From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1754900AbdERBv7 convert rfc822-to-8bit (ORCPT <rfc822;w@1wt.eu>);
        Wed, 17 May 2017 21:51:59 -0400
Received: from mga04.intel.com ([192.55.52.120]:63648 "EHLO mga04.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1754376AbdERBv5 (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 17 May 2017 21:51:57 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.38,356,1491289200"; 
   d="scan'208";a="1131722860"
From: "Chen, Xiaoguang" <xiaoguang.chen@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
CC: Gerd Hoffmann <kraxel@redhat.com>,
        "Tian, Kevin" <kevin.tian@intel.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "zhenyuw@linux.intel.com" <zhenyuw@linux.intel.com>,
        "Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
        "intel-gvt-dev@lists.freedesktop.org" 
        <intel-gvt-dev@lists.freedesktop.org>,
        "Wang, Zhi A" <zhi.a.wang@intel.com>
Subject: RE: [RFC PATCH 6/6] drm/i915/gvt: support QEMU getting the dmabuf
Thread-Topic: [RFC PATCH 6/6] drm/i915/gvt: support QEMU getting the dmabuf
Thread-Index: AQHSwAN47y043l0J90m8wsAsHWts2aHgTQ8AgAGNjMCAAazCQIAAU9qAgAD30YCAAIqCgIAJhcZQ///LVYCAACZvAIABMD1g//+Lz4CAAGhigIAAfLgAgARXT/CAAHIJgIABcqeQgAH05ICAAMZxEA==
Date: Thu, 18 May 2017 01:51:38 +0000
Message-ID: <DD379D741F77464281CE7ED1CD7C12DE7065D1C0@SHSMSX101.ccr.corp.intel.com>
References: <1493372130-27727-1-git-send-email-xiaoguang.chen@intel.com>
        <1493372130-27727-7-git-send-email-xiaoguang.chen@intel.com>
        <1493718658.8581.82.camel@redhat.com>
        <DD379D741F77464281CE7ED1CD7C12DE6FC92D25@SHSMSX101.ccr.corp.intel.com>
        <DD379D741F77464281CE7ED1CD7C12DE6FC93698@SHSMSX101.ccr.corp.intel.com>
        <20170504100833.199bc8ba@t450s.home>    <1493967331.371.53.camel@redhat.com>
        <20170505091115.7a680636@t450s.home>
        <DD379D741F77464281CE7ED1CD7C12DE70657AF3@SHSMSX101.ccr.corp.intel.com>
        <1494509273.17970.12.camel@redhat.com>  <20170511094526.164985ee@w520.home>
        <DD379D741F77464281CE7ED1CD7C12DE706583BA@SHSMSX101.ccr.corp.intel.com>
        <20170511205829.672854c3@t450s.home>    <1494580325.14352.57.camel@redhat.com>
        <20170512103828.4a1378a1@t450s.home>
        <DD379D741F77464281CE7ED1CD7C12DE706592D6@SHSMSX101.ccr.corp.intel.com>
        <20170515114409.414d1fdb@w520.home>
        <DD379D741F77464281CE7ED1CD7C12DE7065AA7D@SHSMSX101.ccr.corp.intel.com>
 <20170517154331.1b465f69@w520.home>
In-Reply-To: <20170517154331.1b465f69@w520.home>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
dlp-product: dlpe-windows
dlp-version: 10.0.102.7
dlp-reaction: no-action
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8BIT
MIME-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi Alex,

>-----Original Message-----
>From: Alex Williamson [mailto:alex.williamson@redhat.com]
>Sent: Thursday, May 18, 2017 5:44 AM
>To: Chen, Xiaoguang <xiaoguang.chen@intel.com>
>Cc: Gerd Hoffmann <kraxel@redhat.com>; Tian, Kevin <kevin.tian@intel.com>;
>linux-kernel@vger.kernel.org; zhenyuw@linux.intel.com; Lv, Zhiyuan
><zhiyuan.lv@intel.com>; intel-gvt-dev@lists.freedesktop.org; Wang, Zhi A
><zhi.a.wang@intel.com>
>Subject: Re: [RFC PATCH 6/6] drm/i915/gvt: support QEMU getting the dmabuf
>
>On Tue, 16 May 2017 10:16:28 +0000
>"Chen, Xiaoguang" <xiaoguang.chen@intel.com> wrote:
>
>> Hi Alex,
>>
>> >-----Original Message-----
>> >From: Alex Williamson [mailto:alex.williamson@redhat.com]
>> >Sent: Tuesday, May 16, 2017 1:44 AM
>> >To: Chen, Xiaoguang <xiaoguang.chen@intel.com>
>> >Cc: Gerd Hoffmann <kraxel@redhat.com>; Tian, Kevin
>> ><kevin.tian@intel.com>; intel-gfx@lists.freedesktop.org;
>> >linux-kernel@vger.kernel.org; zhenyuw@linux.intel.com; Lv, Zhiyuan
>> ><zhiyuan.lv@intel.com>; intel-gvt- dev@lists.freedesktop.org; Wang,
>> >Zhi A <zhi.a.wang@intel.com>
>> >Subject: Re: [RFC PATCH 6/6] drm/i915/gvt: support QEMU getting the
>> >dmabuf
>> >
>> >On Mon, 15 May 2017 03:36:50 +0000
>> >"Chen, Xiaoguang" <xiaoguang.chen@intel.com> wrote:
>> >
>> >> Hi Alex and Gerd,
>> >>
>> >> >-----Original Message-----
>> >> >From: Alex Williamson [mailto:alex.williamson@redhat.com]
>> >> >Sent: Saturday, May 13, 2017 12:38 AM
>> >> >To: Gerd Hoffmann <kraxel@redhat.com>
>> >> >Cc: Chen, Xiaoguang <xiaoguang.chen@intel.com>; Tian, Kevin
>> >> ><kevin.tian@intel.com>; intel-gfx@lists.freedesktop.org; linux-
>> >> >kernel@vger.kernel.org; zhenyuw@linux.intel.com; Lv, Zhiyuan
>> >> ><zhiyuan.lv@intel.com>; intel-gvt-dev@lists.freedesktop.org; Wang,
>> >> >Zhi A <zhi.a.wang@intel.com>
>> >> >Subject: Re: [RFC PATCH 6/6] drm/i915/gvt: support QEMU getting
>> >> >the dmabuf
>> >> >
>> >> >On Fri, 12 May 2017 11:12:05 +0200 Gerd Hoffmann
>> >> ><kraxel@redhat.com> wrote:
>> >> >
>> >> >>   Hi,
>> >> >>
>> >> >> > If the contents of the framebuffer change or if the parameters
>> >> >> > of the framebuffer change?  I can't image that creating a new
>> >> >> > dmabuf fd for every visual change within the framebuffer would
>> >> >> > be efficient, but I don't have any concept of what a dmabuf actually
>does.
>> >> >>
>> >> >> Ok, some background:
>> >> >>
>> >> >> The drm subsystem has the concept of planes.  The most important
>> >> >> plane is the primary framebuffer (i.e. what gets scanned out to
>> >> >> the physical display).  The cursor is a plane too, and there can
>> >> >> be additional overlay planes for stuff like video playback.
>> >> >>
>> >> >> Typically there are multiple planes in a system and only one of
>> >> >> them gets scanned out to the crtc, i.e. the fbdev emulation
>> >> >> creates one plane for the framebuffer console.  The X-Server
>> >> >> creates a plane too, and when you switch between X-Server and
>> >> >> framebuffer console via ctrl-alt-fn the intel driver just
>> >> >> reprograms the encoder to scan out the one or the other plane to the crtc.
>> >> >>
>> >> >> The dma-buf handed out by gvt is a reference to a plane.  I
>> >> >> think on the host side gvt can see only the active plane (from
>> >> >> encoder/crtc register
>> >> >> programming) not the inactive ones.
>> >> >>
>> >> >> The dma-buf can be imported as opengl texture and then be used
>> >> >> to render the guest display to a host window.  I think it is
>> >> >> even possible to use the dma-buf as plane in the host drm driver
>> >> >> and scan it out directly to a physical display.  The actual
>> >> >> framebuffer content stays in gpu memory all the time, the cpu never has
>to touch it.
>> >> >>
>> >> >> It is possible to cache the dma-buf handles, i.e. when the guest
>> >> >> boots you'll get the first for the fbcon plane, when the
>> >> >> x-server starts the second for the x-server framebuffer, and
>> >> >> when the user switches to the text console via ctrl-alt-fn you
>> >> >> can re-use the fbcon dma-buf you already have.
>> >> >>
>> >> >> The caching becomes more important for good performance when the
>> >> >> guest uses pageflipping (wayland does): define two planes,
>> >> >> render into one while displaying the other, then flip the two
>> >> >> for a atomic display update.
>> >> >>
>> >> >> The caching also makes it a bit difficult to create a good interface.
>> >> >> So, the current patch set creates:
>> >> >>
>> >> >>   (a) A way to query the active planes (ioctl
>> >> >>       INTEL_VGPU_QUERY_DMABUF added by patch 5/6 of this series).
>> >> >>   (b) A way to create a dma-buf for the active plane (ioctl
>> >> >>       INTEL_VGPU_GENERATE_DMABUF).
>> >> >>
>> >> >> Typical userspace workflow is to first query the plane, then
>> >> >> check if it already has a dma-buf for it, and if not create one.
>> >> >
>> >> >Thank you!  This is immensely helpful!
>> >> >
>> >> >> > What changes to the framebuffer require a new dmabuf fd?
>> >> >> > Shouldn't the user query the parameters of the framebuffer
>> >> >> > through a dmabuf fd and shouldn't the dmabuf fd have some
>> >> >> > signaling mechanism to the user (eventfd perhaps) to notify
>> >> >> > the user to re-
>> >evaluate the parameters?
>> >> >>
>> >> >> dma-bufs don't support that, they are really just a handle to a
>> >> >> piece of memory, all metadata (format, size) most be
>> >> >> communicated by
>> >other means.
>> >> >>
>> >> >> > Otherwise are you imagining that the user polls the vfio region?
>> >> >>
>> >> >> Hmm, notification support would probably a good reason to have a
>> >> >> separate file handle to manage the dma-bufs (instead of using
>> >> >> driver-specific ioctls on the vfio fd), because the driver could
>> >> >> also use the management fd for notifications then.
>> >> >
>> >> >I like this idea of a separate control fd for dmabufs, it provides
>> >> >not only a central management point, but also a nice abstraction
>> >> >for the vfio device specific interface.  We potentially only need
>> >> >a single
>> >> >VFIO_DEVICE_GET_DMABUF_MGR_FD() ioctl to get a dmabuf management
>> >> >fd (perhaps with a type parameter, ex. GFX) where maybe we could
>> >> >have vfio-core incorporate this reference into the group
>> >> >lifecycle, so the vendor driver only needs to fdget/put this
>> >> >manager fd for the various plane dmabuf fds spawned in order to get core-
>level reference counting.
>> >> Following is my understanding of the management fd idea:
>> >> 1) QEMU will call VFIO_DEVICE_GET_DMABUF_MGR_FD() ioctl to create a
>> >> fd
>> >and saved the fd in vfio group while initializing the vfio.
>> >
>> >Ideally there'd be kernel work here too if we want vfio-core to
>> >incorporate lifecycle of this fd into the device/group/container
>> >lifecycle.  Maybe we even want to generalize it further to something
>> >like VFIO_DEVICE_GET_FD which takes a parameter of what type of FD to
>> >get, GFX_DMABUF_MGR_FD in this case.  vfio- core would probably
>> >allocate the fd, tap into the release hook for reference counting and
>> >pass it to the vfio_device_ops (mdev vendor driver in this case) to attach
>further.
>> I tried to implement this today and now it functionally worked.
>> I am still a little confuse of how to tap the fd into the release hook  of
>device/group/container.
>> I tried to create the fd in vfio core but found it is difficult to get the file
>operations for the fd, the file operations should be supplied by vendor drivers.
>
>I'm not fully convinced there's benefit to having vfio-core attempt to do this, I
>was just hoping to avoid each vendor driver needing to implement their own
>reference counting.  I don't think vfio-core wants to get into tracking each ioctl
>for each vendor specific fd type to know which create new references and which
>don't.  Perhaps we'll come up with easier ways to do this as we go.
Got it.

>
>> So the fd is created in kvmgt.c for now.
>> Below is part of the codes:
>>
>> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c
>> b/drivers/gpu/drm/i915/gvt/kvmgt.c
>> index 389f072..d0649ba 100644
>> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
>> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
>> @@ -41,6 +41,7 @@
>>  #include <linux/kvm_host.h>
>>  #include <linux/vfio.h>
>>  #include <linux/mdev.h>
>> +#include <linux/anon_inodes.h>
>>
>>  #include "i915_drv.h"
>>  #include "gvt.h"
>> @@ -524,6 +525,63 @@ static int intel_vgpu_reg_init_opregion(struct
>intel_vgpu *vgpu)
>>         return ret;
>>  }
>>
>> +static int intel_vgpu_dmabuf_mgr_fd_mmap(struct file *file, struct
>> +vm_area_struct *vma) {
>> +       WARN_ON(1);
>
>A user can abuse this, simply return error.
OK.

>
>> +
>> +       return 0;
>> +}
>> +
>> +static int intel_vgpu_dmabuf_mgr_fd_release(struct inode *inode,
>> +struct file *filp) {
>> +       struct intel_vgpu *vgpu = filp->private_data;
>> +
>> +       if (vgpu->vdev.vfio_device != NULL)
>> +               vfio_device_put(vgpu->vdev.vfio_device);
>
>When does the case occur where we don't have a vfio_device?  This looks a bit
>like a warning flag that reference counting isn't handled properly.
This situation happen only when anonymous fd created successfully but error occur while trying to get the vfio_device.
We should return error while user space trying to create the management fd and print an error message while kernel release this fd.

>
>> +
>> +       return 0;
>> +}
>> +
>> +static long intel_vgpu_dmabuf_mgr_fd_ioctl(struct file *filp,
>> +               unsigned int ioctl, unsigned long arg) {
>> +       struct intel_vgpu *vgpu = filp->private_data;
>> +       int minsz;
>> +       struct intel_vgpu_dmabuf dmabuf;
>> +       int ret;
>> +       struct fd f;
>> +       f = fdget(dmabuf.fd);
>> +       minsz = offsetofend(struct intel_vgpu_dmabuf, tiled);
>> +       if (copy_from_user(&dmabuf, (void __user *)arg, minsz))
>> +               return -EFAULT;
>> +       if (ioctl == INTEL_VGPU_QUERY_DMABUF)
>> +               ret = intel_gvt_ops->vgpu_query_dmabuf(vgpu, &dmabuf);
>> +       else if (ioctl == INTEL_VGPU_GENERATE_DMABUF)
>> +               ret = intel_gvt_ops->vgpu_generate_dmabuf(vgpu,
>> +&dmabuf);
>
>Why do we need vendor specific ioctls here?  Aren't querying the current plane
>and getting an fd for that plane very generic concepts?
>Is the resulting dmabuf Intel specific?
No. not Intel specific. Like Gerd said "Typical userspace workflow is to first query the plane, then 
check if it already has a dma-buf for it, and if not create one".
We first query the plane info(WITHOUT creating a fd).
User space need to check whether there's a dmabuf for the plane(user space usually cached two or three dmabuf to handle double buffer or triple buffer situation) only there's no dmabuf for the plane we will create a dmabuf for it(another ioctl).

>
>> +       else {
>> +               fdput(f);
>> +               gvt_vgpu_err("unsupported dmabuf operation\n");
>> +               return -EINVAL;
>> +       }
>> +
>> +       if (ret != 0) {
>> +               fdput(f);
>> +               gvt_vgpu_err("gvt-g get dmabuf failed:%d\n", ret);
>> +               return -EINVAL;
>> +       }
>> +       fdput(f);
>> +
>> +       return copy_to_user((void __user *)arg, &dmabuf, minsz) ?
>> +-EFAULT : 0; }
>> +
>> +static const struct file_operations intel_vgpu_dmabuf_mgr_fd_ops = {
>> +       .release        = intel_vgpu_dmabuf_mgr_fd_release,
>> +       .unlocked_ioctl = intel_vgpu_dmabuf_mgr_fd_ioctl,
>> +       .mmap           = intel_vgpu_dmabuf_mgr_fd_mmap,
>> +       .llseek         = noop_llseek,
>> +};
>> +
>>  static int intel_vgpu_create(struct kobject *kobj, struct mdev_device
>> *mdev)  {
>>         struct intel_vgpu *vgpu = NULL; @@ -1259,6 +1317,31 @@ static
>> long intel_vgpu_ioctl(struct mdev_device *mdev, unsigned int cmd,
>>         } else if (cmd == VFIO_DEVICE_RESET) {
>>                 intel_gvt_ops->vgpu_reset(vgpu);
>>                 return 0;
>> +       } else if (cmd == VFIO_DEVICE_GET_FD) {
>> +               struct vfio_fd vfio_fd;
>> +               int fd;
>> +               struct vfio_device *device;
>> +
>> +               minsz = offsetofend(struct vfio_fd, fd);
>> +               if (copy_from_user(&vfio_fd, (void __user *)arg, minsz))
>> +                       return -EINVAL;
>> +
>> +               if (vfio_fd.argsz < minsz)
>> +                       return -EINVAL;
>> +
>> +               fd = anon_inode_getfd("vfio_dmabuf_mgr_fd",
>&intel_vgpu_dmabuf_mgr_fd_ops,
>> +                                       vgpu, O_RDWR | O_CLOEXEC);
>> +               if (fd < 0)
>> +                       return -EINVAL;
>> +
>> +               vfio_fd.fd = fd;
>> +               device = vfio_device_get_from_dev(mdev_dev(mdev));
>> +               if (device == NULL)
>> +                       gvt_vgpu_err("kvmgt: vfio device is null\n");
>> +               else
>> +                       vgpu->vdev.vfio_device = device;
>> +
>> +               return copy_to_user((void __user *)arg, &vfio_fd,
>> + minsz) ? -EFAULT : 0;
>>         }
>>
>>         return 0;
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 519eff3..98be2e0 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -484,6 +485,20 @@ struct vfio_pci_hot_reset {
>>
>>  #define VFIO_DEVICE_PCI_HOT_RESET      _IO(VFIO_TYPE, VFIO_BASE + 13)
>>
>> +/**
>> + * VFIO_DEVICE_GET_FD - _IOW(VFIO_TYPE, VFIO_BASE + 21, struct
>> +vfio_fd)
>> + *
>> + * Create a fd for a vfio device.
>> + * This fd can be used for various purpose.
>> + */
>> +struct vfio_fd {
>> +       __u32 argsz;
>> +       __u32 flags;
>> +       /* out */
>> +       __u32 fd;
>> +};
>> +#define VFIO_DEVICE_GET_FD     _IO(VFIO_TYPE, VFIO_BASE + 14)
>
>
>The idea was that we pass some sort of type to VFIO_DEVICE_GET_FD, for
>instance we might ask for a DEVICE_FD_GRAPHICS_DMABUF and the vfio bus
>driver (mdev vendor driver) would test whether it supports that type of thing and
>either return an fd or error.  We can return the fd the same way we do for
>VFIO_DEVICE_GET_FD.  For instance the user should do something like:
>
>dmabuf_fd = ioctl(device_fd,
>		VFIO_DEVICE_GET_FD, DEVICE_FD_GRAPHICS_DMABUF); if
>(dmabuf_fd < 0)
>	/* not supported... */
>else
>	/* do stuff */
OK. Got it.


>
>Thanks,
>Alex