All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jike Song <jike.song@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Neo Jia <cjia@nvidia.com>, Paolo Bonzini <pbonzini@redhat.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	kvm@vger.kernel.org, guangrong.xiao@intel.com,
	Xiaoguang Chen <xiaoguang.chen@intel.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>
Subject: Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
Date: Tue, 18 Oct 2016 20:38:21 +0800	[thread overview]
Message-ID: <580617BD.8000300@intel.com> (raw)
In-Reply-To: <20161017100229.1474ae33@t450s.home>

On 10/18/2016 12:02 AM, Alex Williamson wrote:
> On Fri, 14 Oct 2016 15:19:01 -0700
> Neo Jia <cjia@nvidia.com> wrote:
> 
>> On Fri, Oct 14, 2016 at 10:51:24AM -0600, Alex Williamson wrote:
>>> On Fri, 14 Oct 2016 09:35:45 -0700
>>> Neo Jia <cjia@nvidia.com> wrote:
>>>   
>>>> On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:  
>>>>> On Fri, 14 Oct 2016 08:41:58 -0600
>>>>> Alex Williamson <alex.williamson@redhat.com> wrote:
>>>>>     
>>>>>> On Fri, 14 Oct 2016 18:37:45 +0800
>>>>>> Jike Song <jike.song@intel.com> wrote:
>>>>>>     
>>>>>>> On 10/11/2016 05:47 PM, Paolo Bonzini wrote:      
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/10/2016 11:21, Xiao Guangrong wrote:        
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:        
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/10/2016 04:39, Xiao Guangrong wrote:        
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:        
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/10/2016 20:01, Neo Jia wrote:        
>>>>>>>>>>>>>> Hi Neo,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
>>>>>>>>>>>>>> while nVidia does.        
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Paolo and Xiaoguang,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am just wondering how device driver can register a notifier so he
>>>>>>>>>>>>> can be
>>>>>>>>>>>>> notified for write-protected pages when writes are happening.        
>>>>>>>>>>>>
>>>>>>>>>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
>>>>>>>>>>>> currently where a struct kvm_device* and struct vfio_group* touch.
>>>>>>>>>>>> Given
>>>>>>>>>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
>>>>>>>>>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
>>>>>>>>>>>> that passes the struct kvm_device* to the mdev device.
>>>>>>>>>>>>
>>>>>>>>>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
>>>>>>>>>>>> at KVM Forum but I don't remember the details.        
>>>>>>>>>>>
>>>>>>>>>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
>>>>>>>>>>> figure out the kvm instance based on the fd.
>>>>>>>>>>>
>>>>>>>>>>> We got a new idea, how about search the kvm instance by mm_struct, it
>>>>>>>>>>> can work as KVMGT is running in the vcpu context and it is much more
>>>>>>>>>>> straightforward.        
>>>>>>>>>>
>>>>>>>>>> Perhaps I didn't understand your suggestion, but the same mm_struct can
>>>>>>>>>> have more than 1 struct kvm so I'm not sure that it can work.        
>>>>>>>>>
>>>>>>>>> vcpu->pid is valid during vcpu running so that it can be used to figure
>>>>>>>>> out which kvm instance owns the vcpu whose pid is the one as current
>>>>>>>>> thread, i think it can work. :)        
>>>>>>>>
>>>>>>>> No, don't do that.  There's no reason for a thread to run a single VCPU,
>>>>>>>> and if you can have multiple VCPUs you can also have multiple VCPUs from
>>>>>>>> multiple VMs.
>>>>>>>>
>>>>>>>> Passing file descriptors around are the right way to connect subsystems.        
>>>>>>>
>>>>>>> [CC Alex, Kevin and Qemu-devel]
>>>>>>>
>>>>>>> Hi Paolo & Alex,
>>>>>>>
>>>>>>> IIUC, passing file descriptors means touching QEMU and the UAPI between
>>>>>>> QEMU and VFIO. Would you guys have a look at below draft patch? If it's
>>>>>>> on the correct direction, I'll send the split ones. Thanks!
>>>>>>>
>>>>>>> --
>>>>>>> Thanks,
>>>>>>> Jike
>>>>>>>
>>>>>>>
>>>>>>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>>>>>>> index bec694c..f715d37 100644
>>>>>>> --- a/hw/vfio/pci-quirks.c
>>>>>>> +++ b/hw/vfio/pci-quirks.c
>>>>>>> @@ -10,12 +10,14 @@
>>>>>>>   * the COPYING file in the top-level directory.
>>>>>>>   */
>>>>>>>  
>>>>>>> +#include <sys/ioctl.h>
>>>>>>>  #include "qemu/osdep.h"
>>>>>>>  #include "qemu/error-report.h"
>>>>>>>  #include "qemu/range.h"
>>>>>>>  #include "qapi/error.h"
>>>>>>>  #include "hw/nvram/fw_cfg.h"
>>>>>>>  #include "pci.h"
>>>>>>> +#include "sysemu/kvm.h"
>>>>>>>  #include "trace.h"
>>>>>>>  
>>>>>>>  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
>>>>>>> @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
>>>>>>>          break;
>>>>>>>      }
>>>>>>>  }
>>>>>>> +
>>>>>>> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
>>>>>>> +{
>>>>>>> +    int vmfd;
>>>>>>> +
>>>>>>> +    if (!kvm_enabled() || !vdev->kvmgt)
>>>>>>> +        return;
>>>>>>> +
>>>>>>> +    /* Tell the device what KVM it attached */
>>>>>>> +    vmfd = kvm_get_vmfd(kvm_state);
>>>>>>> +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
>>>>>>> +}
>>>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>>>>> index a5a620a..8732552 100644
>>>>>>> --- a/hw/vfio/pci.c
>>>>>>> +++ b/hw/vfio/pci.c
>>>>>>> @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
>>>>>>>          return ret;
>>>>>>>      }
>>>>>>>  
>>>>>>> +    vfio_quirk_kvmgt(vdev);
>>>>>>> +
>>>>>>>      /* Get a copy of config space */
>>>>>>>      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
>>>>>>>                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
>>>>>>> @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
>>>>>>>      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
>>>>>>>                         sub_device_id, PCI_ANY_ID),
>>>>>>>      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
>>>>>>> +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),      
>>>>>>
>>>>>> Just a side note, device options are a headache, users are prone to get
>>>>>> them wrong and minimally it requires an entire round to get libvirt
>>>>>> support.  We should be able to detect from the device or vfio API
>>>>>> whether such a call is required.  Obviously if we can use the existing
>>>>>> kvm-vfio device, that's the better option anyway.  Thanks,    
>>>>>
>>>>> Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
>>>>> does, it needs to produce a device failure when unavailable.  Thanks,    
>>>>
>>>> Also, I would like to see this as an generic feature instead of
>>>> kvmgt specific interface, so we don't have to add new options to QEMU and it is
>>>> up to the vendor driver to proceed with or without it.  
>>>
>>> In general this should be decided by lack of some required feature
>>> exclusively provided by KVM.  I would not want to add a generic opt-out
>>> for mdev vendor drivers to decide that they arbitrarily want to disable
>>> that path.  Thanks,  
>>
>> IIUC, you are suggesting that this path should be controlled by KVM feature cap
>> and it will be accessible to VFIO users when such checking is satisfied.
> 
> Maybe we're getting too loose with our pronouns here, I'm starting to
> lose track of what "this" is referring to.  I agree that there's no
> reason for the ioctl, as proposed to be kvmgt specific.  I would hope
> that going through the kvm-vfio device to create that linkage would
> eliminate that, but we'll need to see what Jike can come up with to
> plumb between KVM and vfio.  Vendor drivers can implement their own
> ioctls, now that we pass them through the mdev layer, but someone needs
> to call those ioctls.  Ideally we want something programmatic to
> trigger that, without requiring a user to pass an extra device
> parameter.  Additionally, if there is any hope of making use of the
> device with userspace drivers other than QEMU, hard dependencies on KVM
> should be avoided.  Thanks,
> 
> Alex
> 

Thanks for the advice, so I cooked another patch for your comments.
Basically a 'void *usrdata' is added to vfio_group, external users
can set it (kvm) or get it (kvm or other users like kvmgt).

BTW, in device-model, the open method will return failure to vfio-mdev
in case that such kvm information is not available.

--
Thanks,
Jike



diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index d1d70e0..6b8d1d2 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -86,6 +86,7 @@ struct vfio_group {
 	struct mutex			unbound_lock;
 	atomic_t			opened;
 	bool				noiommu;
+	void				*usrdata;
 };
 
 struct vfio_device {
@@ -447,14 +448,13 @@ static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
 }
 
 static
-struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
+struct vfio_group *__vfio_group_get_from_iommu(struct iommu_group *iommu_group)
 {
 	struct vfio_group *group;
 
 	mutex_lock(&vfio.group_lock);
 	list_for_each_entry(group, &vfio.group_list, vfio_next) {
 		if (group->iommu_group == iommu_group) {
-			vfio_group_get(group);
 			mutex_unlock(&vfio.group_lock);
 			return group;
 		}
@@ -464,6 +464,17 @@ struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
 	return NULL;
 }
 
+static
+struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
+{
+	struct vfio_group *group = __vfio_group_get_from_iommu(iommu_group);
+	if (!group)
+		return NULL;
+
+	vfio_group_get(group);
+	return group;
+}
+
 static struct vfio_group *vfio_group_get_from_minor(int minor)
 {
 	struct vfio_group *group;
@@ -1728,6 +1739,31 @@ long vfio_external_check_extension(struct vfio_group *group, unsigned long arg)
 }
 EXPORT_SYMBOL_GPL(vfio_external_check_extension);
 
+void vfio_group_set_usrdata(struct vfio_group *group, void *data)
+{
+	group->usrdata = data;
+}
+EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
+
+void *vfio_group_get_usrdata(struct vfio_group *group)
+{
+	return group->usrdata;
+}
+EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
+
+void *vfio_group_get_usrdata_by_device(struct device *dev)
+{
+	struct vfio_group *vfio_group;
+
+	vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
+	if (!vfio_group)
+		return NULL;
+
+	return vfio_group_get_usrdata(vfio_group);
+}
+EXPORT_SYMBOL_GPL(vfio_group_get_usrdata_by_device);
+
+
 /**
  * Sub-module support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 0ecae0b..712588f 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -91,6 +91,10 @@ extern void vfio_unregister_iommu_driver(
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
 extern long vfio_external_check_extension(struct vfio_group *group,
 					  unsigned long arg);
+extern void vfio_group_set_usrdata(struct vfio_group *group, void *data);
+extern void *vfio_group_get_usrdata(struct vfio_group *group);
+extern void *vfio_group_get_usrdata_by_device(struct device *dev);
+
 
 /*
  * Sub-module helpers
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 1dd087d..e00d401 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -60,6 +60,20 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
 	symbol_put(vfio_group_put_external_user);
 }
 
+static void kvm_vfio_group_set_kvm(struct vfio_group *group, void *kvm)
+{
+	void (*fn)(struct vfio_group *, void *);
+
+	fn = symbol_get(vfio_group_set_usrdata);
+	if (!fn)
+		return;
+
+	fn(group, kvm);
+	kvm_get_kvm(kvm);
+
+	symbol_put(vfio_group_set_usrdata);
+}
+
 static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
 {
 	long (*fn)(struct vfio_group *, unsigned long);
@@ -161,6 +175,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
 
 		kvm_vfio_update_coherency(dev);
 
+		kvm_vfio_group_set_kvm(vfio_group, dev->kvm);
+
 		return 0;
 
 	case KVM_DEV_VFIO_GROUP_DEL:
@@ -200,6 +216,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
 
 		kvm_vfio_update_coherency(dev);
 
+		kvm_put_kvm(dev->kvm);
+
 		return ret;
 	}
 

  reply	other threads:[~2016-10-18 12:41 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-09  7:41 [PATCH 0/2] page track add notifier type track_flush_slot Xiaoguang Chen
2016-10-09  7:41 ` [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot Xiaoguang Chen
2016-10-09  8:31   ` Neo Jia
2016-10-09  8:56     ` Chen, Xiaoguang
2016-10-10 17:06     ` Paolo Bonzini
2016-10-10 18:01       ` Neo Jia
2016-10-10 18:32         ` Paolo Bonzini
2016-10-11  2:39           ` Xiao Guangrong
2016-10-11  8:54             ` Paolo Bonzini
2016-10-11  9:21               ` Xiao Guangrong
2016-10-11  9:47                 ` Paolo Bonzini
2016-10-14 10:37                   ` Jike Song
2016-10-14 10:37                     ` [Qemu-devel] " Jike Song
2016-10-14 10:43                     ` Paolo Bonzini
2016-10-14 10:43                       ` [Qemu-devel] " Paolo Bonzini
2016-10-14 12:26                       ` Jike Song
2016-10-14 12:26                         ` [Qemu-devel] " Jike Song
2016-10-14 14:41                     ` Alex Williamson
2016-10-14 14:46                       ` Alex Williamson
2016-10-14 14:46                         ` [Qemu-devel] " Alex Williamson
2016-10-14 16:35                         ` Neo Jia
2016-10-14 16:35                           ` Neo Jia
2016-10-14 16:51                           ` Alex Williamson
2016-10-14 16:51                             ` Alex Williamson
2016-10-14 22:19                             ` Neo Jia
2016-10-14 22:19                               ` Neo Jia
2016-10-17 16:02                               ` Alex Williamson
2016-10-17 16:02                                 ` Alex Williamson
2016-10-18 12:38                                 ` Jike Song [this message]
2016-10-18 14:59                                   ` Alex Williamson
2016-10-19  2:32                                     ` Jike Song
2016-10-19  5:45                                       ` Xiao Guangrong
2016-10-19 11:56                                         ` Paolo Bonzini
2016-10-19 11:56                                           ` [Qemu-devel] " Paolo Bonzini
2016-10-19 13:39                                           ` Xiao Guangrong
2016-10-19 13:39                                             ` [Qemu-devel] " Xiao Guangrong
2016-10-19 14:14                                             ` Paolo Bonzini
2016-10-19 14:14                                               ` [Qemu-devel] " Paolo Bonzini
2016-10-20  1:48                                               ` Xiao Guangrong
2016-10-20 17:06                                                 ` Paolo Bonzini
2016-10-20 17:19                                                   ` Xiao, Guangrong
2016-10-20 17:19                                                     ` [Qemu-devel] " Xiao, Guangrong
2016-10-21  2:47                                                     ` Jike Song
2016-10-21  2:47                                                       ` Jike Song
2016-10-26 13:44                                                   ` Jike Song
2016-10-26 13:44                                                     ` Jike Song
2016-10-26 14:45                                                     ` Paolo Bonzini
2016-10-29  4:07                                                       ` Jike Song
2016-10-29  4:07                                                         ` Jike Song
2016-10-19 13:56                                       ` Eric Blake
2016-10-19 13:56                                         ` [Qemu-devel] " Eric Blake
2016-10-24  6:32                                         ` Jike Song
2016-10-12 20:48   ` Radim Krčmář
2016-10-09  7:41 ` [PATCH 2/2] KVM: MMU: apply page track notifier type track_flush_slot Xiaoguang Chen
2016-10-10 17:06 ` [PATCH 0/2] page track add " Paolo Bonzini
2016-10-11  2:43   ` Xiao Guangrong
2016-10-11  8:55     ` Paolo Bonzini
2016-10-12 20:52       ` Radim Krčmář

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=580617BD.8000300@intel.com \
    --to=jike.song@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=cjia@nvidia.com \
    --cc=guangrong.xiao@intel.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=xiaoguang.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.