Re: [v3 4/5] vfio: implement APIs to set/put kvm to/from vfio group

From: Jike Song <jike.song@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	guangrong.xiao@linux.intel.com, kwankhede@nvidia.com,
	cjia@nvidia.com, kevin.tian@intel.com, kvm@vger.kernel.org
Subject: Re: [v3 4/5] vfio: implement APIs to set/put kvm to/from vfio group
Date: Fri, 11 Nov 2016 15:29:55 +0800	[thread overview]
Message-ID: <58257373.1010905@intel.com> (raw)
In-Reply-To: <20161110083758.3f9a694f@t450s.home>

On 11/10/2016 11:37 PM, Alex Williamson wrote:
> On Thu, 10 Nov 2016 14:04:19 +0800
> Jike Song <jike.song@intel.com> wrote:
> 
>> On 11/10/2016 12:10 PM, Jike Song wrote:
>>> On 11/10/2016 01:53 AM, Alex Williamson wrote:  
>>>> On Wed, 09 Nov 2016 20:49:32 +0800
>>>> Jike Song <jike.song@intel.com> wrote:
>>>>  
>>>>> On 11/08/2016 04:45 AM, Paolo Bonzini wrote:  
>>>>>> On 07/11/2016 19:28, Alex Williamson wrote:    
>>>>>>>>> Can the reference become invalid?      
>>>>>>>>
>>>>>>>> No, this is guaranteed by virt/kvm/vfio.c + the udata.lock mutex (which
>>>>>>>> probably should be renamed...).    
>>>>>>>
>>>>>>> The caller gets a reference to kvm, but there's no guarantee that the
>>>>>>> association of that kvm reference to the group stays valid.  Once we're
>>>>>>> outside of that mutex, we might as well consider that kvm:group
>>>>>>> association stale.
>>>>>>>      
>>>>>>>>> The caller may still hold
>>>>>>>>> a kvm references, but couldn't the group be detached from one kvm
>>>>>>>>> instance and re-attached to another?      
>>>>>>>>
>>>>>>>> Can this be handled by the vendor driver?  Does it get a callback when
>>>>>>>> it's detached from a KVM instance?    
>>>>>>>
>>>>>>> The only release callback through vfio is when the user closes the
>>>>>>> device, the code in this series is the full extent of vfio awareness of
>>>>>>> kvm.  Thanks,    
>>>>>>
>>>>>> Maybe there should be an mdev callback at the point of association and
>>>>>> deassociation between VFIO and KVM.  Then the vendor driver can just use
>>>>>> the same mutex for association, deassociation and usage.  I'm not even
>>>>>> sure that these patches are necessary once you have that callback.    
>>>>>
>>>>> Hi Alex & Paolo,
>>>>>
>>>>> So I cooked another draft version of this, there is no kvm pointer saved
>>>>> in vfio_group in this version, and notifier will be called on attach/detach,
>>>>> please kindly have a look :-)
>>>>>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> Jike
>>>>>
>>>>>
>>>>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>>>>> index ed2361e4..20b5da9 100644
>>>>> --- a/drivers/vfio/vfio.c
>>>>> +++ b/drivers/vfio/vfio.c
>>>>> @@ -34,6 +34,7 @@
>>>>>  #include <linux/uaccess.h>
>>>>>  #include <linux/vfio.h>
>>>>>  #include <linux/wait.h>
>>>>> +#include <linux/kvm_host.h>
>>>>>  
>>>>>  #define DRIVER_VERSION	"0.3"
>>>>>  #define DRIVER_AUTHOR	"Alex Williamson <alex.williamson@redhat.com>"
>>>>> @@ -86,6 +87,10 @@ struct vfio_group {
>>>>>  	struct mutex			unbound_lock;
>>>>>  	atomic_t			opened;
>>>>>  	bool				noiommu;
>>>>> +	struct {
>>>>> +		struct mutex lock;
>>>>> +		struct blocking_notifier_head notifier;
>>>>> +	} udata;
>>>>>  };
>>>>>  
>>>>>  struct vfio_device {
>>>>> @@ -333,6 +338,7 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group)
>>>>>  	mutex_init(&group->device_lock);
>>>>>  	INIT_LIST_HEAD(&group->unbound_list);
>>>>>  	mutex_init(&group->unbound_lock);
>>>>> +	mutex_init(&group->udata.lock);
>>>>>  	atomic_set(&group->container_users, 0);
>>>>>  	atomic_set(&group->opened, 0);
>>>>>  	group->iommu_group = iommu_group;
>>>>> @@ -414,10 +420,11 @@ static void vfio_group_release(struct kref *kref)
>>>>>  	iommu_group_put(iommu_group);
>>>>>  }
>>>>>  
>>>>> -static void vfio_group_put(struct vfio_group *group)
>>>>> +void vfio_group_put(struct vfio_group *group)
>>>>>  {
>>>>>  	kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
>>>>>  }
>>>>> +EXPORT_SYMBOL_GPL(vfio_group_put);
>>>>>  
>>>>>  /* Assume group_lock or group reference is held */
>>>>>  static void vfio_group_get(struct vfio_group *group)
>>>>> @@ -480,7 +487,7 @@ static struct vfio_group *vfio_group_get_from_minor(int minor)
>>>>>  	return group;
>>>>>  }
>>>>>  
>>>>> -static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
>>>>> +struct vfio_group *vfio_group_get_from_dev(struct device *dev)
>>>>>  {
>>>>>  	struct iommu_group *iommu_group;
>>>>>  	struct vfio_group *group;
>>>>> @@ -494,6 +501,7 @@ static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
>>>>>  
>>>>>  	return group;
>>>>>  }
>>>>> +EXPORT_SYMBOL_GPL(vfio_group_get_from_dev);
>>>>>  
>>>>>  /**
>>>>>   * Device objects - create, release, get, put, search
>>>>> @@ -1745,6 +1753,44 @@ long vfio_external_check_extension(struct vfio_group *group, unsigned long arg)
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(vfio_external_check_extension);
>>>>>  
>>>>> +int vfio_group_register_notifier(struct vfio_group *group, struct notifier_block *nb)
>>>>> +{
>>>>> +	return blocking_notifier_chain_register(&group->udata.notifier, nb);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(vfio_group_register_notifier);
>>>>> +
>>>>> +int vfio_group_unregister_notifier(struct vfio_group *group, struct notifier_block *nb)
>>>>> +{
>>>>> +	return blocking_notifier_chain_unregister(&group->udata.notifier, nb);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(vfio_group_unregister_notifier);  
>>>>
>>>> Kirti is already adding vfio_register_notifier &
>>>> vfio_unregister_notifier, these are not exclusive to the iommu, I
>>>> clarified that in my question that IOVA range invalidation is just one
>>>> aspect of what that notifier might be used for.  The mdev framework
>>>> also automatically registers and unregisters that notifier around
>>>> open/release.  So, I don't think we want a new notifier, we just want
>>>> vfio.c to also consume that notifier.  
>>>
>>> Unfortunately the kvm:group attaching happens before device opening,
>>> so registering the notifier in open() is not functional: the event
>>> has disappeared before we start watching it.
>>>
>>> A possible workaround is, register the notifier in create() instead of
>>> open(). That should be functional, but will cause another issue: being able
>>> to register a notifier means we have a vfio-group reference, when to put
>>> that reference? putting it in remove() is not a good idea since a device
>>> might be open/release multiple times between create/remove, holding the ref
>>> until removal breaks it; putting it in release() is obviously not a
>>> good idea neither.
>>>
>>> IOW, having the notifiers there must be some dirty work in vendor
>>> driver to work around the issue above :(
>>>   
>>>> So I think this patch needs a few components that build on what Kirti
>>>> has, 1) we add a blocking_notifier_head per vfio_group and have
>>>> vfio_{un}regsiter_notifier add and remove that notifier to the group
>>>> chain, 2) we create a vfio_group_notify() function that the kvm-vfio
>>>> pseudo device can call via symbol_get, 3) Have kvm-vfio call
>>>> vfio_group_notify() with VFIO_GROUP_NOTIFY_SET_KVM where the data is a
>>>> pointer to the struct kvm (or NULL to unset, we don't need separate set
>>>> vs unset notifiers).  Does that work?  Thanks,  
>>>
>>> Yes, it works better than the original form of below patch.
>>> vfio side doesn't store any data, nor introduce any lock, only a callback
>>> for kvm to use.
>>>   
>>
>> To make my reply clearer: the notifier can work without two separate
>> set/unset, can be combined with Kirti's iommu notifier, however, the problem
>> of being too late to register from open() still exists, and I still find it
>> difficult to work around.
> 
> Ok, so it's a little bit ugly, but kvm-vfio can tell vfio about struct
> kvm with a vfio_group_set_kvm() callback that it uses via symbol get,
> using the real struct kvm pointer or NULL to unset.  vfio caches this
> on the vfio_group.  When a notifier is registered, it replays the event
> through the notifier.  Any updates while a notifier is connected are
> both cached and immediately replayed through the notifier.  So the
> vendor driver to vfio communication channel is the same, but the vfio
> to kvm-vfio involves some buffering.  Does that work?  Thanks,

Yes it works, thanks for the suggestion, I'll post a v4 series.

--
Thanks,
Jike