All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] page track add notifier type track_flush_slot
@ 2016-10-09  7:41 Xiaoguang Chen
  2016-10-09  7:41 ` [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot Xiaoguang Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Xiaoguang Chen @ 2016-10-09  7:41 UTC (permalink / raw)
  To: kvm, pbonzini, guangrong.xiao, jike.song; +Cc: Xiaoguang Chen

The seires is to add a new notifer type track_flush_slot for page track.
By using this notifer type when a memory slot is being moved or removed
users of page track can be notified.

This notifier type is needed by KVMGT to sync up its shadow page table
when memory slot is being moved or removed.

Xiaoguang Chen (2):
  KVM: page track: add a new notifier type: track_flush_slot
  KVM: MMU: apply page track notifier type track_flush_slot

 arch/x86/include/asm/kvm_page_track.h |  9 +++++++++
 arch/x86/kvm/mmu.c                    |  7 +++++++
 arch/x86/kvm/page_track.c             | 25 +++++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  2 +-
 4 files changed, 42 insertions(+), 1 deletion(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-09  7:41 [PATCH 0/2] page track add notifier type track_flush_slot Xiaoguang Chen
@ 2016-10-09  7:41 ` Xiaoguang Chen
  2016-10-09  8:31   ` Neo Jia
  2016-10-12 20:48   ` Radim Krčmář
  2016-10-09  7:41 ` [PATCH 2/2] KVM: MMU: apply page track notifier type track_flush_slot Xiaoguang Chen
  2016-10-10 17:06 ` [PATCH 0/2] page track add " Paolo Bonzini
  2 siblings, 2 replies; 58+ messages in thread
From: Xiaoguang Chen @ 2016-10-09  7:41 UTC (permalink / raw)
  To: kvm, pbonzini, guangrong.xiao, jike.song; +Cc: Xiaoguang Chen

When a memory slot is being moved or removed users of page track
can be notified. So users can drop write-protection for the pages
in that memory slot.

This notifier type is needed by KVMGT to sync up its shadow page
table when memory slot is being moved or removed.

Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com>
Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
---
 arch/x86/include/asm/kvm_page_track.h |  9 +++++++++
 arch/x86/kvm/page_track.c             | 25 +++++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  2 +-
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index c2b8d24..5f66597 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -32,6 +32,14 @@ struct kvm_page_track_notifier_node {
 	 */
 	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			    int bytes);
+	/*
+	 * It is called when memory slot is being moved or removed
+	 * users can drop write-protection for the pages in that memory slot
+	 *
+	 * @kvm: the kvm where memory slot being moved or removed
+	 * @slot: the memory slot being moved or removed
+	 */
+	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot);
 };
 
 void kvm_page_track_init(struct kvm *kvm);
@@ -58,4 +66,5 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
+void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 #endif
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index b431539..e79bb25 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -225,3 +225,28 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			n->track_write(vcpu, gpa, new, bytes);
 	srcu_read_unlock(&head->track_srcu, idx);
 }
+
+/*
+ * Notify the node that memory slot is being removed or moved so that it can
+ * drop write-protection for the pages in the memory slot.
+ *
+ * The node should figure out it has any write-protected pages in this slot
+ * by itself.
+ */
+void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	struct kvm_page_track_notifier_head *head;
+	struct kvm_page_track_notifier_node *n;
+	int idx;
+
+	head = &kvm->arch.track_notifier_head;
+
+	if (hlist_empty(&head->track_notifier_list))
+		return;
+
+	idx = srcu_read_lock(&head->track_srcu);
+	hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
+		if (n->track_flush_slot)
+			n->track_flush_slot(kvm, slot);
+	srcu_read_unlock(&head->track_srcu, idx);
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 87f5dbb..f8ae90c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8278,7 +8278,7 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
-	kvm_mmu_invalidate_zap_all_pages(kvm);
+	kvm_page_track_flush_slot(kvm, slot);
 }
 
 static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 2/2] KVM: MMU: apply page track notifier type track_flush_slot
  2016-10-09  7:41 [PATCH 0/2] page track add notifier type track_flush_slot Xiaoguang Chen
  2016-10-09  7:41 ` [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot Xiaoguang Chen
@ 2016-10-09  7:41 ` Xiaoguang Chen
  2016-10-10 17:06 ` [PATCH 0/2] page track add " Paolo Bonzini
  2 siblings, 0 replies; 58+ messages in thread
From: Xiaoguang Chen @ 2016-10-09  7:41 UTC (permalink / raw)
  To: kvm, pbonzini, guangrong.xiao, jike.song; +Cc: Xiaoguang Chen

Register the notifier type track_flush_slot to receive memslot move
and remove event so that we can sync up our shadow page table

Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com>
Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
---
 arch/x86/kvm/mmu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fa7fdd1..8537a21 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4634,11 +4634,18 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
 	init_kvm_mmu(vcpu);
 }
 
+static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
+			struct kvm_memory_slot *slot)
+{
+	kvm_mmu_invalidate_zap_all_pages(kvm);
+}
+
 void kvm_mmu_init_vm(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
 
 	node->track_write = kvm_mmu_pte_write;
+	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-09  7:41 ` [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot Xiaoguang Chen
@ 2016-10-09  8:31   ` Neo Jia
  2016-10-09  8:56     ` Chen, Xiaoguang
  2016-10-10 17:06     ` Paolo Bonzini
  2016-10-12 20:48   ` Radim Krčmář
  1 sibling, 2 replies; 58+ messages in thread
From: Neo Jia @ 2016-10-09  8:31 UTC (permalink / raw)
  To: Xiaoguang Chen; +Cc: kvm, pbonzini, guangrong.xiao, jike.song, Kirti Wankhede

On Sun, Oct 09, 2016 at 03:41:43PM +0800, Xiaoguang Chen wrote:
> When a memory slot is being moved or removed users of page track
> can be notified. So users can drop write-protection for the pages
> in that memory slot.
> 
> This notifier type is needed by KVMGT to sync up its shadow page
> table when memory slot is being moved or removed.

Hi Xiaoguang,

How is this supposed to be used by the kvmgt?

Thanks,
Neo

> 
> Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com>
> Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
> ---
>  arch/x86/include/asm/kvm_page_track.h |  9 +++++++++
>  arch/x86/kvm/page_track.c             | 25 +++++++++++++++++++++++++
>  arch/x86/kvm/x86.c                    |  2 +-
>  3 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index c2b8d24..5f66597 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -32,6 +32,14 @@ struct kvm_page_track_notifier_node {
>  	 */
>  	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>  			    int bytes);
> +	/*
> +	 * It is called when memory slot is being moved or removed
> +	 * users can drop write-protection for the pages in that memory slot
> +	 *
> +	 * @kvm: the kvm where memory slot being moved or removed
> +	 * @slot: the memory slot being moved or removed
> +	 */
> +	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot);
>  };
>  
>  void kvm_page_track_init(struct kvm *kvm);
> @@ -58,4 +66,5 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
>  				   struct kvm_page_track_notifier_node *n);
>  void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>  			  int bytes);
> +void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
>  #endif
> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
> index b431539..e79bb25 100644
> --- a/arch/x86/kvm/page_track.c
> +++ b/arch/x86/kvm/page_track.c
> @@ -225,3 +225,28 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>  			n->track_write(vcpu, gpa, new, bytes);
>  	srcu_read_unlock(&head->track_srcu, idx);
>  }
> +
> +/*
> + * Notify the node that memory slot is being removed or moved so that it can
> + * drop write-protection for the pages in the memory slot.
> + *
> + * The node should figure out it has any write-protected pages in this slot
> + * by itself.
> + */
> +void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
> +{
> +	struct kvm_page_track_notifier_head *head;
> +	struct kvm_page_track_notifier_node *n;
> +	int idx;
> +
> +	head = &kvm->arch.track_notifier_head;
> +
> +	if (hlist_empty(&head->track_notifier_list))
> +		return;
> +
> +	idx = srcu_read_lock(&head->track_srcu);
> +	hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
> +		if (n->track_flush_slot)
> +			n->track_flush_slot(kvm, slot);
> +	srcu_read_unlock(&head->track_srcu, idx);
> +}
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 87f5dbb..f8ae90c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8278,7 +8278,7 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> -	kvm_mmu_invalidate_zap_all_pages(kvm);
> +	kvm_page_track_flush_slot(kvm, slot);
>  }
>  
>  static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-09  8:31   ` Neo Jia
@ 2016-10-09  8:56     ` Chen, Xiaoguang
  2016-10-10 17:06     ` Paolo Bonzini
  1 sibling, 0 replies; 58+ messages in thread
From: Chen, Xiaoguang @ 2016-10-09  8:56 UTC (permalink / raw)
  To: Neo Jia; +Cc: kvm, pbonzini, Xiao, Guangrong, Song, Jike, Kirti Wankhede



>-----Original Message-----
>From: Neo Jia [mailto:cjia@nvidia.com]
>Sent: Sunday, October 09, 2016 4:32 PM
>To: Chen, Xiaoguang <xiaoguang.chen@intel.com>
>Cc: kvm@vger.kernel.org; pbonzini@redhat.com; Xiao, Guangrong
><guangrong.xiao@intel.com>; Song, Jike <jike.song@intel.com>; Kirti Wankhede
><kwankhede@nvidia.com>
>Subject: Re: [PATCH 1/2] KVM: page track: add a new notifier type:
>track_flush_slot
>
>On Sun, Oct 09, 2016 at 03:41:43PM +0800, Xiaoguang Chen wrote:
>> When a memory slot is being moved or removed users of page track can
>> be notified. So users can drop write-protection for the pages in that
>> memory slot.
>>
>> This notifier type is needed by KVMGT to sync up its shadow page table
>> when memory slot is being moved or removed.
>
>Hi Xiaoguang,
>
>How is this supposed to be used by the kvmgt?
Hi Neo,
This is related to KVMGT device model.
For KVMGT some ram will be allocated and used as ppgtt page table and the page table is write protected.
We use page track to implement write protect. But get problem while memslot is moved or removed.
We must clear the write protect for the pages in the memslot or there will be errors while ppgtt do cleanup.
So KVMGT must register the track_flush_slot to get notified while memslot is moved or removed.

>
>Thanks,
>Neo
>
>>
>> Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com>
>> Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
>> ---
>>  arch/x86/include/asm/kvm_page_track.h |  9 +++++++++
>>  arch/x86/kvm/page_track.c             | 25 +++++++++++++++++++++++++
>>  arch/x86/kvm/x86.c                    |  2 +-
>>  3 files changed, 35 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_page_track.h
>> b/arch/x86/include/asm/kvm_page_track.h
>> index c2b8d24..5f66597 100644
>> --- a/arch/x86/include/asm/kvm_page_track.h
>> +++ b/arch/x86/include/asm/kvm_page_track.h
>> @@ -32,6 +32,14 @@ struct kvm_page_track_notifier_node {
>>  	 */
>>  	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>>  			    int bytes);
>> +	/*
>> +	 * It is called when memory slot is being moved or removed
>> +	 * users can drop write-protection for the pages in that memory slot
>> +	 *
>> +	 * @kvm: the kvm where memory slot being moved or removed
>> +	 * @slot: the memory slot being moved or removed
>> +	 */
>> +	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot
>> +*slot);
>>  };
>>
>>  void kvm_page_track_init(struct kvm *kvm); @@ -58,4 +66,5 @@
>> kvm_page_track_unregister_notifier(struct kvm *kvm,
>>  				   struct kvm_page_track_notifier_node *n);
>void
>> kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>>  			  int bytes);
>> +void kvm_page_track_flush_slot(struct kvm *kvm, struct
>> +kvm_memory_slot *slot);
>>  #endif
>> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
>> index b431539..e79bb25 100644
>> --- a/arch/x86/kvm/page_track.c
>> +++ b/arch/x86/kvm/page_track.c
>> @@ -225,3 +225,28 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu,
>gpa_t gpa, const u8 *new,
>>  			n->track_write(vcpu, gpa, new, bytes);
>>  	srcu_read_unlock(&head->track_srcu, idx);  }
>> +
>> +/*
>> + * Notify the node that memory slot is being removed or moved so that
>> +it can
>> + * drop write-protection for the pages in the memory slot.
>> + *
>> + * The node should figure out it has any write-protected pages in
>> +this slot
>> + * by itself.
>> + */
>> +void kvm_page_track_flush_slot(struct kvm *kvm, struct
>> +kvm_memory_slot *slot) {
>> +	struct kvm_page_track_notifier_head *head;
>> +	struct kvm_page_track_notifier_node *n;
>> +	int idx;
>> +
>> +	head = &kvm->arch.track_notifier_head;
>> +
>> +	if (hlist_empty(&head->track_notifier_list))
>> +		return;
>> +
>> +	idx = srcu_read_lock(&head->track_srcu);
>> +	hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
>> +		if (n->track_flush_slot)
>> +			n->track_flush_slot(kvm, slot);
>> +	srcu_read_unlock(&head->track_srcu, idx); }
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
>> 87f5dbb..f8ae90c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -8278,7 +8278,7 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
>> void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>>  				   struct kvm_memory_slot *slot)
>>  {
>> -	kvm_mmu_invalidate_zap_all_pages(kvm);
>> +	kvm_page_track_flush_slot(kvm, slot);
>>  }
>>
>>  static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>> --
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in the
>> body of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] page track add notifier type track_flush_slot
  2016-10-09  7:41 [PATCH 0/2] page track add notifier type track_flush_slot Xiaoguang Chen
  2016-10-09  7:41 ` [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot Xiaoguang Chen
  2016-10-09  7:41 ` [PATCH 2/2] KVM: MMU: apply page track notifier type track_flush_slot Xiaoguang Chen
@ 2016-10-10 17:06 ` Paolo Bonzini
  2016-10-11  2:43   ` Xiao Guangrong
  2 siblings, 1 reply; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-10 17:06 UTC (permalink / raw)
  To: Xiaoguang Chen, kvm, guangrong.xiao, jike.song



On 09/10/2016 09:41, Xiaoguang Chen wrote:
> The seires is to add a new notifer type track_flush_slot for page track.
> By using this notifer type when a memory slot is being moved or removed
> users of page track can be notified.
> 
> This notifier type is needed by KVMGT to sync up its shadow page table
> when memory slot is being moved or removed.
> 
> Xiaoguang Chen (2):
>   KVM: page track: add a new notifier type: track_flush_slot
>   KVM: MMU: apply page track notifier type track_flush_slot
> 
>  arch/x86/include/asm/kvm_page_track.h |  9 +++++++++
>  arch/x86/kvm/mmu.c                    |  7 +++++++
>  arch/x86/kvm/page_track.c             | 25 +++++++++++++++++++++++++
>  arch/x86/kvm/x86.c                    |  2 +-
>  4 files changed, 42 insertions(+), 1 deletion(-)
> 

Hi,

the two patches should be squashed for bisectability (alternatively, do
not remove the call to kvm_mmu_invalidate_zap_all_pages in patch 1, and
only drop it in patch 2).

Paolo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-09  8:31   ` Neo Jia
  2016-10-09  8:56     ` Chen, Xiaoguang
@ 2016-10-10 17:06     ` Paolo Bonzini
  2016-10-10 18:01       ` Neo Jia
  1 sibling, 1 reply; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-10 17:06 UTC (permalink / raw)
  To: Neo Jia, Xiaoguang Chen; +Cc: kvm, guangrong.xiao, jike.song, Kirti Wankhede



On 09/10/2016 10:31, Neo Jia wrote:
> On Sun, Oct 09, 2016 at 03:41:43PM +0800, Xiaoguang Chen wrote:
>> When a memory slot is being moved or removed users of page track
>> can be notified. So users can drop write-protection for the pages
>> in that memory slot.
>>
>> This notifier type is needed by KVMGT to sync up its shadow page
>> table when memory slot is being moved or removed.
> 
> Hi Xiaoguang,
> 
> How is this supposed to be used by the kvmgt?

Hi Neo,

AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
while nVidia does.

Paolo

> Thanks,
> Neo
> 
>>
>> Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com>
>> Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
>> ---
>>  arch/x86/include/asm/kvm_page_track.h |  9 +++++++++
>>  arch/x86/kvm/page_track.c             | 25 +++++++++++++++++++++++++
>>  arch/x86/kvm/x86.c                    |  2 +-
>>  3 files changed, 35 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
>> index c2b8d24..5f66597 100644
>> --- a/arch/x86/include/asm/kvm_page_track.h
>> +++ b/arch/x86/include/asm/kvm_page_track.h
>> @@ -32,6 +32,14 @@ struct kvm_page_track_notifier_node {
>>  	 */
>>  	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>>  			    int bytes);
>> +	/*
>> +	 * It is called when memory slot is being moved or removed
>> +	 * users can drop write-protection for the pages in that memory slot
>> +	 *
>> +	 * @kvm: the kvm where memory slot being moved or removed
>> +	 * @slot: the memory slot being moved or removed
>> +	 */
>> +	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot);
>>  };
>>  
>>  void kvm_page_track_init(struct kvm *kvm);
>> @@ -58,4 +66,5 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
>>  				   struct kvm_page_track_notifier_node *n);
>>  void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>>  			  int bytes);
>> +void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
>>  #endif
>> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
>> index b431539..e79bb25 100644
>> --- a/arch/x86/kvm/page_track.c
>> +++ b/arch/x86/kvm/page_track.c
>> @@ -225,3 +225,28 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>>  			n->track_write(vcpu, gpa, new, bytes);
>>  	srcu_read_unlock(&head->track_srcu, idx);
>>  }
>> +
>> +/*
>> + * Notify the node that memory slot is being removed or moved so that it can
>> + * drop write-protection for the pages in the memory slot.
>> + *
>> + * The node should figure out it has any write-protected pages in this slot
>> + * by itself.
>> + */
>> +void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
>> +{
>> +	struct kvm_page_track_notifier_head *head;
>> +	struct kvm_page_track_notifier_node *n;
>> +	int idx;
>> +
>> +	head = &kvm->arch.track_notifier_head;
>> +
>> +	if (hlist_empty(&head->track_notifier_list))
>> +		return;
>> +
>> +	idx = srcu_read_lock(&head->track_srcu);
>> +	hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
>> +		if (n->track_flush_slot)
>> +			n->track_flush_slot(kvm, slot);
>> +	srcu_read_unlock(&head->track_srcu, idx);
>> +}
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 87f5dbb..f8ae90c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -8278,7 +8278,7 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
>>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>>  				   struct kvm_memory_slot *slot)
>>  {
>> -	kvm_mmu_invalidate_zap_all_pages(kvm);
>> +	kvm_page_track_flush_slot(kvm, slot);
>>  }
>>  
>>  static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>> -- 
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-10 17:06     ` Paolo Bonzini
@ 2016-10-10 18:01       ` Neo Jia
  2016-10-10 18:32         ` Paolo Bonzini
  0 siblings, 1 reply; 58+ messages in thread
From: Neo Jia @ 2016-10-10 18:01 UTC (permalink / raw)
  To: Paolo Bonzini, Xiaoguang Chen
  Cc: kvm, guangrong.xiao, jike.song, Kirti Wankhede

On Mon, Oct 10, 2016 at 07:06:27PM +0200, Paolo Bonzini wrote:
> 
> 
> On 09/10/2016 10:31, Neo Jia wrote:
> > On Sun, Oct 09, 2016 at 03:41:43PM +0800, Xiaoguang Chen wrote:
> >> When a memory slot is being moved or removed users of page track
> >> can be notified. So users can drop write-protection for the pages
> >> in that memory slot.
> >>
> >> This notifier type is needed by KVMGT to sync up its shadow page
> >> table when memory slot is being moved or removed.
> > 
> > Hi Xiaoguang,
> > 
> > How is this supposed to be used by the kvmgt?
> 
> Hi Neo,
> 
> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> while nVidia does.

> "From Xiaoguang:

> This is related to KVMGT device model.
> For KVMGT some ram will be allocated and used as ppgtt page table and
> the page table is write protected.
> We use page track to implement write protect. But get problem while
> memslot is moved or removed.
> We must clear the write protect for the pages in the memslot or there
> will be errors while ppgtt do cleanup.
> So KVMGT must register the track_flush_slot to get notified while
> memslot is moved or removed."

(merging thread)

Hi Paolo and Xiaoguang,

I am just wondering how device driver can register a notifier so he can be 
notified for write-protected pages when writes are happening.

Thanks,
Neo

> 
> Paolo
> 
> > Thanks,
> > Neo
> > 
> >>
> >> Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com>
> >> Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
> >> ---
> >>  arch/x86/include/asm/kvm_page_track.h |  9 +++++++++
> >>  arch/x86/kvm/page_track.c             | 25 +++++++++++++++++++++++++
> >>  arch/x86/kvm/x86.c                    |  2 +-
> >>  3 files changed, 35 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> >> index c2b8d24..5f66597 100644
> >> --- a/arch/x86/include/asm/kvm_page_track.h
> >> +++ b/arch/x86/include/asm/kvm_page_track.h
> >> @@ -32,6 +32,14 @@ struct kvm_page_track_notifier_node {
> >>  	 */
> >>  	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> >>  			    int bytes);
> >> +	/*
> >> +	 * It is called when memory slot is being moved or removed
> >> +	 * users can drop write-protection for the pages in that memory slot
> >> +	 *
> >> +	 * @kvm: the kvm where memory slot being moved or removed
> >> +	 * @slot: the memory slot being moved or removed
> >> +	 */
> >> +	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot);
> >>  };
> >>  
> >>  void kvm_page_track_init(struct kvm *kvm);
> >> @@ -58,4 +66,5 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
> >>  				   struct kvm_page_track_notifier_node *n);
> >>  void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> >>  			  int bytes);
> >> +void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
> >>  #endif
> >> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
> >> index b431539..e79bb25 100644
> >> --- a/arch/x86/kvm/page_track.c
> >> +++ b/arch/x86/kvm/page_track.c
> >> @@ -225,3 +225,28 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> >>  			n->track_write(vcpu, gpa, new, bytes);
> >>  	srcu_read_unlock(&head->track_srcu, idx);
> >>  }
> >> +
> >> +/*
> >> + * Notify the node that memory slot is being removed or moved so that it can
> >> + * drop write-protection for the pages in the memory slot.
> >> + *
> >> + * The node should figure out it has any write-protected pages in this slot
> >> + * by itself.
> >> + */
> >> +void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
> >> +{
> >> +	struct kvm_page_track_notifier_head *head;
> >> +	struct kvm_page_track_notifier_node *n;
> >> +	int idx;
> >> +
> >> +	head = &kvm->arch.track_notifier_head;
> >> +
> >> +	if (hlist_empty(&head->track_notifier_list))
> >> +		return;
> >> +
> >> +	idx = srcu_read_lock(&head->track_srcu);
> >> +	hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
> >> +		if (n->track_flush_slot)
> >> +			n->track_flush_slot(kvm, slot);
> >> +	srcu_read_unlock(&head->track_srcu, idx);
> >> +}
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index 87f5dbb..f8ae90c 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -8278,7 +8278,7 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
> >>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> >>  				   struct kvm_memory_slot *slot)
> >>  {
> >> -	kvm_mmu_invalidate_zap_all_pages(kvm);
> >> +	kvm_page_track_flush_slot(kvm, slot);
> >>  }
> >>  
> >>  static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
> >> -- 
> >> 1.9.1
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-10 18:01       ` Neo Jia
@ 2016-10-10 18:32         ` Paolo Bonzini
  2016-10-11  2:39           ` Xiao Guangrong
  0 siblings, 1 reply; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-10 18:32 UTC (permalink / raw)
  To: Neo Jia, Xiaoguang Chen; +Cc: kvm, guangrong.xiao, jike.song, Kirti Wankhede



On 10/10/2016 20:01, Neo Jia wrote:
>> Hi Neo,
>>
>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
>> while nVidia does.
> 
> Hi Paolo and Xiaoguang,
> 
> I am just wondering how device driver can register a notifier so he can be 
> notified for write-protected pages when writes are happening.

It can't yet, but the API is ready for that.  kvm_vfio_set_group is
currently where a struct kvm_device* and struct vfio_group* touch. Given
a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
kvm_page_track_register_notifier.  So I guess you could add a callback
that passes the struct kvm_device* to the mdev device.

Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
at KVM Forum but I don't remember the details.

Paolo

> Thanks,
> Neo
> 
>>
>> Paolo
>>
>>> Thanks,
>>> Neo
>>>
>>>>
>>>> Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com>
>>>> Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
>>>> ---
>>>>  arch/x86/include/asm/kvm_page_track.h |  9 +++++++++
>>>>  arch/x86/kvm/page_track.c             | 25 +++++++++++++++++++++++++
>>>>  arch/x86/kvm/x86.c                    |  2 +-
>>>>  3 files changed, 35 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
>>>> index c2b8d24..5f66597 100644
>>>> --- a/arch/x86/include/asm/kvm_page_track.h
>>>> +++ b/arch/x86/include/asm/kvm_page_track.h
>>>> @@ -32,6 +32,14 @@ struct kvm_page_track_notifier_node {
>>>>  	 */
>>>>  	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>>>>  			    int bytes);
>>>> +	/*
>>>> +	 * It is called when memory slot is being moved or removed
>>>> +	 * users can drop write-protection for the pages in that memory slot
>>>> +	 *
>>>> +	 * @kvm: the kvm where memory slot being moved or removed
>>>> +	 * @slot: the memory slot being moved or removed
>>>> +	 */
>>>> +	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot);
>>>>  };
>>>>  
>>>>  void kvm_page_track_init(struct kvm *kvm);
>>>> @@ -58,4 +66,5 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
>>>>  				   struct kvm_page_track_notifier_node *n);
>>>>  void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>>>>  			  int bytes);
>>>> +void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
>>>>  #endif
>>>> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
>>>> index b431539..e79bb25 100644
>>>> --- a/arch/x86/kvm/page_track.c
>>>> +++ b/arch/x86/kvm/page_track.c
>>>> @@ -225,3 +225,28 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>>>>  			n->track_write(vcpu, gpa, new, bytes);
>>>>  	srcu_read_unlock(&head->track_srcu, idx);
>>>>  }
>>>> +
>>>> +/*
>>>> + * Notify the node that memory slot is being removed or moved so that it can
>>>> + * drop write-protection for the pages in the memory slot.
>>>> + *
>>>> + * The node should figure out it has any write-protected pages in this slot
>>>> + * by itself.
>>>> + */
>>>> +void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
>>>> +{
>>>> +	struct kvm_page_track_notifier_head *head;
>>>> +	struct kvm_page_track_notifier_node *n;
>>>> +	int idx;
>>>> +
>>>> +	head = &kvm->arch.track_notifier_head;
>>>> +
>>>> +	if (hlist_empty(&head->track_notifier_list))
>>>> +		return;
>>>> +
>>>> +	idx = srcu_read_lock(&head->track_srcu);
>>>> +	hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
>>>> +		if (n->track_flush_slot)
>>>> +			n->track_flush_slot(kvm, slot);
>>>> +	srcu_read_unlock(&head->track_srcu, idx);
>>>> +}
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index 87f5dbb..f8ae90c 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -8278,7 +8278,7 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
>>>>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>>>>  				   struct kvm_memory_slot *slot)
>>>>  {
>>>> -	kvm_mmu_invalidate_zap_all_pages(kvm);
>>>> +	kvm_page_track_flush_slot(kvm, slot);
>>>>  }
>>>>  
>>>>  static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>>>> -- 
>>>> 1.9.1
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-10 18:32         ` Paolo Bonzini
@ 2016-10-11  2:39           ` Xiao Guangrong
  2016-10-11  8:54             ` Paolo Bonzini
  0 siblings, 1 reply; 58+ messages in thread
From: Xiao Guangrong @ 2016-10-11  2:39 UTC (permalink / raw)
  To: Paolo Bonzini, Neo Jia, Xiaoguang Chen
  Cc: kvm, guangrong.xiao, jike.song, Kirti Wankhede



On 10/11/2016 02:32 AM, Paolo Bonzini wrote:
>
>
> On 10/10/2016 20:01, Neo Jia wrote:
>>> Hi Neo,
>>>
>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
>>> while nVidia does.
>>
>> Hi Paolo and Xiaoguang,
>>
>> I am just wondering how device driver can register a notifier so he can be
>> notified for write-protected pages when writes are happening.
>
> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> currently where a struct kvm_device* and struct vfio_group* touch. Given
> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> kvm_page_track_register_notifier.  So I guess you could add a callback
> that passes the struct kvm_device* to the mdev device.
>
> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> at KVM Forum but I don't remember the details.
>

Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
figure out the kvm instance based on the fd.

We got a new idea, how about search the kvm instance by mm_struct, it
can work as KVMGT is running in the vcpu context and it is much more
straightforward.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] page track add notifier type track_flush_slot
  2016-10-10 17:06 ` [PATCH 0/2] page track add " Paolo Bonzini
@ 2016-10-11  2:43   ` Xiao Guangrong
  2016-10-11  8:55     ` Paolo Bonzini
  0 siblings, 1 reply; 58+ messages in thread
From: Xiao Guangrong @ 2016-10-11  2:43 UTC (permalink / raw)
  To: Paolo Bonzini, Xiaoguang Chen, kvm, guangrong.xiao, jike.song



On 10/11/2016 01:06 AM, Paolo Bonzini wrote:
>
>
> On 09/10/2016 09:41, Xiaoguang Chen wrote:
>> The seires is to add a new notifer type track_flush_slot for page track.
>> By using this notifer type when a memory slot is being moved or removed
>> users of page track can be notified.
>>
>> This notifier type is needed by KVMGT to sync up its shadow page table
>> when memory slot is being moved or removed.
>>
>> Xiaoguang Chen (2):
>>   KVM: page track: add a new notifier type: track_flush_slot
>>   KVM: MMU: apply page track notifier type track_flush_slot
>>
>>  arch/x86/include/asm/kvm_page_track.h |  9 +++++++++
>>  arch/x86/kvm/mmu.c                    |  7 +++++++
>>  arch/x86/kvm/page_track.c             | 25 +++++++++++++++++++++++++
>>  arch/x86/kvm/x86.c                    |  2 +-
>>  4 files changed, 42 insertions(+), 1 deletion(-)
>>
>
> Hi,
>
> the two patches should be squashed for bisectability (alternatively, do
> not remove the call to kvm_mmu_invalidate_zap_all_pages in patch 1, and
> only drop it in patch 2).

Indeed.

Sorry for the carelessness, we will adapt these patches.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-11  2:39           ` Xiao Guangrong
@ 2016-10-11  8:54             ` Paolo Bonzini
  2016-10-11  9:21               ` Xiao Guangrong
  0 siblings, 1 reply; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-11  8:54 UTC (permalink / raw)
  To: Xiao Guangrong, Neo Jia, Xiaoguang Chen
  Cc: kvm, guangrong.xiao, jike.song, Kirti Wankhede



On 11/10/2016 04:39, Xiao Guangrong wrote:
> 
> 
> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:
>>
>>
>> On 10/10/2016 20:01, Neo Jia wrote:
>>>> Hi Neo,
>>>>
>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
>>>> while nVidia does.
>>>
>>> Hi Paolo and Xiaoguang,
>>>
>>> I am just wondering how device driver can register a notifier so he
>>> can be
>>> notified for write-protected pages when writes are happening.
>>
>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
>> currently where a struct kvm_device* and struct vfio_group* touch. Given
>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
>> kvm_page_track_register_notifier.  So I guess you could add a callback
>> that passes the struct kvm_device* to the mdev device.
>>
>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
>> at KVM Forum but I don't remember the details.
>>
> 
> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> figure out the kvm instance based on the fd.
> 
> We got a new idea, how about search the kvm instance by mm_struct, it
> can work as KVMGT is running in the vcpu context and it is much more
> straightforward.

Perhaps I didn't understand your suggestion, but the same mm_struct can
have more than 1 struct kvm so I'm not sure that it can work.

Paolo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] page track add notifier type track_flush_slot
  2016-10-11  2:43   ` Xiao Guangrong
@ 2016-10-11  8:55     ` Paolo Bonzini
  2016-10-12 20:52       ` Radim Krčmář
  0 siblings, 1 reply; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-11  8:55 UTC (permalink / raw)
  To: Xiao Guangrong, Xiaoguang Chen, kvm, guangrong.xiao, jike.song



On 11/10/2016 04:43, Xiao Guangrong wrote:
> 
> 
> On 10/11/2016 01:06 AM, Paolo Bonzini wrote:
>>
>>
>> On 09/10/2016 09:41, Xiaoguang Chen wrote:
>>> The seires is to add a new notifer type track_flush_slot for page track.
>>> By using this notifer type when a memory slot is being moved or removed
>>> users of page track can be notified.
>>>
>>> This notifier type is needed by KVMGT to sync up its shadow page table
>>> when memory slot is being moved or removed.
>>>
>>> Xiaoguang Chen (2):
>>>   KVM: page track: add a new notifier type: track_flush_slot
>>>   KVM: MMU: apply page track notifier type track_flush_slot
>>>
>>>  arch/x86/include/asm/kvm_page_track.h |  9 +++++++++
>>>  arch/x86/kvm/mmu.c                    |  7 +++++++
>>>  arch/x86/kvm/page_track.c             | 25 +++++++++++++++++++++++++
>>>  arch/x86/kvm/x86.c                    |  2 +-
>>>  4 files changed, 42 insertions(+), 1 deletion(-)
>>>
>>
>> Hi,
>>
>> the two patches should be squashed for bisectability (alternatively, do
>> not remove the call to kvm_mmu_invalidate_zap_all_pages in patch 1, and
>> only drop it in patch 2).
> 
> Indeed.
> 
> Sorry for the carelessness, we will adapt these patches.

We can squash them too, it's easy.

Paolo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-11  8:54             ` Paolo Bonzini
@ 2016-10-11  9:21               ` Xiao Guangrong
  2016-10-11  9:47                 ` Paolo Bonzini
  0 siblings, 1 reply; 58+ messages in thread
From: Xiao Guangrong @ 2016-10-11  9:21 UTC (permalink / raw)
  To: Paolo Bonzini, Neo Jia, Xiaoguang Chen
  Cc: kvm, guangrong.xiao, jike.song, Kirti Wankhede



On 10/11/2016 04:54 PM, Paolo Bonzini wrote:
>
>
> On 11/10/2016 04:39, Xiao Guangrong wrote:
>>
>>
>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:
>>>
>>>
>>> On 10/10/2016 20:01, Neo Jia wrote:
>>>>> Hi Neo,
>>>>>
>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
>>>>> while nVidia does.
>>>>
>>>> Hi Paolo and Xiaoguang,
>>>>
>>>> I am just wondering how device driver can register a notifier so he
>>>> can be
>>>> notified for write-protected pages when writes are happening.
>>>
>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
>>> currently where a struct kvm_device* and struct vfio_group* touch. Given
>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
>>> kvm_page_track_register_notifier.  So I guess you could add a callback
>>> that passes the struct kvm_device* to the mdev device.
>>>
>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
>>> at KVM Forum but I don't remember the details.
>>>
>>
>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
>> figure out the kvm instance based on the fd.
>>
>> We got a new idea, how about search the kvm instance by mm_struct, it
>> can work as KVMGT is running in the vcpu context and it is much more
>> straightforward.
>
> Perhaps I didn't understand your suggestion, but the same mm_struct can
> have more than 1 struct kvm so I'm not sure that it can work.

vcpu->pid is valid during vcpu running so that it can be used to figure
out which kvm instance owns the vcpu whose pid is the one as current
thread, i think it can work. :)




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-11  9:21               ` Xiao Guangrong
@ 2016-10-11  9:47                 ` Paolo Bonzini
  2016-10-14 10:37                     ` [Qemu-devel] " Jike Song
  0 siblings, 1 reply; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-11  9:47 UTC (permalink / raw)
  To: Xiao Guangrong, Neo Jia, Xiaoguang Chen
  Cc: kvm, guangrong.xiao, jike.song, Kirti Wankhede



On 11/10/2016 11:21, Xiao Guangrong wrote:
> 
> 
> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:
>>
>>
>> On 11/10/2016 04:39, Xiao Guangrong wrote:
>>>
>>>
>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 10/10/2016 20:01, Neo Jia wrote:
>>>>>> Hi Neo,
>>>>>>
>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
>>>>>> while nVidia does.
>>>>>
>>>>> Hi Paolo and Xiaoguang,
>>>>>
>>>>> I am just wondering how device driver can register a notifier so he
>>>>> can be
>>>>> notified for write-protected pages when writes are happening.
>>>>
>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
>>>> currently where a struct kvm_device* and struct vfio_group* touch.
>>>> Given
>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
>>>> that passes the struct kvm_device* to the mdev device.
>>>>
>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
>>>> at KVM Forum but I don't remember the details.
>>>
>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
>>> figure out the kvm instance based on the fd.
>>>
>>> We got a new idea, how about search the kvm instance by mm_struct, it
>>> can work as KVMGT is running in the vcpu context and it is much more
>>> straightforward.
>>
>> Perhaps I didn't understand your suggestion, but the same mm_struct can
>> have more than 1 struct kvm so I'm not sure that it can work.
> 
> vcpu->pid is valid during vcpu running so that it can be used to figure
> out which kvm instance owns the vcpu whose pid is the one as current
> thread, i think it can work. :)

No, don't do that.  There's no reason for a thread to run a single VCPU,
and if you can have multiple VCPUs you can also have multiple VCPUs from
multiple VMs.

Passing file descriptors around are the right way to connect subsystems.

Paolo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-09  7:41 ` [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot Xiaoguang Chen
  2016-10-09  8:31   ` Neo Jia
@ 2016-10-12 20:48   ` Radim Krčmář
  1 sibling, 0 replies; 58+ messages in thread
From: Radim Krčmář @ 2016-10-12 20:48 UTC (permalink / raw)
  To: Xiaoguang Chen; +Cc: kvm, pbonzini, guangrong.xiao, jike.song

2016-10-09 15:41+0800, Xiaoguang Chen:
> When a memory slot is being moved or removed users of page track
> can be notified. So users can drop write-protection for the pages
> in that memory slot.
> 
> This notifier type is needed by KVMGT to sync up its shadow page
> table when memory slot is being moved or removed.
> 
> Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com>
> Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
> ---
> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
> @@ -225,3 +225,28 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> +
> +/*
> + * Notify the node that memory slot is being removed or moved so that it can
> + * drop write-protection for the pages in the memory slot.
> + *
> + * The node should figure out it has any write-protected pages in this slot
> + * by itself.
> + */
> +void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
> +{
> +	struct kvm_page_track_notifier_head *head;
> +	struct kvm_page_track_notifier_node *n;
> +	int idx;
> +
> +	head = &kvm->arch.track_notifier_head;
> +
> +	if (hlist_empty(&head->track_notifier_list))
> +		return;
> +
> +	idx = srcu_read_lock(&head->track_srcu);
> +	hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
> +		if (n->track_flush_slot)
> +			n->track_flush_slot(kvm, slot);
> +	srcu_read_unlock(&head->track_srcu, idx);
> +}

We repeat the same drill for the other page_track_notifier as well ...
I was thinking it would be nice to have something like:

  void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
  {
  	struct kvm_page_track_notifier_node *n;
  	int i;

  	kvm_for_each_track_notifier(n, &kvm->arch.track_notifier_head, i)
  		if (n->track_flush_slot)
  			n->track_flush_slot(kvm, slot);
  }

which requires this monster:

  #define kvm_for_each_track_notifier(notifier, head, tmp) \
  	for (tmp = !hlist_empty(&(head)->track_notifier_list); \
  	     tmp && ({tmp = srcu_read_lock(&(head)->track_srcu); true;}); \
  	     srcu_read_unlock(&(head)->track_srcu, tmp), tmp = 0) \
  		hlist_for_each_entry_rcu(notifier, &(head)->track_notifier_list, node)

so waiting for more notifiers doesn't seem that bad. :)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/2] page track add notifier type track_flush_slot
  2016-10-11  8:55     ` Paolo Bonzini
@ 2016-10-12 20:52       ` Radim Krčmář
  0 siblings, 0 replies; 58+ messages in thread
From: Radim Krčmář @ 2016-10-12 20:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, Xiaoguang Chen, kvm, guangrong.xiao, jike.song

2016-10-11 10:55+0200, Paolo Bonzini:
> On 11/10/2016 04:43, Xiao Guangrong wrote:
>> On 10/11/2016 01:06 AM, Paolo Bonzini wrote:
>>> On 09/10/2016 09:41, Xiaoguang Chen wrote:
>>>> The seires is to add a new notifer type track_flush_slot for page track.
>>>> By using this notifer type when a memory slot is being moved or removed
>>>> users of page track can be notified.
>>>>
>>>> This notifier type is needed by KVMGT to sync up its shadow page table
>>>> when memory slot is being moved or removed.
>>>>
>>>> Xiaoguang Chen (2):
>>>>   KVM: page track: add a new notifier type: track_flush_slot
>>>>   KVM: MMU: apply page track notifier type track_flush_slot
>>>>
>>>>  arch/x86/include/asm/kvm_page_track.h |  9 +++++++++
>>>>  arch/x86/kvm/mmu.c                    |  7 +++++++
>>>>  arch/x86/kvm/page_track.c             | 25 +++++++++++++++++++++++++
>>>>  arch/x86/kvm/x86.c                    |  2 +-
>>>>  4 files changed, 42 insertions(+), 1 deletion(-)
>>>>
>>>
>>> Hi,
>>>
>>> the two patches should be squashed for bisectability (alternatively, do
>>> not remove the call to kvm_mmu_invalidate_zap_all_pages in patch 1, and
>>> only drop it in patch 2).
>> 
>> Indeed.
>> 
>> Sorry for the carelessness, we will adapt these patches.
> 
> We can squash them too, it's easy.

I did that and applied to kvm/queue, thanks.

(Please check whether the result is agreeable.)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-11  9:47                 ` Paolo Bonzini
@ 2016-10-14 10:37                     ` Jike Song
  0 siblings, 0 replies; 58+ messages in thread
From: Jike Song @ 2016-10-14 10:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, Neo Jia, Xiaoguang Chen, kvm, guangrong.xiao,
	Kirti Wankhede, Alex Williamson, Tian, Kevin, qemu-devel

On 10/11/2016 05:47 PM, Paolo Bonzini wrote:
> 
> 
> On 11/10/2016 11:21, Xiao Guangrong wrote:
>>
>>
>> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:
>>>
>>>
>>> On 11/10/2016 04:39, Xiao Guangrong wrote:
>>>>
>>>>
>>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:
>>>>>
>>>>>
>>>>> On 10/10/2016 20:01, Neo Jia wrote:
>>>>>>> Hi Neo,
>>>>>>>
>>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
>>>>>>> while nVidia does.
>>>>>>
>>>>>> Hi Paolo and Xiaoguang,
>>>>>>
>>>>>> I am just wondering how device driver can register a notifier so he
>>>>>> can be
>>>>>> notified for write-protected pages when writes are happening.
>>>>>
>>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
>>>>> currently where a struct kvm_device* and struct vfio_group* touch.
>>>>> Given
>>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
>>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
>>>>> that passes the struct kvm_device* to the mdev device.
>>>>>
>>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
>>>>> at KVM Forum but I don't remember the details.
>>>>
>>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
>>>> figure out the kvm instance based on the fd.
>>>>
>>>> We got a new idea, how about search the kvm instance by mm_struct, it
>>>> can work as KVMGT is running in the vcpu context and it is much more
>>>> straightforward.
>>>
>>> Perhaps I didn't understand your suggestion, but the same mm_struct can
>>> have more than 1 struct kvm so I'm not sure that it can work.
>>
>> vcpu->pid is valid during vcpu running so that it can be used to figure
>> out which kvm instance owns the vcpu whose pid is the one as current
>> thread, i think it can work. :)
> 
> No, don't do that.  There's no reason for a thread to run a single VCPU,
> and if you can have multiple VCPUs you can also have multiple VCPUs from
> multiple VMs.
> 
> Passing file descriptors around are the right way to connect subsystems.

[CC Alex, Kevin and Qemu-devel]

Hi Paolo & Alex,

IIUC, passing file descriptors means touching QEMU and the UAPI between
QEMU and VFIO. Would you guys have a look at below draft patch? If it's
on the correct direction, I'll send the split ones. Thanks!

--
Thanks,
Jike


diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index bec694c..f715d37 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -10,12 +10,14 @@
  * the COPYING file in the top-level directory.
  */
 
+#include <sys/ioctl.h>
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
 #include "qemu/range.h"
 #include "qapi/error.h"
 #include "hw/nvram/fw_cfg.h"
 #include "pci.h"
+#include "sysemu/kvm.h"
 #include "trace.h"
 
 /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
@@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
         break;
     }
 }
+
+void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
+{
+    int vmfd;
+
+    if (!kvm_enabled() || !vdev->kvmgt)
+        return;
+
+    /* Tell the device what KVM it attached */
+    vmfd = kvm_get_vmfd(kvm_state);
+    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
+}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a5a620a..8732552 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
         return ret;
     }
 
+    vfio_quirk_kvmgt(vdev);
+
     /* Get a copy of config space */
     ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
                 MIN(pci_config_size(&vdev->pdev), vdev->config_size),
@@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
                        sub_device_id, PCI_ANY_ID),
     DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
+    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),
     /*
      * TODO - support passed fds... is this necessary?
      * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 7d482d9..813832c 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -143,6 +143,7 @@ typedef struct VFIOPCIDevice {
     bool no_kvm_intx;
     bool no_kvm_msi;
     bool no_kvm_msix;
+    bool kvmgt;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
@@ -166,4 +167,6 @@ int vfio_populate_vga(VFIOPCIDevice *vdev);
 int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
                                struct vfio_region_info *info);
 
+void vfio_quirk_kvmgt(VFIOPCIDevice *vdev);
+
 #endif /* HW_VFIO_VFIO_PCI_H */
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index df67cc0..dd8320a 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -254,6 +254,7 @@ void phys_mem_set_alloc(void *(*alloc)(size_t, uint64_t *align));
 int kvm_ioctl(KVMState *s, int type, ...);
 
 int kvm_vm_ioctl(KVMState *s, int type, ...);
+int kvm_get_vmfd(KVMState *s);
 
 int kvm_vcpu_ioctl(CPUState *cpu, int type, ...);
 
diff --git a/kvm-all.c b/kvm-all.c
index efb5fe3..bd72ce3 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -2065,6 +2065,11 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
     return ret;
 }
 
+int kvm_get_vmfd(KVMState *s)
+{
+	return s->vmfd;
+}
+
 int kvm_vcpu_ioctl(CPUState *cpu, int type, ...)
 {
     int ret;
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 759b850..952303f 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -686,6 +686,12 @@ struct vfio_iommu_spapr_tce_remove {
 };
 #define VFIO_IOMMU_SPAPR_TCE_REMOVE	_IO(VFIO_TYPE, VFIO_BASE + 20)
 
+
+/**
+ * VFIO_SET_KVMFD - _IO(VFIO_TYPE, VFIO_BASE + 21, __u32)
+ */
+#define VFIO_SET_KVMFD		_IO(VFIO_TYPE, VFIO_BASE + 21)
+
 /* ***************************************************************** */
 
 #endif /* VFIO_H */

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-14 10:37                     ` Jike Song
  0 siblings, 0 replies; 58+ messages in thread
From: Jike Song @ 2016-10-14 10:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, Neo Jia, Xiaoguang Chen, kvm, guangrong.xiao,
	Kirti Wankhede, Alex Williamson, Tian, Kevin, qemu-devel

On 10/11/2016 05:47 PM, Paolo Bonzini wrote:
> 
> 
> On 11/10/2016 11:21, Xiao Guangrong wrote:
>>
>>
>> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:
>>>
>>>
>>> On 11/10/2016 04:39, Xiao Guangrong wrote:
>>>>
>>>>
>>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:
>>>>>
>>>>>
>>>>> On 10/10/2016 20:01, Neo Jia wrote:
>>>>>>> Hi Neo,
>>>>>>>
>>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
>>>>>>> while nVidia does.
>>>>>>
>>>>>> Hi Paolo and Xiaoguang,
>>>>>>
>>>>>> I am just wondering how device driver can register a notifier so he
>>>>>> can be
>>>>>> notified for write-protected pages when writes are happening.
>>>>>
>>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
>>>>> currently where a struct kvm_device* and struct vfio_group* touch.
>>>>> Given
>>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
>>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
>>>>> that passes the struct kvm_device* to the mdev device.
>>>>>
>>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
>>>>> at KVM Forum but I don't remember the details.
>>>>
>>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
>>>> figure out the kvm instance based on the fd.
>>>>
>>>> We got a new idea, how about search the kvm instance by mm_struct, it
>>>> can work as KVMGT is running in the vcpu context and it is much more
>>>> straightforward.
>>>
>>> Perhaps I didn't understand your suggestion, but the same mm_struct can
>>> have more than 1 struct kvm so I'm not sure that it can work.
>>
>> vcpu->pid is valid during vcpu running so that it can be used to figure
>> out which kvm instance owns the vcpu whose pid is the one as current
>> thread, i think it can work. :)
> 
> No, don't do that.  There's no reason for a thread to run a single VCPU,
> and if you can have multiple VCPUs you can also have multiple VCPUs from
> multiple VMs.
> 
> Passing file descriptors around are the right way to connect subsystems.

[CC Alex, Kevin and Qemu-devel]

Hi Paolo & Alex,

IIUC, passing file descriptors means touching QEMU and the UAPI between
QEMU and VFIO. Would you guys have a look at below draft patch? If it's
on the correct direction, I'll send the split ones. Thanks!

--
Thanks,
Jike


diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index bec694c..f715d37 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -10,12 +10,14 @@
  * the COPYING file in the top-level directory.
  */
 
+#include <sys/ioctl.h>
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
 #include "qemu/range.h"
 #include "qapi/error.h"
 #include "hw/nvram/fw_cfg.h"
 #include "pci.h"
+#include "sysemu/kvm.h"
 #include "trace.h"
 
 /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
@@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
         break;
     }
 }
+
+void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
+{
+    int vmfd;
+
+    if (!kvm_enabled() || !vdev->kvmgt)
+        return;
+
+    /* Tell the device what KVM it attached */
+    vmfd = kvm_get_vmfd(kvm_state);
+    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
+}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a5a620a..8732552 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
         return ret;
     }
 
+    vfio_quirk_kvmgt(vdev);
+
     /* Get a copy of config space */
     ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
                 MIN(pci_config_size(&vdev->pdev), vdev->config_size),
@@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
                        sub_device_id, PCI_ANY_ID),
     DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
+    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),
     /*
      * TODO - support passed fds... is this necessary?
      * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 7d482d9..813832c 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -143,6 +143,7 @@ typedef struct VFIOPCIDevice {
     bool no_kvm_intx;
     bool no_kvm_msi;
     bool no_kvm_msix;
+    bool kvmgt;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
@@ -166,4 +167,6 @@ int vfio_populate_vga(VFIOPCIDevice *vdev);
 int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
                                struct vfio_region_info *info);
 
+void vfio_quirk_kvmgt(VFIOPCIDevice *vdev);
+
 #endif /* HW_VFIO_VFIO_PCI_H */
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index df67cc0..dd8320a 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -254,6 +254,7 @@ void phys_mem_set_alloc(void *(*alloc)(size_t, uint64_t *align));
 int kvm_ioctl(KVMState *s, int type, ...);
 
 int kvm_vm_ioctl(KVMState *s, int type, ...);
+int kvm_get_vmfd(KVMState *s);
 
 int kvm_vcpu_ioctl(CPUState *cpu, int type, ...);
 
diff --git a/kvm-all.c b/kvm-all.c
index efb5fe3..bd72ce3 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -2065,6 +2065,11 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
     return ret;
 }
 
+int kvm_get_vmfd(KVMState *s)
+{
+	return s->vmfd;
+}
+
 int kvm_vcpu_ioctl(CPUState *cpu, int type, ...)
 {
     int ret;
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 759b850..952303f 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -686,6 +686,12 @@ struct vfio_iommu_spapr_tce_remove {
 };
 #define VFIO_IOMMU_SPAPR_TCE_REMOVE	_IO(VFIO_TYPE, VFIO_BASE + 20)
 
+
+/**
+ * VFIO_SET_KVMFD - _IO(VFIO_TYPE, VFIO_BASE + 21, __u32)
+ */
+#define VFIO_SET_KVMFD		_IO(VFIO_TYPE, VFIO_BASE + 21)
+
 /* ***************************************************************** */
 
 #endif /* VFIO_H */

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-14 10:37                     ` [Qemu-devel] " Jike Song
@ 2016-10-14 10:43                       ` Paolo Bonzini
  -1 siblings, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-14 10:43 UTC (permalink / raw)
  To: Jike Song
  Cc: Xiao Guangrong, Neo Jia, Xiaoguang Chen, kvm, guangrong.xiao,
	Kirti Wankhede, Alex Williamson, Tian, Kevin, qemu-devel



On 14/10/2016 12:37, Jike Song wrote:
> Hi Paolo & Alex,
> 
> IIUC, passing file descriptors means touching QEMU and the UAPI between
> QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> on the correct direction, I'll send the split ones. Thanks!
> 
> --
> Thanks,
> Jike
> 
> 
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index bec694c..f715d37 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -10,12 +10,14 @@
>   * the COPYING file in the top-level directory.
>   */
>  
> +#include <sys/ioctl.h>
>  #include "qemu/osdep.h"
>  #include "qemu/error-report.h"
>  #include "qemu/range.h"
>  #include "qapi/error.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "pci.h"
> +#include "sysemu/kvm.h"
>  #include "trace.h"
>  
>  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
>          break;
>      }
>  }
> +
> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> +{
> +    int vmfd;
> +
> +    if (!kvm_enabled() || !vdev->kvmgt)
> +        return;
> +
> +    /* Tell the device what KVM it attached */
> +    vmfd = kvm_get_vmfd(kvm_state);
> +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> +}

vfio_kvm_device_add_group is already telling the group id file
descriptor to KVM.  You can use that existing hook (whose kernel side is
virt/kvm/vfio.c).

Paolo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-14 10:43                       ` Paolo Bonzini
  0 siblings, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-14 10:43 UTC (permalink / raw)
  To: Jike Song
  Cc: Xiao Guangrong, Neo Jia, Xiaoguang Chen, kvm, guangrong.xiao,
	Kirti Wankhede, Alex Williamson, Tian, Kevin, qemu-devel



On 14/10/2016 12:37, Jike Song wrote:
> Hi Paolo & Alex,
> 
> IIUC, passing file descriptors means touching QEMU and the UAPI between
> QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> on the correct direction, I'll send the split ones. Thanks!
> 
> --
> Thanks,
> Jike
> 
> 
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index bec694c..f715d37 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -10,12 +10,14 @@
>   * the COPYING file in the top-level directory.
>   */
>  
> +#include <sys/ioctl.h>
>  #include "qemu/osdep.h"
>  #include "qemu/error-report.h"
>  #include "qemu/range.h"
>  #include "qapi/error.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "pci.h"
> +#include "sysemu/kvm.h"
>  #include "trace.h"
>  
>  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
>          break;
>      }
>  }
> +
> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> +{
> +    int vmfd;
> +
> +    if (!kvm_enabled() || !vdev->kvmgt)
> +        return;
> +
> +    /* Tell the device what KVM it attached */
> +    vmfd = kvm_get_vmfd(kvm_state);
> +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> +}

vfio_kvm_device_add_group is already telling the group id file
descriptor to KVM.  You can use that existing hook (whose kernel side is
virt/kvm/vfio.c).

Paolo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-14 10:43                       ` [Qemu-devel] " Paolo Bonzini
@ 2016-10-14 12:26                         ` Jike Song
  -1 siblings, 0 replies; 58+ messages in thread
From: Jike Song @ 2016-10-14 12:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Tian, Kevin, Neo Jia, kvm, guangrong.xiao, Alex Williamson,
	Xiaoguang Chen, qemu-devel, Kirti Wankhede, Xiao Guangrong


On 10/14/2016 06:43 PM, Paolo Bonzini wrote:
> 
> 
> On 14/10/2016 12:37, Jike Song wrote:
>> Hi Paolo & Alex,
>>
>> IIUC, passing file descriptors means touching QEMU and the UAPI between
>> QEMU and VFIO. Would you guys have a look at below draft patch? If it's
>> on the correct direction, I'll send the split ones. Thanks!
>>
>> --
>> Thanks,
>> Jike
>>
>>
>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>> index bec694c..f715d37 100644
>> --- a/hw/vfio/pci-quirks.c
>> +++ b/hw/vfio/pci-quirks.c
>> @@ -10,12 +10,14 @@
>>   * the COPYING file in the top-level directory.
>>   */
>>  
>> +#include <sys/ioctl.h>
>>  #include "qemu/osdep.h"
>>  #include "qemu/error-report.h"
>>  #include "qemu/range.h"
>>  #include "qapi/error.h"
>>  #include "hw/nvram/fw_cfg.h"
>>  #include "pci.h"
>> +#include "sysemu/kvm.h"
>>  #include "trace.h"
>>  
>>  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
>> @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
>>          break;
>>      }
>>  }
>> +
>> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
>> +{
>> +    int vmfd;
>> +
>> +    if (!kvm_enabled() || !vdev->kvmgt)
>> +        return;
>> +
>> +    /* Tell the device what KVM it attached */
>> +    vmfd = kvm_get_vmfd(kvm_state);
>> +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
>> +}
> 
> vfio_kvm_device_add_group is already telling the group id file
> descriptor to KVM.  You can use that existing hook (whose kernel side is
> virt/kvm/vfio.c).

Thanks for quick reply. I'll do some homework and report back :)

--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-14 12:26                         ` Jike Song
  0 siblings, 0 replies; 58+ messages in thread
From: Jike Song @ 2016-10-14 12:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, Neo Jia, Xiaoguang Chen, kvm, guangrong.xiao,
	Kirti Wankhede, Alex Williamson, Tian, Kevin, qemu-devel


On 10/14/2016 06:43 PM, Paolo Bonzini wrote:
> 
> 
> On 14/10/2016 12:37, Jike Song wrote:
>> Hi Paolo & Alex,
>>
>> IIUC, passing file descriptors means touching QEMU and the UAPI between
>> QEMU and VFIO. Would you guys have a look at below draft patch? If it's
>> on the correct direction, I'll send the split ones. Thanks!
>>
>> --
>> Thanks,
>> Jike
>>
>>
>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>> index bec694c..f715d37 100644
>> --- a/hw/vfio/pci-quirks.c
>> +++ b/hw/vfio/pci-quirks.c
>> @@ -10,12 +10,14 @@
>>   * the COPYING file in the top-level directory.
>>   */
>>  
>> +#include <sys/ioctl.h>
>>  #include "qemu/osdep.h"
>>  #include "qemu/error-report.h"
>>  #include "qemu/range.h"
>>  #include "qapi/error.h"
>>  #include "hw/nvram/fw_cfg.h"
>>  #include "pci.h"
>> +#include "sysemu/kvm.h"
>>  #include "trace.h"
>>  
>>  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
>> @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
>>          break;
>>      }
>>  }
>> +
>> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
>> +{
>> +    int vmfd;
>> +
>> +    if (!kvm_enabled() || !vdev->kvmgt)
>> +        return;
>> +
>> +    /* Tell the device what KVM it attached */
>> +    vmfd = kvm_get_vmfd(kvm_state);
>> +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
>> +}
> 
> vfio_kvm_device_add_group is already telling the group id file
> descriptor to KVM.  You can use that existing hook (whose kernel side is
> virt/kvm/vfio.c).

Thanks for quick reply. I'll do some homework and report back :)

--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-14 10:37                     ` [Qemu-devel] " Jike Song
  (?)
  (?)
@ 2016-10-14 14:41                     ` Alex Williamson
  2016-10-14 14:46                         ` [Qemu-devel] " Alex Williamson
  -1 siblings, 1 reply; 58+ messages in thread
From: Alex Williamson @ 2016-10-14 14:41 UTC (permalink / raw)
  To: Jike Song
  Cc: Paolo Bonzini, Tian, Kevin, Neo Jia, kvm, guangrong.xiao,
	Xiaoguang Chen, qemu-devel, Kirti Wankhede, Xiao Guangrong

On Fri, 14 Oct 2016 18:37:45 +0800
Jike Song <jike.song@intel.com> wrote:

> On 10/11/2016 05:47 PM, Paolo Bonzini wrote:
> > 
> > 
> > On 11/10/2016 11:21, Xiao Guangrong wrote:  
> >>
> >>
> >> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:  
> >>>
> >>>
> >>> On 11/10/2016 04:39, Xiao Guangrong wrote:  
> >>>>
> >>>>
> >>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:  
> >>>>>
> >>>>>
> >>>>> On 10/10/2016 20:01, Neo Jia wrote:  
> >>>>>>> Hi Neo,
> >>>>>>>
> >>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> >>>>>>> while nVidia does.  
> >>>>>>
> >>>>>> Hi Paolo and Xiaoguang,
> >>>>>>
> >>>>>> I am just wondering how device driver can register a notifier so he
> >>>>>> can be
> >>>>>> notified for write-protected pages when writes are happening.  
> >>>>>
> >>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> >>>>> currently where a struct kvm_device* and struct vfio_group* touch.
> >>>>> Given
> >>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> >>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
> >>>>> that passes the struct kvm_device* to the mdev device.
> >>>>>
> >>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> >>>>> at KVM Forum but I don't remember the details.  
> >>>>
> >>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> >>>> figure out the kvm instance based on the fd.
> >>>>
> >>>> We got a new idea, how about search the kvm instance by mm_struct, it
> >>>> can work as KVMGT is running in the vcpu context and it is much more
> >>>> straightforward.  
> >>>
> >>> Perhaps I didn't understand your suggestion, but the same mm_struct can
> >>> have more than 1 struct kvm so I'm not sure that it can work.  
> >>
> >> vcpu->pid is valid during vcpu running so that it can be used to figure
> >> out which kvm instance owns the vcpu whose pid is the one as current
> >> thread, i think it can work. :)  
> > 
> > No, don't do that.  There's no reason for a thread to run a single VCPU,
> > and if you can have multiple VCPUs you can also have multiple VCPUs from
> > multiple VMs.
> > 
> > Passing file descriptors around are the right way to connect subsystems.  
> 
> [CC Alex, Kevin and Qemu-devel]
> 
> Hi Paolo & Alex,
> 
> IIUC, passing file descriptors means touching QEMU and the UAPI between
> QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> on the correct direction, I'll send the split ones. Thanks!
> 
> --
> Thanks,
> Jike
> 
> 
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index bec694c..f715d37 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -10,12 +10,14 @@
>   * the COPYING file in the top-level directory.
>   */
>  
> +#include <sys/ioctl.h>
>  #include "qemu/osdep.h"
>  #include "qemu/error-report.h"
>  #include "qemu/range.h"
>  #include "qapi/error.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "pci.h"
> +#include "sysemu/kvm.h"
>  #include "trace.h"
>  
>  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
>          break;
>      }
>  }
> +
> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> +{
> +    int vmfd;
> +
> +    if (!kvm_enabled() || !vdev->kvmgt)
> +        return;
> +
> +    /* Tell the device what KVM it attached */
> +    vmfd = kvm_get_vmfd(kvm_state);
> +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> +}
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index a5a620a..8732552 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
>          return ret;
>      }
>  
> +    vfio_quirk_kvmgt(vdev);
> +
>      /* Get a copy of config space */
>      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
>                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
>      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
>                         sub_device_id, PCI_ANY_ID),
>      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
> +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),

Just a side note, device options are a headache, users are prone to get
them wrong and minimally it requires an entire round to get libvirt
support.  We should be able to detect from the device or vfio API
whether such a call is required.  Obviously if we can use the existing
kvm-vfio device, that's the better option anyway.  Thanks,

Alex

>      /*
>       * TODO - support passed fds... is this necessary?
>       * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 7d482d9..813832c 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -143,6 +143,7 @@ typedef struct VFIOPCIDevice {
>      bool no_kvm_intx;
>      bool no_kvm_msi;
>      bool no_kvm_msix;
> +    bool kvmgt;
>  } VFIOPCIDevice;
>  
>  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
> @@ -166,4 +167,6 @@ int vfio_populate_vga(VFIOPCIDevice *vdev);
>  int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>                                 struct vfio_region_info *info);
>  
> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev);
> +
>  #endif /* HW_VFIO_VFIO_PCI_H */
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index df67cc0..dd8320a 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -254,6 +254,7 @@ void phys_mem_set_alloc(void *(*alloc)(size_t, uint64_t *align));
>  int kvm_ioctl(KVMState *s, int type, ...);
>  
>  int kvm_vm_ioctl(KVMState *s, int type, ...);
> +int kvm_get_vmfd(KVMState *s);
>  
>  int kvm_vcpu_ioctl(CPUState *cpu, int type, ...);
>  
> diff --git a/kvm-all.c b/kvm-all.c
> index efb5fe3..bd72ce3 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -2065,6 +2065,11 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
>      return ret;
>  }
>  
> +int kvm_get_vmfd(KVMState *s)
> +{
> +	return s->vmfd;
> +}
> +
>  int kvm_vcpu_ioctl(CPUState *cpu, int type, ...)
>  {
>      int ret;
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index 759b850..952303f 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -686,6 +686,12 @@ struct vfio_iommu_spapr_tce_remove {
>  };
>  #define VFIO_IOMMU_SPAPR_TCE_REMOVE	_IO(VFIO_TYPE, VFIO_BASE + 20)
>  
> +
> +/**
> + * VFIO_SET_KVMFD - _IO(VFIO_TYPE, VFIO_BASE + 21, __u32)
> + */
> +#define VFIO_SET_KVMFD		_IO(VFIO_TYPE, VFIO_BASE + 21)
> +
>  /* ***************************************************************** */
>  
>  #endif /* VFIO_H */
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-14 14:41                     ` Alex Williamson
@ 2016-10-14 14:46                         ` Alex Williamson
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Williamson @ 2016-10-14 14:46 UTC (permalink / raw)
  To: Jike Song
  Cc: Tian, Kevin, Neo Jia, kvm, guangrong.xiao, Xiao Guangrong,
	qemu-devel, Xiaoguang Chen, Kirti Wankhede, Paolo Bonzini

On Fri, 14 Oct 2016 08:41:58 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Fri, 14 Oct 2016 18:37:45 +0800
> Jike Song <jike.song@intel.com> wrote:
> 
> > On 10/11/2016 05:47 PM, Paolo Bonzini wrote:  
> > > 
> > > 
> > > On 11/10/2016 11:21, Xiao Guangrong wrote:    
> > >>
> > >>
> > >> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:    
> > >>>
> > >>>
> > >>> On 11/10/2016 04:39, Xiao Guangrong wrote:    
> > >>>>
> > >>>>
> > >>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:    
> > >>>>>
> > >>>>>
> > >>>>> On 10/10/2016 20:01, Neo Jia wrote:    
> > >>>>>>> Hi Neo,
> > >>>>>>>
> > >>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> > >>>>>>> while nVidia does.    
> > >>>>>>
> > >>>>>> Hi Paolo and Xiaoguang,
> > >>>>>>
> > >>>>>> I am just wondering how device driver can register a notifier so he
> > >>>>>> can be
> > >>>>>> notified for write-protected pages when writes are happening.    
> > >>>>>
> > >>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> > >>>>> currently where a struct kvm_device* and struct vfio_group* touch.
> > >>>>> Given
> > >>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> > >>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
> > >>>>> that passes the struct kvm_device* to the mdev device.
> > >>>>>
> > >>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> > >>>>> at KVM Forum but I don't remember the details.    
> > >>>>
> > >>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> > >>>> figure out the kvm instance based on the fd.
> > >>>>
> > >>>> We got a new idea, how about search the kvm instance by mm_struct, it
> > >>>> can work as KVMGT is running in the vcpu context and it is much more
> > >>>> straightforward.    
> > >>>
> > >>> Perhaps I didn't understand your suggestion, but the same mm_struct can
> > >>> have more than 1 struct kvm so I'm not sure that it can work.    
> > >>
> > >> vcpu->pid is valid during vcpu running so that it can be used to figure
> > >> out which kvm instance owns the vcpu whose pid is the one as current
> > >> thread, i think it can work. :)    
> > > 
> > > No, don't do that.  There's no reason for a thread to run a single VCPU,
> > > and if you can have multiple VCPUs you can also have multiple VCPUs from
> > > multiple VMs.
> > > 
> > > Passing file descriptors around are the right way to connect subsystems.    
> > 
> > [CC Alex, Kevin and Qemu-devel]
> > 
> > Hi Paolo & Alex,
> > 
> > IIUC, passing file descriptors means touching QEMU and the UAPI between
> > QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> > on the correct direction, I'll send the split ones. Thanks!
> > 
> > --
> > Thanks,
> > Jike
> > 
> > 
> > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > index bec694c..f715d37 100644
> > --- a/hw/vfio/pci-quirks.c
> > +++ b/hw/vfio/pci-quirks.c
> > @@ -10,12 +10,14 @@
> >   * the COPYING file in the top-level directory.
> >   */
> >  
> > +#include <sys/ioctl.h>
> >  #include "qemu/osdep.h"
> >  #include "qemu/error-report.h"
> >  #include "qemu/range.h"
> >  #include "qapi/error.h"
> >  #include "hw/nvram/fw_cfg.h"
> >  #include "pci.h"
> > +#include "sysemu/kvm.h"
> >  #include "trace.h"
> >  
> >  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> > @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> >          break;
> >      }
> >  }
> > +
> > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> > +{
> > +    int vmfd;
> > +
> > +    if (!kvm_enabled() || !vdev->kvmgt)
> > +        return;
> > +
> > +    /* Tell the device what KVM it attached */
> > +    vmfd = kvm_get_vmfd(kvm_state);
> > +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> > +}
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index a5a620a..8732552 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
> >          return ret;
> >      }
> >  
> > +    vfio_quirk_kvmgt(vdev);
> > +
> >      /* Get a copy of config space */
> >      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
> >                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> > @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
> >      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
> >                         sub_device_id, PCI_ANY_ID),
> >      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
> > +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),  
> 
> Just a side note, device options are a headache, users are prone to get
> them wrong and minimally it requires an entire round to get libvirt
> support.  We should be able to detect from the device or vfio API
> whether such a call is required.  Obviously if we can use the existing
> kvm-vfio device, that's the better option anyway.  Thanks,

Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
does, it needs to produce a device failure when unavailable.  Thanks,

Alex

> >      /*
> >       * TODO - support passed fds... is this necessary?
> >       * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> > index 7d482d9..813832c 100644
> > --- a/hw/vfio/pci.h
> > +++ b/hw/vfio/pci.h
> > @@ -143,6 +143,7 @@ typedef struct VFIOPCIDevice {
> >      bool no_kvm_intx;
> >      bool no_kvm_msi;
> >      bool no_kvm_msix;
> > +    bool kvmgt;
> >  } VFIOPCIDevice;
> >  
> >  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
> > @@ -166,4 +167,6 @@ int vfio_populate_vga(VFIOPCIDevice *vdev);
> >  int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
> >                                 struct vfio_region_info *info);
> >  
> > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev);
> > +
> >  #endif /* HW_VFIO_VFIO_PCI_H */
> > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> > index df67cc0..dd8320a 100644
> > --- a/include/sysemu/kvm.h
> > +++ b/include/sysemu/kvm.h
> > @@ -254,6 +254,7 @@ void phys_mem_set_alloc(void *(*alloc)(size_t, uint64_t *align));
> >  int kvm_ioctl(KVMState *s, int type, ...);
> >  
> >  int kvm_vm_ioctl(KVMState *s, int type, ...);
> > +int kvm_get_vmfd(KVMState *s);
> >  
> >  int kvm_vcpu_ioctl(CPUState *cpu, int type, ...);
> >  
> > diff --git a/kvm-all.c b/kvm-all.c
> > index efb5fe3..bd72ce3 100644
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -2065,6 +2065,11 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
> >      return ret;
> >  }
> >  
> > +int kvm_get_vmfd(KVMState *s)
> > +{
> > +	return s->vmfd;
> > +}
> > +
> >  int kvm_vcpu_ioctl(CPUState *cpu, int type, ...)
> >  {
> >      int ret;
> > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> > index 759b850..952303f 100644
> > --- a/linux-headers/linux/vfio.h
> > +++ b/linux-headers/linux/vfio.h
> > @@ -686,6 +686,12 @@ struct vfio_iommu_spapr_tce_remove {
> >  };
> >  #define VFIO_IOMMU_SPAPR_TCE_REMOVE	_IO(VFIO_TYPE, VFIO_BASE + 20)
> >  
> > +
> > +/**
> > + * VFIO_SET_KVMFD - _IO(VFIO_TYPE, VFIO_BASE + 21, __u32)
> > + */
> > +#define VFIO_SET_KVMFD		_IO(VFIO_TYPE, VFIO_BASE + 21)
> > +
> >  /* ***************************************************************** */
> >  
> >  #endif /* VFIO_H */
> >   
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-14 14:46                         ` Alex Williamson
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Williamson @ 2016-10-14 14:46 UTC (permalink / raw)
  To: Jike Song
  Cc: Paolo Bonzini, Tian, Kevin, Neo Jia, kvm, guangrong.xiao,
	Xiaoguang Chen, qemu-devel, Kirti Wankhede, Xiao Guangrong

On Fri, 14 Oct 2016 08:41:58 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Fri, 14 Oct 2016 18:37:45 +0800
> Jike Song <jike.song@intel.com> wrote:
> 
> > On 10/11/2016 05:47 PM, Paolo Bonzini wrote:  
> > > 
> > > 
> > > On 11/10/2016 11:21, Xiao Guangrong wrote:    
> > >>
> > >>
> > >> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:    
> > >>>
> > >>>
> > >>> On 11/10/2016 04:39, Xiao Guangrong wrote:    
> > >>>>
> > >>>>
> > >>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:    
> > >>>>>
> > >>>>>
> > >>>>> On 10/10/2016 20:01, Neo Jia wrote:    
> > >>>>>>> Hi Neo,
> > >>>>>>>
> > >>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> > >>>>>>> while nVidia does.    
> > >>>>>>
> > >>>>>> Hi Paolo and Xiaoguang,
> > >>>>>>
> > >>>>>> I am just wondering how device driver can register a notifier so he
> > >>>>>> can be
> > >>>>>> notified for write-protected pages when writes are happening.    
> > >>>>>
> > >>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> > >>>>> currently where a struct kvm_device* and struct vfio_group* touch.
> > >>>>> Given
> > >>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> > >>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
> > >>>>> that passes the struct kvm_device* to the mdev device.
> > >>>>>
> > >>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> > >>>>> at KVM Forum but I don't remember the details.    
> > >>>>
> > >>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> > >>>> figure out the kvm instance based on the fd.
> > >>>>
> > >>>> We got a new idea, how about search the kvm instance by mm_struct, it
> > >>>> can work as KVMGT is running in the vcpu context and it is much more
> > >>>> straightforward.    
> > >>>
> > >>> Perhaps I didn't understand your suggestion, but the same mm_struct can
> > >>> have more than 1 struct kvm so I'm not sure that it can work.    
> > >>
> > >> vcpu->pid is valid during vcpu running so that it can be used to figure
> > >> out which kvm instance owns the vcpu whose pid is the one as current
> > >> thread, i think it can work. :)    
> > > 
> > > No, don't do that.  There's no reason for a thread to run a single VCPU,
> > > and if you can have multiple VCPUs you can also have multiple VCPUs from
> > > multiple VMs.
> > > 
> > > Passing file descriptors around are the right way to connect subsystems.    
> > 
> > [CC Alex, Kevin and Qemu-devel]
> > 
> > Hi Paolo & Alex,
> > 
> > IIUC, passing file descriptors means touching QEMU and the UAPI between
> > QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> > on the correct direction, I'll send the split ones. Thanks!
> > 
> > --
> > Thanks,
> > Jike
> > 
> > 
> > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > index bec694c..f715d37 100644
> > --- a/hw/vfio/pci-quirks.c
> > +++ b/hw/vfio/pci-quirks.c
> > @@ -10,12 +10,14 @@
> >   * the COPYING file in the top-level directory.
> >   */
> >  
> > +#include <sys/ioctl.h>
> >  #include "qemu/osdep.h"
> >  #include "qemu/error-report.h"
> >  #include "qemu/range.h"
> >  #include "qapi/error.h"
> >  #include "hw/nvram/fw_cfg.h"
> >  #include "pci.h"
> > +#include "sysemu/kvm.h"
> >  #include "trace.h"
> >  
> >  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> > @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> >          break;
> >      }
> >  }
> > +
> > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> > +{
> > +    int vmfd;
> > +
> > +    if (!kvm_enabled() || !vdev->kvmgt)
> > +        return;
> > +
> > +    /* Tell the device what KVM it attached */
> > +    vmfd = kvm_get_vmfd(kvm_state);
> > +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> > +}
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index a5a620a..8732552 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
> >          return ret;
> >      }
> >  
> > +    vfio_quirk_kvmgt(vdev);
> > +
> >      /* Get a copy of config space */
> >      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
> >                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> > @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
> >      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
> >                         sub_device_id, PCI_ANY_ID),
> >      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
> > +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),  
> 
> Just a side note, device options are a headache, users are prone to get
> them wrong and minimally it requires an entire round to get libvirt
> support.  We should be able to detect from the device or vfio API
> whether such a call is required.  Obviously if we can use the existing
> kvm-vfio device, that's the better option anyway.  Thanks,

Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
does, it needs to produce a device failure when unavailable.  Thanks,

Alex

> >      /*
> >       * TODO - support passed fds... is this necessary?
> >       * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> > index 7d482d9..813832c 100644
> > --- a/hw/vfio/pci.h
> > +++ b/hw/vfio/pci.h
> > @@ -143,6 +143,7 @@ typedef struct VFIOPCIDevice {
> >      bool no_kvm_intx;
> >      bool no_kvm_msi;
> >      bool no_kvm_msix;
> > +    bool kvmgt;
> >  } VFIOPCIDevice;
> >  
> >  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
> > @@ -166,4 +167,6 @@ int vfio_populate_vga(VFIOPCIDevice *vdev);
> >  int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
> >                                 struct vfio_region_info *info);
> >  
> > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev);
> > +
> >  #endif /* HW_VFIO_VFIO_PCI_H */
> > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> > index df67cc0..dd8320a 100644
> > --- a/include/sysemu/kvm.h
> > +++ b/include/sysemu/kvm.h
> > @@ -254,6 +254,7 @@ void phys_mem_set_alloc(void *(*alloc)(size_t, uint64_t *align));
> >  int kvm_ioctl(KVMState *s, int type, ...);
> >  
> >  int kvm_vm_ioctl(KVMState *s, int type, ...);
> > +int kvm_get_vmfd(KVMState *s);
> >  
> >  int kvm_vcpu_ioctl(CPUState *cpu, int type, ...);
> >  
> > diff --git a/kvm-all.c b/kvm-all.c
> > index efb5fe3..bd72ce3 100644
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -2065,6 +2065,11 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
> >      return ret;
> >  }
> >  
> > +int kvm_get_vmfd(KVMState *s)
> > +{
> > +	return s->vmfd;
> > +}
> > +
> >  int kvm_vcpu_ioctl(CPUState *cpu, int type, ...)
> >  {
> >      int ret;
> > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> > index 759b850..952303f 100644
> > --- a/linux-headers/linux/vfio.h
> > +++ b/linux-headers/linux/vfio.h
> > @@ -686,6 +686,12 @@ struct vfio_iommu_spapr_tce_remove {
> >  };
> >  #define VFIO_IOMMU_SPAPR_TCE_REMOVE	_IO(VFIO_TYPE, VFIO_BASE + 20)
> >  
> > +
> > +/**
> > + * VFIO_SET_KVMFD - _IO(VFIO_TYPE, VFIO_BASE + 21, __u32)
> > + */
> > +#define VFIO_SET_KVMFD		_IO(VFIO_TYPE, VFIO_BASE + 21)
> > +
> >  /* ***************************************************************** */
> >  
> >  #endif /* VFIO_H */
> >   
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-14 14:46                         ` [Qemu-devel] " Alex Williamson
@ 2016-10-14 16:35                           ` Neo Jia
  -1 siblings, 0 replies; 58+ messages in thread
From: Neo Jia @ 2016-10-14 16:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jike Song, Paolo Bonzini, Tian, Kevin, kvm, guangrong.xiao,
	Xiaoguang Chen, qemu-devel, Kirti Wankhede, Xiao Guangrong

On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:
> On Fri, 14 Oct 2016 08:41:58 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Fri, 14 Oct 2016 18:37:45 +0800
> > Jike Song <jike.song@intel.com> wrote:
> > 
> > > On 10/11/2016 05:47 PM, Paolo Bonzini wrote:  
> > > > 
> > > > 
> > > > On 11/10/2016 11:21, Xiao Guangrong wrote:    
> > > >>
> > > >>
> > > >> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:    
> > > >>>
> > > >>>
> > > >>> On 11/10/2016 04:39, Xiao Guangrong wrote:    
> > > >>>>
> > > >>>>
> > > >>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:    
> > > >>>>>
> > > >>>>>
> > > >>>>> On 10/10/2016 20:01, Neo Jia wrote:    
> > > >>>>>>> Hi Neo,
> > > >>>>>>>
> > > >>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> > > >>>>>>> while nVidia does.    
> > > >>>>>>
> > > >>>>>> Hi Paolo and Xiaoguang,
> > > >>>>>>
> > > >>>>>> I am just wondering how device driver can register a notifier so he
> > > >>>>>> can be
> > > >>>>>> notified for write-protected pages when writes are happening.    
> > > >>>>>
> > > >>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> > > >>>>> currently where a struct kvm_device* and struct vfio_group* touch.
> > > >>>>> Given
> > > >>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> > > >>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
> > > >>>>> that passes the struct kvm_device* to the mdev device.
> > > >>>>>
> > > >>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> > > >>>>> at KVM Forum but I don't remember the details.    
> > > >>>>
> > > >>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> > > >>>> figure out the kvm instance based on the fd.
> > > >>>>
> > > >>>> We got a new idea, how about search the kvm instance by mm_struct, it
> > > >>>> can work as KVMGT is running in the vcpu context and it is much more
> > > >>>> straightforward.    
> > > >>>
> > > >>> Perhaps I didn't understand your suggestion, but the same mm_struct can
> > > >>> have more than 1 struct kvm so I'm not sure that it can work.    
> > > >>
> > > >> vcpu->pid is valid during vcpu running so that it can be used to figure
> > > >> out which kvm instance owns the vcpu whose pid is the one as current
> > > >> thread, i think it can work. :)    
> > > > 
> > > > No, don't do that.  There's no reason for a thread to run a single VCPU,
> > > > and if you can have multiple VCPUs you can also have multiple VCPUs from
> > > > multiple VMs.
> > > > 
> > > > Passing file descriptors around are the right way to connect subsystems.    
> > > 
> > > [CC Alex, Kevin and Qemu-devel]
> > > 
> > > Hi Paolo & Alex,
> > > 
> > > IIUC, passing file descriptors means touching QEMU and the UAPI between
> > > QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> > > on the correct direction, I'll send the split ones. Thanks!
> > > 
> > > --
> > > Thanks,
> > > Jike
> > > 
> > > 
> > > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > > index bec694c..f715d37 100644
> > > --- a/hw/vfio/pci-quirks.c
> > > +++ b/hw/vfio/pci-quirks.c
> > > @@ -10,12 +10,14 @@
> > >   * the COPYING file in the top-level directory.
> > >   */
> > >  
> > > +#include <sys/ioctl.h>
> > >  #include "qemu/osdep.h"
> > >  #include "qemu/error-report.h"
> > >  #include "qemu/range.h"
> > >  #include "qapi/error.h"
> > >  #include "hw/nvram/fw_cfg.h"
> > >  #include "pci.h"
> > > +#include "sysemu/kvm.h"
> > >  #include "trace.h"
> > >  
> > >  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> > > @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> > >          break;
> > >      }
> > >  }
> > > +
> > > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> > > +{
> > > +    int vmfd;
> > > +
> > > +    if (!kvm_enabled() || !vdev->kvmgt)
> > > +        return;
> > > +
> > > +    /* Tell the device what KVM it attached */
> > > +    vmfd = kvm_get_vmfd(kvm_state);
> > > +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> > > +}
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index a5a620a..8732552 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
> > >          return ret;
> > >      }
> > >  
> > > +    vfio_quirk_kvmgt(vdev);
> > > +
> > >      /* Get a copy of config space */
> > >      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
> > >                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> > > @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
> > >      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
> > >                         sub_device_id, PCI_ANY_ID),
> > >      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
> > > +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),  
> > 
> > Just a side note, device options are a headache, users are prone to get
> > them wrong and minimally it requires an entire round to get libvirt
> > support.  We should be able to detect from the device or vfio API
> > whether such a call is required.  Obviously if we can use the existing
> > kvm-vfio device, that's the better option anyway.  Thanks,
> 
> Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
> does, it needs to produce a device failure when unavailable.  Thanks,

Also, I would like to see this as an generic feature instead of
kvmgt specific interface, so we don't have to add new options to QEMU and it is
up to the vendor driver to proceed with or without it.

Thanks,
Neo

> 
> Alex
> 
> > >      /*
> > >       * TODO - support passed fds... is this necessary?
> > >       * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> > > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> > > index 7d482d9..813832c 100644
> > > --- a/hw/vfio/pci.h
> > > +++ b/hw/vfio/pci.h
> > > @@ -143,6 +143,7 @@ typedef struct VFIOPCIDevice {
> > >      bool no_kvm_intx;
> > >      bool no_kvm_msi;
> > >      bool no_kvm_msix;
> > > +    bool kvmgt;
> > >  } VFIOPCIDevice;
> > >  
> > >  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
> > > @@ -166,4 +167,6 @@ int vfio_populate_vga(VFIOPCIDevice *vdev);
> > >  int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
> > >                                 struct vfio_region_info *info);
> > >  
> > > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev);
> > > +
> > >  #endif /* HW_VFIO_VFIO_PCI_H */
> > > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> > > index df67cc0..dd8320a 100644
> > > --- a/include/sysemu/kvm.h
> > > +++ b/include/sysemu/kvm.h
> > > @@ -254,6 +254,7 @@ void phys_mem_set_alloc(void *(*alloc)(size_t, uint64_t *align));
> > >  int kvm_ioctl(KVMState *s, int type, ...);
> > >  
> > >  int kvm_vm_ioctl(KVMState *s, int type, ...);
> > > +int kvm_get_vmfd(KVMState *s);
> > >  
> > >  int kvm_vcpu_ioctl(CPUState *cpu, int type, ...);
> > >  
> > > diff --git a/kvm-all.c b/kvm-all.c
> > > index efb5fe3..bd72ce3 100644
> > > --- a/kvm-all.c
> > > +++ b/kvm-all.c
> > > @@ -2065,6 +2065,11 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
> > >      return ret;
> > >  }
> > >  
> > > +int kvm_get_vmfd(KVMState *s)
> > > +{
> > > +	return s->vmfd;
> > > +}
> > > +
> > >  int kvm_vcpu_ioctl(CPUState *cpu, int type, ...)
> > >  {
> > >      int ret;
> > > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> > > index 759b850..952303f 100644
> > > --- a/linux-headers/linux/vfio.h
> > > +++ b/linux-headers/linux/vfio.h
> > > @@ -686,6 +686,12 @@ struct vfio_iommu_spapr_tce_remove {
> > >  };
> > >  #define VFIO_IOMMU_SPAPR_TCE_REMOVE	_IO(VFIO_TYPE, VFIO_BASE + 20)
> > >  
> > > +
> > > +/**
> > > + * VFIO_SET_KVMFD - _IO(VFIO_TYPE, VFIO_BASE + 21, __u32)
> > > + */
> > > +#define VFIO_SET_KVMFD		_IO(VFIO_TYPE, VFIO_BASE + 21)
> > > +
> > >  /* ***************************************************************** */
> > >  
> > >  #endif /* VFIO_H */
> > >   
> > 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-14 16:35                           ` Neo Jia
  0 siblings, 0 replies; 58+ messages in thread
From: Neo Jia @ 2016-10-14 16:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jike Song, Paolo Bonzini, Tian, Kevin, kvm, guangrong.xiao,
	Xiaoguang Chen, qemu-devel, Kirti Wankhede, Xiao Guangrong

On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:
> On Fri, 14 Oct 2016 08:41:58 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Fri, 14 Oct 2016 18:37:45 +0800
> > Jike Song <jike.song@intel.com> wrote:
> > 
> > > On 10/11/2016 05:47 PM, Paolo Bonzini wrote:  
> > > > 
> > > > 
> > > > On 11/10/2016 11:21, Xiao Guangrong wrote:    
> > > >>
> > > >>
> > > >> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:    
> > > >>>
> > > >>>
> > > >>> On 11/10/2016 04:39, Xiao Guangrong wrote:    
> > > >>>>
> > > >>>>
> > > >>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:    
> > > >>>>>
> > > >>>>>
> > > >>>>> On 10/10/2016 20:01, Neo Jia wrote:    
> > > >>>>>>> Hi Neo,
> > > >>>>>>>
> > > >>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> > > >>>>>>> while nVidia does.    
> > > >>>>>>
> > > >>>>>> Hi Paolo and Xiaoguang,
> > > >>>>>>
> > > >>>>>> I am just wondering how device driver can register a notifier so he
> > > >>>>>> can be
> > > >>>>>> notified for write-protected pages when writes are happening.    
> > > >>>>>
> > > >>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> > > >>>>> currently where a struct kvm_device* and struct vfio_group* touch.
> > > >>>>> Given
> > > >>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> > > >>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
> > > >>>>> that passes the struct kvm_device* to the mdev device.
> > > >>>>>
> > > >>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> > > >>>>> at KVM Forum but I don't remember the details.    
> > > >>>>
> > > >>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> > > >>>> figure out the kvm instance based on the fd.
> > > >>>>
> > > >>>> We got a new idea, how about search the kvm instance by mm_struct, it
> > > >>>> can work as KVMGT is running in the vcpu context and it is much more
> > > >>>> straightforward.    
> > > >>>
> > > >>> Perhaps I didn't understand your suggestion, but the same mm_struct can
> > > >>> have more than 1 struct kvm so I'm not sure that it can work.    
> > > >>
> > > >> vcpu->pid is valid during vcpu running so that it can be used to figure
> > > >> out which kvm instance owns the vcpu whose pid is the one as current
> > > >> thread, i think it can work. :)    
> > > > 
> > > > No, don't do that.  There's no reason for a thread to run a single VCPU,
> > > > and if you can have multiple VCPUs you can also have multiple VCPUs from
> > > > multiple VMs.
> > > > 
> > > > Passing file descriptors around are the right way to connect subsystems.    
> > > 
> > > [CC Alex, Kevin and Qemu-devel]
> > > 
> > > Hi Paolo & Alex,
> > > 
> > > IIUC, passing file descriptors means touching QEMU and the UAPI between
> > > QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> > > on the correct direction, I'll send the split ones. Thanks!
> > > 
> > > --
> > > Thanks,
> > > Jike
> > > 
> > > 
> > > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > > index bec694c..f715d37 100644
> > > --- a/hw/vfio/pci-quirks.c
> > > +++ b/hw/vfio/pci-quirks.c
> > > @@ -10,12 +10,14 @@
> > >   * the COPYING file in the top-level directory.
> > >   */
> > >  
> > > +#include <sys/ioctl.h>
> > >  #include "qemu/osdep.h"
> > >  #include "qemu/error-report.h"
> > >  #include "qemu/range.h"
> > >  #include "qapi/error.h"
> > >  #include "hw/nvram/fw_cfg.h"
> > >  #include "pci.h"
> > > +#include "sysemu/kvm.h"
> > >  #include "trace.h"
> > >  
> > >  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> > > @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> > >          break;
> > >      }
> > >  }
> > > +
> > > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> > > +{
> > > +    int vmfd;
> > > +
> > > +    if (!kvm_enabled() || !vdev->kvmgt)
> > > +        return;
> > > +
> > > +    /* Tell the device what KVM it attached */
> > > +    vmfd = kvm_get_vmfd(kvm_state);
> > > +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> > > +}
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index a5a620a..8732552 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
> > >          return ret;
> > >      }
> > >  
> > > +    vfio_quirk_kvmgt(vdev);
> > > +
> > >      /* Get a copy of config space */
> > >      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
> > >                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> > > @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
> > >      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
> > >                         sub_device_id, PCI_ANY_ID),
> > >      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
> > > +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),  
> > 
> > Just a side note, device options are a headache, users are prone to get
> > them wrong and minimally it requires an entire round to get libvirt
> > support.  We should be able to detect from the device or vfio API
> > whether such a call is required.  Obviously if we can use the existing
> > kvm-vfio device, that's the better option anyway.  Thanks,
> 
> Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
> does, it needs to produce a device failure when unavailable.  Thanks,

Also, I would like to see this as an generic feature instead of
kvmgt specific interface, so we don't have to add new options to QEMU and it is
up to the vendor driver to proceed with or without it.

Thanks,
Neo

> 
> Alex
> 
> > >      /*
> > >       * TODO - support passed fds... is this necessary?
> > >       * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> > > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> > > index 7d482d9..813832c 100644
> > > --- a/hw/vfio/pci.h
> > > +++ b/hw/vfio/pci.h
> > > @@ -143,6 +143,7 @@ typedef struct VFIOPCIDevice {
> > >      bool no_kvm_intx;
> > >      bool no_kvm_msi;
> > >      bool no_kvm_msix;
> > > +    bool kvmgt;
> > >  } VFIOPCIDevice;
> > >  
> > >  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
> > > @@ -166,4 +167,6 @@ int vfio_populate_vga(VFIOPCIDevice *vdev);
> > >  int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
> > >                                 struct vfio_region_info *info);
> > >  
> > > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev);
> > > +
> > >  #endif /* HW_VFIO_VFIO_PCI_H */
> > > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> > > index df67cc0..dd8320a 100644
> > > --- a/include/sysemu/kvm.h
> > > +++ b/include/sysemu/kvm.h
> > > @@ -254,6 +254,7 @@ void phys_mem_set_alloc(void *(*alloc)(size_t, uint64_t *align));
> > >  int kvm_ioctl(KVMState *s, int type, ...);
> > >  
> > >  int kvm_vm_ioctl(KVMState *s, int type, ...);
> > > +int kvm_get_vmfd(KVMState *s);
> > >  
> > >  int kvm_vcpu_ioctl(CPUState *cpu, int type, ...);
> > >  
> > > diff --git a/kvm-all.c b/kvm-all.c
> > > index efb5fe3..bd72ce3 100644
> > > --- a/kvm-all.c
> > > +++ b/kvm-all.c
> > > @@ -2065,6 +2065,11 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
> > >      return ret;
> > >  }
> > >  
> > > +int kvm_get_vmfd(KVMState *s)
> > > +{
> > > +	return s->vmfd;
> > > +}
> > > +
> > >  int kvm_vcpu_ioctl(CPUState *cpu, int type, ...)
> > >  {
> > >      int ret;
> > > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> > > index 759b850..952303f 100644
> > > --- a/linux-headers/linux/vfio.h
> > > +++ b/linux-headers/linux/vfio.h
> > > @@ -686,6 +686,12 @@ struct vfio_iommu_spapr_tce_remove {
> > >  };
> > >  #define VFIO_IOMMU_SPAPR_TCE_REMOVE	_IO(VFIO_TYPE, VFIO_BASE + 20)
> > >  
> > > +
> > > +/**
> > > + * VFIO_SET_KVMFD - _IO(VFIO_TYPE, VFIO_BASE + 21, __u32)
> > > + */
> > > +#define VFIO_SET_KVMFD		_IO(VFIO_TYPE, VFIO_BASE + 21)
> > > +
> > >  /* ***************************************************************** */
> > >  
> > >  #endif /* VFIO_H */
> > >   
> > 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-14 16:35                           ` Neo Jia
@ 2016-10-14 16:51                             ` Alex Williamson
  -1 siblings, 0 replies; 58+ messages in thread
From: Alex Williamson @ 2016-10-14 16:51 UTC (permalink / raw)
  To: Neo Jia
  Cc: Jike Song, Paolo Bonzini, Tian, Kevin, kvm, guangrong.xiao,
	Xiaoguang Chen, qemu-devel, Kirti Wankhede, Xiao Guangrong

On Fri, 14 Oct 2016 09:35:45 -0700
Neo Jia <cjia@nvidia.com> wrote:

> On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:
> > On Fri, 14 Oct 2016 08:41:58 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> > > On Fri, 14 Oct 2016 18:37:45 +0800
> > > Jike Song <jike.song@intel.com> wrote:
> > >   
> > > > On 10/11/2016 05:47 PM, Paolo Bonzini wrote:    
> > > > > 
> > > > > 
> > > > > On 11/10/2016 11:21, Xiao Guangrong wrote:      
> > > > >>
> > > > >>
> > > > >> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:      
> > > > >>>
> > > > >>>
> > > > >>> On 11/10/2016 04:39, Xiao Guangrong wrote:      
> > > > >>>>
> > > > >>>>
> > > > >>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:      
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On 10/10/2016 20:01, Neo Jia wrote:      
> > > > >>>>>>> Hi Neo,
> > > > >>>>>>>
> > > > >>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> > > > >>>>>>> while nVidia does.      
> > > > >>>>>>
> > > > >>>>>> Hi Paolo and Xiaoguang,
> > > > >>>>>>
> > > > >>>>>> I am just wondering how device driver can register a notifier so he
> > > > >>>>>> can be
> > > > >>>>>> notified for write-protected pages when writes are happening.      
> > > > >>>>>
> > > > >>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> > > > >>>>> currently where a struct kvm_device* and struct vfio_group* touch.
> > > > >>>>> Given
> > > > >>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> > > > >>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
> > > > >>>>> that passes the struct kvm_device* to the mdev device.
> > > > >>>>>
> > > > >>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> > > > >>>>> at KVM Forum but I don't remember the details.      
> > > > >>>>
> > > > >>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> > > > >>>> figure out the kvm instance based on the fd.
> > > > >>>>
> > > > >>>> We got a new idea, how about search the kvm instance by mm_struct, it
> > > > >>>> can work as KVMGT is running in the vcpu context and it is much more
> > > > >>>> straightforward.      
> > > > >>>
> > > > >>> Perhaps I didn't understand your suggestion, but the same mm_struct can
> > > > >>> have more than 1 struct kvm so I'm not sure that it can work.      
> > > > >>
> > > > >> vcpu->pid is valid during vcpu running so that it can be used to figure
> > > > >> out which kvm instance owns the vcpu whose pid is the one as current
> > > > >> thread, i think it can work. :)      
> > > > > 
> > > > > No, don't do that.  There's no reason for a thread to run a single VCPU,
> > > > > and if you can have multiple VCPUs you can also have multiple VCPUs from
> > > > > multiple VMs.
> > > > > 
> > > > > Passing file descriptors around are the right way to connect subsystems.      
> > > > 
> > > > [CC Alex, Kevin and Qemu-devel]
> > > > 
> > > > Hi Paolo & Alex,
> > > > 
> > > > IIUC, passing file descriptors means touching QEMU and the UAPI between
> > > > QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> > > > on the correct direction, I'll send the split ones. Thanks!
> > > > 
> > > > --
> > > > Thanks,
> > > > Jike
> > > > 
> > > > 
> > > > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > > > index bec694c..f715d37 100644
> > > > --- a/hw/vfio/pci-quirks.c
> > > > +++ b/hw/vfio/pci-quirks.c
> > > > @@ -10,12 +10,14 @@
> > > >   * the COPYING file in the top-level directory.
> > > >   */
> > > >  
> > > > +#include <sys/ioctl.h>
> > > >  #include "qemu/osdep.h"
> > > >  #include "qemu/error-report.h"
> > > >  #include "qemu/range.h"
> > > >  #include "qapi/error.h"
> > > >  #include "hw/nvram/fw_cfg.h"
> > > >  #include "pci.h"
> > > > +#include "sysemu/kvm.h"
> > > >  #include "trace.h"
> > > >  
> > > >  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> > > > @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> > > >          break;
> > > >      }
> > > >  }
> > > > +
> > > > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> > > > +{
> > > > +    int vmfd;
> > > > +
> > > > +    if (!kvm_enabled() || !vdev->kvmgt)
> > > > +        return;
> > > > +
> > > > +    /* Tell the device what KVM it attached */
> > > > +    vmfd = kvm_get_vmfd(kvm_state);
> > > > +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> > > > +}
> > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > index a5a620a..8732552 100644
> > > > --- a/hw/vfio/pci.c
> > > > +++ b/hw/vfio/pci.c
> > > > @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
> > > >          return ret;
> > > >      }
> > > >  
> > > > +    vfio_quirk_kvmgt(vdev);
> > > > +
> > > >      /* Get a copy of config space */
> > > >      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
> > > >                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> > > > @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
> > > >      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
> > > >                         sub_device_id, PCI_ANY_ID),
> > > >      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
> > > > +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),    
> > > 
> > > Just a side note, device options are a headache, users are prone to get
> > > them wrong and minimally it requires an entire round to get libvirt
> > > support.  We should be able to detect from the device or vfio API
> > > whether such a call is required.  Obviously if we can use the existing
> > > kvm-vfio device, that's the better option anyway.  Thanks,  
> > 
> > Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
> > does, it needs to produce a device failure when unavailable.  Thanks,  
> 
> Also, I would like to see this as an generic feature instead of
> kvmgt specific interface, so we don't have to add new options to QEMU and it is
> up to the vendor driver to proceed with or without it.

In general this should be decided by lack of some required feature
exclusively provided by KVM.  I would not want to add a generic opt-out
for mdev vendor drivers to decide that they arbitrarily want to disable
that path.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-14 16:51                             ` Alex Williamson
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Williamson @ 2016-10-14 16:51 UTC (permalink / raw)
  To: Neo Jia
  Cc: Jike Song, Paolo Bonzini, Tian, Kevin, kvm, guangrong.xiao,
	Xiaoguang Chen, qemu-devel, Kirti Wankhede, Xiao Guangrong

On Fri, 14 Oct 2016 09:35:45 -0700
Neo Jia <cjia@nvidia.com> wrote:

> On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:
> > On Fri, 14 Oct 2016 08:41:58 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> > > On Fri, 14 Oct 2016 18:37:45 +0800
> > > Jike Song <jike.song@intel.com> wrote:
> > >   
> > > > On 10/11/2016 05:47 PM, Paolo Bonzini wrote:    
> > > > > 
> > > > > 
> > > > > On 11/10/2016 11:21, Xiao Guangrong wrote:      
> > > > >>
> > > > >>
> > > > >> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:      
> > > > >>>
> > > > >>>
> > > > >>> On 11/10/2016 04:39, Xiao Guangrong wrote:      
> > > > >>>>
> > > > >>>>
> > > > >>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:      
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On 10/10/2016 20:01, Neo Jia wrote:      
> > > > >>>>>>> Hi Neo,
> > > > >>>>>>>
> > > > >>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> > > > >>>>>>> while nVidia does.      
> > > > >>>>>>
> > > > >>>>>> Hi Paolo and Xiaoguang,
> > > > >>>>>>
> > > > >>>>>> I am just wondering how device driver can register a notifier so he
> > > > >>>>>> can be
> > > > >>>>>> notified for write-protected pages when writes are happening.      
> > > > >>>>>
> > > > >>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> > > > >>>>> currently where a struct kvm_device* and struct vfio_group* touch.
> > > > >>>>> Given
> > > > >>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> > > > >>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
> > > > >>>>> that passes the struct kvm_device* to the mdev device.
> > > > >>>>>
> > > > >>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> > > > >>>>> at KVM Forum but I don't remember the details.      
> > > > >>>>
> > > > >>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> > > > >>>> figure out the kvm instance based on the fd.
> > > > >>>>
> > > > >>>> We got a new idea, how about search the kvm instance by mm_struct, it
> > > > >>>> can work as KVMGT is running in the vcpu context and it is much more
> > > > >>>> straightforward.      
> > > > >>>
> > > > >>> Perhaps I didn't understand your suggestion, but the same mm_struct can
> > > > >>> have more than 1 struct kvm so I'm not sure that it can work.      
> > > > >>
> > > > >> vcpu->pid is valid during vcpu running so that it can be used to figure
> > > > >> out which kvm instance owns the vcpu whose pid is the one as current
> > > > >> thread, i think it can work. :)      
> > > > > 
> > > > > No, don't do that.  There's no reason for a thread to run a single VCPU,
> > > > > and if you can have multiple VCPUs you can also have multiple VCPUs from
> > > > > multiple VMs.
> > > > > 
> > > > > Passing file descriptors around are the right way to connect subsystems.      
> > > > 
> > > > [CC Alex, Kevin and Qemu-devel]
> > > > 
> > > > Hi Paolo & Alex,
> > > > 
> > > > IIUC, passing file descriptors means touching QEMU and the UAPI between
> > > > QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> > > > on the correct direction, I'll send the split ones. Thanks!
> > > > 
> > > > --
> > > > Thanks,
> > > > Jike
> > > > 
> > > > 
> > > > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > > > index bec694c..f715d37 100644
> > > > --- a/hw/vfio/pci-quirks.c
> > > > +++ b/hw/vfio/pci-quirks.c
> > > > @@ -10,12 +10,14 @@
> > > >   * the COPYING file in the top-level directory.
> > > >   */
> > > >  
> > > > +#include <sys/ioctl.h>
> > > >  #include "qemu/osdep.h"
> > > >  #include "qemu/error-report.h"
> > > >  #include "qemu/range.h"
> > > >  #include "qapi/error.h"
> > > >  #include "hw/nvram/fw_cfg.h"
> > > >  #include "pci.h"
> > > > +#include "sysemu/kvm.h"
> > > >  #include "trace.h"
> > > >  
> > > >  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> > > > @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> > > >          break;
> > > >      }
> > > >  }
> > > > +
> > > > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> > > > +{
> > > > +    int vmfd;
> > > > +
> > > > +    if (!kvm_enabled() || !vdev->kvmgt)
> > > > +        return;
> > > > +
> > > > +    /* Tell the device what KVM it attached */
> > > > +    vmfd = kvm_get_vmfd(kvm_state);
> > > > +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> > > > +}
> > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > index a5a620a..8732552 100644
> > > > --- a/hw/vfio/pci.c
> > > > +++ b/hw/vfio/pci.c
> > > > @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
> > > >          return ret;
> > > >      }
> > > >  
> > > > +    vfio_quirk_kvmgt(vdev);
> > > > +
> > > >      /* Get a copy of config space */
> > > >      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
> > > >                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> > > > @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
> > > >      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
> > > >                         sub_device_id, PCI_ANY_ID),
> > > >      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
> > > > +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),    
> > > 
> > > Just a side note, device options are a headache, users are prone to get
> > > them wrong and minimally it requires an entire round to get libvirt
> > > support.  We should be able to detect from the device or vfio API
> > > whether such a call is required.  Obviously if we can use the existing
> > > kvm-vfio device, that's the better option anyway.  Thanks,  
> > 
> > Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
> > does, it needs to produce a device failure when unavailable.  Thanks,  
> 
> Also, I would like to see this as an generic feature instead of
> kvmgt specific interface, so we don't have to add new options to QEMU and it is
> up to the vendor driver to proceed with or without it.

In general this should be decided by lack of some required feature
exclusively provided by KVM.  I would not want to add a generic opt-out
for mdev vendor drivers to decide that they arbitrarily want to disable
that path.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-14 16:51                             ` Alex Williamson
@ 2016-10-14 22:19                               ` Neo Jia
  -1 siblings, 0 replies; 58+ messages in thread
From: Neo Jia @ 2016-10-14 22:19 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jike Song, Paolo Bonzini, Tian, Kevin, kvm, guangrong.xiao,
	Xiaoguang Chen, qemu-devel, Kirti Wankhede, Xiao Guangrong

On Fri, Oct 14, 2016 at 10:51:24AM -0600, Alex Williamson wrote:
> On Fri, 14 Oct 2016 09:35:45 -0700
> Neo Jia <cjia@nvidia.com> wrote:
> 
> > On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:
> > > On Fri, 14 Oct 2016 08:41:58 -0600
> > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > >   
> > > > On Fri, 14 Oct 2016 18:37:45 +0800
> > > > Jike Song <jike.song@intel.com> wrote:
> > > >   
> > > > > On 10/11/2016 05:47 PM, Paolo Bonzini wrote:    
> > > > > > 
> > > > > > 
> > > > > > On 11/10/2016 11:21, Xiao Guangrong wrote:      
> > > > > >>
> > > > > >>
> > > > > >> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:      
> > > > > >>>
> > > > > >>>
> > > > > >>> On 11/10/2016 04:39, Xiao Guangrong wrote:      
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:      
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On 10/10/2016 20:01, Neo Jia wrote:      
> > > > > >>>>>>> Hi Neo,
> > > > > >>>>>>>
> > > > > >>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> > > > > >>>>>>> while nVidia does.      
> > > > > >>>>>>
> > > > > >>>>>> Hi Paolo and Xiaoguang,
> > > > > >>>>>>
> > > > > >>>>>> I am just wondering how device driver can register a notifier so he
> > > > > >>>>>> can be
> > > > > >>>>>> notified for write-protected pages when writes are happening.      
> > > > > >>>>>
> > > > > >>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> > > > > >>>>> currently where a struct kvm_device* and struct vfio_group* touch.
> > > > > >>>>> Given
> > > > > >>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> > > > > >>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
> > > > > >>>>> that passes the struct kvm_device* to the mdev device.
> > > > > >>>>>
> > > > > >>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> > > > > >>>>> at KVM Forum but I don't remember the details.      
> > > > > >>>>
> > > > > >>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> > > > > >>>> figure out the kvm instance based on the fd.
> > > > > >>>>
> > > > > >>>> We got a new idea, how about search the kvm instance by mm_struct, it
> > > > > >>>> can work as KVMGT is running in the vcpu context and it is much more
> > > > > >>>> straightforward.      
> > > > > >>>
> > > > > >>> Perhaps I didn't understand your suggestion, but the same mm_struct can
> > > > > >>> have more than 1 struct kvm so I'm not sure that it can work.      
> > > > > >>
> > > > > >> vcpu->pid is valid during vcpu running so that it can be used to figure
> > > > > >> out which kvm instance owns the vcpu whose pid is the one as current
> > > > > >> thread, i think it can work. :)      
> > > > > > 
> > > > > > No, don't do that.  There's no reason for a thread to run a single VCPU,
> > > > > > and if you can have multiple VCPUs you can also have multiple VCPUs from
> > > > > > multiple VMs.
> > > > > > 
> > > > > > Passing file descriptors around are the right way to connect subsystems.      
> > > > > 
> > > > > [CC Alex, Kevin and Qemu-devel]
> > > > > 
> > > > > Hi Paolo & Alex,
> > > > > 
> > > > > IIUC, passing file descriptors means touching QEMU and the UAPI between
> > > > > QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> > > > > on the correct direction, I'll send the split ones. Thanks!
> > > > > 
> > > > > --
> > > > > Thanks,
> > > > > Jike
> > > > > 
> > > > > 
> > > > > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > > > > index bec694c..f715d37 100644
> > > > > --- a/hw/vfio/pci-quirks.c
> > > > > +++ b/hw/vfio/pci-quirks.c
> > > > > @@ -10,12 +10,14 @@
> > > > >   * the COPYING file in the top-level directory.
> > > > >   */
> > > > >  
> > > > > +#include <sys/ioctl.h>
> > > > >  #include "qemu/osdep.h"
> > > > >  #include "qemu/error-report.h"
> > > > >  #include "qemu/range.h"
> > > > >  #include "qapi/error.h"
> > > > >  #include "hw/nvram/fw_cfg.h"
> > > > >  #include "pci.h"
> > > > > +#include "sysemu/kvm.h"
> > > > >  #include "trace.h"
> > > > >  
> > > > >  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> > > > > @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> > > > >          break;
> > > > >      }
> > > > >  }
> > > > > +
> > > > > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> > > > > +{
> > > > > +    int vmfd;
> > > > > +
> > > > > +    if (!kvm_enabled() || !vdev->kvmgt)
> > > > > +        return;
> > > > > +
> > > > > +    /* Tell the device what KVM it attached */
> > > > > +    vmfd = kvm_get_vmfd(kvm_state);
> > > > > +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> > > > > +}
> > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > > index a5a620a..8732552 100644
> > > > > --- a/hw/vfio/pci.c
> > > > > +++ b/hw/vfio/pci.c
> > > > > @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
> > > > >          return ret;
> > > > >      }
> > > > >  
> > > > > +    vfio_quirk_kvmgt(vdev);
> > > > > +
> > > > >      /* Get a copy of config space */
> > > > >      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
> > > > >                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> > > > > @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
> > > > >      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
> > > > >                         sub_device_id, PCI_ANY_ID),
> > > > >      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
> > > > > +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),    
> > > > 
> > > > Just a side note, device options are a headache, users are prone to get
> > > > them wrong and minimally it requires an entire round to get libvirt
> > > > support.  We should be able to detect from the device or vfio API
> > > > whether such a call is required.  Obviously if we can use the existing
> > > > kvm-vfio device, that's the better option anyway.  Thanks,  
> > > 
> > > Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
> > > does, it needs to produce a device failure when unavailable.  Thanks,  
> > 
> > Also, I would like to see this as an generic feature instead of
> > kvmgt specific interface, so we don't have to add new options to QEMU and it is
> > up to the vendor driver to proceed with or without it.
> 
> In general this should be decided by lack of some required feature
> exclusively provided by KVM.  I would not want to add a generic opt-out
> for mdev vendor drivers to decide that they arbitrarily want to disable
> that path.  Thanks,

IIUC, you are suggesting that this path should be controlled by KVM feature cap
and it will be accessible to VFIO users when such checking is satisfied.

Thanks,
Neo

> 
> Alex

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-14 22:19                               ` Neo Jia
  0 siblings, 0 replies; 58+ messages in thread
From: Neo Jia @ 2016-10-14 22:19 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jike Song, Paolo Bonzini, Tian, Kevin, kvm, guangrong.xiao,
	Xiaoguang Chen, qemu-devel, Kirti Wankhede, Xiao Guangrong

On Fri, Oct 14, 2016 at 10:51:24AM -0600, Alex Williamson wrote:
> On Fri, 14 Oct 2016 09:35:45 -0700
> Neo Jia <cjia@nvidia.com> wrote:
> 
> > On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:
> > > On Fri, 14 Oct 2016 08:41:58 -0600
> > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > >   
> > > > On Fri, 14 Oct 2016 18:37:45 +0800
> > > > Jike Song <jike.song@intel.com> wrote:
> > > >   
> > > > > On 10/11/2016 05:47 PM, Paolo Bonzini wrote:    
> > > > > > 
> > > > > > 
> > > > > > On 11/10/2016 11:21, Xiao Guangrong wrote:      
> > > > > >>
> > > > > >>
> > > > > >> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:      
> > > > > >>>
> > > > > >>>
> > > > > >>> On 11/10/2016 04:39, Xiao Guangrong wrote:      
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:      
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On 10/10/2016 20:01, Neo Jia wrote:      
> > > > > >>>>>>> Hi Neo,
> > > > > >>>>>>>
> > > > > >>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> > > > > >>>>>>> while nVidia does.      
> > > > > >>>>>>
> > > > > >>>>>> Hi Paolo and Xiaoguang,
> > > > > >>>>>>
> > > > > >>>>>> I am just wondering how device driver can register a notifier so he
> > > > > >>>>>> can be
> > > > > >>>>>> notified for write-protected pages when writes are happening.      
> > > > > >>>>>
> > > > > >>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> > > > > >>>>> currently where a struct kvm_device* and struct vfio_group* touch.
> > > > > >>>>> Given
> > > > > >>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> > > > > >>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
> > > > > >>>>> that passes the struct kvm_device* to the mdev device.
> > > > > >>>>>
> > > > > >>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> > > > > >>>>> at KVM Forum but I don't remember the details.      
> > > > > >>>>
> > > > > >>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> > > > > >>>> figure out the kvm instance based on the fd.
> > > > > >>>>
> > > > > >>>> We got a new idea, how about search the kvm instance by mm_struct, it
> > > > > >>>> can work as KVMGT is running in the vcpu context and it is much more
> > > > > >>>> straightforward.      
> > > > > >>>
> > > > > >>> Perhaps I didn't understand your suggestion, but the same mm_struct can
> > > > > >>> have more than 1 struct kvm so I'm not sure that it can work.      
> > > > > >>
> > > > > >> vcpu->pid is valid during vcpu running so that it can be used to figure
> > > > > >> out which kvm instance owns the vcpu whose pid is the one as current
> > > > > >> thread, i think it can work. :)      
> > > > > > 
> > > > > > No, don't do that.  There's no reason for a thread to run a single VCPU,
> > > > > > and if you can have multiple VCPUs you can also have multiple VCPUs from
> > > > > > multiple VMs.
> > > > > > 
> > > > > > Passing file descriptors around are the right way to connect subsystems.      
> > > > > 
> > > > > [CC Alex, Kevin and Qemu-devel]
> > > > > 
> > > > > Hi Paolo & Alex,
> > > > > 
> > > > > IIUC, passing file descriptors means touching QEMU and the UAPI between
> > > > > QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> > > > > on the correct direction, I'll send the split ones. Thanks!
> > > > > 
> > > > > --
> > > > > Thanks,
> > > > > Jike
> > > > > 
> > > > > 
> > > > > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > > > > index bec694c..f715d37 100644
> > > > > --- a/hw/vfio/pci-quirks.c
> > > > > +++ b/hw/vfio/pci-quirks.c
> > > > > @@ -10,12 +10,14 @@
> > > > >   * the COPYING file in the top-level directory.
> > > > >   */
> > > > >  
> > > > > +#include <sys/ioctl.h>
> > > > >  #include "qemu/osdep.h"
> > > > >  #include "qemu/error-report.h"
> > > > >  #include "qemu/range.h"
> > > > >  #include "qapi/error.h"
> > > > >  #include "hw/nvram/fw_cfg.h"
> > > > >  #include "pci.h"
> > > > > +#include "sysemu/kvm.h"
> > > > >  #include "trace.h"
> > > > >  
> > > > >  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> > > > > @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> > > > >          break;
> > > > >      }
> > > > >  }
> > > > > +
> > > > > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> > > > > +{
> > > > > +    int vmfd;
> > > > > +
> > > > > +    if (!kvm_enabled() || !vdev->kvmgt)
> > > > > +        return;
> > > > > +
> > > > > +    /* Tell the device what KVM it attached */
> > > > > +    vmfd = kvm_get_vmfd(kvm_state);
> > > > > +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> > > > > +}
> > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > > index a5a620a..8732552 100644
> > > > > --- a/hw/vfio/pci.c
> > > > > +++ b/hw/vfio/pci.c
> > > > > @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
> > > > >          return ret;
> > > > >      }
> > > > >  
> > > > > +    vfio_quirk_kvmgt(vdev);
> > > > > +
> > > > >      /* Get a copy of config space */
> > > > >      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
> > > > >                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> > > > > @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
> > > > >      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
> > > > >                         sub_device_id, PCI_ANY_ID),
> > > > >      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
> > > > > +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),    
> > > > 
> > > > Just a side note, device options are a headache, users are prone to get
> > > > them wrong and minimally it requires an entire round to get libvirt
> > > > support.  We should be able to detect from the device or vfio API
> > > > whether such a call is required.  Obviously if we can use the existing
> > > > kvm-vfio device, that's the better option anyway.  Thanks,  
> > > 
> > > Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
> > > does, it needs to produce a device failure when unavailable.  Thanks,  
> > 
> > Also, I would like to see this as an generic feature instead of
> > kvmgt specific interface, so we don't have to add new options to QEMU and it is
> > up to the vendor driver to proceed with or without it.
> 
> In general this should be decided by lack of some required feature
> exclusively provided by KVM.  I would not want to add a generic opt-out
> for mdev vendor drivers to decide that they arbitrarily want to disable
> that path.  Thanks,

IIUC, you are suggesting that this path should be controlled by KVM feature cap
and it will be accessible to VFIO users when such checking is satisfied.

Thanks,
Neo

> 
> Alex

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-14 22:19                               ` Neo Jia
@ 2016-10-17 16:02                                 ` Alex Williamson
  -1 siblings, 0 replies; 58+ messages in thread
From: Alex Williamson @ 2016-10-17 16:02 UTC (permalink / raw)
  To: Neo Jia
  Cc: Jike Song, Paolo Bonzini, Tian, Kevin, kvm, guangrong.xiao,
	Xiaoguang Chen, qemu-devel, Kirti Wankhede, Xiao Guangrong

On Fri, 14 Oct 2016 15:19:01 -0700
Neo Jia <cjia@nvidia.com> wrote:

> On Fri, Oct 14, 2016 at 10:51:24AM -0600, Alex Williamson wrote:
> > On Fri, 14 Oct 2016 09:35:45 -0700
> > Neo Jia <cjia@nvidia.com> wrote:
> >   
> > > On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:  
> > > > On Fri, 14 Oct 2016 08:41:58 -0600
> > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > >     
> > > > > On Fri, 14 Oct 2016 18:37:45 +0800
> > > > > Jike Song <jike.song@intel.com> wrote:
> > > > >     
> > > > > > On 10/11/2016 05:47 PM, Paolo Bonzini wrote:      
> > > > > > > 
> > > > > > > 
> > > > > > > On 11/10/2016 11:21, Xiao Guangrong wrote:        
> > > > > > >>
> > > > > > >>
> > > > > > >> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:        
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> On 11/10/2016 04:39, Xiao Guangrong wrote:        
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:        
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> On 10/10/2016 20:01, Neo Jia wrote:        
> > > > > > >>>>>>> Hi Neo,
> > > > > > >>>>>>>
> > > > > > >>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> > > > > > >>>>>>> while nVidia does.        
> > > > > > >>>>>>
> > > > > > >>>>>> Hi Paolo and Xiaoguang,
> > > > > > >>>>>>
> > > > > > >>>>>> I am just wondering how device driver can register a notifier so he
> > > > > > >>>>>> can be
> > > > > > >>>>>> notified for write-protected pages when writes are happening.        
> > > > > > >>>>>
> > > > > > >>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> > > > > > >>>>> currently where a struct kvm_device* and struct vfio_group* touch.
> > > > > > >>>>> Given
> > > > > > >>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> > > > > > >>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
> > > > > > >>>>> that passes the struct kvm_device* to the mdev device.
> > > > > > >>>>>
> > > > > > >>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> > > > > > >>>>> at KVM Forum but I don't remember the details.        
> > > > > > >>>>
> > > > > > >>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> > > > > > >>>> figure out the kvm instance based on the fd.
> > > > > > >>>>
> > > > > > >>>> We got a new idea, how about search the kvm instance by mm_struct, it
> > > > > > >>>> can work as KVMGT is running in the vcpu context and it is much more
> > > > > > >>>> straightforward.        
> > > > > > >>>
> > > > > > >>> Perhaps I didn't understand your suggestion, but the same mm_struct can
> > > > > > >>> have more than 1 struct kvm so I'm not sure that it can work.        
> > > > > > >>
> > > > > > >> vcpu->pid is valid during vcpu running so that it can be used to figure
> > > > > > >> out which kvm instance owns the vcpu whose pid is the one as current
> > > > > > >> thread, i think it can work. :)        
> > > > > > > 
> > > > > > > No, don't do that.  There's no reason for a thread to run a single VCPU,
> > > > > > > and if you can have multiple VCPUs you can also have multiple VCPUs from
> > > > > > > multiple VMs.
> > > > > > > 
> > > > > > > Passing file descriptors around are the right way to connect subsystems.        
> > > > > > 
> > > > > > [CC Alex, Kevin and Qemu-devel]
> > > > > > 
> > > > > > Hi Paolo & Alex,
> > > > > > 
> > > > > > IIUC, passing file descriptors means touching QEMU and the UAPI between
> > > > > > QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> > > > > > on the correct direction, I'll send the split ones. Thanks!
> > > > > > 
> > > > > > --
> > > > > > Thanks,
> > > > > > Jike
> > > > > > 
> > > > > > 
> > > > > > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > > > > > index bec694c..f715d37 100644
> > > > > > --- a/hw/vfio/pci-quirks.c
> > > > > > +++ b/hw/vfio/pci-quirks.c
> > > > > > @@ -10,12 +10,14 @@
> > > > > >   * the COPYING file in the top-level directory.
> > > > > >   */
> > > > > >  
> > > > > > +#include <sys/ioctl.h>
> > > > > >  #include "qemu/osdep.h"
> > > > > >  #include "qemu/error-report.h"
> > > > > >  #include "qemu/range.h"
> > > > > >  #include "qapi/error.h"
> > > > > >  #include "hw/nvram/fw_cfg.h"
> > > > > >  #include "pci.h"
> > > > > > +#include "sysemu/kvm.h"
> > > > > >  #include "trace.h"
> > > > > >  
> > > > > >  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> > > > > > @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> > > > > >          break;
> > > > > >      }
> > > > > >  }
> > > > > > +
> > > > > > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> > > > > > +{
> > > > > > +    int vmfd;
> > > > > > +
> > > > > > +    if (!kvm_enabled() || !vdev->kvmgt)
> > > > > > +        return;
> > > > > > +
> > > > > > +    /* Tell the device what KVM it attached */
> > > > > > +    vmfd = kvm_get_vmfd(kvm_state);
> > > > > > +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> > > > > > +}
> > > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > > > index a5a620a..8732552 100644
> > > > > > --- a/hw/vfio/pci.c
> > > > > > +++ b/hw/vfio/pci.c
> > > > > > @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
> > > > > >          return ret;
> > > > > >      }
> > > > > >  
> > > > > > +    vfio_quirk_kvmgt(vdev);
> > > > > > +
> > > > > >      /* Get a copy of config space */
> > > > > >      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
> > > > > >                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> > > > > > @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
> > > > > >      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
> > > > > >                         sub_device_id, PCI_ANY_ID),
> > > > > >      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
> > > > > > +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),      
> > > > > 
> > > > > Just a side note, device options are a headache, users are prone to get
> > > > > them wrong and minimally it requires an entire round to get libvirt
> > > > > support.  We should be able to detect from the device or vfio API
> > > > > whether such a call is required.  Obviously if we can use the existing
> > > > > kvm-vfio device, that's the better option anyway.  Thanks,    
> > > > 
> > > > Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
> > > > does, it needs to produce a device failure when unavailable.  Thanks,    
> > > 
> > > Also, I would like to see this as an generic feature instead of
> > > kvmgt specific interface, so we don't have to add new options to QEMU and it is
> > > up to the vendor driver to proceed with or without it.  
> > 
> > In general this should be decided by lack of some required feature
> > exclusively provided by KVM.  I would not want to add a generic opt-out
> > for mdev vendor drivers to decide that they arbitrarily want to disable
> > that path.  Thanks,  
> 
> IIUC, you are suggesting that this path should be controlled by KVM feature cap
> and it will be accessible to VFIO users when such checking is satisfied.

Maybe we're getting too loose with our pronouns here, I'm starting to
lose track of what "this" is referring to.  I agree that there's no
reason for the ioctl, as proposed to be kvmgt specific.  I would hope
that going through the kvm-vfio device to create that linkage would
eliminate that, but we'll need to see what Jike can come up with to
plumb between KVM and vfio.  Vendor drivers can implement their own
ioctls, now that we pass them through the mdev layer, but someone needs
to call those ioctls.  Ideally we want something programmatic to
trigger that, without requiring a user to pass an extra device
parameter.  Additionally, if there is any hope of making use of the
device with userspace drivers other than QEMU, hard dependencies on KVM
should be avoided.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-17 16:02                                 ` Alex Williamson
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Williamson @ 2016-10-17 16:02 UTC (permalink / raw)
  To: Neo Jia
  Cc: Jike Song, Paolo Bonzini, Tian, Kevin, kvm, guangrong.xiao,
	Xiaoguang Chen, qemu-devel, Kirti Wankhede, Xiao Guangrong

On Fri, 14 Oct 2016 15:19:01 -0700
Neo Jia <cjia@nvidia.com> wrote:

> On Fri, Oct 14, 2016 at 10:51:24AM -0600, Alex Williamson wrote:
> > On Fri, 14 Oct 2016 09:35:45 -0700
> > Neo Jia <cjia@nvidia.com> wrote:
> >   
> > > On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:  
> > > > On Fri, 14 Oct 2016 08:41:58 -0600
> > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > >     
> > > > > On Fri, 14 Oct 2016 18:37:45 +0800
> > > > > Jike Song <jike.song@intel.com> wrote:
> > > > >     
> > > > > > On 10/11/2016 05:47 PM, Paolo Bonzini wrote:      
> > > > > > > 
> > > > > > > 
> > > > > > > On 11/10/2016 11:21, Xiao Guangrong wrote:        
> > > > > > >>
> > > > > > >>
> > > > > > >> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:        
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> On 11/10/2016 04:39, Xiao Guangrong wrote:        
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:        
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> On 10/10/2016 20:01, Neo Jia wrote:        
> > > > > > >>>>>>> Hi Neo,
> > > > > > >>>>>>>
> > > > > > >>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> > > > > > >>>>>>> while nVidia does.        
> > > > > > >>>>>>
> > > > > > >>>>>> Hi Paolo and Xiaoguang,
> > > > > > >>>>>>
> > > > > > >>>>>> I am just wondering how device driver can register a notifier so he
> > > > > > >>>>>> can be
> > > > > > >>>>>> notified for write-protected pages when writes are happening.        
> > > > > > >>>>>
> > > > > > >>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> > > > > > >>>>> currently where a struct kvm_device* and struct vfio_group* touch.
> > > > > > >>>>> Given
> > > > > > >>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> > > > > > >>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
> > > > > > >>>>> that passes the struct kvm_device* to the mdev device.
> > > > > > >>>>>
> > > > > > >>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> > > > > > >>>>> at KVM Forum but I don't remember the details.        
> > > > > > >>>>
> > > > > > >>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> > > > > > >>>> figure out the kvm instance based on the fd.
> > > > > > >>>>
> > > > > > >>>> We got a new idea, how about search the kvm instance by mm_struct, it
> > > > > > >>>> can work as KVMGT is running in the vcpu context and it is much more
> > > > > > >>>> straightforward.        
> > > > > > >>>
> > > > > > >>> Perhaps I didn't understand your suggestion, but the same mm_struct can
> > > > > > >>> have more than 1 struct kvm so I'm not sure that it can work.        
> > > > > > >>
> > > > > > >> vcpu->pid is valid during vcpu running so that it can be used to figure
> > > > > > >> out which kvm instance owns the vcpu whose pid is the one as current
> > > > > > >> thread, i think it can work. :)        
> > > > > > > 
> > > > > > > No, don't do that.  There's no reason for a thread to run a single VCPU,
> > > > > > > and if you can have multiple VCPUs you can also have multiple VCPUs from
> > > > > > > multiple VMs.
> > > > > > > 
> > > > > > > Passing file descriptors around are the right way to connect subsystems.        
> > > > > > 
> > > > > > [CC Alex, Kevin and Qemu-devel]
> > > > > > 
> > > > > > Hi Paolo & Alex,
> > > > > > 
> > > > > > IIUC, passing file descriptors means touching QEMU and the UAPI between
> > > > > > QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> > > > > > on the correct direction, I'll send the split ones. Thanks!
> > > > > > 
> > > > > > --
> > > > > > Thanks,
> > > > > > Jike
> > > > > > 
> > > > > > 
> > > > > > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > > > > > index bec694c..f715d37 100644
> > > > > > --- a/hw/vfio/pci-quirks.c
> > > > > > +++ b/hw/vfio/pci-quirks.c
> > > > > > @@ -10,12 +10,14 @@
> > > > > >   * the COPYING file in the top-level directory.
> > > > > >   */
> > > > > >  
> > > > > > +#include <sys/ioctl.h>
> > > > > >  #include "qemu/osdep.h"
> > > > > >  #include "qemu/error-report.h"
> > > > > >  #include "qemu/range.h"
> > > > > >  #include "qapi/error.h"
> > > > > >  #include "hw/nvram/fw_cfg.h"
> > > > > >  #include "pci.h"
> > > > > > +#include "sysemu/kvm.h"
> > > > > >  #include "trace.h"
> > > > > >  
> > > > > >  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> > > > > > @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> > > > > >          break;
> > > > > >      }
> > > > > >  }
> > > > > > +
> > > > > > +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> > > > > > +{
> > > > > > +    int vmfd;
> > > > > > +
> > > > > > +    if (!kvm_enabled() || !vdev->kvmgt)
> > > > > > +        return;
> > > > > > +
> > > > > > +    /* Tell the device what KVM it attached */
> > > > > > +    vmfd = kvm_get_vmfd(kvm_state);
> > > > > > +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> > > > > > +}
> > > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > > > index a5a620a..8732552 100644
> > > > > > --- a/hw/vfio/pci.c
> > > > > > +++ b/hw/vfio/pci.c
> > > > > > @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
> > > > > >          return ret;
> > > > > >      }
> > > > > >  
> > > > > > +    vfio_quirk_kvmgt(vdev);
> > > > > > +
> > > > > >      /* Get a copy of config space */
> > > > > >      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
> > > > > >                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> > > > > > @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
> > > > > >      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
> > > > > >                         sub_device_id, PCI_ANY_ID),
> > > > > >      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
> > > > > > +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),      
> > > > > 
> > > > > Just a side note, device options are a headache, users are prone to get
> > > > > them wrong and minimally it requires an entire round to get libvirt
> > > > > support.  We should be able to detect from the device or vfio API
> > > > > whether such a call is required.  Obviously if we can use the existing
> > > > > kvm-vfio device, that's the better option anyway.  Thanks,    
> > > > 
> > > > Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
> > > > does, it needs to produce a device failure when unavailable.  Thanks,    
> > > 
> > > Also, I would like to see this as an generic feature instead of
> > > kvmgt specific interface, so we don't have to add new options to QEMU and it is
> > > up to the vendor driver to proceed with or without it.  
> > 
> > In general this should be decided by lack of some required feature
> > exclusively provided by KVM.  I would not want to add a generic opt-out
> > for mdev vendor drivers to decide that they arbitrarily want to disable
> > that path.  Thanks,  
> 
> IIUC, you are suggesting that this path should be controlled by KVM feature cap
> and it will be accessible to VFIO users when such checking is satisfied.

Maybe we're getting too loose with our pronouns here, I'm starting to
lose track of what "this" is referring to.  I agree that there's no
reason for the ioctl, as proposed to be kvmgt specific.  I would hope
that going through the kvm-vfio device to create that linkage would
eliminate that, but we'll need to see what Jike can come up with to
plumb between KVM and vfio.  Vendor drivers can implement their own
ioctls, now that we pass them through the mdev layer, but someone needs
to call those ioctls.  Ideally we want something programmatic to
trigger that, without requiring a user to pass an extra device
parameter.  Additionally, if there is any hope of making use of the
device with userspace drivers other than QEMU, hard dependencies on KVM
should be avoided.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-17 16:02                                 ` Alex Williamson
  (?)
@ 2016-10-18 12:38                                 ` Jike Song
  2016-10-18 14:59                                   ` Alex Williamson
  -1 siblings, 1 reply; 58+ messages in thread
From: Jike Song @ 2016-10-18 12:38 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Neo Jia, Paolo Bonzini, Tian, Kevin, kvm, guangrong.xiao,
	Xiaoguang Chen, qemu-devel, Kirti Wankhede, Xiao Guangrong

On 10/18/2016 12:02 AM, Alex Williamson wrote:
> On Fri, 14 Oct 2016 15:19:01 -0700
> Neo Jia <cjia@nvidia.com> wrote:
> 
>> On Fri, Oct 14, 2016 at 10:51:24AM -0600, Alex Williamson wrote:
>>> On Fri, 14 Oct 2016 09:35:45 -0700
>>> Neo Jia <cjia@nvidia.com> wrote:
>>>   
>>>> On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:  
>>>>> On Fri, 14 Oct 2016 08:41:58 -0600
>>>>> Alex Williamson <alex.williamson@redhat.com> wrote:
>>>>>     
>>>>>> On Fri, 14 Oct 2016 18:37:45 +0800
>>>>>> Jike Song <jike.song@intel.com> wrote:
>>>>>>     
>>>>>>> On 10/11/2016 05:47 PM, Paolo Bonzini wrote:      
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/10/2016 11:21, Xiao Guangrong wrote:        
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:        
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/10/2016 04:39, Xiao Guangrong wrote:        
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:        
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/10/2016 20:01, Neo Jia wrote:        
>>>>>>>>>>>>>> Hi Neo,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
>>>>>>>>>>>>>> while nVidia does.        
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Paolo and Xiaoguang,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am just wondering how device driver can register a notifier so he
>>>>>>>>>>>>> can be
>>>>>>>>>>>>> notified for write-protected pages when writes are happening.        
>>>>>>>>>>>>
>>>>>>>>>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
>>>>>>>>>>>> currently where a struct kvm_device* and struct vfio_group* touch.
>>>>>>>>>>>> Given
>>>>>>>>>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
>>>>>>>>>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
>>>>>>>>>>>> that passes the struct kvm_device* to the mdev device.
>>>>>>>>>>>>
>>>>>>>>>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
>>>>>>>>>>>> at KVM Forum but I don't remember the details.        
>>>>>>>>>>>
>>>>>>>>>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
>>>>>>>>>>> figure out the kvm instance based on the fd.
>>>>>>>>>>>
>>>>>>>>>>> We got a new idea, how about search the kvm instance by mm_struct, it
>>>>>>>>>>> can work as KVMGT is running in the vcpu context and it is much more
>>>>>>>>>>> straightforward.        
>>>>>>>>>>
>>>>>>>>>> Perhaps I didn't understand your suggestion, but the same mm_struct can
>>>>>>>>>> have more than 1 struct kvm so I'm not sure that it can work.        
>>>>>>>>>
>>>>>>>>> vcpu->pid is valid during vcpu running so that it can be used to figure
>>>>>>>>> out which kvm instance owns the vcpu whose pid is the one as current
>>>>>>>>> thread, i think it can work. :)        
>>>>>>>>
>>>>>>>> No, don't do that.  There's no reason for a thread to run a single VCPU,
>>>>>>>> and if you can have multiple VCPUs you can also have multiple VCPUs from
>>>>>>>> multiple VMs.
>>>>>>>>
>>>>>>>> Passing file descriptors around are the right way to connect subsystems.        
>>>>>>>
>>>>>>> [CC Alex, Kevin and Qemu-devel]
>>>>>>>
>>>>>>> Hi Paolo & Alex,
>>>>>>>
>>>>>>> IIUC, passing file descriptors means touching QEMU and the UAPI between
>>>>>>> QEMU and VFIO. Would you guys have a look at below draft patch? If it's
>>>>>>> on the correct direction, I'll send the split ones. Thanks!
>>>>>>>
>>>>>>> --
>>>>>>> Thanks,
>>>>>>> Jike
>>>>>>>
>>>>>>>
>>>>>>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>>>>>>> index bec694c..f715d37 100644
>>>>>>> --- a/hw/vfio/pci-quirks.c
>>>>>>> +++ b/hw/vfio/pci-quirks.c
>>>>>>> @@ -10,12 +10,14 @@
>>>>>>>   * the COPYING file in the top-level directory.
>>>>>>>   */
>>>>>>>  
>>>>>>> +#include <sys/ioctl.h>
>>>>>>>  #include "qemu/osdep.h"
>>>>>>>  #include "qemu/error-report.h"
>>>>>>>  #include "qemu/range.h"
>>>>>>>  #include "qapi/error.h"
>>>>>>>  #include "hw/nvram/fw_cfg.h"
>>>>>>>  #include "pci.h"
>>>>>>> +#include "sysemu/kvm.h"
>>>>>>>  #include "trace.h"
>>>>>>>  
>>>>>>>  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
>>>>>>> @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
>>>>>>>          break;
>>>>>>>      }
>>>>>>>  }
>>>>>>> +
>>>>>>> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
>>>>>>> +{
>>>>>>> +    int vmfd;
>>>>>>> +
>>>>>>> +    if (!kvm_enabled() || !vdev->kvmgt)
>>>>>>> +        return;
>>>>>>> +
>>>>>>> +    /* Tell the device what KVM it attached */
>>>>>>> +    vmfd = kvm_get_vmfd(kvm_state);
>>>>>>> +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
>>>>>>> +}
>>>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>>>>> index a5a620a..8732552 100644
>>>>>>> --- a/hw/vfio/pci.c
>>>>>>> +++ b/hw/vfio/pci.c
>>>>>>> @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
>>>>>>>          return ret;
>>>>>>>      }
>>>>>>>  
>>>>>>> +    vfio_quirk_kvmgt(vdev);
>>>>>>> +
>>>>>>>      /* Get a copy of config space */
>>>>>>>      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
>>>>>>>                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
>>>>>>> @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
>>>>>>>      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
>>>>>>>                         sub_device_id, PCI_ANY_ID),
>>>>>>>      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
>>>>>>> +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),      
>>>>>>
>>>>>> Just a side note, device options are a headache, users are prone to get
>>>>>> them wrong and minimally it requires an entire round to get libvirt
>>>>>> support.  We should be able to detect from the device or vfio API
>>>>>> whether such a call is required.  Obviously if we can use the existing
>>>>>> kvm-vfio device, that's the better option anyway.  Thanks,    
>>>>>
>>>>> Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
>>>>> does, it needs to produce a device failure when unavailable.  Thanks,    
>>>>
>>>> Also, I would like to see this as an generic feature instead of
>>>> kvmgt specific interface, so we don't have to add new options to QEMU and it is
>>>> up to the vendor driver to proceed with or without it.  
>>>
>>> In general this should be decided by lack of some required feature
>>> exclusively provided by KVM.  I would not want to add a generic opt-out
>>> for mdev vendor drivers to decide that they arbitrarily want to disable
>>> that path.  Thanks,  
>>
>> IIUC, you are suggesting that this path should be controlled by KVM feature cap
>> and it will be accessible to VFIO users when such checking is satisfied.
> 
> Maybe we're getting too loose with our pronouns here, I'm starting to
> lose track of what "this" is referring to.  I agree that there's no
> reason for the ioctl, as proposed to be kvmgt specific.  I would hope
> that going through the kvm-vfio device to create that linkage would
> eliminate that, but we'll need to see what Jike can come up with to
> plumb between KVM and vfio.  Vendor drivers can implement their own
> ioctls, now that we pass them through the mdev layer, but someone needs
> to call those ioctls.  Ideally we want something programmatic to
> trigger that, without requiring a user to pass an extra device
> parameter.  Additionally, if there is any hope of making use of the
> device with userspace drivers other than QEMU, hard dependencies on KVM
> should be avoided.  Thanks,
> 
> Alex
> 

Thanks for the advice, so I cooked another patch for your comments.
Basically a 'void *usrdata' is added to vfio_group, external users
can set it (kvm) or get it (kvm or other users like kvmgt).

BTW, in device-model, the open method will return failure to vfio-mdev
in case that such kvm information is not available.

--
Thanks,
Jike



diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index d1d70e0..6b8d1d2 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -86,6 +86,7 @@ struct vfio_group {
 	struct mutex			unbound_lock;
 	atomic_t			opened;
 	bool				noiommu;
+	void				*usrdata;
 };
 
 struct vfio_device {
@@ -447,14 +448,13 @@ static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
 }
 
 static
-struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
+struct vfio_group *__vfio_group_get_from_iommu(struct iommu_group *iommu_group)
 {
 	struct vfio_group *group;
 
 	mutex_lock(&vfio.group_lock);
 	list_for_each_entry(group, &vfio.group_list, vfio_next) {
 		if (group->iommu_group == iommu_group) {
-			vfio_group_get(group);
 			mutex_unlock(&vfio.group_lock);
 			return group;
 		}
@@ -464,6 +464,17 @@ struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
 	return NULL;
 }
 
+static
+struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
+{
+	struct vfio_group *group = __vfio_group_get_from_iommu(iommu_group);
+	if (!group)
+		return NULL;
+
+	vfio_group_get(group);
+	return group;
+}
+
 static struct vfio_group *vfio_group_get_from_minor(int minor)
 {
 	struct vfio_group *group;
@@ -1728,6 +1739,31 @@ long vfio_external_check_extension(struct vfio_group *group, unsigned long arg)
 }
 EXPORT_SYMBOL_GPL(vfio_external_check_extension);
 
+void vfio_group_set_usrdata(struct vfio_group *group, void *data)
+{
+	group->usrdata = data;
+}
+EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
+
+void *vfio_group_get_usrdata(struct vfio_group *group)
+{
+	return group->usrdata;
+}
+EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
+
+void *vfio_group_get_usrdata_by_device(struct device *dev)
+{
+	struct vfio_group *vfio_group;
+
+	vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
+	if (!vfio_group)
+		return NULL;
+
+	return vfio_group_get_usrdata(vfio_group);
+}
+EXPORT_SYMBOL_GPL(vfio_group_get_usrdata_by_device);
+
+
 /**
  * Sub-module support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 0ecae0b..712588f 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -91,6 +91,10 @@ extern void vfio_unregister_iommu_driver(
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
 extern long vfio_external_check_extension(struct vfio_group *group,
 					  unsigned long arg);
+extern void vfio_group_set_usrdata(struct vfio_group *group, void *data);
+extern void *vfio_group_get_usrdata(struct vfio_group *group);
+extern void *vfio_group_get_usrdata_by_device(struct device *dev);
+
 
 /*
  * Sub-module helpers
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 1dd087d..e00d401 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -60,6 +60,20 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
 	symbol_put(vfio_group_put_external_user);
 }
 
+static void kvm_vfio_group_set_kvm(struct vfio_group *group, void *kvm)
+{
+	void (*fn)(struct vfio_group *, void *);
+
+	fn = symbol_get(vfio_group_set_usrdata);
+	if (!fn)
+		return;
+
+	fn(group, kvm);
+	kvm_get_kvm(kvm);
+
+	symbol_put(vfio_group_set_usrdata);
+}
+
 static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
 {
 	long (*fn)(struct vfio_group *, unsigned long);
@@ -161,6 +175,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
 
 		kvm_vfio_update_coherency(dev);
 
+		kvm_vfio_group_set_kvm(vfio_group, dev->kvm);
+
 		return 0;
 
 	case KVM_DEV_VFIO_GROUP_DEL:
@@ -200,6 +216,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
 
 		kvm_vfio_update_coherency(dev);
 
+		kvm_put_kvm(dev->kvm);
+
 		return ret;
 	}
 

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-18 12:38                                 ` Jike Song
@ 2016-10-18 14:59                                   ` Alex Williamson
  2016-10-19  2:32                                     ` Jike Song
  0 siblings, 1 reply; 58+ messages in thread
From: Alex Williamson @ 2016-10-18 14:59 UTC (permalink / raw)
  To: Jike Song
  Cc: Tian, Kevin, Neo Jia, kvm, guangrong.xiao, Xiao Guangrong,
	qemu-devel, Xiaoguang Chen, Kirti Wankhede, Paolo Bonzini

On Tue, 18 Oct 2016 20:38:21 +0800
Jike Song <jike.song@intel.com> wrote:

> On 10/18/2016 12:02 AM, Alex Williamson wrote:
> > On Fri, 14 Oct 2016 15:19:01 -0700
> > Neo Jia <cjia@nvidia.com> wrote:
> >   
> >> On Fri, Oct 14, 2016 at 10:51:24AM -0600, Alex Williamson wrote:  
> >>> On Fri, 14 Oct 2016 09:35:45 -0700
> >>> Neo Jia <cjia@nvidia.com> wrote:
> >>>     
> >>>> On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:    
> >>>>> On Fri, 14 Oct 2016 08:41:58 -0600
> >>>>> Alex Williamson <alex.williamson@redhat.com> wrote:
> >>>>>       
> >>>>>> On Fri, 14 Oct 2016 18:37:45 +0800
> >>>>>> Jike Song <jike.song@intel.com> wrote:
> >>>>>>       
> >>>>>>> On 10/11/2016 05:47 PM, Paolo Bonzini wrote:        
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 11/10/2016 11:21, Xiao Guangrong wrote:          
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:          
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 11/10/2016 04:39, Xiao Guangrong wrote:          
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:          
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 10/10/2016 20:01, Neo Jia wrote:          
> >>>>>>>>>>>>>> Hi Neo,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
> >>>>>>>>>>>>>> while nVidia does.          
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Paolo and Xiaoguang,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am just wondering how device driver can register a notifier so he
> >>>>>>>>>>>>> can be
> >>>>>>>>>>>>> notified for write-protected pages when writes are happening.          
> >>>>>>>>>>>>
> >>>>>>>>>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
> >>>>>>>>>>>> currently where a struct kvm_device* and struct vfio_group* touch.
> >>>>>>>>>>>> Given
> >>>>>>>>>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
> >>>>>>>>>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
> >>>>>>>>>>>> that passes the struct kvm_device* to the mdev device.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
> >>>>>>>>>>>> at KVM Forum but I don't remember the details.          
> >>>>>>>>>>>
> >>>>>>>>>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
> >>>>>>>>>>> figure out the kvm instance based on the fd.
> >>>>>>>>>>>
> >>>>>>>>>>> We got a new idea, how about search the kvm instance by mm_struct, it
> >>>>>>>>>>> can work as KVMGT is running in the vcpu context and it is much more
> >>>>>>>>>>> straightforward.          
> >>>>>>>>>>
> >>>>>>>>>> Perhaps I didn't understand your suggestion, but the same mm_struct can
> >>>>>>>>>> have more than 1 struct kvm so I'm not sure that it can work.          
> >>>>>>>>>
> >>>>>>>>> vcpu->pid is valid during vcpu running so that it can be used to figure
> >>>>>>>>> out which kvm instance owns the vcpu whose pid is the one as current
> >>>>>>>>> thread, i think it can work. :)          
> >>>>>>>>
> >>>>>>>> No, don't do that.  There's no reason for a thread to run a single VCPU,
> >>>>>>>> and if you can have multiple VCPUs you can also have multiple VCPUs from
> >>>>>>>> multiple VMs.
> >>>>>>>>
> >>>>>>>> Passing file descriptors around are the right way to connect subsystems.          
> >>>>>>>
> >>>>>>> [CC Alex, Kevin and Qemu-devel]
> >>>>>>>
> >>>>>>> Hi Paolo & Alex,
> >>>>>>>
> >>>>>>> IIUC, passing file descriptors means touching QEMU and the UAPI between
> >>>>>>> QEMU and VFIO. Would you guys have a look at below draft patch? If it's
> >>>>>>> on the correct direction, I'll send the split ones. Thanks!
> >>>>>>>
> >>>>>>> --
> >>>>>>> Thanks,
> >>>>>>> Jike
> >>>>>>>
> >>>>>>>
> >>>>>>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> >>>>>>> index bec694c..f715d37 100644
> >>>>>>> --- a/hw/vfio/pci-quirks.c
> >>>>>>> +++ b/hw/vfio/pci-quirks.c
> >>>>>>> @@ -10,12 +10,14 @@
> >>>>>>>   * the COPYING file in the top-level directory.
> >>>>>>>   */
> >>>>>>>  
> >>>>>>> +#include <sys/ioctl.h>
> >>>>>>>  #include "qemu/osdep.h"
> >>>>>>>  #include "qemu/error-report.h"
> >>>>>>>  #include "qemu/range.h"
> >>>>>>>  #include "qapi/error.h"
> >>>>>>>  #include "hw/nvram/fw_cfg.h"
> >>>>>>>  #include "pci.h"
> >>>>>>> +#include "sysemu/kvm.h"
> >>>>>>>  #include "trace.h"
> >>>>>>>  
> >>>>>>>  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
> >>>>>>> @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> >>>>>>>          break;
> >>>>>>>      }
> >>>>>>>  }
> >>>>>>> +
> >>>>>>> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
> >>>>>>> +{
> >>>>>>> +    int vmfd;
> >>>>>>> +
> >>>>>>> +    if (!kvm_enabled() || !vdev->kvmgt)
> >>>>>>> +        return;
> >>>>>>> +
> >>>>>>> +    /* Tell the device what KVM it attached */
> >>>>>>> +    vmfd = kvm_get_vmfd(kvm_state);
> >>>>>>> +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
> >>>>>>> +}
> >>>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >>>>>>> index a5a620a..8732552 100644
> >>>>>>> --- a/hw/vfio/pci.c
> >>>>>>> +++ b/hw/vfio/pci.c
> >>>>>>> @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
> >>>>>>>          return ret;
> >>>>>>>      }
> >>>>>>>  
> >>>>>>> +    vfio_quirk_kvmgt(vdev);
> >>>>>>> +
> >>>>>>>      /* Get a copy of config space */
> >>>>>>>      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
> >>>>>>>                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> >>>>>>> @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
> >>>>>>>      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
> >>>>>>>                         sub_device_id, PCI_ANY_ID),
> >>>>>>>      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
> >>>>>>> +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),        
> >>>>>>
> >>>>>> Just a side note, device options are a headache, users are prone to get
> >>>>>> them wrong and minimally it requires an entire round to get libvirt
> >>>>>> support.  We should be able to detect from the device or vfio API
> >>>>>> whether such a call is required.  Obviously if we can use the existing
> >>>>>> kvm-vfio device, that's the better option anyway.  Thanks,      
> >>>>>
> >>>>> Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
> >>>>> does, it needs to produce a device failure when unavailable.  Thanks,      
> >>>>
> >>>> Also, I would like to see this as an generic feature instead of
> >>>> kvmgt specific interface, so we don't have to add new options to QEMU and it is
> >>>> up to the vendor driver to proceed with or without it.    
> >>>
> >>> In general this should be decided by lack of some required feature
> >>> exclusively provided by KVM.  I would not want to add a generic opt-out
> >>> for mdev vendor drivers to decide that they arbitrarily want to disable
> >>> that path.  Thanks,    
> >>
> >> IIUC, you are suggesting that this path should be controlled by KVM feature cap
> >> and it will be accessible to VFIO users when such checking is satisfied.  
> > 
> > Maybe we're getting too loose with our pronouns here, I'm starting to
> > lose track of what "this" is referring to.  I agree that there's no
> > reason for the ioctl, as proposed to be kvmgt specific.  I would hope
> > that going through the kvm-vfio device to create that linkage would
> > eliminate that, but we'll need to see what Jike can come up with to
> > plumb between KVM and vfio.  Vendor drivers can implement their own
> > ioctls, now that we pass them through the mdev layer, but someone needs
> > to call those ioctls.  Ideally we want something programmatic to
> > trigger that, without requiring a user to pass an extra device
> > parameter.  Additionally, if there is any hope of making use of the
> > device with userspace drivers other than QEMU, hard dependencies on KVM
> > should be avoided.  Thanks,
> > 
> > Alex
> >   
> 
> Thanks for the advice, so I cooked another patch for your comments.
> Basically a 'void *usrdata' is added to vfio_group, external users
> can set it (kvm) or get it (kvm or other users like kvmgt).
> 
> BTW, in device-model, the open method will return failure to vfio-mdev
> in case that such kvm information is not available.
> 
> --
> Thanks,
> Jike
> 
> 
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index d1d70e0..6b8d1d2 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -86,6 +86,7 @@ struct vfio_group {
>  	struct mutex			unbound_lock;
>  	atomic_t			opened;
>  	bool				noiommu;
> +	void				*usrdata;
>  };
>  
>  struct vfio_device {
> @@ -447,14 +448,13 @@ static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
>  }
>  
>  static
> -struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
> +struct vfio_group *__vfio_group_get_from_iommu(struct iommu_group *iommu_group)
>  {
>  	struct vfio_group *group;
>  
>  	mutex_lock(&vfio.group_lock);
>  	list_for_each_entry(group, &vfio.group_list, vfio_next) {
>  		if (group->iommu_group == iommu_group) {
> -			vfio_group_get(group);

This is wrong, we can't add our reference after we release the lock.

>  			mutex_unlock(&vfio.group_lock);
>  			return group;
>  		}
> @@ -464,6 +464,17 @@ struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
>  	return NULL;
>  }
>  
> +static
> +struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
> +{
> +	struct vfio_group *group = __vfio_group_get_from_iommu(iommu_group);
> +	if (!group)
> +		return NULL;
> +
> +	vfio_group_get(group);

We have no basis to get a reference here.  This function cannot exist
separate from the existing function above.

> +	return group;
> +}
> +
>  static struct vfio_group *vfio_group_get_from_minor(int minor)
>  {
>  	struct vfio_group *group;
> @@ -1728,6 +1739,31 @@ long vfio_external_check_extension(struct vfio_group *group, unsigned long arg)
>  }
>  EXPORT_SYMBOL_GPL(vfio_external_check_extension);
>  
> +void vfio_group_set_usrdata(struct vfio_group *group, void *data)
> +{
> +	group->usrdata = data;
> +}
> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
> +
> +void *vfio_group_get_usrdata(struct vfio_group *group)
> +{
> +	return group->usrdata;
> +}
> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
> +
> +void *vfio_group_get_usrdata_by_device(struct device *dev)
> +{
> +	struct vfio_group *vfio_group;
> +
> +	vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);

We actually need to use iommu_group_get() here.  Kirti adds a
vfio_group_get_from_dev() in v9 03/12 that does this properly.

> +	if (!vfio_group)
> +		return NULL;
> +
> +	return vfio_group_get_usrdata(vfio_group);

This operates on a group for which we have no reference.

> +}
> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata_by_device);
> +
> +
>  /**
>   * Sub-module support
>   */
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 0ecae0b..712588f 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -91,6 +91,10 @@ extern void vfio_unregister_iommu_driver(
>  extern int vfio_external_user_iommu_id(struct vfio_group *group);
>  extern long vfio_external_check_extension(struct vfio_group *group,
>  					  unsigned long arg);
> +extern void vfio_group_set_usrdata(struct vfio_group *group, void *data);
> +extern void *vfio_group_get_usrdata(struct vfio_group *group);
> +extern void *vfio_group_get_usrdata_by_device(struct device *dev);
> +
>  
>  /*
>   * Sub-module helpers
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 1dd087d..e00d401 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -60,6 +60,20 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
>  	symbol_put(vfio_group_put_external_user);
>  }
>  
> +static void kvm_vfio_group_set_kvm(struct vfio_group *group, void *kvm)
> +{
> +	void (*fn)(struct vfio_group *, void *);
> +
> +	fn = symbol_get(vfio_group_set_usrdata);
> +	if (!fn)
> +		return;
> +
> +	fn(group, kvm);
> +	kvm_get_kvm(kvm);
> +
> +	symbol_put(vfio_group_set_usrdata);
> +}
> +
>  static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
>  {
>  	long (*fn)(struct vfio_group *, unsigned long);
> @@ -161,6 +175,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>  
>  		kvm_vfio_update_coherency(dev);
>  
> +		kvm_vfio_group_set_kvm(vfio_group, dev->kvm);
> +
>  		return 0;
>  
>  	case KVM_DEV_VFIO_GROUP_DEL:
> @@ -200,6 +216,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>  
>  		kvm_vfio_update_coherency(dev);
>  
> +		kvm_put_kvm(dev->kvm);
> +
>  		return ret;
>  	}

How does anyone get'ing the usrdata know what it contains?  Does the
vendor driver compare it to a pointer it found elsewhere?  How does the
vendor driver generate an error back to the user if this linkage is
necessary but unavailable?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-18 14:59                                   ` Alex Williamson
@ 2016-10-19  2:32                                     ` Jike Song
  2016-10-19  5:45                                       ` Xiao Guangrong
  2016-10-19 13:56                                         ` [Qemu-devel] " Eric Blake
  0 siblings, 2 replies; 58+ messages in thread
From: Jike Song @ 2016-10-19  2:32 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Neo Jia, kvm, guangrong.xiao, Xiao Guangrong,
	qemu-devel, Xiaoguang Chen, Kirti Wankhede, Paolo Bonzini

On 10/18/2016 10:59 PM, Alex Williamson wrote:
> On Tue, 18 Oct 2016 20:38:21 +0800
> Jike Song <jike.song@intel.com> wrote:
>> On 10/18/2016 12:02 AM, Alex Williamson wrote:
>>> On Fri, 14 Oct 2016 15:19:01 -0700
>>> Neo Jia <cjia@nvidia.com> wrote:
>>>   
>>>> On Fri, Oct 14, 2016 at 10:51:24AM -0600, Alex Williamson wrote:  
>>>>> On Fri, 14 Oct 2016 09:35:45 -0700
>>>>> Neo Jia <cjia@nvidia.com> wrote:
>>>>>     
>>>>>> On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote:    
>>>>>>> On Fri, 14 Oct 2016 08:41:58 -0600
>>>>>>> Alex Williamson <alex.williamson@redhat.com> wrote:
>>>>>>>       
>>>>>>>> On Fri, 14 Oct 2016 18:37:45 +0800
>>>>>>>> Jike Song <jike.song@intel.com> wrote:
>>>>>>>>       
>>>>>>>>> On 10/11/2016 05:47 PM, Paolo Bonzini wrote:        
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/10/2016 11:21, Xiao Guangrong wrote:          
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 10/11/2016 04:54 PM, Paolo Bonzini wrote:          
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/10/2016 04:39, Xiao Guangrong wrote:          
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote:          
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10/10/2016 20:01, Neo Jia wrote:          
>>>>>>>>>>>>>>>> Hi Neo,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT,
>>>>>>>>>>>>>>>> while nVidia does.          
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Paolo and Xiaoguang,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am just wondering how device driver can register a notifier so he
>>>>>>>>>>>>>>> can be
>>>>>>>>>>>>>>> notified for write-protected pages when writes are happening.          
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It can't yet, but the API is ready for that.  kvm_vfio_set_group is
>>>>>>>>>>>>>> currently where a struct kvm_device* and struct vfio_group* touch.
>>>>>>>>>>>>>> Given
>>>>>>>>>>>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to
>>>>>>>>>>>>>> kvm_page_track_register_notifier.  So I guess you could add a callback
>>>>>>>>>>>>>> that passes the struct kvm_device* to the mdev device.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Xiaoguang and Guangrong, what were your plans?  We discussed it briefly
>>>>>>>>>>>>>> at KVM Forum but I don't remember the details.          
>>>>>>>>>>>>>
>>>>>>>>>>>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can
>>>>>>>>>>>>> figure out the kvm instance based on the fd.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We got a new idea, how about search the kvm instance by mm_struct, it
>>>>>>>>>>>>> can work as KVMGT is running in the vcpu context and it is much more
>>>>>>>>>>>>> straightforward.          
>>>>>>>>>>>>
>>>>>>>>>>>> Perhaps I didn't understand your suggestion, but the same mm_struct can
>>>>>>>>>>>> have more than 1 struct kvm so I'm not sure that it can work.          
>>>>>>>>>>>
>>>>>>>>>>> vcpu->pid is valid during vcpu running so that it can be used to figure
>>>>>>>>>>> out which kvm instance owns the vcpu whose pid is the one as current
>>>>>>>>>>> thread, i think it can work. :)          
>>>>>>>>>>
>>>>>>>>>> No, don't do that.  There's no reason for a thread to run a single VCPU,
>>>>>>>>>> and if you can have multiple VCPUs you can also have multiple VCPUs from
>>>>>>>>>> multiple VMs.
>>>>>>>>>>
>>>>>>>>>> Passing file descriptors around are the right way to connect subsystems.          
>>>>>>>>>
>>>>>>>>> [CC Alex, Kevin and Qemu-devel]
>>>>>>>>>
>>>>>>>>> Hi Paolo & Alex,
>>>>>>>>>
>>>>>>>>> IIUC, passing file descriptors means touching QEMU and the UAPI between
>>>>>>>>> QEMU and VFIO. Would you guys have a look at below draft patch? If it's
>>>>>>>>> on the correct direction, I'll send the split ones. Thanks!
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Thanks,
>>>>>>>>> Jike
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>>>>>>>>> index bec694c..f715d37 100644
>>>>>>>>> --- a/hw/vfio/pci-quirks.c
>>>>>>>>> +++ b/hw/vfio/pci-quirks.c
>>>>>>>>> @@ -10,12 +10,14 @@
>>>>>>>>>   * the COPYING file in the top-level directory.
>>>>>>>>>   */
>>>>>>>>>  
>>>>>>>>> +#include <sys/ioctl.h>
>>>>>>>>>  #include "qemu/osdep.h"
>>>>>>>>>  #include "qemu/error-report.h"
>>>>>>>>>  #include "qemu/range.h"
>>>>>>>>>  #include "qapi/error.h"
>>>>>>>>>  #include "hw/nvram/fw_cfg.h"
>>>>>>>>>  #include "pci.h"
>>>>>>>>> +#include "sysemu/kvm.h"
>>>>>>>>>  #include "trace.h"
>>>>>>>>>  
>>>>>>>>>  /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
>>>>>>>>> @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
>>>>>>>>>          break;
>>>>>>>>>      }
>>>>>>>>>  }
>>>>>>>>> +
>>>>>>>>> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev)
>>>>>>>>> +{
>>>>>>>>> +    int vmfd;
>>>>>>>>> +
>>>>>>>>> +    if (!kvm_enabled() || !vdev->kvmgt)
>>>>>>>>> +        return;
>>>>>>>>> +
>>>>>>>>> +    /* Tell the device what KVM it attached */
>>>>>>>>> +    vmfd = kvm_get_vmfd(kvm_state);
>>>>>>>>> +    ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd);
>>>>>>>>> +}
>>>>>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>>>>>>> index a5a620a..8732552 100644
>>>>>>>>> --- a/hw/vfio/pci.c
>>>>>>>>> +++ b/hw/vfio/pci.c
>>>>>>>>> @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev)
>>>>>>>>>          return ret;
>>>>>>>>>      }
>>>>>>>>>  
>>>>>>>>> +    vfio_quirk_kvmgt(vdev);
>>>>>>>>> +
>>>>>>>>>      /* Get a copy of config space */
>>>>>>>>>      ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
>>>>>>>>>                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
>>>>>>>>> @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = {
>>>>>>>>>      DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
>>>>>>>>>                         sub_device_id, PCI_ANY_ID),
>>>>>>>>>      DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0),
>>>>>>>>> +    DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false),        
>>>>>>>>
>>>>>>>> Just a side note, device options are a headache, users are prone to get
>>>>>>>> them wrong and minimally it requires an entire round to get libvirt
>>>>>>>> support.  We should be able to detect from the device or vfio API
>>>>>>>> whether such a call is required.  Obviously if we can use the existing
>>>>>>>> kvm-vfio device, that's the better option anyway.  Thanks,      
>>>>>>>
>>>>>>> Also, vfio devices currently have no hard dependencies on KVM, if kvmgt
>>>>>>> does, it needs to produce a device failure when unavailable.  Thanks,      
>>>>>>
>>>>>> Also, I would like to see this as an generic feature instead of
>>>>>> kvmgt specific interface, so we don't have to add new options to QEMU and it is
>>>>>> up to the vendor driver to proceed with or without it.    
>>>>>
>>>>> In general this should be decided by lack of some required feature
>>>>> exclusively provided by KVM.  I would not want to add a generic opt-out
>>>>> for mdev vendor drivers to decide that they arbitrarily want to disable
>>>>> that path.  Thanks,    
>>>>
>>>> IIUC, you are suggesting that this path should be controlled by KVM feature cap
>>>> and it will be accessible to VFIO users when such checking is satisfied.  
>>>
>>> Maybe we're getting too loose with our pronouns here, I'm starting to
>>> lose track of what "this" is referring to.  I agree that there's no
>>> reason for the ioctl, as proposed to be kvmgt specific.  I would hope
>>> that going through the kvm-vfio device to create that linkage would
>>> eliminate that, but we'll need to see what Jike can come up with to
>>> plumb between KVM and vfio.  Vendor drivers can implement their own
>>> ioctls, now that we pass them through the mdev layer, but someone needs
>>> to call those ioctls.  Ideally we want something programmatic to
>>> trigger that, without requiring a user to pass an extra device
>>> parameter.  Additionally, if there is any hope of making use of the
>>> device with userspace drivers other than QEMU, hard dependencies on KVM
>>> should be avoided.  Thanks,
>>>
>>> Alex
>>>   
>>
>> Thanks for the advice, so I cooked another patch for your comments.
>> Basically a 'void *usrdata' is added to vfio_group, external users
>> can set it (kvm) or get it (kvm or other users like kvmgt).
>>
>> BTW, in device-model, the open method will return failure to vfio-mdev
>> in case that such kvm information is not available.
>>
>> --
>> Thanks,
>> Jike
>>
>>
>>
>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>> index d1d70e0..6b8d1d2 100644
>> --- a/drivers/vfio/vfio.c
>> +++ b/drivers/vfio/vfio.c
>> @@ -86,6 +86,7 @@ struct vfio_group {
>>  	struct mutex			unbound_lock;
>>  	atomic_t			opened;
>>  	bool				noiommu;
>> +	void				*usrdata;
>>  };
>>  
>>  struct vfio_device {
>> @@ -447,14 +448,13 @@ static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
>>  }
>>  
>>  static
>> -struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
>> +struct vfio_group *__vfio_group_get_from_iommu(struct iommu_group *iommu_group)
>>  {
>>  	struct vfio_group *group;
>>  
>>  	mutex_lock(&vfio.group_lock);
>>  	list_for_each_entry(group, &vfio.group_list, vfio_next) {
>>  		if (group->iommu_group == iommu_group) {
>> -			vfio_group_get(group);
> 
> This is wrong, we can't add our reference after we release the lock.
> 

Thanks for pointing it out :)

>>  			mutex_unlock(&vfio.group_lock);
>>  			return group;
>>  		}
>> @@ -464,6 +464,17 @@ struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
>>  	return NULL;
>>  }
>>  
>> +static
>> +struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
>> +{
>> +	struct vfio_group *group = __vfio_group_get_from_iommu(iommu_group);
>> +	if (!group)
>> +		return NULL;
>> +
>> +	vfio_group_get(group);
> 
> We have no basis to get a reference here.  This function cannot exist
> separate from the existing function above.
> 
>> +	return group;
>> +}
>> +
>>  static struct vfio_group *vfio_group_get_from_minor(int minor)
>>  {
>>  	struct vfio_group *group;
>> @@ -1728,6 +1739,31 @@ long vfio_external_check_extension(struct vfio_group *group, unsigned long arg)
>>  }
>>  EXPORT_SYMBOL_GPL(vfio_external_check_extension);
>>  
>> +void vfio_group_set_usrdata(struct vfio_group *group, void *data)
>> +{
>> +	group->usrdata = data;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>> +
>> +void *vfio_group_get_usrdata(struct vfio_group *group)
>> +{
>> +	return group->usrdata;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>> +
>> +void *vfio_group_get_usrdata_by_device(struct device *dev)
>> +{
>> +	struct vfio_group *vfio_group;
>> +
>> +	vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
> 
> We actually need to use iommu_group_get() here.  Kirti adds a
> vfio_group_get_from_dev() in v9 03/12 that does this properly.
> 
>> +	if (!vfio_group)
>> +		return NULL;
>> +
>> +	return vfio_group_get_usrdata(vfio_group);
> 
> This operates on a group for which we have no reference.

Great to know Kirti's work! BTW, this means user need to
call vfio_group_put_external_user afterwards, right?

>> +}
>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata_by_device);
>> +
>> +
>>  /**
>>   * Sub-module support
>>   */
>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>> index 0ecae0b..712588f 100644
>> --- a/include/linux/vfio.h
>> +++ b/include/linux/vfio.h
>> @@ -91,6 +91,10 @@ extern void vfio_unregister_iommu_driver(
>>  extern int vfio_external_user_iommu_id(struct vfio_group *group);
>>  extern long vfio_external_check_extension(struct vfio_group *group,
>>  					  unsigned long arg);
>> +extern void vfio_group_set_usrdata(struct vfio_group *group, void *data);
>> +extern void *vfio_group_get_usrdata(struct vfio_group *group);
>> +extern void *vfio_group_get_usrdata_by_device(struct device *dev);
>> +
>>  
>>  /*
>>   * Sub-module helpers
>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>> index 1dd087d..e00d401 100644
>> --- a/virt/kvm/vfio.c
>> +++ b/virt/kvm/vfio.c
>> @@ -60,6 +60,20 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
>>  	symbol_put(vfio_group_put_external_user);
>>  }
>>  
>> +static void kvm_vfio_group_set_kvm(struct vfio_group *group, void *kvm)
>> +{
>> +	void (*fn)(struct vfio_group *, void *);
>> +
>> +	fn = symbol_get(vfio_group_set_usrdata);
>> +	if (!fn)
>> +		return;
>> +
>> +	fn(group, kvm);
>> +	kvm_get_kvm(kvm);
>> +
>> +	symbol_put(vfio_group_set_usrdata);
>> +}
>> +
>>  static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
>>  {
>>  	long (*fn)(struct vfio_group *, unsigned long);
>> @@ -161,6 +175,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>>  
>>  		kvm_vfio_update_coherency(dev);
>>  
>> +		kvm_vfio_group_set_kvm(vfio_group, dev->kvm);
>> +
>>  		return 0;
>>  
>>  	case KVM_DEV_VFIO_GROUP_DEL:
>> @@ -200,6 +216,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>>  
>>  		kvm_vfio_update_coherency(dev);
>>  
>> +		kvm_put_kvm(dev->kvm);
>> +
>>  		return ret;
>>  	}
> 
> How does anyone get'ing the usrdata know what it contains?

Currently only the KVM instance. Maybe we can add other data along with
flags in the future?

> Does the
> vendor driver compare it to a pointer it found elsewhere?  How does the
> vendor driver generate an error back to the user if this linkage is
> necessary but unavailable?

For the data == kvm scenario, yes, I think it's only valid to use it
inside the kvm thread context, IIUC, comparing kvm->mm with current->mm
does the trick.  If not equal, in our case, the parent_ops->open()
will get an -ESRCH indicating that this mdev must be used along with KVM.


--
Thanks,
Jike


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-19  2:32                                     ` Jike Song
@ 2016-10-19  5:45                                       ` Xiao Guangrong
  2016-10-19 11:56                                           ` [Qemu-devel] " Paolo Bonzini
  2016-10-19 13:56                                         ` [Qemu-devel] " Eric Blake
  1 sibling, 1 reply; 58+ messages in thread
From: Xiao Guangrong @ 2016-10-19  5:45 UTC (permalink / raw)
  To: Jike Song, Alex Williamson
  Cc: Tian, Kevin, Neo Jia, kvm, Xiao Guangrong, qemu-devel,
	Xiaoguang Chen, Kirti Wankhede, Paolo Bonzini



On 10/19/2016 10:32 AM, Jike Song wrote:
+EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>>> +
>>> +void *vfio_group_get_usrdata(struct vfio_group *group)
>>> +{
>>> +	return group->usrdata;
>>> +}
>>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>>> +
>>> +void *vfio_group_get_usrdata_by_device(struct device *dev)
>>> +{
>>> +	struct vfio_group *vfio_group;
>>> +
>>> +	vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
>>
>> We actually need to use iommu_group_get() here.  Kirti adds a
>> vfio_group_get_from_dev() in v9 03/12 that does this properly.
>>
>>> +	if (!vfio_group)
>>> +		return NULL;
>>> +
>>> +	return vfio_group_get_usrdata(vfio_group);

I am worrying if the kvm instance got from group->usrdata is safe
enough? What happens if you get the instance after kvm released
kvm-vfio device?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-19  5:45                                       ` Xiao Guangrong
@ 2016-10-19 11:56                                           ` Paolo Bonzini
  0 siblings, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-19 11:56 UTC (permalink / raw)
  To: Xiao Guangrong, Jike Song, Alex Williamson
  Cc: Tian, Kevin, Xiao Guangrong, kvm, qemu-devel, Xiaoguang Chen,
	Kirti Wankhede, Neo Jia



On 19/10/2016 07:45, Xiao Guangrong wrote:
> 
> 
> On 10/19/2016 10:32 AM, Jike Song wrote:
> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>>>> +
>>>> +void *vfio_group_get_usrdata(struct vfio_group *group)
>>>> +{
>>>> +    return group->usrdata;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>>>> +
>>>> +void *vfio_group_get_usrdata_by_device(struct device *dev)
>>>> +{
>>>> +    struct vfio_group *vfio_group;
>>>> +
>>>> +    vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
>>>
>>> We actually need to use iommu_group_get() here.  Kirti adds a
>>> vfio_group_get_from_dev() in v9 03/12 that does this properly.
>>>
>>>> +    if (!vfio_group)
>>>> +        return NULL;
>>>> +
>>>> +    return vfio_group_get_usrdata(vfio_group);
> 
> I am worrying if the kvm instance got from group->usrdata is safe
> enough? What happens if you get the instance after kvm released
> kvm-vfio device?

It shouldn't happen if you use kvm_get_kvm and kvm_put_kvm properly.  It
is almost okay in the patch, just:

> @@ -200,6 +216,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>  
>  		kvm_vfio_update_coherency(dev);
>  
> +		kvm_put_kvm(dev->kvm);
> +
>  		return ret;
>  	}

... please add a new function kvm_vfio_group_clear_kvm(vfio_group) here,
that does vfio_group_set_usrdata(vfio_group, NULL) and kvm_put_kvm.
This should avoid use-after-free.

Paolo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-19 11:56                                           ` Paolo Bonzini
  0 siblings, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-19 11:56 UTC (permalink / raw)
  To: Xiao Guangrong, Jike Song, Alex Williamson
  Cc: Tian, Kevin, Neo Jia, kvm, Xiao Guangrong, qemu-devel,
	Xiaoguang Chen, Kirti Wankhede



On 19/10/2016 07:45, Xiao Guangrong wrote:
> 
> 
> On 10/19/2016 10:32 AM, Jike Song wrote:
> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>>>> +
>>>> +void *vfio_group_get_usrdata(struct vfio_group *group)
>>>> +{
>>>> +    return group->usrdata;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>>>> +
>>>> +void *vfio_group_get_usrdata_by_device(struct device *dev)
>>>> +{
>>>> +    struct vfio_group *vfio_group;
>>>> +
>>>> +    vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
>>>
>>> We actually need to use iommu_group_get() here.  Kirti adds a
>>> vfio_group_get_from_dev() in v9 03/12 that does this properly.
>>>
>>>> +    if (!vfio_group)
>>>> +        return NULL;
>>>> +
>>>> +    return vfio_group_get_usrdata(vfio_group);
> 
> I am worrying if the kvm instance got from group->usrdata is safe
> enough? What happens if you get the instance after kvm released
> kvm-vfio device?

It shouldn't happen if you use kvm_get_kvm and kvm_put_kvm properly.  It
is almost okay in the patch, just:

> @@ -200,6 +216,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>  
>  		kvm_vfio_update_coherency(dev);
>  
> +		kvm_put_kvm(dev->kvm);
> +
>  		return ret;
>  	}

... please add a new function kvm_vfio_group_clear_kvm(vfio_group) here,
that does vfio_group_set_usrdata(vfio_group, NULL) and kvm_put_kvm.
This should avoid use-after-free.

Paolo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-19 11:56                                           ` [Qemu-devel] " Paolo Bonzini
@ 2016-10-19 13:39                                             ` Xiao Guangrong
  -1 siblings, 0 replies; 58+ messages in thread
From: Xiao Guangrong @ 2016-10-19 13:39 UTC (permalink / raw)
  To: Paolo Bonzini, Xiao Guangrong, Jike Song, Alex Williamson
  Cc: Tian, Kevin, Neo Jia, kvm, qemu-devel, Xiaoguang Chen, Kirti Wankhede



On 10/19/2016 07:56 PM, Paolo Bonzini wrote:
>
>
> On 19/10/2016 07:45, Xiao Guangrong wrote:
>>
>>
>> On 10/19/2016 10:32 AM, Jike Song wrote:
>> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>>>>> +
>>>>> +void *vfio_group_get_usrdata(struct vfio_group *group)
>>>>> +{
>>>>> +    return group->usrdata;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>>>>> +
>>>>> +void *vfio_group_get_usrdata_by_device(struct device *dev)
>>>>> +{
>>>>> +    struct vfio_group *vfio_group;
>>>>> +
>>>>> +    vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
>>>>
>>>> We actually need to use iommu_group_get() here.  Kirti adds a
>>>> vfio_group_get_from_dev() in v9 03/12 that does this properly.
>>>>
>>>>> +    if (!vfio_group)
>>>>> +        return NULL;
>>>>> +
>>>>> +    return vfio_group_get_usrdata(vfio_group);
>>
>> I am worrying if the kvm instance got from group->usrdata is safe
>> enough? What happens if you get the instance after kvm released
>> kvm-vfio device?
>
> It shouldn't happen if you use kvm_get_kvm and kvm_put_kvm properly.  It
> is almost okay in the patch, just:
>

How about if KVM releases kvm-vfio device between vfio_group_get_usrdata()
and get_kvm()?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-19 13:39                                             ` Xiao Guangrong
  0 siblings, 0 replies; 58+ messages in thread
From: Xiao Guangrong @ 2016-10-19 13:39 UTC (permalink / raw)
  To: Paolo Bonzini, Xiao Guangrong, Jike Song, Alex Williamson
  Cc: Tian, Kevin, Neo Jia, kvm, qemu-devel, Xiaoguang Chen, Kirti Wankhede



On 10/19/2016 07:56 PM, Paolo Bonzini wrote:
>
>
> On 19/10/2016 07:45, Xiao Guangrong wrote:
>>
>>
>> On 10/19/2016 10:32 AM, Jike Song wrote:
>> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>>>>> +
>>>>> +void *vfio_group_get_usrdata(struct vfio_group *group)
>>>>> +{
>>>>> +    return group->usrdata;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>>>>> +
>>>>> +void *vfio_group_get_usrdata_by_device(struct device *dev)
>>>>> +{
>>>>> +    struct vfio_group *vfio_group;
>>>>> +
>>>>> +    vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
>>>>
>>>> We actually need to use iommu_group_get() here.  Kirti adds a
>>>> vfio_group_get_from_dev() in v9 03/12 that does this properly.
>>>>
>>>>> +    if (!vfio_group)
>>>>> +        return NULL;
>>>>> +
>>>>> +    return vfio_group_get_usrdata(vfio_group);
>>
>> I am worrying if the kvm instance got from group->usrdata is safe
>> enough? What happens if you get the instance after kvm released
>> kvm-vfio device?
>
> It shouldn't happen if you use kvm_get_kvm and kvm_put_kvm properly.  It
> is almost okay in the patch, just:
>

How about if KVM releases kvm-vfio device between vfio_group_get_usrdata()
and get_kvm()?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-19  2:32                                     ` Jike Song
@ 2016-10-19 13:56                                         ` Eric Blake
  2016-10-19 13:56                                         ` [Qemu-devel] " Eric Blake
  1 sibling, 0 replies; 58+ messages in thread
From: Eric Blake @ 2016-10-19 13:56 UTC (permalink / raw)
  To: Jike Song, Alex Williamson
  Cc: Tian, Kevin, Xiao Guangrong, kvm, guangrong.xiao, Neo Jia,
	qemu-devel, Xiaoguang Chen, Kirti Wankhede, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 925 bytes --]

[meta-comment]

On 10/18/2016 09:32 PM, Jike Song wrote:
> On 10/18/2016 10:59 PM, Alex Williamson wrote:
...
>>>>>>>>>>>>>>> On 10/10/2016 20:01, Neo Jia wrote:          
>>>>>>>>>>>>>>>>> Hi Neo,

17 levels of quoting is rather over-the-top.  It is OKAY (and in fact
DESIRABLE) to trim your emails to relevant portions, when posting to a
high-volume list.  Readers shouldn't have to scroll through pages of
deeply-nested quoting...

>>>  
>>>  	mutex_lock(&vfio.group_lock);
>>>  	list_for_each_entry(group, &vfio.group_list, vfio_next) {
>>>  		if (group->iommu_group == iommu_group) {
>>> -			vfio_group_get(group);
>>
>> This is wrong, we can't add our reference after we release the lock.
>>
> 
> Thanks for pointing it out :)
> 

...to get to the much smaller meat of the message.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-19 13:56                                         ` Eric Blake
  0 siblings, 0 replies; 58+ messages in thread
From: Eric Blake @ 2016-10-19 13:56 UTC (permalink / raw)
  To: Jike Song, Alex Williamson
  Cc: Tian, Kevin, Xiao Guangrong, kvm, guangrong.xiao, qemu-devel,
	Xiaoguang Chen, Kirti Wankhede, Paolo Bonzini, Neo Jia

[-- Attachment #1: Type: text/plain, Size: 925 bytes --]

[meta-comment]

On 10/18/2016 09:32 PM, Jike Song wrote:
> On 10/18/2016 10:59 PM, Alex Williamson wrote:
...
>>>>>>>>>>>>>>> On 10/10/2016 20:01, Neo Jia wrote:          
>>>>>>>>>>>>>>>>> Hi Neo,

17 levels of quoting is rather over-the-top.  It is OKAY (and in fact
DESIRABLE) to trim your emails to relevant portions, when posting to a
high-volume list.  Readers shouldn't have to scroll through pages of
deeply-nested quoting...

>>>  
>>>  	mutex_lock(&vfio.group_lock);
>>>  	list_for_each_entry(group, &vfio.group_list, vfio_next) {
>>>  		if (group->iommu_group == iommu_group) {
>>> -			vfio_group_get(group);
>>
>> This is wrong, we can't add our reference after we release the lock.
>>
> 
> Thanks for pointing it out :)
> 

...to get to the much smaller meat of the message.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-19 13:39                                             ` [Qemu-devel] " Xiao Guangrong
@ 2016-10-19 14:14                                               ` Paolo Bonzini
  -1 siblings, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-19 14:14 UTC (permalink / raw)
  To: Xiao Guangrong, Xiao Guangrong, Jike Song, Alex Williamson
  Cc: Tian, Kevin, Neo Jia, kvm, qemu-devel, Xiaoguang Chen, Kirti Wankhede



On 19/10/2016 15:39, Xiao Guangrong wrote:
> 
> 
> On 10/19/2016 07:56 PM, Paolo Bonzini wrote:
>>
>>
>> On 19/10/2016 07:45, Xiao Guangrong wrote:
>>>
>>>
>>> On 10/19/2016 10:32 AM, Jike Song wrote:
>>> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>>>>>> +
>>>>>> +void *vfio_group_get_usrdata(struct vfio_group *group)
>>>>>> +{
>>>>>> +    return group->usrdata;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>>>>>> +
>>>>>> +void *vfio_group_get_usrdata_by_device(struct device *dev)
>>>>>> +{
>>>>>> +    struct vfio_group *vfio_group;
>>>>>> +
>>>>>> +    vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
>>>>>
>>>>> We actually need to use iommu_group_get() here.  Kirti adds a
>>>>> vfio_group_get_from_dev() in v9 03/12 that does this properly.
>>>>>
>>>>>> +    if (!vfio_group)
>>>>>> +        return NULL;
>>>>>> +
>>>>>> +    return vfio_group_get_usrdata(vfio_group);
>>>
>>> I am worrying if the kvm instance got from group->usrdata is safe
>>> enough? What happens if you get the instance after kvm released
>>> kvm-vfio device?
>>
>> It shouldn't happen if you use kvm_get_kvm and kvm_put_kvm properly.  It
>> is almost okay in the patch, just:
> 
> How about if KVM releases kvm-vfio device between vfio_group_get_usrdata()
> and get_kvm()?

That cannot happen as long as there is a struct file* for the device
(see kvm_ioctl_create_device and kvm_device_release).  Since you're
sending a ioctl to it, it's fine.

Paolo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-19 14:14                                               ` Paolo Bonzini
  0 siblings, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-19 14:14 UTC (permalink / raw)
  To: Xiao Guangrong, Xiao Guangrong, Jike Song, Alex Williamson
  Cc: Tian, Kevin, Neo Jia, kvm, qemu-devel, Xiaoguang Chen, Kirti Wankhede



On 19/10/2016 15:39, Xiao Guangrong wrote:
> 
> 
> On 10/19/2016 07:56 PM, Paolo Bonzini wrote:
>>
>>
>> On 19/10/2016 07:45, Xiao Guangrong wrote:
>>>
>>>
>>> On 10/19/2016 10:32 AM, Jike Song wrote:
>>> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>>>>>> +
>>>>>> +void *vfio_group_get_usrdata(struct vfio_group *group)
>>>>>> +{
>>>>>> +    return group->usrdata;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>>>>>> +
>>>>>> +void *vfio_group_get_usrdata_by_device(struct device *dev)
>>>>>> +{
>>>>>> +    struct vfio_group *vfio_group;
>>>>>> +
>>>>>> +    vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
>>>>>
>>>>> We actually need to use iommu_group_get() here.  Kirti adds a
>>>>> vfio_group_get_from_dev() in v9 03/12 that does this properly.
>>>>>
>>>>>> +    if (!vfio_group)
>>>>>> +        return NULL;
>>>>>> +
>>>>>> +    return vfio_group_get_usrdata(vfio_group);
>>>
>>> I am worrying if the kvm instance got from group->usrdata is safe
>>> enough? What happens if you get the instance after kvm released
>>> kvm-vfio device?
>>
>> It shouldn't happen if you use kvm_get_kvm and kvm_put_kvm properly.  It
>> is almost okay in the patch, just:
> 
> How about if KVM releases kvm-vfio device between vfio_group_get_usrdata()
> and get_kvm()?

That cannot happen as long as there is a struct file* for the device
(see kvm_ioctl_create_device and kvm_device_release).  Since you're
sending a ioctl to it, it's fine.

Paolo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-19 14:14                                               ` [Qemu-devel] " Paolo Bonzini
  (?)
@ 2016-10-20  1:48                                               ` Xiao Guangrong
  2016-10-20 17:06                                                 ` Paolo Bonzini
  -1 siblings, 1 reply; 58+ messages in thread
From: Xiao Guangrong @ 2016-10-20  1:48 UTC (permalink / raw)
  To: Paolo Bonzini, Xiao Guangrong, Jike Song, Alex Williamson
  Cc: Tian, Kevin, Neo Jia, kvm, qemu-devel, Xiaoguang Chen, Kirti Wankhede



On 10/19/2016 10:14 PM, Paolo Bonzini wrote:
>
>
> On 19/10/2016 15:39, Xiao Guangrong wrote:
>>
>>
>> On 10/19/2016 07:56 PM, Paolo Bonzini wrote:
>>>
>>>
>>> On 19/10/2016 07:45, Xiao Guangrong wrote:
>>>>
>>>>
>>>> On 10/19/2016 10:32 AM, Jike Song wrote:
>>>> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>>>>>>> +
>>>>>>> +void *vfio_group_get_usrdata(struct vfio_group *group)
>>>>>>> +{
>>>>>>> +    return group->usrdata;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>>>>>>> +
>>>>>>> +void *vfio_group_get_usrdata_by_device(struct device *dev)
>>>>>>> +{
>>>>>>> +    struct vfio_group *vfio_group;
>>>>>>> +
>>>>>>> +    vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
>>>>>>
>>>>>> We actually need to use iommu_group_get() here.  Kirti adds a
>>>>>> vfio_group_get_from_dev() in v9 03/12 that does this properly.
>>>>>>
>>>>>>> +    if (!vfio_group)
>>>>>>> +        return NULL;
>>>>>>> +
>>>>>>> +    return vfio_group_get_usrdata(vfio_group);
>>>>
>>>> I am worrying if the kvm instance got from group->usrdata is safe
>>>> enough? What happens if you get the instance after kvm released
>>>> kvm-vfio device?
>>>
>>> It shouldn't happen if you use kvm_get_kvm and kvm_put_kvm properly.  It
>>> is almost okay in the patch, just:
>>
>> How about if KVM releases kvm-vfio device between vfio_group_get_usrdata()
>> and get_kvm()?
>
> That cannot happen as long as there is a struct file* for the device
> (see kvm_ioctl_create_device and kvm_device_release).  Since you're
> sending a ioctl to it, it's fine.

I understood that KVM side is safe, however, vfio side is independent with
kvm and the user of usrdata can fetch kvm struct at any time, consider
this scenario:

CPU 0                         CPU 1
KVM:                         VFIO/userdata user
   kvm_ioctl_create_device
      get_kvm()
                             vfio_group_get_usrdata(vfio_group)
   kvm_device_release
     put_kvm()
                             !!! kvm refcount has gone
                             use KVM struct

Then, the user of userdata have fetched kvm struct but the refcount has already gone.

What i missed?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-20  1:48                                               ` Xiao Guangrong
@ 2016-10-20 17:06                                                 ` Paolo Bonzini
  2016-10-20 17:19                                                     ` [Qemu-devel] " Xiao, Guangrong
  2016-10-26 13:44                                                     ` Jike Song
  0 siblings, 2 replies; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-20 17:06 UTC (permalink / raw)
  To: Xiao Guangrong, Xiao Guangrong, Jike Song, Alex Williamson
  Cc: Tian, Kevin, Neo Jia, kvm, qemu-devel, Xiaoguang Chen, Kirti Wankhede



On 20/10/2016 03:48, Xiao Guangrong wrote:
> 
> 
> On 10/19/2016 10:14 PM, Paolo Bonzini wrote:
>>
>>
>> On 19/10/2016 15:39, Xiao Guangrong wrote:
>>>
>>>
>>> On 10/19/2016 07:56 PM, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 19/10/2016 07:45, Xiao Guangrong wrote:
>>>>>
>>>>>
>>>>> On 10/19/2016 10:32 AM, Jike Song wrote:
>>>>> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>>>>>>>> +
>>>>>>>> +void *vfio_group_get_usrdata(struct vfio_group *group)
>>>>>>>> +{
>>>>>>>> +    return group->usrdata;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>>>>>>>> +
>>>>>>>> +void *vfio_group_get_usrdata_by_device(struct device *dev)
>>>>>>>> +{
>>>>>>>> +    struct vfio_group *vfio_group;
>>>>>>>> +
>>>>>>>> +    vfio_group = __vfio_group_get_from_iommu(dev->iommu_group);
>>>>>>>
>>>>>>> We actually need to use iommu_group_get() here.  Kirti adds a
>>>>>>> vfio_group_get_from_dev() in v9 03/12 that does this properly.
>>>>>>>
>>>>>>>> +    if (!vfio_group)
>>>>>>>> +        return NULL;
>>>>>>>> +
>>>>>>>> +    return vfio_group_get_usrdata(vfio_group);
>>>>>
>>>>> I am worrying if the kvm instance got from group->usrdata is safe
>>>>> enough? What happens if you get the instance after kvm released
>>>>> kvm-vfio device?
>>>>
>>>> It shouldn't happen if you use kvm_get_kvm and kvm_put_kvm
>>>> properly.  It
>>>> is almost okay in the patch, just:
>>>
>>> How about if KVM releases kvm-vfio device between
>>> vfio_group_get_usrdata()
>>> and get_kvm()?
>>
>> That cannot happen as long as there is a struct file* for the device
>> (see kvm_ioctl_create_device and kvm_device_release).  Since you're
>> sending a ioctl to it, it's fine.
> 
> I understood that KVM side is safe, however, vfio side is independent with
> kvm and the user of usrdata can fetch kvm struct at any time, consider
> this scenario:
> 
> CPU 0                         CPU 1
> KVM:                         VFIO/userdata user
>   kvm_ioctl_create_device
>      get_kvm()
>                             vfio_group_get_usrdata(vfio_group)
>   kvm_device_release
>     put_kvm()
>                             !!! kvm refcount has gone
>                             use KVM struct
> 
> Then, the user of userdata have fetched kvm struct but the refcount has
> already gone.

vfio_group_set_usrdata (actually) kvm_vfio_group_set_kvm has called
kvm_get_kvm too, however.  What you need is a mutex that is taken by
vfio_group_set_usrdata and by the callers of vfio_group_get_usrdata.

Paolo

> What i missed?
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-20 17:06                                                 ` Paolo Bonzini
@ 2016-10-20 17:19                                                     ` Xiao, Guangrong
  2016-10-26 13:44                                                     ` Jike Song
  1 sibling, 0 replies; 58+ messages in thread
From: Xiao, Guangrong @ 2016-10-20 17:19 UTC (permalink / raw)
  To: Paolo Bonzini, Xiao Guangrong, Song, Jike, Alex Williamson
  Cc: Tian, Kevin, Neo Jia, kvm, qemu-devel, Chen, Xiaoguang, Kirti Wankhede



-----Original Message-----
From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo Bonzini
Sent: Friday, October 21, 2016 1:07 AM
To: Xiao Guangrong <guangrong.xiao@linux.intel.com>; Xiao, Guangrong <guangrong.xiao@intel.com>; Song, Jike <jike.song@intel.com>; Alex Williamson <alex.williamson@redhat.com>
Cc: Tian, Kevin <kevin.tian@intel.com>; Neo Jia <cjia@nvidia.com>; kvm@vger.kernel.org; qemu-devel <qemu-devel@nongnu.org>; Chen, Xiaoguang <xiaoguang.chen@intel.com>; Kirti Wankhede <kwankhede@nvidia.com>
Subject: Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot



On 20/10/2016 03:48, Xiao Guangrong wrote:
> 
> 
> On 10/19/2016 10:14 PM, Paolo Bonzini wrote:
>>
>>
>> On 19/10/2016 15:39, Xiao Guangrong wrote:
>>>
>>>
>>> On 10/19/2016 07:56 PM, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 19/10/2016 07:45, Xiao Guangrong wrote:
>>>>>
>>>>>
>>>>> On 10/19/2016 10:32 AM, Jike Song wrote:
>>>>> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>>>>>>>> +
>>>>>>>> +void *vfio_group_get_usrdata(struct vfio_group *group) {
>>>>>>>> +    return group->usrdata;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>>>>>>>> +
>>>>>>>> +void *vfio_group_get_usrdata_by_device(struct device *dev) {
>>>>>>>> +    struct vfio_group *vfio_group;
>>>>>>>> +
>>>>>>>> +    vfio_group = 
>>>>>>>> + __vfio_group_get_from_iommu(dev->iommu_group);
>>>>>>>
>>>>>>> We actually need to use iommu_group_get() here.  Kirti adds a
>>>>>>> vfio_group_get_from_dev() in v9 03/12 that does this properly.
>>>>>>>
>>>>>>>> +    if (!vfio_group)
>>>>>>>> +        return NULL;
>>>>>>>> +
>>>>>>>> +    return vfio_group_get_usrdata(vfio_group);
>>>>>
>>>>> I am worrying if the kvm instance got from group->usrdata is safe 
>>>>> enough? What happens if you get the instance after kvm released 
>>>>> kvm-vfio device?
>>>>
>>>> It shouldn't happen if you use kvm_get_kvm and kvm_put_kvm 
>>>> properly.  It is almost okay in the patch, just:
>>>
>>> How about if KVM releases kvm-vfio device between
>>> vfio_group_get_usrdata()
>>> and get_kvm()?
>>
>> That cannot happen as long as there is a struct file* for the device 
>> (see kvm_ioctl_create_device and kvm_device_release).  Since you're 
>> sending a ioctl to it, it's fine.
> 
> I understood that KVM side is safe, however, vfio side is independent 
> with kvm and the user of usrdata can fetch kvm struct at any time, 
> consider this scenario:
> 
> CPU 0                         CPU 1
> KVM:                         VFIO/userdata user
>   kvm_ioctl_create_device
>      get_kvm()
>                             vfio_group_get_usrdata(vfio_group)
>   kvm_device_release
>     put_kvm()
>                             !!! kvm refcount has gone
>                             use KVM struct
> 
> Then, the user of userdata have fetched kvm struct but the refcount 
> has already gone.

vfio_group_set_usrdata (actually) kvm_vfio_group_set_kvm has called kvm_get_kvm too, however.  What you need is a mutex that is taken by vfio_group_set_usrdata and by the callers of vfio_group_get_usrdata.

Yes, mutex can fix it and is good to me. :)


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-20 17:19                                                     ` Xiao, Guangrong
  0 siblings, 0 replies; 58+ messages in thread
From: Xiao, Guangrong @ 2016-10-20 17:19 UTC (permalink / raw)
  To: Paolo Bonzini, Xiao Guangrong, Song, Jike, Alex Williamson
  Cc: Tian, Kevin, Neo Jia, kvm, qemu-devel, Chen, Xiaoguang, Kirti Wankhede



-----Original Message-----
From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo Bonzini
Sent: Friday, October 21, 2016 1:07 AM
To: Xiao Guangrong <guangrong.xiao@linux.intel.com>; Xiao, Guangrong <guangrong.xiao@intel.com>; Song, Jike <jike.song@intel.com>; Alex Williamson <alex.williamson@redhat.com>
Cc: Tian, Kevin <kevin.tian@intel.com>; Neo Jia <cjia@nvidia.com>; kvm@vger.kernel.org; qemu-devel <qemu-devel@nongnu.org>; Chen, Xiaoguang <xiaoguang.chen@intel.com>; Kirti Wankhede <kwankhede@nvidia.com>
Subject: Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot



On 20/10/2016 03:48, Xiao Guangrong wrote:
> 
> 
> On 10/19/2016 10:14 PM, Paolo Bonzini wrote:
>>
>>
>> On 19/10/2016 15:39, Xiao Guangrong wrote:
>>>
>>>
>>> On 10/19/2016 07:56 PM, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 19/10/2016 07:45, Xiao Guangrong wrote:
>>>>>
>>>>>
>>>>> On 10/19/2016 10:32 AM, Jike Song wrote:
>>>>> +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata);
>>>>>>>> +
>>>>>>>> +void *vfio_group_get_usrdata(struct vfio_group *group) {
>>>>>>>> +    return group->usrdata;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata);
>>>>>>>> +
>>>>>>>> +void *vfio_group_get_usrdata_by_device(struct device *dev) {
>>>>>>>> +    struct vfio_group *vfio_group;
>>>>>>>> +
>>>>>>>> +    vfio_group = 
>>>>>>>> + __vfio_group_get_from_iommu(dev->iommu_group);
>>>>>>>
>>>>>>> We actually need to use iommu_group_get() here.  Kirti adds a
>>>>>>> vfio_group_get_from_dev() in v9 03/12 that does this properly.
>>>>>>>
>>>>>>>> +    if (!vfio_group)
>>>>>>>> +        return NULL;
>>>>>>>> +
>>>>>>>> +    return vfio_group_get_usrdata(vfio_group);
>>>>>
>>>>> I am worrying if the kvm instance got from group->usrdata is safe 
>>>>> enough? What happens if you get the instance after kvm released 
>>>>> kvm-vfio device?
>>>>
>>>> It shouldn't happen if you use kvm_get_kvm and kvm_put_kvm 
>>>> properly.  It is almost okay in the patch, just:
>>>
>>> How about if KVM releases kvm-vfio device between
>>> vfio_group_get_usrdata()
>>> and get_kvm()?
>>
>> That cannot happen as long as there is a struct file* for the device 
>> (see kvm_ioctl_create_device and kvm_device_release).  Since you're 
>> sending a ioctl to it, it's fine.
> 
> I understood that KVM side is safe, however, vfio side is independent 
> with kvm and the user of usrdata can fetch kvm struct at any time, 
> consider this scenario:
> 
> CPU 0                         CPU 1
> KVM:                         VFIO/userdata user
>   kvm_ioctl_create_device
>      get_kvm()
>                             vfio_group_get_usrdata(vfio_group)
>   kvm_device_release
>     put_kvm()
>                             !!! kvm refcount has gone
>                             use KVM struct
> 
> Then, the user of userdata have fetched kvm struct but the refcount 
> has already gone.

vfio_group_set_usrdata (actually) kvm_vfio_group_set_kvm has called kvm_get_kvm too, however.  What you need is a mutex that is taken by vfio_group_set_usrdata and by the callers of vfio_group_get_usrdata.

Yes, mutex can fix it and is good to me. :)


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-20 17:19                                                     ` [Qemu-devel] " Xiao, Guangrong
@ 2016-10-21  2:47                                                       ` Jike Song
  -1 siblings, 0 replies; 58+ messages in thread
From: Jike Song @ 2016-10-21  2:47 UTC (permalink / raw)
  To: Xiao, Guangrong
  Cc: Paolo Bonzini, Xiao Guangrong, Alex Williamson, Tian, Kevin,
	Neo Jia, kvm, qemu-devel, Chen, Xiaoguang, Kirti Wankhede

On 10/21/2016 01:19 AM, Xiao, Guangrong wrote:
>> On 10/19/2016 10:14 PM, Paolo Bonzini wrote:
>>> On 19/10/2016 15:39, Xiao Guangrong wrote:
>>>
>>>
>>> I understood that KVM side is safe, however, vfio side is independent 
>>> with kvm and the user of usrdata can fetch kvm struct at any time, 
>>> consider this scenario:
>>>
>>> CPU 0                         CPU 1
>>> KVM:                         VFIO/userdata user
>>>   kvm_ioctl_create_device
>>>      get_kvm()
>>>                             vfio_group_get_usrdata(vfio_group)
>>>   kvm_device_release
>>>     put_kvm()
>>>                             !!! kvm refcount has gone
>>>                             use KVM struct
>>>
>>> Then, the user of userdata have fetched kvm struct but the refcount 
>>> has already gone.
>> 
>> vfio_group_set_usrdata (actually) kvm_vfio_group_set_kvm has called
>>kvm_get_kvm too, however.  What you need is a mutex that is taken by
>>vfio_group_set_usrdata and by the callers of vfio_group_get_usrdata.
> 
> Yes, mutex can fix it and is good to me. :)

Thanks everyone, I'll cook another patch according to your guidance.

--
Thanks,
Jike


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-21  2:47                                                       ` Jike Song
  0 siblings, 0 replies; 58+ messages in thread
From: Jike Song @ 2016-10-21  2:47 UTC (permalink / raw)
  To: Xiao, Guangrong
  Cc: Paolo Bonzini, Xiao Guangrong, Alex Williamson, Tian, Kevin,
	Neo Jia, kvm, qemu-devel, Chen, Xiaoguang, Kirti Wankhede

On 10/21/2016 01:19 AM, Xiao, Guangrong wrote:
>> On 10/19/2016 10:14 PM, Paolo Bonzini wrote:
>>> On 19/10/2016 15:39, Xiao Guangrong wrote:
>>>
>>>
>>> I understood that KVM side is safe, however, vfio side is independent 
>>> with kvm and the user of usrdata can fetch kvm struct at any time, 
>>> consider this scenario:
>>>
>>> CPU 0                         CPU 1
>>> KVM:                         VFIO/userdata user
>>>   kvm_ioctl_create_device
>>>      get_kvm()
>>>                             vfio_group_get_usrdata(vfio_group)
>>>   kvm_device_release
>>>     put_kvm()
>>>                             !!! kvm refcount has gone
>>>                             use KVM struct
>>>
>>> Then, the user of userdata have fetched kvm struct but the refcount 
>>> has already gone.
>> 
>> vfio_group_set_usrdata (actually) kvm_vfio_group_set_kvm has called
>>kvm_get_kvm too, however.  What you need is a mutex that is taken by
>>vfio_group_set_usrdata and by the callers of vfio_group_get_usrdata.
> 
> Yes, mutex can fix it and is good to me. :)

Thanks everyone, I'll cook another patch according to your guidance.

--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-19 13:56                                         ` [Qemu-devel] " Eric Blake
  (?)
@ 2016-10-24  6:32                                         ` Jike Song
  -1 siblings, 0 replies; 58+ messages in thread
From: Jike Song @ 2016-10-24  6:32 UTC (permalink / raw)
  To: Eric Blake
  Cc: Alex Williamson, Tian, Kevin, Xiao Guangrong, kvm,
	guangrong.xiao, qemu-devel, Xiaoguang Chen, Kirti Wankhede,
	Paolo Bonzini, Neo Jia

On 10/19/2016 09:56 PM, Eric Blake wrote:
> 17 levels of quoting is rather over-the-top.  It is OKAY (and in fact
> DESIRABLE) to trim your emails to relevant portions, when posting to a
> high-volume list.  Readers shouldn't have to scroll through pages of
> deeply-nested quoting...

Hi Eric,

Sorry for that, will trim the quotation next time. Thanks for
reminding!

--
Thanks,
Jike


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-20 17:06                                                 ` Paolo Bonzini
@ 2016-10-26 13:44                                                     ` Jike Song
  2016-10-26 13:44                                                     ` Jike Song
  1 sibling, 0 replies; 58+ messages in thread
From: Jike Song @ 2016-10-26 13:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, Xiao Guangrong, Alex Williamson, Tian, Kevin,
	Neo Jia, kvm, qemu-devel, Xiaoguang Chen, Kirti Wankhede

On 10/21/2016 01:06 AM, Paolo Bonzini wrote:
> On 20/10/2016 03:48, Xiao Guangrong wrote:
>> I understood that KVM side is safe, however, vfio side is independent with
>> kvm and the user of usrdata can fetch kvm struct at any time, consider
>> this scenario:
>>
>> CPU 0                         CPU 1
>> KVM:                         VFIO/userdata user
>>   kvm_ioctl_create_device
>>      get_kvm()
>>                             vfio_group_get_usrdata(vfio_group)
>>   kvm_device_release
>>     put_kvm()
>>                             !!! kvm refcount has gone
>>                             use KVM struct
>>
>> Then, the user of userdata have fetched kvm struct but the refcount has
>> already gone.
> 
> vfio_group_set_usrdata (actually) kvm_vfio_group_set_kvm has called
> kvm_get_kvm too, however.  What you need is a mutex that is taken by
> vfio_group_set_usrdata and by the callers of vfio_group_get_usrdata.

Hi Paolo & Guangrong,

I walked the whole thread and became a little nervous: I don't want
to introduce a global mutex.

The problem is, as I understand, vfio_group_get_usrdata() returns a
KVM pointer but it may be stale. To make the pointer always valid,
it can call kvm_get_kvm() *before* return the pointer.

I would apologize in advance if this idea turns out totally
nonsense, but hey, please kindly help fix my whim :-)


[vfio.h]

	struct vfio_usrdata {
		void *data;
		void (*get)(void *data);
		void (*put)(void *data)
	};

	vfio_group {
		...
		vfio_usrdata *usrdata;

[kvm.ko]

	struvt vfio_usrdata kvmdata = {
		.data = kvm,
		.get = kvm_get_kvm,
		.put = kvm_put_kvm,
	};

	fn = symbol_get(vfio_group_set_usrdata)
	fn(vfio_group, &kvmdata)


[vfio.ko]

	vfio_group_set_usrdata
		lock
		vfio_group->d = kvmdata
		unlock

	void *vfio_group_get_usrdata
		lock
		struct vfio_usrdata *d = vfio_group->usrdata;
		d->get(d->data);
		unlock
		return d->data;

	void vfio_group_put_usrdata
		lock
		struct vfio_usrdata *d = vfio_group->usrdata;
		d->put(d->data)
		unlock

[kvmgt.ko]

	call vfio_group_get_usrdata to get kvm,
	call vfio_group_put_usrdata to release it
	*never* call kvm_get_kvm/kvm_put_kvm

--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-26 13:44                                                     ` Jike Song
  0 siblings, 0 replies; 58+ messages in thread
From: Jike Song @ 2016-10-26 13:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, Xiao Guangrong, Alex Williamson, Tian, Kevin,
	Neo Jia, kvm, qemu-devel, Xiaoguang Chen, Kirti Wankhede

On 10/21/2016 01:06 AM, Paolo Bonzini wrote:
> On 20/10/2016 03:48, Xiao Guangrong wrote:
>> I understood that KVM side is safe, however, vfio side is independent with
>> kvm and the user of usrdata can fetch kvm struct at any time, consider
>> this scenario:
>>
>> CPU 0                         CPU 1
>> KVM:                         VFIO/userdata user
>>   kvm_ioctl_create_device
>>      get_kvm()
>>                             vfio_group_get_usrdata(vfio_group)
>>   kvm_device_release
>>     put_kvm()
>>                             !!! kvm refcount has gone
>>                             use KVM struct
>>
>> Then, the user of userdata have fetched kvm struct but the refcount has
>> already gone.
> 
> vfio_group_set_usrdata (actually) kvm_vfio_group_set_kvm has called
> kvm_get_kvm too, however.  What you need is a mutex that is taken by
> vfio_group_set_usrdata and by the callers of vfio_group_get_usrdata.

Hi Paolo & Guangrong,

I walked the whole thread and became a little nervous: I don't want
to introduce a global mutex.

The problem is, as I understand, vfio_group_get_usrdata() returns a
KVM pointer but it may be stale. To make the pointer always valid,
it can call kvm_get_kvm() *before* return the pointer.

I would apologize in advance if this idea turns out totally
nonsense, but hey, please kindly help fix my whim :-)


[vfio.h]

	struct vfio_usrdata {
		void *data;
		void (*get)(void *data);
		void (*put)(void *data)
	};

	vfio_group {
		...
		vfio_usrdata *usrdata;

[kvm.ko]

	struvt vfio_usrdata kvmdata = {
		.data = kvm,
		.get = kvm_get_kvm,
		.put = kvm_put_kvm,
	};

	fn = symbol_get(vfio_group_set_usrdata)
	fn(vfio_group, &kvmdata)


[vfio.ko]

	vfio_group_set_usrdata
		lock
		vfio_group->d = kvmdata
		unlock

	void *vfio_group_get_usrdata
		lock
		struct vfio_usrdata *d = vfio_group->usrdata;
		d->get(d->data);
		unlock
		return d->data;

	void vfio_group_put_usrdata
		lock
		struct vfio_usrdata *d = vfio_group->usrdata;
		d->put(d->data)
		unlock

[kvmgt.ko]

	call vfio_group_get_usrdata to get kvm,
	call vfio_group_put_usrdata to release it
	*never* call kvm_get_kvm/kvm_put_kvm

--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-26 13:44                                                     ` Jike Song
  (?)
@ 2016-10-26 14:45                                                     ` Paolo Bonzini
  2016-10-29  4:07                                                         ` Jike Song
  -1 siblings, 1 reply; 58+ messages in thread
From: Paolo Bonzini @ 2016-10-26 14:45 UTC (permalink / raw)
  To: Jike Song
  Cc: Xiao Guangrong, Xiao Guangrong, Alex Williamson, Tian, Kevin,
	Neo Jia, kvm, qemu-devel, Xiaoguang Chen, Kirti Wankhede



On 26/10/2016 15:44, Jike Song wrote:
> On 10/21/2016 01:06 AM, Paolo Bonzini wrote:
>> On 20/10/2016 03:48, Xiao Guangrong wrote:
>>> I understood that KVM side is safe, however, vfio side is independent with
>>> kvm and the user of usrdata can fetch kvm struct at any time, consider
>>> this scenario:
>>>
>>> CPU 0                         CPU 1
>>> KVM:                         VFIO/userdata user
>>>   kvm_ioctl_create_device
>>>      get_kvm()
>>>                             vfio_group_get_usrdata(vfio_group)
>>>   kvm_device_release
>>>     put_kvm()
>>>                             !!! kvm refcount has gone
>>>                             use KVM struct
>>>
>>> Then, the user of userdata have fetched kvm struct but the refcount has
>>> already gone.
>>
>> vfio_group_set_usrdata (actually) kvm_vfio_group_set_kvm has called
>> kvm_get_kvm too, however.  What you need is a mutex that is taken by
>> vfio_group_set_usrdata and by the callers of vfio_group_get_usrdata.
> 
> Hi Paolo & Guangrong,
> 
> I walked the whole thread and became a little nervous: I don't want
> to introduce a global mutex.
> 
> The problem is, as I understand, vfio_group_get_usrdata() returns a
> KVM pointer but it may be stale. To make the pointer always valid,
> it can call kvm_get_kvm() *before* return the pointer.

That doesn't work, you still have to protect get against concurrent set.
 But the mutex need not be global, it is specific to the vfio device.
You probably have such a mutex anyway...

Paolo

> I would apologize in advance if this idea turns out totally
> nonsense, but hey, please kindly help fix my whim :-)
> 
> 
> [vfio.h]
> 
> 	struct vfio_usrdata {
> 		void *data;
> 		void (*get)(void *data);
> 		void (*put)(void *data)
> 	};
> 
> 	vfio_group {
> 		...
> 		vfio_usrdata *usrdata;
> 
> [kvm.ko]
> 
> 	struvt vfio_usrdata kvmdata = {
> 		.data = kvm,
> 		.get = kvm_get_kvm,
> 		.put = kvm_put_kvm,
> 	};
> 
> 	fn = symbol_get(vfio_group_set_usrdata)
> 	fn(vfio_group, &kvmdata)
> 
> 
> [vfio.ko]
> 
> 	vfio_group_set_usrdata
> 		lock
> 		vfio_group->d = kvmdata
> 		unlock
> 
> 	void *vfio_group_get_usrdata
> 		lock
> 		struct vfio_usrdata *d = vfio_group->usrdata;
> 		d->get(d->data);
> 		unlock
> 		return d->data;
> 
> 	void vfio_group_put_usrdata
> 		lock
> 		struct vfio_usrdata *d = vfio_group->usrdata;
> 		d->put(d->data)
> 		unlock
> 
> [kvmgt.ko]
> 
> 	call vfio_group_get_usrdata to get kvm,
> 	call vfio_group_put_usrdata to release it
> 	*never* call kvm_get_kvm/kvm_put_kvm
> 
> --
> Thanks,
> Jike
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
  2016-10-26 14:45                                                     ` Paolo Bonzini
@ 2016-10-29  4:07                                                         ` Jike Song
  0 siblings, 0 replies; 58+ messages in thread
From: Jike Song @ 2016-10-29  4:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, Xiao Guangrong, Alex Williamson, Tian, Kevin,
	Neo Jia, kvm, qemu-devel, Xiaoguang Chen, Kirti Wankhede

On 10/26/2016 10:45 PM, Paolo Bonzini wrote:
> On 26/10/2016 15:44, Jike Song wrote:
>> On 10/21/2016 01:06 AM, Paolo Bonzini wrote:
>>> On 20/10/2016 03:48, Xiao Guangrong wrote:
>>>> I understood that KVM side is safe, however, vfio side is independent with
>>>> kvm and the user of usrdata can fetch kvm struct at any time, consider
>>>> this scenario:
>>>>
>>>> CPU 0                         CPU 1
>>>> KVM:                         VFIO/userdata user
>>>>   kvm_ioctl_create_device
>>>>      get_kvm()
>>>>                             vfio_group_get_usrdata(vfio_group)
>>>>   kvm_device_release
>>>>     put_kvm()
>>>>                             !!! kvm refcount has gone
>>>>                             use KVM struct
>>>>
>>>> Then, the user of userdata have fetched kvm struct but the refcount has
>>>> already gone.
>>>
>>> vfio_group_set_usrdata (actually) kvm_vfio_group_set_kvm has called
>>> kvm_get_kvm too, however.  What you need is a mutex that is taken by
>>> vfio_group_set_usrdata and by the callers of vfio_group_get_usrdata.
>>
>> Hi Paolo & Guangrong,
>>
>> I walked the whole thread and became a little nervous: I don't want
>> to introduce a global mutex.
>>
>> The problem is, as I understand, vfio_group_get_usrdata() returns a
>> KVM pointer but it may be stale. To make the pointer always valid,
>> it can call kvm_get_kvm() *before* return the pointer.
> 
> That doesn't work, you still have to protect get against concurrent set.
>  But the mutex need not be global, it is specific to the vfio device.
> You probably have such a mutex anyway...

Thanks Paolo, I agree whatsoever a mutex is necessary. I cooked a patch
sent to you and Alex, please kindly have a look :-)

--
Thanks,
Jike

>> I would apologize in advance if this idea turns out totally
>> nonsense, but hey, please kindly help fix my whim :-)
>>
>>
>> [vfio.h]
>>
>> 	struct vfio_usrdata {
>> 		void *data;
>> 		void (*get)(void *data);
>> 		void (*put)(void *data)
>> 	};
>>
>> 	vfio_group {
>> 		...
>> 		vfio_usrdata *usrdata;
>>
>> [kvm.ko]
>>
>> 	struvt vfio_usrdata kvmdata = {
>> 		.data = kvm,
>> 		.get = kvm_get_kvm,
>> 		.put = kvm_put_kvm,
>> 	};
>>
>> 	fn = symbol_get(vfio_group_set_usrdata)
>> 	fn(vfio_group, &kvmdata)
>>
>>
>> [vfio.ko]
>>
>> 	vfio_group_set_usrdata
>> 		lock
>> 		vfio_group->d = kvmdata
>> 		unlock
>>
>> 	void *vfio_group_get_usrdata
>> 		lock
>> 		struct vfio_usrdata *d = vfio_group->usrdata;
>> 		d->get(d->data);
>> 		unlock
>> 		return d->data;
>>
>> 	void vfio_group_put_usrdata
>> 		lock
>> 		struct vfio_usrdata *d = vfio_group->usrdata;
>> 		d->put(d->data)
>> 		unlock
>>
>> [kvmgt.ko]
>>
>> 	call vfio_group_get_usrdata to get kvm,
>> 	call vfio_group_put_usrdata to release it
>> 	*never* call kvm_get_kvm/kvm_put_kvm

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot
@ 2016-10-29  4:07                                                         ` Jike Song
  0 siblings, 0 replies; 58+ messages in thread
From: Jike Song @ 2016-10-29  4:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, Xiao Guangrong, Alex Williamson, Tian, Kevin,
	Neo Jia, kvm, qemu-devel, Xiaoguang Chen, Kirti Wankhede

On 10/26/2016 10:45 PM, Paolo Bonzini wrote:
> On 26/10/2016 15:44, Jike Song wrote:
>> On 10/21/2016 01:06 AM, Paolo Bonzini wrote:
>>> On 20/10/2016 03:48, Xiao Guangrong wrote:
>>>> I understood that KVM side is safe, however, vfio side is independent with
>>>> kvm and the user of usrdata can fetch kvm struct at any time, consider
>>>> this scenario:
>>>>
>>>> CPU 0                         CPU 1
>>>> KVM:                         VFIO/userdata user
>>>>   kvm_ioctl_create_device
>>>>      get_kvm()
>>>>                             vfio_group_get_usrdata(vfio_group)
>>>>   kvm_device_release
>>>>     put_kvm()
>>>>                             !!! kvm refcount has gone
>>>>                             use KVM struct
>>>>
>>>> Then, the user of userdata have fetched kvm struct but the refcount has
>>>> already gone.
>>>
>>> vfio_group_set_usrdata (actually) kvm_vfio_group_set_kvm has called
>>> kvm_get_kvm too, however.  What you need is a mutex that is taken by
>>> vfio_group_set_usrdata and by the callers of vfio_group_get_usrdata.
>>
>> Hi Paolo & Guangrong,
>>
>> I walked the whole thread and became a little nervous: I don't want
>> to introduce a global mutex.
>>
>> The problem is, as I understand, vfio_group_get_usrdata() returns a
>> KVM pointer but it may be stale. To make the pointer always valid,
>> it can call kvm_get_kvm() *before* return the pointer.
> 
> That doesn't work, you still have to protect get against concurrent set.
>  But the mutex need not be global, it is specific to the vfio device.
> You probably have such a mutex anyway...

Thanks Paolo, I agree whatsoever a mutex is necessary. I cooked a patch
sent to you and Alex, please kindly have a look :-)

--
Thanks,
Jike

>> I would apologize in advance if this idea turns out totally
>> nonsense, but hey, please kindly help fix my whim :-)
>>
>>
>> [vfio.h]
>>
>> 	struct vfio_usrdata {
>> 		void *data;
>> 		void (*get)(void *data);
>> 		void (*put)(void *data)
>> 	};
>>
>> 	vfio_group {
>> 		...
>> 		vfio_usrdata *usrdata;
>>
>> [kvm.ko]
>>
>> 	struvt vfio_usrdata kvmdata = {
>> 		.data = kvm,
>> 		.get = kvm_get_kvm,
>> 		.put = kvm_put_kvm,
>> 	};
>>
>> 	fn = symbol_get(vfio_group_set_usrdata)
>> 	fn(vfio_group, &kvmdata)
>>
>>
>> [vfio.ko]
>>
>> 	vfio_group_set_usrdata
>> 		lock
>> 		vfio_group->d = kvmdata
>> 		unlock
>>
>> 	void *vfio_group_get_usrdata
>> 		lock
>> 		struct vfio_usrdata *d = vfio_group->usrdata;
>> 		d->get(d->data);
>> 		unlock
>> 		return d->data;
>>
>> 	void vfio_group_put_usrdata
>> 		lock
>> 		struct vfio_usrdata *d = vfio_group->usrdata;
>> 		d->put(d->data)
>> 		unlock
>>
>> [kvmgt.ko]
>>
>> 	call vfio_group_get_usrdata to get kvm,
>> 	call vfio_group_put_usrdata to release it
>> 	*never* call kvm_get_kvm/kvm_put_kvm

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2016-10-29  4:10 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-09  7:41 [PATCH 0/2] page track add notifier type track_flush_slot Xiaoguang Chen
2016-10-09  7:41 ` [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot Xiaoguang Chen
2016-10-09  8:31   ` Neo Jia
2016-10-09  8:56     ` Chen, Xiaoguang
2016-10-10 17:06     ` Paolo Bonzini
2016-10-10 18:01       ` Neo Jia
2016-10-10 18:32         ` Paolo Bonzini
2016-10-11  2:39           ` Xiao Guangrong
2016-10-11  8:54             ` Paolo Bonzini
2016-10-11  9:21               ` Xiao Guangrong
2016-10-11  9:47                 ` Paolo Bonzini
2016-10-14 10:37                   ` Jike Song
2016-10-14 10:37                     ` [Qemu-devel] " Jike Song
2016-10-14 10:43                     ` Paolo Bonzini
2016-10-14 10:43                       ` [Qemu-devel] " Paolo Bonzini
2016-10-14 12:26                       ` Jike Song
2016-10-14 12:26                         ` [Qemu-devel] " Jike Song
2016-10-14 14:41                     ` Alex Williamson
2016-10-14 14:46                       ` Alex Williamson
2016-10-14 14:46                         ` [Qemu-devel] " Alex Williamson
2016-10-14 16:35                         ` Neo Jia
2016-10-14 16:35                           ` Neo Jia
2016-10-14 16:51                           ` Alex Williamson
2016-10-14 16:51                             ` Alex Williamson
2016-10-14 22:19                             ` Neo Jia
2016-10-14 22:19                               ` Neo Jia
2016-10-17 16:02                               ` Alex Williamson
2016-10-17 16:02                                 ` Alex Williamson
2016-10-18 12:38                                 ` Jike Song
2016-10-18 14:59                                   ` Alex Williamson
2016-10-19  2:32                                     ` Jike Song
2016-10-19  5:45                                       ` Xiao Guangrong
2016-10-19 11:56                                         ` Paolo Bonzini
2016-10-19 11:56                                           ` [Qemu-devel] " Paolo Bonzini
2016-10-19 13:39                                           ` Xiao Guangrong
2016-10-19 13:39                                             ` [Qemu-devel] " Xiao Guangrong
2016-10-19 14:14                                             ` Paolo Bonzini
2016-10-19 14:14                                               ` [Qemu-devel] " Paolo Bonzini
2016-10-20  1:48                                               ` Xiao Guangrong
2016-10-20 17:06                                                 ` Paolo Bonzini
2016-10-20 17:19                                                   ` Xiao, Guangrong
2016-10-20 17:19                                                     ` [Qemu-devel] " Xiao, Guangrong
2016-10-21  2:47                                                     ` Jike Song
2016-10-21  2:47                                                       ` Jike Song
2016-10-26 13:44                                                   ` Jike Song
2016-10-26 13:44                                                     ` Jike Song
2016-10-26 14:45                                                     ` Paolo Bonzini
2016-10-29  4:07                                                       ` Jike Song
2016-10-29  4:07                                                         ` Jike Song
2016-10-19 13:56                                       ` Eric Blake
2016-10-19 13:56                                         ` [Qemu-devel] " Eric Blake
2016-10-24  6:32                                         ` Jike Song
2016-10-12 20:48   ` Radim Krčmář
2016-10-09  7:41 ` [PATCH 2/2] KVM: MMU: apply page track notifier type track_flush_slot Xiaoguang Chen
2016-10-10 17:06 ` [PATCH 0/2] page track add " Paolo Bonzini
2016-10-11  2:43   ` Xiao Guangrong
2016-10-11  8:55     ` Paolo Bonzini
2016-10-12 20:52       ` Radim Krčmář

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.