xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
@ 2016-05-20  8:53 Feng Wu
  2016-05-20  8:53 ` [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor Feng Wu
                   ` (3 more replies)
  0 siblings, 4 replies; 49+ messages in thread
From: Feng Wu @ 2016-05-20  8:53 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	jbeulich, Feng Wu

The current VT-d PI related code may operate incorrectly in the following
scenarios:
- When the last assigned device is dettached from the domain, all
the PI related hooks are removed then, however, the vCPU can be
blocked, switched to another pCPU, etc, all without the aware of
PI. After the next time we attach another device to the domain,
which makes the PI realted hooks avaliable again, the status
of the pi descriptor is not true. Beside that, the blocking vcpu
may still remain in the per-cpu blocking in this case
- After the domain is destroyed, the the blocking vcpu may also
remain in the per-cpu blocking.

This series fix the above issue.

I still have two opens, which needs comments/sugguestions from you guys.
- What shoule we do for the per-cpu blocking list during vcpu hotplug?
- What shoule we do for the per-cpu blocking list during pcpu hotplug?

Feng Wu (3):
  VMX: Properly adjuest the status of pi descriptor
  VMX: Make hook pi_do_resume always available
  VMX: Remove the vcpu from the per-cpu blocking list after domain
    termination

 xen/arch/x86/hvm/vmx/vmx.c         | 65 +++++++++++++++++++++++++++++++++++---
 xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
 2 files changed, 61 insertions(+), 5 deletions(-)

-- 
2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
  2016-05-20  8:53 [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
@ 2016-05-20  8:53 ` Feng Wu
  2016-05-23  5:15   ` Tian, Kevin
  2016-05-23 12:30   ` Jan Beulich
  2016-05-20  8:53 ` [PATCH 2/3] VMX: Make hook pi_do_resume always available Feng Wu
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 49+ messages in thread
From: Feng Wu @ 2016-05-20  8:53 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	jbeulich, Feng Wu

When the last assigned device is dettached from the domain, all
the PI related hooks are removed then, however, the vCPU can be
blocked, switched to another pCPU, etc, all without the aware of
PI. After the next time we attach another device to the domain,
which makes the PI realted hooks avaliable again, the status
of the pi descriptor is not true, we need to properly adjust
it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c         | 29 ++++++++++++++++++++++++++---
 xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index bc4410f..3fbc7b1 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -107,12 +107,22 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
 static void vmx_vcpu_block(struct vcpu *v)
 {
     unsigned long flags;
-    unsigned int dest;
+    unsigned int dest = cpu_physical_id(v->processor);
     spinlock_t *old_lock;
     spinlock_t *pi_blocking_list_lock =
 		&per_cpu(vmx_pi_blocking, v->processor).lock;
     struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
 
+    if (v->arch.hvm_vmx.pi_back_from_hotplug == 1)
+    {
+        write_atomic(&pi_desc->ndst,
+                     x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
+        write_atomic(&pi_desc->nv, posted_intr_vector);
+        pi_clear_sn(pi_desc);
+
+        v->arch.hvm_vmx.pi_back_from_hotplug = 0;
+    }
+
     spin_lock_irqsave(pi_blocking_list_lock, flags);
     old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
                        pi_blocking_list_lock);
@@ -130,8 +140,6 @@ static void vmx_vcpu_block(struct vcpu *v)
 
     ASSERT(!pi_test_sn(pi_desc));
 
-    dest = cpu_physical_id(v->processor);
-
     ASSERT(pi_desc->ndst ==
            (x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK)));
 
@@ -164,6 +172,16 @@ static void vmx_pi_do_resume(struct vcpu *v)
     unsigned long flags;
     spinlock_t *pi_blocking_list_lock;
     struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+    unsigned int dest = cpu_physical_id(v->processor);
+
+    if (v->arch.hvm_vmx.pi_back_from_hotplug == 1)
+    {
+        write_atomic(&pi_desc->ndst,
+                     x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
+        pi_clear_sn(pi_desc);
+
+        v->arch.hvm_vmx.pi_back_from_hotplug = 0;
+    }
 
     ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
 
@@ -202,9 +220,14 @@ static void vmx_pi_do_resume(struct vcpu *v)
 /* This function is called when pcidevs_lock is held */
 void vmx_pi_hooks_assign(struct domain *d)
 {
+    struct vcpu *v;
+
     if ( !iommu_intpost || !has_hvm_container_domain(d) )
         return;
 
+    for_each_vcpu ( d, v )
+        v->arch.hvm_vmx.pi_back_from_hotplug = 1;
+
     ASSERT(!d->arch.hvm_domain.vmx.vcpu_block);
 
     d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index b54f52f..3feb60a 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -231,6 +231,7 @@ struct arch_vmx_struct {
      * pCPU and wakeup the related vCPU.
      */
     struct pi_blocking_vcpu pi_blocking;
+    int pi_back_from_hotplug;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
-- 
2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/3] VMX: Make hook pi_do_resume always available
  2016-05-20  8:53 [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
  2016-05-20  8:53 ` [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor Feng Wu
@ 2016-05-20  8:53 ` Feng Wu
  2016-05-23 12:32   ` Jan Beulich
  2016-05-20  8:53 ` [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination Feng Wu
  2016-05-20 10:27 ` [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list Jan Beulich
  3 siblings, 1 reply; 49+ messages in thread
From: Feng Wu @ 2016-05-20  8:53 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	jbeulich, Feng Wu

Make hook pi_do_resume always available, so when the last
assigned device is dettached from a domain, the blocked
vcpu can be removed from the per-cpu blocking list properly.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 3fbc7b1..4862b13 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -233,7 +233,6 @@ void vmx_pi_hooks_assign(struct domain *d)
     d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
     d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from;
     d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to;
-    d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
 }
 
 /* This function is called when pcidevs_lock is held */
@@ -247,13 +246,14 @@ void vmx_pi_hooks_deassign(struct domain *d)
     d->arch.hvm_domain.vmx.vcpu_block = NULL;
     d->arch.hvm_domain.vmx.pi_switch_from = NULL;
     d->arch.hvm_domain.vmx.pi_switch_to = NULL;
-    d->arch.hvm_domain.vmx.pi_do_resume = NULL;
 }
 
 static int vmx_domain_initialise(struct domain *d)
 {
     int rc;
 
+    d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
+
     if ( !has_vlapic(d) )
         return 0;
 
-- 
2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-20  8:53 [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
  2016-05-20  8:53 ` [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor Feng Wu
  2016-05-20  8:53 ` [PATCH 2/3] VMX: Make hook pi_do_resume always available Feng Wu
@ 2016-05-20  8:53 ` Feng Wu
  2016-05-23  5:19   ` Tian, Kevin
                     ` (2 more replies)
  2016-05-20 10:27 ` [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list Jan Beulich
  3 siblings, 3 replies; 49+ messages in thread
From: Feng Wu @ 2016-05-20  8:53 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	jbeulich, Feng Wu

We need to make sure the bocking vcpu is not in any per-cpu blocking list
when the associated domain is going to be destroyed.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 4862b13..e74b3e7 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
     d->arch.hvm_domain.vmx.pi_switch_to = NULL;
 }
 
+static void vmx_pi_blocking_list_cleanup(struct domain *d)
+{
+    unsigned int cpu;
+
+    for_each_online_cpu ( cpu )
+    {
+        struct vcpu *v;
+        unsigned long flags;
+        struct arch_vmx_struct *vmx, *tmp;
+        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
+        struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
+
+        spin_lock_irqsave(lock, flags);
+
+        list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
+        {
+            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
+
+            if (v->domain == d)
+            {
+                list_del(&vmx->pi_blocking.list);
+                ASSERT(vmx->pi_blocking.lock == lock);
+                vmx->pi_blocking.lock = NULL;
+            }
+        }
+
+        spin_unlock_irqrestore(lock, flags);
+    }
+}
+
 static int vmx_domain_initialise(struct domain *d)
 {
     int rc;
@@ -265,6 +295,8 @@ static int vmx_domain_initialise(struct domain *d)
 
 static void vmx_domain_destroy(struct domain *d)
 {
+    vmx_pi_blocking_list_cleanup(d);
+
     if ( !has_vlapic(d) )
         return;
 
-- 
2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-20  8:53 [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
                   ` (2 preceding siblings ...)
  2016-05-20  8:53 ` [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination Feng Wu
@ 2016-05-20 10:27 ` Jan Beulich
  2016-05-20 10:46   ` Wu, Feng
  3 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-20 10:27 UTC (permalink / raw)
  To: Feng Wu
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 20.05.16 at 10:53, <feng.wu@intel.com> wrote:
> I still have two opens, which needs comments/sugguestions from you guys.
> - What shoule we do for the per-cpu blocking list during vcpu hotplug?

What do you mean with vcpu hotplug? vcpus never get removed
from a VM (from hypervisor perspective), and all the guest may
ever use need to be created before the guest starts.

> - What shoule we do for the per-cpu blocking list during pcpu hotplug?

I think it would help if you said what issue you see.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-20 10:27 ` [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list Jan Beulich
@ 2016-05-20 10:46   ` Wu, Feng
  2016-05-23  8:08     ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-05-20 10:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, May 20, 2016 6:27 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu
> blocking list
> 
> >>> On 20.05.16 at 10:53, <feng.wu@intel.com> wrote:
> > I still have two opens, which needs comments/suggestions from you guys.
> > - What should we do for the per-cpu blocking list during vcpu hotplug?
> 
> What do you mean with vcpu hotplug? vcpus never get removed
> from a VM (from hypervisor perspective), and all the guest may
> ever use need to be created before the guest starts.

Thanks for the reply, Jan. First of all, I am not familiar with vcpu hotplug/pcup hotplug,
and that is why I list them as two opens here. When I wrote "vcpu hotplug", I was planning
to refer to the following case:
1. use 'xl vcpu-set ${dom_id} ${vcpu_num}', where ${vcpu_num} is less than the current
online vcpus.
2. In the guest, use ' echo 1 > /sys/devices/system/cpu/cpuX/online ' to make the vcpus
offline.

Yes, I know the maximum vcpus are allocated for the guest, but I am not quite sure
whether the above operations will affect the blocking list. If you can elaborate a bit
more what hypervisor will do for the above operations, That should be very helpful.
such as, will the vcpu's state be changed, etc? And if the vcpu is current in blocking
state, while try to offline it, it will still remain in the blocking list, right, will this cause
some problems? 

> 
> > - What should we do for the per-cpu blocking list during pcpu hotplug?
> 
> I think it would help if you said what issue you see.

I didn't see any issues related to this, this open just comes out in my mind when I wrote
this patch to fix the bug. As mentioned above, when the pCPU is removed, what is the
status of its blocking list? But thinking a bit more about this, I feel it should work this way,
before the pCPU is removed, all the vCPUs running on it has been moved to another
pCPU, right? If this is the case, it can address part of my concern. Another concern is
if a vCPU is blocking on a pCPU, then the pCPU is going to be unplugged, what does
Xen do for the blocking vCPU?

Thanks,
Feng

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
  2016-05-20  8:53 ` [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor Feng Wu
@ 2016-05-23  5:15   ` Tian, Kevin
  2016-05-23  5:27     ` Wu, Feng
  2016-05-23 12:30   ` Jan Beulich
  1 sibling, 1 reply; 49+ messages in thread
From: Tian, Kevin @ 2016-05-23  5:15 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: keir, george.dunlap, andrew.cooper3, dario.faggioli, jbeulich

> From: Wu, Feng
> Sent: Friday, May 20, 2016 4:54 PM
> 
> When the last assigned device is dettached from the domain, all
> the PI related hooks are removed then, however, the vCPU can be
> blocked, switched to another pCPU, etc, all without the aware of
> PI. After the next time we attach another device to the domain,
> which makes the PI realted hooks avaliable again, the status
> of the pi descriptor is not true, we need to properly adjust
> it.

Instead of adjusting pi descriptor in multiple places, can we
simply reset the status (including removing from block list)
right when hooks are removed at deattach?

> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c         | 29
> ++++++++++++++++++++++++++---
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
>  2 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index bc4410f..3fbc7b1 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -107,12 +107,22 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
>  static void vmx_vcpu_block(struct vcpu *v)
>  {
>      unsigned long flags;
> -    unsigned int dest;
> +    unsigned int dest = cpu_physical_id(v->processor);
>      spinlock_t *old_lock;
>      spinlock_t *pi_blocking_list_lock =
>  		&per_cpu(vmx_pi_blocking, v->processor).lock;
>      struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> 
> +    if (v->arch.hvm_vmx.pi_back_from_hotplug == 1)
> +    {
> +        write_atomic(&pi_desc->ndst,
> +                     x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> +        pi_clear_sn(pi_desc);
> +
> +        v->arch.hvm_vmx.pi_back_from_hotplug = 0;
> +    }
> +
>      spin_lock_irqsave(pi_blocking_list_lock, flags);
>      old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
>                         pi_blocking_list_lock);
> @@ -130,8 +140,6 @@ static void vmx_vcpu_block(struct vcpu *v)
> 
>      ASSERT(!pi_test_sn(pi_desc));
> 
> -    dest = cpu_physical_id(v->processor);
> -
>      ASSERT(pi_desc->ndst ==
>             (x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK)));
> 
> @@ -164,6 +172,16 @@ static void vmx_pi_do_resume(struct vcpu *v)
>      unsigned long flags;
>      spinlock_t *pi_blocking_list_lock;
>      struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +    unsigned int dest = cpu_physical_id(v->processor);
> +
> +    if (v->arch.hvm_vmx.pi_back_from_hotplug == 1)
> +    {
> +        write_atomic(&pi_desc->ndst,
> +                     x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
> +        pi_clear_sn(pi_desc);
> +
> +        v->arch.hvm_vmx.pi_back_from_hotplug = 0;
> +    }
> 
>      ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
> 
> @@ -202,9 +220,14 @@ static void vmx_pi_do_resume(struct vcpu *v)
>  /* This function is called when pcidevs_lock is held */
>  void vmx_pi_hooks_assign(struct domain *d)
>  {
> +    struct vcpu *v;
> +
>      if ( !iommu_intpost || !has_hvm_container_domain(d) )
>          return;
> 
> +    for_each_vcpu ( d, v )
> +        v->arch.hvm_vmx.pi_back_from_hotplug = 1;
> +
>      ASSERT(!d->arch.hvm_domain.vmx.vcpu_block);
> 
>      d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
> b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index b54f52f..3feb60a 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -231,6 +231,7 @@ struct arch_vmx_struct {
>       * pCPU and wakeup the related vCPU.
>       */
>      struct pi_blocking_vcpu pi_blocking;
> +    int pi_back_from_hotplug;
>  };
> 
>  int vmx_create_vmcs(struct vcpu *v);
> --
> 2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-20  8:53 ` [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination Feng Wu
@ 2016-05-23  5:19   ` Tian, Kevin
  2016-05-23  5:48     ` Wu, Feng
  2016-05-23 12:30   ` Dario Faggioli
  2016-05-23 12:35   ` Jan Beulich
  2 siblings, 1 reply; 49+ messages in thread
From: Tian, Kevin @ 2016-05-23  5:19 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: keir, george.dunlap, andrew.cooper3, dario.faggioli, jbeulich

> From: Wu, Feng
> Sent: Friday, May 20, 2016 4:54 PM
> 
> We need to make sure the bocking vcpu is not in any per-cpu blocking list
> when the associated domain is going to be destroyed.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 32
> ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 4862b13..e74b3e7 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
>      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
>  }
> 
> +static void vmx_pi_blocking_list_cleanup(struct domain *d)

Is it more natural to move such cleanup under vcpu destroy?

> +{
> +    unsigned int cpu;
> +
> +    for_each_online_cpu ( cpu )
> +    {
> +        struct vcpu *v;
> +        unsigned long flags;
> +        struct arch_vmx_struct *vmx, *tmp;
> +        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> +        struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
> +
> +        spin_lock_irqsave(lock, flags);
> +
> +        list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
> +        {
> +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +
> +            if (v->domain == d)
> +            {
> +                list_del(&vmx->pi_blocking.list);
> +                ASSERT(vmx->pi_blocking.lock == lock);
> +                vmx->pi_blocking.lock = NULL;
> +            }
> +        }
> +
> +        spin_unlock_irqrestore(lock, flags);
> +    }
> +}
> +
>  static int vmx_domain_initialise(struct domain *d)
>  {
>      int rc;
> @@ -265,6 +295,8 @@ static int vmx_domain_initialise(struct domain *d)
> 
>  static void vmx_domain_destroy(struct domain *d)
>  {
> +    vmx_pi_blocking_list_cleanup(d);
> +
>      if ( !has_vlapic(d) )
>          return;
> 
> --
> 2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
  2016-05-23  5:15   ` Tian, Kevin
@ 2016-05-23  5:27     ` Wu, Feng
  2016-05-23  6:52       ` Tian, Kevin
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-05-23  5:27 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: keir, george.dunlap, andrew.cooper3, dario.faggioli, jbeulich, Wu, Feng



> -----Original Message-----
> From: Tian, Kevin
> Sent: Monday, May 23, 2016 1:15 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: keir@xen.org; jbeulich@suse.com; andrew.cooper3@citrix.com;
> george.dunlap@eu.citrix.com; dario.faggioli@citrix.com;
> konrad.wilk@oracle.com
> Subject: RE: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
> 
> > From: Wu, Feng
> > Sent: Friday, May 20, 2016 4:54 PM
> >
> > When the last assigned device is dettached from the domain, all
> > the PI related hooks are removed then, however, the vCPU can be
> > blocked, switched to another pCPU, etc, all without the aware of
> > PI. After the next time we attach another device to the domain,
> > which makes the PI realted hooks avaliable again, the status
> > of the pi descriptor is not true, we need to properly adjust
> > it.
> 
> Instead of adjusting pi descriptor in multiple places, can we
> simply reset the status (including removing from block list)
> right when hooks are removed at deattach?
> 

I also thought about this idea before, the problem is that when
the hooks are being removed at the pci device's deattachment,
the hooks may be running at the same time, which may cause
problems, such as, after we have removed the vCPU from the
blocking list, vmx_vcpu_block() (which has been running before the
hooks were removed and not finished yet.) adds a vCPU to the same
blocking list, then this vCPU will remain in the list after the device is
deattached from the domain. At the same reason, if we change PI desc
in vmx_pi_hooks_deassign() while it is being used in the hooks, it
may cause problems.

Thanks,
Feng

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-23  5:19   ` Tian, Kevin
@ 2016-05-23  5:48     ` Wu, Feng
  2016-05-23  6:54       ` Tian, Kevin
  2016-05-23  9:08       ` Jan Beulich
  0 siblings, 2 replies; 49+ messages in thread
From: Wu, Feng @ 2016-05-23  5:48 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: keir, george.dunlap, andrew.cooper3, dario.faggioli, jbeulich, Wu, Feng



> -----Original Message-----
> From: Tian, Kevin
> Sent: Monday, May 23, 2016 1:19 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: keir@xen.org; jbeulich@suse.com; andrew.cooper3@citrix.com;
> george.dunlap@eu.citrix.com; dario.faggioli@citrix.com;
> konrad.wilk@oracle.com
> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> > From: Wu, Feng
> > Sent: Friday, May 20, 2016 4:54 PM
> >
> > We need to make sure the bocking vcpu is not in any per-cpu blocking list
> > when the associated domain is going to be destroyed.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/arch/x86/hvm/vmx/vmx.c | 32
> > ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index 4862b13..e74b3e7 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> >  }
> >
> > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> 
> Is it more natural to move such cleanup under vcpu destroy?

Yes, indeed it is more natural to add this function when vcpu is destroyed,
however, the reason I add it here is I still have some concerns about the timing.
When the VM is destroyed, here is the calling path:

- vmx_pi_hooks_deassign() -->
......
- vmx_vcpu_destroy() --> 
......
- vmx_domain_destroy()
......

As I replied in the previous mail, when we remove the vcpus from the blocking
list, there might be some _in-flight_ call to the hooks, so I put the cleanup
code in the vmx_domain_destroy(), which is a bit more far from vmx_pi_hooks_deassign,
and hence safer. If you have any other good ideas, I am all ears!:)

Thanks,
Feng


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
  2016-05-23  5:27     ` Wu, Feng
@ 2016-05-23  6:52       ` Tian, Kevin
  2016-05-23  7:16         ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Tian, Kevin @ 2016-05-23  6:52 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: keir, george.dunlap, andrew.cooper3, dario.faggioli, jbeulich

> From: Wu, Feng
> Sent: Monday, May 23, 2016 1:28 PM
> 
> 
> > -----Original Message-----
> > From: Tian, Kevin
> > Sent: Monday, May 23, 2016 1:15 PM
> > To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> > Cc: keir@xen.org; jbeulich@suse.com; andrew.cooper3@citrix.com;
> > george.dunlap@eu.citrix.com; dario.faggioli@citrix.com;
> > konrad.wilk@oracle.com
> > Subject: RE: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
> >
> > > From: Wu, Feng
> > > Sent: Friday, May 20, 2016 4:54 PM
> > >
> > > When the last assigned device is dettached from the domain, all
> > > the PI related hooks are removed then, however, the vCPU can be
> > > blocked, switched to another pCPU, etc, all without the aware of
> > > PI. After the next time we attach another device to the domain,
> > > which makes the PI realted hooks avaliable again, the status
> > > of the pi descriptor is not true, we need to properly adjust
> > > it.
> >
> > Instead of adjusting pi descriptor in multiple places, can we
> > simply reset the status (including removing from block list)
> > right when hooks are removed at deattach?
> >
> 
> I also thought about this idea before, the problem is that when
> the hooks are being removed at the pci device's deattachment,
> the hooks may be running at the same time, which may cause
> problems, such as, after we have removed the vCPU from the
> blocking list, vmx_vcpu_block() (which has been running before the
> hooks were removed and not finished yet.) adds a vCPU to the same
> blocking list, then this vCPU will remain in the list after the device is
> deattached from the domain. At the same reason, if we change PI desc
> in vmx_pi_hooks_deassign() while it is being used in the hooks, it
> may cause problems.
> 

It's not just about updating PI desc. The race could exist for changing
hooks too:

void vmx_pi_hooks_deassign(struct domain *d)
	...
	d->arch.hvm_domain.vmx.pi_switch_from = NULL;
	...

static void vmx_ctxt_switch_from(struct vcpu *v)
	...
	if ( v->domain->arch.hvm_domain.vmx.pi_switch_from )
        v->domain->arch.hvm_domain.vmx.pi_switch_from(v);

If the callback is cleared between above two lines, you'll refer to
a NULL pointer in the 2nd line.

I thought there should be some protection mechanism in place for
such fundamental race, which might be extended to cover PI desc
too. But looks not the case. Would you mind pointing it out if already
well handled? :-)

Somehow I'm thinking whether we really need such dynamic
callback changes in a SMP environment. Is it safer to introduce
some per-domain flag to indicate above scenario so each callback
can return early with negligible cost so no need to reset them once
registered?

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-23  5:48     ` Wu, Feng
@ 2016-05-23  6:54       ` Tian, Kevin
  2016-05-23  9:08       ` Jan Beulich
  1 sibling, 0 replies; 49+ messages in thread
From: Tian, Kevin @ 2016-05-23  6:54 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: keir, george.dunlap, andrew.cooper3, dario.faggioli, jbeulich

> From: Wu, Feng
> Sent: Monday, May 23, 2016 1:48 PM
> > -----Original Message-----
> > From: Tian, Kevin
> > Sent: Monday, May 23, 2016 1:19 PM
> > To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> > Cc: keir@xen.org; jbeulich@suse.com; andrew.cooper3@citrix.com;
> > george.dunlap@eu.citrix.com; dario.faggioli@citrix.com;
> > konrad.wilk@oracle.com
> > Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> > after domain termination
> >
> > > From: Wu, Feng
> > > Sent: Friday, May 20, 2016 4:54 PM
> > >
> > > We need to make sure the bocking vcpu is not in any per-cpu blocking list
> > > when the associated domain is going to be destroyed.
> > >
> > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > ---
> > >  xen/arch/x86/hvm/vmx/vmx.c | 32
> > > ++++++++++++++++++++++++++++++++
> > >  1 file changed, 32 insertions(+)
> > >
> > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > > index 4862b13..e74b3e7 100644
> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> > >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> > >  }
> > >
> > > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> >
> > Is it more natural to move such cleanup under vcpu destroy?
> 
> Yes, indeed it is more natural to add this function when vcpu is destroyed,
> however, the reason I add it here is I still have some concerns about the timing.
> When the VM is destroyed, here is the calling path:
> 
> - vmx_pi_hooks_deassign() -->
> ......
> - vmx_vcpu_destroy() -->
> ......
> - vmx_domain_destroy()
> ......
> 
> As I replied in the previous mail, when we remove the vcpus from the blocking
> list, there might be some _in-flight_ call to the hooks, so I put the cleanup
> code in the vmx_domain_destroy(), which is a bit more far from vmx_pi_hooks_deassign,
> and hence safer. If you have any other good ideas, I am all ears!:)
> 

If we don't need reset callbacks at deassign path, as commented for patch 1/3,
would it make above puzzle away? :-)

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
  2016-05-23  6:52       ` Tian, Kevin
@ 2016-05-23  7:16         ` Wu, Feng
  2016-05-23  9:03           ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-05-23  7:16 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel, Jan Beulich (JBeulich@suse.com)
  Cc: keir, george.dunlap, andrew.cooper3, dario.faggioli, Wu, Feng



> -----Original Message-----
> From: Tian, Kevin
> Sent: Monday, May 23, 2016 2:52 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: keir@xen.org; jbeulich@suse.com; andrew.cooper3@citrix.com;
> george.dunlap@eu.citrix.com; dario.faggioli@citrix.com;
> konrad.wilk@oracle.com
> Subject: RE: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
> 
> > From: Wu, Feng
> > Sent: Monday, May 23, 2016 1:28 PM
> >
> >
> > > -----Original Message-----
> > > From: Tian, Kevin
> > > Sent: Monday, May 23, 2016 1:15 PM
> > > To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> > > Cc: keir@xen.org; jbeulich@suse.com; andrew.cooper3@citrix.com;
> > > george.dunlap@eu.citrix.com; dario.faggioli@citrix.com;
> > > konrad.wilk@oracle.com
> > > Subject: RE: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
> > >
> > > > From: Wu, Feng
> > > > Sent: Friday, May 20, 2016 4:54 PM
> > > >
> > > > When the last assigned device is dettached from the domain, all
> > > > the PI related hooks are removed then, however, the vCPU can be
> > > > blocked, switched to another pCPU, etc, all without the aware of
> > > > PI. After the next time we attach another device to the domain,
> > > > which makes the PI realted hooks avaliable again, the status
> > > > of the pi descriptor is not true, we need to properly adjust
> > > > it.
> > >
> > > Instead of adjusting pi descriptor in multiple places, can we
> > > simply reset the status (including removing from block list)
> > > right when hooks are removed at deattach?
> > >
> >
> > I also thought about this idea before, the problem is that when
> > the hooks are being removed at the pci device's deattachment,
> > the hooks may be running at the same time, which may cause
> > problems, such as, after we have removed the vCPU from the
> > blocking list, vmx_vcpu_block() (which has been running before the
> > hooks were removed and not finished yet.) adds a vCPU to the same
> > blocking list, then this vCPU will remain in the list after the device is
> > deattached from the domain. At the same reason, if we change PI desc
> > in vmx_pi_hooks_deassign() while it is being used in the hooks, it
> > may cause problems.
> >
> 
> It's not just about updating PI desc. The race could exist for changing
> hooks too:
> 
> void vmx_pi_hooks_deassign(struct domain *d)
> 	...
> 	d->arch.hvm_domain.vmx.pi_switch_from = NULL;
> 	...
> 
> static void vmx_ctxt_switch_from(struct vcpu *v)
> 	...
> 	if ( v->domain->arch.hvm_domain.vmx.pi_switch_from )
>         v->domain->arch.hvm_domain.vmx.pi_switch_from(v);
> 
> If the callback is cleared between above two lines, you'll refer to
> a NULL pointer in the 2nd line.
> 
> I thought there should be some protection mechanism in place for
> such fundamental race, which might be extended to cover PI desc
> too. But looks not the case. Would you mind pointing it out if already
> well handled? :-)

You are right, that can happen. However, in this case, it will not introduce
any bad impacts other than the PI desc is remaining the old value.
But the device is deattched, so it doesn't matter, as long as we
re-adjust it to the right one once the device is attached. And this
is what I am doing in this patch.

> 
> Somehow I'm thinking whether we really need such dynamic
> callback changes in a SMP environment. Is it safer to introduce
> some per-domain flag to indicate above scenario so each callback
> can return early with negligible cost so no need to reset them once
> registered?

Yes, I agree with you. As you can see, the current issue is kind of because
of the dynamically assigning/deassigning the pi hooks, and there are
lots of concern cases by using it. I think it worth try what you mentioned
above. However, this method was sugguested by Jan before, so I'd like
to know what is Jan's option about this.

Hi Jan, any ideas? Thanks!

Thanks,
Feng

> 
> Thanks
> Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-20 10:46   ` Wu, Feng
@ 2016-05-23  8:08     ` Jan Beulich
  2016-05-23  8:44       ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-23  8:08 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 20.05.16 at 12:46, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, May 20, 2016 6:27 PM
>> To: Wu, Feng <feng.wu@intel.com>
>> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
>> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
>> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org 
>> Subject: Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu
>> blocking list
>> 
>> >>> On 20.05.16 at 10:53, <feng.wu@intel.com> wrote:
>> > I still have two opens, which needs comments/suggestions from you guys.
>> > - What should we do for the per-cpu blocking list during vcpu hotplug?
>> 
>> What do you mean with vcpu hotplug? vcpus never get removed
>> from a VM (from hypervisor perspective), and all the guest may
>> ever use need to be created before the guest starts.
> 
> Thanks for the reply, Jan. First of all, I am not familiar with vcpu 
> hotplug/pcup hotplug,
> and that is why I list them as two opens here. When I wrote "vcpu hotplug", 
> I was planning
> to refer to the following case:
> 1. use 'xl vcpu-set ${dom_id} ${vcpu_num}', where ${vcpu_num} is less than 
> the current
> online vcpus.
> 2. In the guest, use ' echo 1 > /sys/devices/system/cpu/cpuX/online ' to make 
> the vcpus
> offline.
> 
> Yes, I know the maximum vcpus are allocated for the guest, but I am not quite sure
> whether the above operations will affect the blocking list. If you can elaborate a bit
> more what hypervisor will do for the above operations, That should be very helpful.

The hypervisor just offlines the vCPU (in response to the guest
invoking VCPUOP_down).

> such as, will the vcpu's state be changed, etc? And if the vcpu is current 
> in blocking
> state, while try to offline it, it will still remain in the blocking list, 
> right, will this cause
> some problems? 

Since _VPF_down is just one of many possible pause flags, I don't
see what problems you're suspecting. But maybe I'm overlooking
something?

>> > - What should we do for the per-cpu blocking list during pcpu hotplug?
>> 
>> I think it would help if you said what issue you see.
> 
> I didn't see any issues related to this, this open just comes out in my mind 
> when I wrote
> this patch to fix the bug. As mentioned above, when the pCPU is removed, 
> what is the
> status of its blocking list? But thinking a bit more about this, I feel it 
> should work this way,
> before the pCPU is removed, all the vCPUs running on it has been moved to 
> another
> pCPU, right?

Yes.

> If this is the case, it can address part of my concern. Another 
> concern is
> if a vCPU is blocking on a pCPU, then the pCPU is going to be unplugged, 
> what does
> Xen do for the blocking vCPU?

A pCPU is never blocking "on a pCPU", it is always just blocking.
vCPU-s currently having their v->processor set to the pCPU being
hot removed would simply get migrated elsewhere. If that's not
accompanied by respective PI blocking list adjustments, then I can
only refer you back to the original series' review process, where
it was (iirc) more than once that it was pointed out that getting out
of sync v->processor and the PI list a vCPU sits on may casue
issues. And if there is an issue here, I'm not sure what a good
solution would be.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-23  8:08     ` Jan Beulich
@ 2016-05-23  8:44       ` Wu, Feng
  2016-05-23  8:51         ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-05-23  8:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, May 23, 2016 4:09 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: RE: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu
> blocking list
> 
> >>> On 20.05.16 at 12:46, <feng.wu@intel.com> wrote:
> 
> >
> >> -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, May 20, 2016 6:27 PM
> >> To: Wu, Feng <feng.wu@intel.com>
> >> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> >> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> >> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> >> Subject: Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu
> >> blocking list
> >>
> >> >>> On 20.05.16 at 10:53, <feng.wu@intel.com> wrote:
> >> > I still have two opens, which needs comments/suggestions from you guys.
> >> > - What should we do for the per-cpu blocking list during vcpu hotplug?
> >>
> >> What do you mean with vcpu hotplug? vcpus never get removed
> >> from a VM (from hypervisor perspective), and all the guest may
> >> ever use need to be created before the guest starts.
> >
> > Thanks for the reply, Jan. First of all, I am not familiar with vcpu
> > hotplug/pcup hotplug,
> > and that is why I list them as two opens here. When I wrote "vcpu hotplug",
> > I was planning
> > to refer to the following case:
> > 1. use 'xl vcpu-set ${dom_id} ${vcpu_num}', where ${vcpu_num} is less than
> > the current
> > online vcpus.
> > 2. In the guest, use ' echo 1 > /sys/devices/system/cpu/cpuX/online ' to make
> > the vcpus
> > offline.
> >
> > Yes, I know the maximum vcpus are allocated for the guest, but I am not quite
> sure
> > whether the above operations will affect the blocking list. If you can elaborate
> a bit
> > more what hypervisor will do for the above operations, That should be very
> helpful.
> 
> The hypervisor just offlines the vCPU (in response to the guest
> invoking VCPUOP_down).
> 
> > such as, will the vcpu's state be changed, etc? And if the vcpu is current
> > in blocking
> > state, while try to offline it, it will still remain in the blocking list,
> > right, will this cause
> > some problems?
> 
> Since _VPF_down is just one of many possible pause flags, I don't
> see what problems you're suspecting. But maybe I'm overlooking
> something?

Thanks for the explanation. If hypervisor just offline the vCPU, I
think there is nothing special need to do here. Since the current
logic can handle the state transition well.

> 
> >> > - What should we do for the per-cpu blocking list during pcpu hotplug?
> >>
> >> I think it would help if you said what issue you see.
> >
> > I didn't see any issues related to this, this open just comes out in my mind
> > when I wrote
> > this patch to fix the bug. As mentioned above, when the pCPU is removed,
> > what is the
> > status of its blocking list? But thinking a bit more about this, I feel it
> > should work this way,
> > before the pCPU is removed, all the vCPUs running on it has been moved to
> > another
> > pCPU, right?
> 
> Yes.
> 
> > If this is the case, it can address part of my concern. Another
> > concern is
> > if a vCPU is blocking on a pCPU, then the pCPU is going to be unplugged,
> > what does
> > Xen do for the blocking vCPU?
> 
> A pCPU is never blocking "on a pCPU", it is always just blocking.

I think you meant "vCPU is never ...."

> vCPU-s currently having their v->processor set to the pCPU being
> hot removed would simply get migrated elsewhere. If that's not
> accompanied by respective PI blocking list adjustments, then I can
> only refer you back to the original series' review process, where
> it was (iirc) more than once that it was pointed out that getting out
> of sync v->processor and the PI list a vCPU sits on may casue
> issues. And if there is an issue here, I'm not sure what a good
> solution would be.

Okay, here is my understanding about the blocking and vCPU
migration things, if vCPU is blocking and its v->processor is pCPU 0,
at some point, pCPU0 is removed from the system, it will also
migrate the vCPU to another pCPU, say, pCPU1 as the response
to pCPU0 being unplugged, hence, v->processor = pCPU1, right?

If that is the case, I think the current code may have issues in the
above case, since 'NDST' filed in PI descriptor is still vCPU0 even
it is removed from the system. I need to consider how to handle
this case. But this is not an issue  in non pCPU hotplug scenario.

Thanks,
Feng

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-23  8:44       ` Wu, Feng
@ 2016-05-23  8:51         ` Jan Beulich
  2016-05-23 12:39           ` Dario Faggioli
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-23  8:51 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 23.05.16 at 10:44, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Monday, May 23, 2016 4:09 PM
>> >>> On 20.05.16 at 12:46, <feng.wu@intel.com> wrote:
>> > If this is the case, it can address part of my concern. Another
>> > concern is
>> > if a vCPU is blocking on a pCPU, then the pCPU is going to be unplugged,
>> > what does
>> > Xen do for the blocking vCPU?
>> 
>> A pCPU is never blocking "on a pCPU", it is always just blocking.
> 
> I think you meant "vCPU is never ...."

Oh, indeed.

>> vCPU-s currently having their v->processor set to the pCPU being
>> hot removed would simply get migrated elsewhere. If that's not
>> accompanied by respective PI blocking list adjustments, then I can
>> only refer you back to the original series' review process, where
>> it was (iirc) more than once that it was pointed out that getting out
>> of sync v->processor and the PI list a vCPU sits on may casue
>> issues. And if there is an issue here, I'm not sure what a good
>> solution would be.
> 
> Okay, here is my understanding about the blocking and vCPU
> migration things, if vCPU is blocking and its v->processor is pCPU 0,
> at some point, pCPU0 is removed from the system, it will also
> migrate the vCPU to another pCPU, say, pCPU1 as the response
> to pCPU0 being unplugged, hence, v->processor = pCPU1, right?

Yes.

Jan

> If that is the case, I think the current code may have issues in the
> above case, since 'NDST' filed in PI descriptor is still vCPU0 even
> it is removed from the system. I need to consider how to handle
> this case. But this is not an issue  in non pCPU hotplug scenario.
> 
> Thanks,
> Feng



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
  2016-05-23  7:16         ` Wu, Feng
@ 2016-05-23  9:03           ` Jan Beulich
  2016-05-23  9:21             ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-23  9:03 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 23.05.16 at 09:16, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Tian, Kevin
>> Sent: Monday, May 23, 2016 2:52 PM
>> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org 
>> Cc: keir@xen.org; jbeulich@suse.com; andrew.cooper3@citrix.com;
>> george.dunlap@eu.citrix.com; dario.faggioli@citrix.com;
>> konrad.wilk@oracle.com 
>> Subject: RE: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
>> 
>> > From: Wu, Feng
>> > Sent: Monday, May 23, 2016 1:28 PM
>> >
>> >
>> > > -----Original Message-----
>> > > From: Tian, Kevin
>> > > Sent: Monday, May 23, 2016 1:15 PM
>> > > To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org 
>> > > Cc: keir@xen.org; jbeulich@suse.com; andrew.cooper3@citrix.com;
>> > > george.dunlap@eu.citrix.com; dario.faggioli@citrix.com;
>> > > konrad.wilk@oracle.com 
>> > > Subject: RE: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
>> > >
>> > > > From: Wu, Feng
>> > > > Sent: Friday, May 20, 2016 4:54 PM
>> > > >
>> > > > When the last assigned device is dettached from the domain, all
>> > > > the PI related hooks are removed then, however, the vCPU can be
>> > > > blocked, switched to another pCPU, etc, all without the aware of
>> > > > PI. After the next time we attach another device to the domain,
>> > > > which makes the PI realted hooks avaliable again, the status
>> > > > of the pi descriptor is not true, we need to properly adjust
>> > > > it.
>> > >
>> > > Instead of adjusting pi descriptor in multiple places, can we
>> > > simply reset the status (including removing from block list)
>> > > right when hooks are removed at deattach?
>> > >
>> >
>> > I also thought about this idea before, the problem is that when
>> > the hooks are being removed at the pci device's deattachment,
>> > the hooks may be running at the same time, which may cause
>> > problems, such as, after we have removed the vCPU from the
>> > blocking list, vmx_vcpu_block() (which has been running before the
>> > hooks were removed and not finished yet.) adds a vCPU to the same
>> > blocking list, then this vCPU will remain in the list after the device is
>> > deattached from the domain. At the same reason, if we change PI desc
>> > in vmx_pi_hooks_deassign() while it is being used in the hooks, it
>> > may cause problems.
>> >
>> 
>> It's not just about updating PI desc. The race could exist for changing
>> hooks too:
>> 
>> void vmx_pi_hooks_deassign(struct domain *d)
>> 	...
>> 	d->arch.hvm_domain.vmx.pi_switch_from = NULL;
>> 	...
>> 
>> static void vmx_ctxt_switch_from(struct vcpu *v)
>> 	...
>> 	if ( v->domain->arch.hvm_domain.vmx.pi_switch_from )
>>         v->domain->arch.hvm_domain.vmx.pi_switch_from(v);
>> 
>> If the callback is cleared between above two lines, you'll refer to
>> a NULL pointer in the 2nd line.
>> 
>> I thought there should be some protection mechanism in place for
>> such fundamental race, which might be extended to cover PI desc
>> too. But looks not the case. Would you mind pointing it out if already
>> well handled? :-)
> 
> You are right, that can happen. However, in this case, it will not introduce
> any bad impacts other than the PI desc is remaining the old value.

How that, if you agree that what Kevin described can happen? He's
pointing out a possible NULL deref...

>> Somehow I'm thinking whether we really need such dynamic
>> callback changes in a SMP environment. Is it safer to introduce
>> some per-domain flag to indicate above scenario so each callback
>> can return early with negligible cost so no need to reset them once
>> registered?
> 
> Yes, I agree with you. As you can see, the current issue is kind of because
> of the dynamically assigning/deassigning the pi hooks, and there are
> lots of concern cases by using it. I think it worth try what you mentioned
> above. However, this method was sugguested by Jan before, so I'd like
> to know what is Jan's option about this.

Avoiding the indirect calls would certainly be nice. Apart from
reading the function pointer into a local variable, use of which is
separated from the read by a barrier() (to address the NULL
deref concern), did you consider zapping the pointers in an RCU
callback instead of right when being requested?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-23  5:48     ` Wu, Feng
  2016-05-23  6:54       ` Tian, Kevin
@ 2016-05-23  9:08       ` Jan Beulich
  2016-05-23  9:17         ` Wu, Feng
  1 sibling, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-23  9:08 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
>> From: Tian, Kevin
>> Sent: Monday, May 23, 2016 1:19 PM
>> > From: Wu, Feng
>> > Sent: Friday, May 20, 2016 4:54 PM
>> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
>> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
>> >  }
>> >
>> > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
>> 
>> Is it more natural to move such cleanup under vcpu destroy?
> 
> Yes, indeed it is more natural to add this function when vcpu is destroyed,
> however, the reason I add it here is I still have some concerns about the 
> timing.
> When the VM is destroyed, here is the calling path:
> 
> - vmx_pi_hooks_deassign() -->
> ......
> - vmx_vcpu_destroy() --> 
> ......
> - vmx_domain_destroy()
> ......
> 
> As I replied in the previous mail, when we remove the vcpus from the 
> blocking
> list, there might be some _in-flight_ call to the hooks, so I put the cleanup
> code in the vmx_domain_destroy(), which is a bit more far from 
> vmx_pi_hooks_deassign,
> and hence safer. If you have any other good ideas, I am all ears!:)

Well, either there is a possible race (then moving the addition
later just reduces the chances, but doesn't eliminate it), or there
isn't (in which case Kevin's suggestion should probably be followed).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-23  9:08       ` Jan Beulich
@ 2016-05-23  9:17         ` Wu, Feng
  2016-05-23 10:35           ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-05-23  9:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, May 23, 2016 5:08 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> >>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
> >> From: Tian, Kevin
> >> Sent: Monday, May 23, 2016 1:19 PM
> >> > From: Wu, Feng
> >> > Sent: Friday, May 20, 2016 4:54 PM
> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> >> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> >> >  }
> >> >
> >> > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> >>
> >> Is it more natural to move such cleanup under vcpu destroy?
> >
> > Yes, indeed it is more natural to add this function when vcpu is destroyed,
> > however, the reason I add it here is I still have some concerns about the
> > timing.
> > When the VM is destroyed, here is the calling path:
> >
> > - vmx_pi_hooks_deassign() -->
> > ......
> > - vmx_vcpu_destroy() -->
> > ......
> > - vmx_domain_destroy()
> > ......
> >
> > As I replied in the previous mail, when we remove the vcpus from the
> > blocking
> > list, there might be some _in-flight_ call to the hooks, so I put the cleanup
> > code in the vmx_domain_destroy(), which is a bit more far from
> > vmx_pi_hooks_deassign,
> > and hence safer. If you have any other good ideas, I am all ears!:)
> 
> Well, either there is a possible race (then moving the addition
> later just reduces the chances, but doesn't eliminate it), or there
> isn't (in which case Kevin's suggestion should probably be followed).

Yes, I agree, adding the cleanup code in domain destroy other than
vcpu destroy point just reduces the risk, but not eliminate. So far I don't
get a perfect solution to solve this possible race condition.

Thanks,
Feng

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
  2016-05-23  9:03           ` Jan Beulich
@ 2016-05-23  9:21             ` Wu, Feng
  2016-05-23 11:04               ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-05-23  9:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, May 23, 2016 5:04 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: RE: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
> 
> >>> On 23.05.16 at 09:16, <feng.wu@intel.com> wrote:
> 
> >
> >> -----Original Message-----
> >> From: Tian, Kevin
> >> Sent: Monday, May 23, 2016 2:52 PM
> >> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> >> Cc: keir@xen.org; jbeulich@suse.com; andrew.cooper3@citrix.com;
> >> george.dunlap@eu.citrix.com; dario.faggioli@citrix.com;
> >> konrad.wilk@oracle.com
> >> Subject: RE: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
> >>
> >> > From: Wu, Feng
> >> > Sent: Monday, May 23, 2016 1:28 PM
> >> >
> >> >
> >> > > -----Original Message-----
> >> > > From: Tian, Kevin
> >> > > Sent: Monday, May 23, 2016 1:15 PM
> >> > > To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> >> > > Cc: keir@xen.org; jbeulich@suse.com; andrew.cooper3@citrix.com;
> >> > > george.dunlap@eu.citrix.com; dario.faggioli@citrix.com;
> >> > > konrad.wilk@oracle.com
> >> > > Subject: RE: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
> >> > >
> >> > > > From: Wu, Feng
> >> > > > Sent: Friday, May 20, 2016 4:54 PM
> >> > > >
> >> > > > When the last assigned device is dettached from the domain, all
> >> > > > the PI related hooks are removed then, however, the vCPU can be
> >> > > > blocked, switched to another pCPU, etc, all without the aware of
> >> > > > PI. After the next time we attach another device to the domain,
> >> > > > which makes the PI realted hooks avaliable again, the status
> >> > > > of the pi descriptor is not true, we need to properly adjust
> >> > > > it.
> >> > >
> >> > > Instead of adjusting pi descriptor in multiple places, can we
> >> > > simply reset the status (including removing from block list)
> >> > > right when hooks are removed at deattach?
> >> > >
> >> >
> >> > I also thought about this idea before, the problem is that when
> >> > the hooks are being removed at the pci device's deattachment,
> >> > the hooks may be running at the same time, which may cause
> >> > problems, such as, after we have removed the vCPU from the
> >> > blocking list, vmx_vcpu_block() (which has been running before the
> >> > hooks were removed and not finished yet.) adds a vCPU to the same
> >> > blocking list, then this vCPU will remain in the list after the device is
> >> > deattached from the domain. At the same reason, if we change PI desc
> >> > in vmx_pi_hooks_deassign() while it is being used in the hooks, it
> >> > may cause problems.
> >> >
> >>
> >> It's not just about updating PI desc. The race could exist for changing
> >> hooks too:
> >>
> >> void vmx_pi_hooks_deassign(struct domain *d)
> >> 	...
> >> 	d->arch.hvm_domain.vmx.pi_switch_from = NULL;
> >> 	...
> >>
> >> static void vmx_ctxt_switch_from(struct vcpu *v)
> >> 	...
> >> 	if ( v->domain->arch.hvm_domain.vmx.pi_switch_from )
> >>         v->domain->arch.hvm_domain.vmx.pi_switch_from(v);
> >>
> >> If the callback is cleared between above two lines, you'll refer to
> >> a NULL pointer in the 2nd line.
> >>
> >> I thought there should be some protection mechanism in place for
> >> such fundamental race, which might be extended to cover PI desc
> >> too. But looks not the case. Would you mind pointing it out if already
> >> well handled? :-)
> >
> > You are right, that can happen. However, in this case, it will not introduce
> > any bad impacts other than the PI desc is remaining the old value.
> 
> How that, if you agree that what Kevin described can happen? He's
> pointing out a possible NULL deref...

Oh, sorry, I misunderstood Kevin's point. Yes, that should be an issue.

> 
> >> Somehow I'm thinking whether we really need such dynamic
> >> callback changes in a SMP environment. Is it safer to introduce
> >> some per-domain flag to indicate above scenario so each callback
> >> can return early with negligible cost so no need to reset them once
> >> registered?
> >
> > Yes, I agree with you. As you can see, the current issue is kind of because
> > of the dynamically assigning/deassigning the pi hooks, and there are
> > lots of concern cases by using it. I think it worth try what you mentioned
> > above. However, this method was sugguested by Jan before, so I'd like
> > to know what is Jan's option about this.
> 
> Avoiding the indirect calls would certainly be nice. Apart from
> reading the function pointer into a local variable, use of which is
> separated from the read by a barrier() (to address the NULL
> deref concern), did you consider zapping the pointers in an RCU
> callback instead of right when being requested?

Is there any example for the RCU callback in current Xen code,
so I can first have some investigation on it.

Thanks,
Feng

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-23  9:17         ` Wu, Feng
@ 2016-05-23 10:35           ` Wu, Feng
  2016-05-23 11:11             ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-05-23 10:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Wu, Feng
> Sent: Monday, May 23, 2016 5:18 PM
> To: Jan Beulich <JBeulich@suse.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org; Wu, Feng
> <feng.wu@intel.com>
> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> 
> 
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Monday, May 23, 2016 5:08 PM
> > To: Wu, Feng <feng.wu@intel.com>
> > Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> > george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> > devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> > Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> > after domain termination
> >
> > >>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
> > >> From: Tian, Kevin
> > >> Sent: Monday, May 23, 2016 1:19 PM
> > >> > From: Wu, Feng
> > >> > Sent: Friday, May 20, 2016 4:54 PM
> > >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > >> > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> > >> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> > >> >  }
> > >> >
> > >> > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> > >>
> > >> Is it more natural to move such cleanup under vcpu destroy?
> > >
> > > Yes, indeed it is more natural to add this function when vcpu is destroyed,
> > > however, the reason I add it here is I still have some concerns about the
> > > timing.
> > > When the VM is destroyed, here is the calling path:
> > >
> > > - vmx_pi_hooks_deassign() -->
> > > ......
> > > - vmx_vcpu_destroy() -->
> > > ......
> > > - vmx_domain_destroy()
> > > ......
> > >
> > > As I replied in the previous mail, when we remove the vcpus from the
> > > blocking
> > > list, there might be some _in-flight_ call to the hooks, so I put the cleanup
> > > code in the vmx_domain_destroy(), which is a bit more far from
> > > vmx_pi_hooks_deassign,
> > > and hence safer. If you have any other good ideas, I am all ears!:)
> >
> > Well, either there is a possible race (then moving the addition
> > later just reduces the chances, but doesn't eliminate it), or there
> > isn't (in which case Kevin's suggestion should probably be followed).
> 
> Yes, I agree, adding the cleanup code in domain destroy other than
> vcpu destroy point just reduces the risk, but not eliminate. So far I don't
> get a perfect solution to solve this possible race condition.

After more thinking about this, I think this race condition can be resolve
in the following way:
1. Define a per-vCPU flag, say, 'v->arch.hvm_vmx.pi_back_from_hotplug'
2. In vmx_pi_blocking_list_cleanup(), when we find the vCPU from an
blocking list, after removing it, set the flag to 1
3. In vmx_vcpu_block(), add the following check:

     spin_lock_irqsave(pi_blocking_list_lock, flags);
+    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up == 1) )
+    {
+        /*
+         * The vcpu is to be destroyed and it has already been removed
+         * from the per-CPU list if it is blocking, we shouldn't add
+         * new vCPUs to the list.
+         */
+        spin_unlock_irqrestore(pi_blocking_list_lock, flags);
+        return;
+    }
+
     old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
                        pi_blocking_list_lock);

Then we can following Kevin's suggestion to move the addition
to vmx_vcpu_destory().

Any ideas?

Thanks,
Feng


> 
> Thanks,
> Feng
> 
> >
> > Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
  2016-05-23  9:21             ` Wu, Feng
@ 2016-05-23 11:04               ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-05-23 11:04 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 23.05.16 at 11:21, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Monday, May 23, 2016 5:04 PM
>> >>> On 23.05.16 at 09:16, <feng.wu@intel.com> wrote:
>> >> From: Tian, Kevin
>> >> Sent: Monday, May 23, 2016 2:52 PM
>> >> Somehow I'm thinking whether we really need such dynamic
>> >> callback changes in a SMP environment. Is it safer to introduce
>> >> some per-domain flag to indicate above scenario so each callback
>> >> can return early with negligible cost so no need to reset them once
>> >> registered?
>> >
>> > Yes, I agree with you. As you can see, the current issue is kind of because
>> > of the dynamically assigning/deassigning the pi hooks, and there are
>> > lots of concern cases by using it. I think it worth try what you mentioned
>> > above. However, this method was sugguested by Jan before, so I'd like
>> > to know what is Jan's option about this.
>> 
>> Avoiding the indirect calls would certainly be nice. Apart from
>> reading the function pointer into a local variable, use of which is
>> separated from the read by a barrier() (to address the NULL
>> deref concern), did you consider zapping the pointers in an RCU
>> callback instead of right when being requested?
> 
> Is there any example for the RCU callback in current Xen code,
> so I can first have some investigation on it.

Did you try to find any before asking, e.g. by grep-ing for "rcu",
or (more precisely) "call_rcu"?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-23 10:35           ` Wu, Feng
@ 2016-05-23 11:11             ` Jan Beulich
  2016-05-23 12:24               ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-23 11:11 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 23.05.16 at 12:35, <feng.wu@intel.com> wrote:
>> From: Wu, Feng
>> Sent: Monday, May 23, 2016 5:18 PM
>> > From: Jan Beulich [mailto:JBeulich@suse.com]
>> > Sent: Monday, May 23, 2016 5:08 PM
>> > To: Wu, Feng <feng.wu@intel.com>
>> > >>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
>> > > Yes, indeed it is more natural to add this function when vcpu is destroyed,
>> > > however, the reason I add it here is I still have some concerns about the
>> > > timing.
>> > > When the VM is destroyed, here is the calling path:
>> > >
>> > > - vmx_pi_hooks_deassign() -->
>> > > ......
>> > > - vmx_vcpu_destroy() -->
>> > > ......
>> > > - vmx_domain_destroy()
>> > > ......
>> > >
>> > > As I replied in the previous mail, when we remove the vcpus from the
>> > > blocking
>> > > list, there might be some _in-flight_ call to the hooks, so I put the cleanup
>> > > code in the vmx_domain_destroy(), which is a bit more far from
>> > > vmx_pi_hooks_deassign,
>> > > and hence safer. If you have any other good ideas, I am all ears!:)
>> >
>> > Well, either there is a possible race (then moving the addition
>> > later just reduces the chances, but doesn't eliminate it), or there
>> > isn't (in which case Kevin's suggestion should probably be followed).
>> 
>> Yes, I agree, adding the cleanup code in domain destroy other than
>> vcpu destroy point just reduces the risk, but not eliminate. So far I don't
>> get a perfect solution to solve this possible race condition.
> 
> After more thinking about this, I think this race condition can be resolve
> in the following way:
> 1. Define a per-vCPU flag, say, 'v->arch.hvm_vmx.pi_back_from_hotplug'
> 2. In vmx_pi_blocking_list_cleanup(), when we find the vCPU from an
> blocking list, after removing it, set the flag to 1
> 3. In vmx_vcpu_block(), add the following check:
> 
>      spin_lock_irqsave(pi_blocking_list_lock, flags);
> +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up == 1) )
> +    {
> +        /*
> +         * The vcpu is to be destroyed and it has already been removed
> +         * from the per-CPU list if it is blocking, we shouldn't add
> +         * new vCPUs to the list.
> +         */
> +        spin_unlock_irqrestore(pi_blocking_list_lock, flags);
> +        return;
> +    }
> +
>      old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
>                         pi_blocking_list_lock);
> 
> Then we can following Kevin's suggestion to move the addition
> to vmx_vcpu_destory().

Before adding yet another PI-related field, I'd really like to see other
alternatives explored. In particular - can determination be based on
some other state (considering the subject, e.g. per-domain one)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-23 11:11             ` Jan Beulich
@ 2016-05-23 12:24               ` Wu, Feng
  2016-05-23 12:46                 ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-05-23 12:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, May 23, 2016 7:11 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> >>> On 23.05.16 at 12:35, <feng.wu@intel.com> wrote:
> >> From: Wu, Feng
> >> Sent: Monday, May 23, 2016 5:18 PM
> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
> >> > Sent: Monday, May 23, 2016 5:08 PM
> >> > To: Wu, Feng <feng.wu@intel.com>
> >> > >>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
> >> > > Yes, indeed it is more natural to add this function when vcpu is destroyed,
> >> > > however, the reason I add it here is I still have some concerns about the
> >> > > timing.
> >> > > When the VM is destroyed, here is the calling path:
> >> > >
> >> > > - vmx_pi_hooks_deassign() -->
> >> > > ......
> >> > > - vmx_vcpu_destroy() -->
> >> > > ......
> >> > > - vmx_domain_destroy()
> >> > > ......
> >> > >
> >> > > As I replied in the previous mail, when we remove the vcpus from the
> >> > > blocking
> >> > > list, there might be some _in-flight_ call to the hooks, so I put the cleanup
> >> > > code in the vmx_domain_destroy(), which is a bit more far from
> >> > > vmx_pi_hooks_deassign,
> >> > > and hence safer. If you have any other good ideas, I am all ears!:)
> >> >
> >> > Well, either there is a possible race (then moving the addition
> >> > later just reduces the chances, but doesn't eliminate it), or there
> >> > isn't (in which case Kevin's suggestion should probably be followed).
> >>
> >> Yes, I agree, adding the cleanup code in domain destroy other than
> >> vcpu destroy point just reduces the risk, but not eliminate. So far I don't
> >> get a perfect solution to solve this possible race condition.
> >
> > After more thinking about this, I think this race condition can be resolve
> > in the following way:
> > 1. Define a per-vCPU flag, say, 'v->arch.hvm_vmx.pi_back_from_hotplug'
> > 2. In vmx_pi_blocking_list_cleanup(), when we find the vCPU from an
> > blocking list, after removing it, set the flag to 1
> > 3. In vmx_vcpu_block(), add the following check:
> >
> >      spin_lock_irqsave(pi_blocking_list_lock, flags);
> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up == 1) )
> > +    {
> > +        /*
> > +         * The vcpu is to be destroyed and it has already been removed
> > +         * from the per-CPU list if it is blocking, we shouldn't add
> > +         * new vCPUs to the list.
> > +         */
> > +        spin_unlock_irqrestore(pi_blocking_list_lock, flags);
> > +        return;
> > +    }
> > +
> >      old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
> >                         pi_blocking_list_lock);
> >
> > Then we can following Kevin's suggestion to move the addition
> > to vmx_vcpu_destory().
> 
> Before adding yet another PI-related field, I'd really like to see other
> alternatives explored. In particular - can determination be based on
> some other state (considering the subject, e.g. per-domain one)?

I think the point is we need to set some flag inside the
spin_lock_irqsave()/spin_unlock_irqrestore() section in
vmx_pi_blocking_list_cleanup() and check it after acquiring the lock
in vmx_vcpu_block(), so the case condition can be eliminated, right?
If that is the case, I am not sure how we can use other state.

Thanks,
Feng

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor
  2016-05-20  8:53 ` [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor Feng Wu
  2016-05-23  5:15   ` Tian, Kevin
@ 2016-05-23 12:30   ` Jan Beulich
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-05-23 12:30 UTC (permalink / raw)
  To: Feng Wu
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 20.05.16 at 10:53, <feng.wu@intel.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -107,12 +107,22 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
>  static void vmx_vcpu_block(struct vcpu *v)
>  {
>      unsigned long flags;
> -    unsigned int dest;
> +    unsigned int dest = cpu_physical_id(v->processor);
>      spinlock_t *old_lock;
>      spinlock_t *pi_blocking_list_lock =
>  		&per_cpu(vmx_pi_blocking, v->processor).lock;
>      struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
>  
> +    if (v->arch.hvm_vmx.pi_back_from_hotplug == 1)

Coding style (missing blanks). Also the flag is of boolean nature, so
it should be declared bool_t and you should drop the "== 1" here.

> +    {
> +        write_atomic(&pi_desc->ndst,
> +                     x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> +        pi_clear_sn(pi_desc);
> +
> +        v->arch.hvm_vmx.pi_back_from_hotplug = 0;
> +    }
> +
>      spin_lock_irqsave(pi_blocking_list_lock, flags);
>      old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
>                         pi_blocking_list_lock);
> @@ -130,8 +140,6 @@ static void vmx_vcpu_block(struct vcpu *v)
>  
>      ASSERT(!pi_test_sn(pi_desc));
>  
> -    dest = cpu_physical_id(v->processor);
> -
>      ASSERT(pi_desc->ndst ==
>             (x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK)));
>  
> @@ -164,6 +172,16 @@ static void vmx_pi_do_resume(struct vcpu *v)
>      unsigned long flags;
>      spinlock_t *pi_blocking_list_lock;
>      struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +    unsigned int dest = cpu_physical_id(v->processor);
> +
> +    if (v->arch.hvm_vmx.pi_back_from_hotplug == 1)
> +    {
> +        write_atomic(&pi_desc->ndst,
> +                     x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
> +        pi_clear_sn(pi_desc);
> +
> +        v->arch.hvm_vmx.pi_back_from_hotplug = 0;
> +    }

Compared with the adjustment above you don't write ->nv here, as
that happens ...

>      ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));

after this ASSERT() (which suggests that your code addition would
better go here). That raises the question of moving the entire body
of the first hunk's if() into a helper function, which then would also
be used here. That would then also address my question about
ordering: Shouldn't pi_clear_sn() always happen last?

Furthermore, are both of these additions really eliminating the
issue rather than just reducing the likelihood for it to occur? I.e.
what if the new flag gets set immediately after either of the if()-s
above?

> @@ -202,9 +220,14 @@ static void vmx_pi_do_resume(struct vcpu *v)
>  /* This function is called when pcidevs_lock is held */
>  void vmx_pi_hooks_assign(struct domain *d)
>  {
> +    struct vcpu *v;
> +
>      if ( !iommu_intpost || !has_hvm_container_domain(d) )
>          return;
>  
> +    for_each_vcpu ( d, v )
> +        v->arch.hvm_vmx.pi_back_from_hotplug = 1;

Thinking of possible alternatives: What about doing the necessary
adjustments right here, instead of setting the new flag?

Of course it may still be the case that the whole issue means we
indeed shouldn't zap those hooks upon device removal.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-20  8:53 ` [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination Feng Wu
  2016-05-23  5:19   ` Tian, Kevin
@ 2016-05-23 12:30   ` Dario Faggioli
  2016-05-23 13:32     ` Wu, Feng
  2016-05-23 12:35   ` Jan Beulich
  2 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2016-05-23 12:30 UTC (permalink / raw)
  To: Feng Wu, xen-devel
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 2022 bytes --]

On Fri, 2016-05-20 at 16:53 +0800, Feng Wu wrote:
> We need to make sure the bocking vcpu is not in any per-cpu blocking
> list
> when the associated domain is going to be destroyed.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> 
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
>      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
>  }
>  
> +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> +{
> +    unsigned int cpu;
> +
> +    for_each_online_cpu ( cpu )
> +    {
> +        struct vcpu *v;
> +        unsigned long flags;
> +        struct arch_vmx_struct *vmx, *tmp;
> +        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> +        struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking,
> cpu).list;
> +
> +        spin_lock_irqsave(lock, flags);
> +
> +        list_for_each_entry_safe(vmx, tmp, blocked_vcpus,
> pi_blocking.list)
> +        {
> +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +
> +            if (v->domain == d)
> +            {
> +                list_del(&vmx->pi_blocking.list);
> +                ASSERT(vmx->pi_blocking.lock == lock);
> +                vmx->pi_blocking.lock = NULL;
> +            }
> +        }
> +
> +        spin_unlock_irqrestore(lock, flags);
> +    }
>
So, I'm probably missing something very ver basic, but I don't see
what's the reason why we need this loop... can't we arrange for
checking

 list_empty(&v->arch.hvm_vmx.pi_blocking.list)

?

:-O

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/3] VMX: Make hook pi_do_resume always available
  2016-05-20  8:53 ` [PATCH 2/3] VMX: Make hook pi_do_resume always available Feng Wu
@ 2016-05-23 12:32   ` Jan Beulich
  2016-05-23 12:51     ` Dario Faggioli
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-23 12:32 UTC (permalink / raw)
  To: Feng Wu
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 20.05.16 at 10:53, <feng.wu@intel.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -233,7 +233,6 @@ void vmx_pi_hooks_assign(struct domain *d)
>      d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
>      d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from;
>      d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to;
> -    d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
>  }
>  
>  /* This function is called when pcidevs_lock is held */
> @@ -247,13 +246,14 @@ void vmx_pi_hooks_deassign(struct domain *d)
>      d->arch.hvm_domain.vmx.vcpu_block = NULL;
>      d->arch.hvm_domain.vmx.pi_switch_from = NULL;
>      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> -    d->arch.hvm_domain.vmx.pi_do_resume = NULL;
>  }
>  
>  static int vmx_domain_initialise(struct domain *d)
>  {
>      int rc;
>  
> +    d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
> +
>      if ( !has_vlapic(d) )
>          return 0;

Along the lines of what I said last in for 1/3: There's no need to
always install the hook. For your purpose it ought to suffice to
simply not zap it upon device de-assign (which would still leave
all VMs without passed through devices without such useless to
them hook in place).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-20  8:53 ` [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination Feng Wu
  2016-05-23  5:19   ` Tian, Kevin
  2016-05-23 12:30   ` Dario Faggioli
@ 2016-05-23 12:35   ` Jan Beulich
  2016-05-23 13:33     ` Wu, Feng
  2 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-23 12:35 UTC (permalink / raw)
  To: Feng Wu
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 20.05.16 at 10:53, <feng.wu@intel.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
>      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
>  }
>  
> +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> +{
> +    unsigned int cpu;
> +
> +    for_each_online_cpu ( cpu )
> +    {
> +        struct vcpu *v;
> +        unsigned long flags;
> +        struct arch_vmx_struct *vmx, *tmp;
> +        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> +        struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
> +
> +        spin_lock_irqsave(lock, flags);
> +
> +        list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
> +        {

Did you consider how long these two nested loops may take on a
large system?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-23  8:51         ` Jan Beulich
@ 2016-05-23 12:39           ` Dario Faggioli
  2016-05-24 10:07             ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2016-05-23 12:39 UTC (permalink / raw)
  To: Jan Beulich, Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3066 bytes --]

On Mon, 2016-05-23 at 02:51 -0600, Jan Beulich wrote:
> > > > On 23.05.16 at 10:44, <feng.wu@intel.com> wrote:
> > > 
> > > vCPU-s currently having their v->processor set to the pCPU being
> > > hot removed would simply get migrated elsewhere. If that's not
> > > accompanied by respective PI blocking list adjustments, then I
> > > can
> > > only refer you back to the original series' review process, where
> > > it was (iirc) more than once that it was pointed out that getting
> > > out
> > > of sync v->processor and the PI list a vCPU sits on may casue
> > > issues. And if there is an issue here, I'm not sure what a good
> > > solution would be.
> > Okay, here is my understanding about the blocking and vCPU
> > migration things, if vCPU is blocking and its v->processor is pCPU
> > 0,
> > at some point, pCPU0 is removed from the system, it will also
> > migrate the vCPU to another pCPU, say, pCPU1 as the response
> > to pCPU0 being unplugged, hence, v->processor = pCPU1, right?
>
> Yes.
> 
See, for instance, cpu_disable_scheduler() in schedule.c. What we do is
go over all the vcpus of all domains of either the system or the
cpupool, and force the ones that we found with v->processor set to the
pCPU that is going down, to perform migration (system_state will be
different than SYS_STATE_suspend, and we hence take the 'else' branch).

Considering that the pCPU is no longer part of the relevant bitmask-s
during the migration, the vCPUs will figure out that they just can't
stay there, and move somewhere else.

Note, however, that this happens for running and runnable vCPUs. If a
vCPU is blocker, there is nothing to do, and in fact, nothing happens
(as vcpu_sleep_nosync() and vcpu_wake() are NOP in that case). For
those vCPUs, as soon as they wake up, they'll figure out that their own
v->processor is not there any longer, and will move somewhere else.

> > If that is the case, I think the current code may have issues in
> > the
> > above case, since 'NDST' filed in PI descriptor is still vCPU0 even
> > it is removed from the system. I need to consider how to handle
> > this case. 
>
Exactly. I don't think you can do that, for instance, from within
cpu_disable_scheduler(), or anythign that gets called from there, as it
basically deals with running vCPUs, while you're interested in blocked
ones.

I wonder whether you can register your own notifier, and, inside it,
scan the blocked list and fixup things... (just the first idea that
came up in my mind)

> > But this is not an issue  in non pCPU hotplug scenario.
> > 
It's probably an issue even if you remove a cpu from a cpupool (and
even a more "interesting" one, if you also manage to add it to another
pool, in the meantime) isn't it?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-23 12:24               ` Wu, Feng
@ 2016-05-23 12:46                 ` Jan Beulich
  2016-05-23 13:41                   ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-05-23 12:46 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 23.05.16 at 14:24, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Monday, May 23, 2016 7:11 PM
>> To: Wu, Feng <feng.wu@intel.com>
>> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
>> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
>> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org 
>> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
>> after domain termination
>> 
>> >>> On 23.05.16 at 12:35, <feng.wu@intel.com> wrote:
>> >> From: Wu, Feng
>> >> Sent: Monday, May 23, 2016 5:18 PM
>> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> > Sent: Monday, May 23, 2016 5:08 PM
>> >> > To: Wu, Feng <feng.wu@intel.com>
>> >> > >>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
>> >> > > Yes, indeed it is more natural to add this function when vcpu is 
> destroyed,
>> >> > > however, the reason I add it here is I still have some concerns about the
>> >> > > timing.
>> >> > > When the VM is destroyed, here is the calling path:
>> >> > >
>> >> > > - vmx_pi_hooks_deassign() -->
>> >> > > ......
>> >> > > - vmx_vcpu_destroy() -->
>> >> > > ......
>> >> > > - vmx_domain_destroy()
>> >> > > ......
>> >> > >
>> >> > > As I replied in the previous mail, when we remove the vcpus from the
>> >> > > blocking
>> >> > > list, there might be some _in-flight_ call to the hooks, so I put the 
> cleanup
>> >> > > code in the vmx_domain_destroy(), which is a bit more far from
>> >> > > vmx_pi_hooks_deassign,
>> >> > > and hence safer. If you have any other good ideas, I am all ears!:)
>> >> >
>> >> > Well, either there is a possible race (then moving the addition
>> >> > later just reduces the chances, but doesn't eliminate it), or there
>> >> > isn't (in which case Kevin's suggestion should probably be followed).
>> >>
>> >> Yes, I agree, adding the cleanup code in domain destroy other than
>> >> vcpu destroy point just reduces the risk, but not eliminate. So far I don't
>> >> get a perfect solution to solve this possible race condition.
>> >
>> > After more thinking about this, I think this race condition can be resolve
>> > in the following way:
>> > 1. Define a per-vCPU flag, say, 'v->arch.hvm_vmx.pi_back_from_hotplug'
>> > 2. In vmx_pi_blocking_list_cleanup(), when we find the vCPU from an
>> > blocking list, after removing it, set the flag to 1
>> > 3. In vmx_vcpu_block(), add the following check:
>> >
>> >      spin_lock_irqsave(pi_blocking_list_lock, flags);
>> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up == 1) )
>> > +    {
>> > +        /*
>> > +         * The vcpu is to be destroyed and it has already been removed
>> > +         * from the per-CPU list if it is blocking, we shouldn't add
>> > +         * new vCPUs to the list.
>> > +         */
>> > +        spin_unlock_irqrestore(pi_blocking_list_lock, flags);
>> > +        return;
>> > +    }
>> > +
>> >      old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
>> >                         pi_blocking_list_lock);
>> >
>> > Then we can following Kevin's suggestion to move the addition
>> > to vmx_vcpu_destory().
>> 
>> Before adding yet another PI-related field, I'd really like to see other
>> alternatives explored. In particular - can determination be based on
>> some other state (considering the subject, e.g. per-domain one)?
> 
> I think the point is we need to set some flag inside the
> spin_lock_irqsave()/spin_unlock_irqrestore() section in
> vmx_pi_blocking_list_cleanup() and check it after acquiring the lock
> in vmx_vcpu_block(), so the case condition can be eliminated, right?
> If that is the case, I am not sure how we can use other state.

Since you only need this during domain shutdown, I'm not sure. For
example, can't you simply use d->is_dying or d->is_shutting_down?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/3] VMX: Make hook pi_do_resume always available
  2016-05-23 12:32   ` Jan Beulich
@ 2016-05-23 12:51     ` Dario Faggioli
  0 siblings, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2016-05-23 12:51 UTC (permalink / raw)
  To: Jan Beulich, Feng Wu
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1272 bytes --]

On Mon, 2016-05-23 at 06:32 -0600, Jan Beulich wrote:
> > > > On 20.05.16 at 10:53, <feng.wu@intel.com> wrote:
> > @@ -247,13 +246,14 @@ void vmx_pi_hooks_deassign(struct domain *d)
> >      d->arch.hvm_domain.vmx.vcpu_block = NULL;
> >      d->arch.hvm_domain.vmx.pi_switch_from = NULL;
> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> > -    d->arch.hvm_domain.vmx.pi_do_resume = NULL;
> >  }
> >  
> >  static int vmx_domain_initialise(struct domain *d)
> >  {
> >      int rc;
> >  
> > +    d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
> > +
> >      if ( !has_vlapic(d) )
> >          return 0;
> Along the lines of what I said last in for 1/3: There's no need to
> always install the hook. For your purpose it ought to suffice to
> simply not zap it upon device de-assign (which would still leave
> all VMs without passed through devices without such useless to
> them hook in place).
> 
FWIW, +1 to this.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-23 12:30   ` Dario Faggioli
@ 2016-05-23 13:32     ` Wu, Feng
  2016-05-23 14:45       ` Dario Faggioli
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-05-23 13:32 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, jbeulich, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Monday, May 23, 2016 8:31 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: keir@xen.org; Tian, Kevin <kevin.tian@intel.com>; jbeulich@suse.com;
> andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com;
> konrad.wilk@oracle.com
> Subject: Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> On Fri, 2016-05-20 at 16:53 +0800, Feng Wu wrote:
> > We need to make sure the bocking vcpu is not in any per-cpu blocking
> > list
> > when the associated domain is going to be destroyed.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> >  }
> >
> > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> > +{
> > +    unsigned int cpu;
> > +
> > +    for_each_online_cpu ( cpu )
> > +    {
> > +        struct vcpu *v;
> > +        unsigned long flags;
> > +        struct arch_vmx_struct *vmx, *tmp;
> > +        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> > +        struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking,
> > cpu).list;
> > +
> > +        spin_lock_irqsave(lock, flags);
> > +
> > +        list_for_each_entry_safe(vmx, tmp, blocked_vcpus,
> > pi_blocking.list)
> > +        {
> > +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> > +
> > +            if (v->domain == d)
> > +            {
> > +                list_del(&vmx->pi_blocking.list);
> > +                ASSERT(vmx->pi_blocking.lock == lock);
> > +                vmx->pi_blocking.lock = NULL;
> > +            }
> > +        }
> > +
> > +        spin_unlock_irqrestore(lock, flags);
> > +    }
> >
> So, I'm probably missing something very ver basic, but I don't see
> what's the reason why we need this loop... can't we arrange for
> checking
> 
>  list_empty(&v->arch.hvm_vmx.pi_blocking.list)

Yes, I also cannot find the reason why can't we use this good
suggestion, Except we need use list_del_init() instead of
list_del() in the current code. Or we can just check whether
' vmx->pi_blocking.lock ' is NULL? I total don't know why I
missed it! :)

Thanks,
Feng

> 
> ?
> 
> :-O
> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-23 12:35   ` Jan Beulich
@ 2016-05-23 13:33     ` Wu, Feng
  0 siblings, 0 replies; 49+ messages in thread
From: Wu, Feng @ 2016-05-23 13:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, May 23, 2016 8:36 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> >>> On 20.05.16 at 10:53, <feng.wu@intel.com> wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> >  }
> >
> > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> > +{
> > +    unsigned int cpu;
> > +
> > +    for_each_online_cpu ( cpu )
> > +    {
> > +        struct vcpu *v;
> > +        unsigned long flags;
> > +        struct arch_vmx_struct *vmx, *tmp;
> > +        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> > +        struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
> > +
> > +        spin_lock_irqsave(lock, flags);
> > +
> > +        list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
> > +        {
> 
> Did you consider how long these two nested loops may take on a
> large system?
> 

As Dario just mentioned, we may not need this loop at all.

Thanks,
Feng

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-23 12:46                 ` Jan Beulich
@ 2016-05-23 13:41                   ` Wu, Feng
  0 siblings, 0 replies; 49+ messages in thread
From: Wu, Feng @ 2016-05-23 13:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, May 23, 2016 8:47 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list
> after domain termination
> 
> >>> On 23.05.16 at 14:24, <feng.wu@intel.com> wrote:
> 
> >
> >> -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Monday, May 23, 2016 7:11 PM
> >> To: Wu, Feng <feng.wu@intel.com>
> >> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> >> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> >> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> >> Subject: RE: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking
> list
> >> after domain termination
> >>
> >> >>> On 23.05.16 at 12:35, <feng.wu@intel.com> wrote:
> >> >> From: Wu, Feng
> >> >> Sent: Monday, May 23, 2016 5:18 PM
> >> >> > From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> > Sent: Monday, May 23, 2016 5:08 PM
> >> >> > To: Wu, Feng <feng.wu@intel.com>
> >> >> > >>> On 23.05.16 at 07:48, <feng.wu@intel.com> wrote:
> >> >> > > Yes, indeed it is more natural to add this function when vcpu is
> > destroyed,
> >> >> > > however, the reason I add it here is I still have some concerns about
> the
> >> >> > > timing.
> >> >> > > When the VM is destroyed, here is the calling path:
> >> >> > >
> >> >> > > - vmx_pi_hooks_deassign() -->
> >> >> > > ......
> >> >> > > - vmx_vcpu_destroy() -->
> >> >> > > ......
> >> >> > > - vmx_domain_destroy()
> >> >> > > ......
> >> >> > >
> >> >> > > As I replied in the previous mail, when we remove the vcpus from the
> >> >> > > blocking
> >> >> > > list, there might be some _in-flight_ call to the hooks, so I put the
> > cleanup
> >> >> > > code in the vmx_domain_destroy(), which is a bit more far from
> >> >> > > vmx_pi_hooks_deassign,
> >> >> > > and hence safer. If you have any other good ideas, I am all ears!:)
> >> >> >
> >> >> > Well, either there is a possible race (then moving the addition
> >> >> > later just reduces the chances, but doesn't eliminate it), or there
> >> >> > isn't (in which case Kevin's suggestion should probably be followed).
> >> >>
> >> >> Yes, I agree, adding the cleanup code in domain destroy other than
> >> >> vcpu destroy point just reduces the risk, but not eliminate. So far I don't
> >> >> get a perfect solution to solve this possible race condition.
> >> >
> >> > After more thinking about this, I think this race condition can be resolve
> >> > in the following way:
> >> > 1. Define a per-vCPU flag, say, 'v->arch.hvm_vmx.pi_back_from_hotplug'
> >> > 2. In vmx_pi_blocking_list_cleanup(), when we find the vCPU from an
> >> > blocking list, after removing it, set the flag to 1
> >> > 3. In vmx_vcpu_block(), add the following check:
> >> >
> >> >      spin_lock_irqsave(pi_blocking_list_lock, flags);
> >> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up == 1) )
> >> > +    {
> >> > +        /*
> >> > +         * The vcpu is to be destroyed and it has already been removed
> >> > +         * from the per-CPU list if it is blocking, we shouldn't add
> >> > +         * new vCPUs to the list.
> >> > +         */
> >> > +        spin_unlock_irqrestore(pi_blocking_list_lock, flags);
> >> > +        return;
> >> > +    }
> >> > +
> >> >      old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
> >> >                         pi_blocking_list_lock);
> >> >
> >> > Then we can following Kevin's suggestion to move the addition
> >> > to vmx_vcpu_destory().
> >>
> >> Before adding yet another PI-related field, I'd really like to see other
> >> alternatives explored. In particular - can determination be based on
> >> some other state (considering the subject, e.g. per-domain one)?
> >
> > I think the point is we need to set some flag inside the
> > spin_lock_irqsave()/spin_unlock_irqrestore() section in
> > vmx_pi_blocking_list_cleanup() and check it after acquiring the lock
> > in vmx_vcpu_block(), so the case condition can be eliminated, right?
> > If that is the case, I am not sure how we can use other state.
> 
> Since you only need this during domain shutdown, I'm not sure. For
> example, can't you simply use d->is_dying or d->is_shutting_down?

I am not sure those flags can be used for this case, since I think we
should set the flags inside the spin lock area as mentioned above
Anyway, I will think it more about this.

Thanks,
Feng

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination
  2016-05-23 13:32     ` Wu, Feng
@ 2016-05-23 14:45       ` Dario Faggioli
  0 siblings, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2016-05-23 14:45 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 2552 bytes --]

On Mon, 2016-05-23 at 13:32 +0000, Wu, Feng wrote:
> 
> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > @@ -248,6 +248,36 @@ void vmx_pi_hooks_deassign(struct domain *d)
> > >      d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> > >  }
> > > 
> > > +static void vmx_pi_blocking_list_cleanup(struct domain *d)
> > > +{
> > > +    unsigned int cpu;
> > > +
> > > +    for_each_online_cpu ( cpu )
> > > +    {
> > > +        struct vcpu *v;
> > > +        unsigned long flags;
> > > +        struct arch_vmx_struct *vmx, *tmp;
> > > +        spinlock_t *lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> > > +        struct list_head *blocked_vcpus =
> > > &per_cpu(vmx_pi_blocking,
> > > cpu).list;
> > > +
> > > +        spin_lock_irqsave(lock, flags);
> > > +
> > > +        list_for_each_entry_safe(vmx, tmp, blocked_vcpus,
> > > pi_blocking.list)
> > > +        {
> > > +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> > > +
> > > +            if (v->domain == d)
> > > +            {
> > > +                list_del(&vmx->pi_blocking.list);
> > > +                ASSERT(vmx->pi_blocking.lock == lock);
> > > +                vmx->pi_blocking.lock = NULL;
> > > +            }
> > > +        }
> > > +
> > > +        spin_unlock_irqrestore(lock, flags);
> > > +    }
> > > 
> > So, I'm probably missing something very ver basic, but I don't see
> > what's the reason why we need this loop... can't we arrange for
> > checking
> > 
> >  list_empty(&v->arch.hvm_vmx.pi_blocking.list)
> Yes, I also cannot find the reason why can't we use this good
> suggestion, Except we need use list_del_init() instead of
> list_del() in the current code. 
>
Yes, I saw that, and it's well worth doing that, to get rid of the
loop. :-)

> Or we can just check whether
> ' vmx->pi_blocking.lock ' is NULL? 
>
I guess that will work as well. Still, if it were me doing this, I'd go
for the list_del_init()/list_empty() approach.

> I total don't know why I
> missed it! :)
> 
:-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-23 12:39           ` Dario Faggioli
@ 2016-05-24 10:07             ` Wu, Feng
  2016-05-24 13:33               ` Wu, Feng
  2016-05-24 14:02               ` Dario Faggioli
  0 siblings, 2 replies; 49+ messages in thread
From: Wu, Feng @ 2016-05-24 10:07 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, xen-devel, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Monday, May 23, 2016 8:39 PM
> To: Jan Beulich <JBeulich@suse.com>; Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com; Tian, Kevin
> <kevin.tian@intel.com>; xen-devel@lists.xen.org; konrad.wilk@oracle.com;
> keir@xen.org
> Subject: Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu
> blocking list
> 
> On Mon, 2016-05-23 at 02:51 -0600, Jan Beulich wrote:
> > > > > On 23.05.16 at 10:44, <feng.wu@intel.com> wrote:
> > > >
> > > > vCPU-s currently having their v->processor set to the pCPU being
> > > > hot removed would simply get migrated elsewhere. If that's not
> > > > accompanied by respective PI blocking list adjustments, then I
> > > > can
> > > > only refer you back to the original series' review process, where
> > > > it was (iirc) more than once that it was pointed out that getting
> > > > out
> > > > of sync v->processor and the PI list a vCPU sits on may casue
> > > > issues. And if there is an issue here, I'm not sure what a good
> > > > solution would be.
> > > Okay, here is my understanding about the blocking and vCPU
> > > migration things, if vCPU is blocking and its v->processor is pCPU
> > > 0,
> > > at some point, pCPU0 is removed from the system, it will also
> > > migrate the vCPU to another pCPU, say, pCPU1 as the response
> > > to pCPU0 being unplugged, hence, v->processor = pCPU1, right?
> >
> > Yes.
> >
> See, for instance, cpu_disable_scheduler() in schedule.c. What we do is
> go over all the vcpus of all domains of either the system or the
> cpupool, and force the ones that we found with v->processor set to the
> pCPU that is going down, to perform migration (system_state will be
> different than SYS_STATE_suspend, and we hence take the 'else' branch).
> 
> Considering that the pCPU is no longer part of the relevant bitmask-s
> during the migration, the vCPUs will figure out that they just can't
> stay there, and move somewhere else.

Thanks a lot for the elaboration, it is really helpful.

> 
> Note, however, that this happens for running and runnable vCPUs. 

I don't quite understand this, do you mean cpu_disable_scheduler()
only handle running and runnable vCPUs, I tried to find some hints
from the code, but I didn't get it. Could you please give some more
information about this?

> If a
> vCPU is blocker, there is nothing to do, and in fact, nothing happens
> (as vcpu_sleep_nosync() and vcpu_wake() are NOP in that case). 

What do you mean by saying ' vcpu_sleep_nosync() and vcpu_wake()
are NOP '?

> For
> those vCPUs, as soon as they wake up, they'll figure out that their own
> v->processor is not there any longer, and will move somewhere else.

So basically, when vCPU is blocking, it has no impact to the blocking vcpu
when 'v->processor' is removed. When the vCPU is waken up, it will find
another pCPU to run, since the original 'v->processor' is down and no
longer in the cpu bitmask, right?

> 
> > > If that is the case, I think the current code may have issues in
> > > the
> > > above case, since 'NDST' filed in PI descriptor is still vCPU0 even
> > > it is removed from the system. I need to consider how to handle
> > > this case.
> >
> Exactly. I don't think you can do that, for instance, from within
> cpu_disable_scheduler(), or anythign that gets called from there, as it
> basically deals with running vCPUs, while you're interested in blocked
> ones.
> 
> I wonder whether you can register your own notifier, and, inside it,
> scan the blocked list and fixup things... (just the first idea that
> came up in my mind)
> 
> > > But this is not an issue  in non pCPU hotplug scenario.
> > >
> It's probably an issue even if you remove a cpu from a cpupool (and
> even a more "interesting" one, if you also manage to add it to another
> pool, in the meantime) isn't it?

Yes, things become more complex in that case ....

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-24 10:07             ` Wu, Feng
@ 2016-05-24 13:33               ` Wu, Feng
  2016-05-24 14:46                 ` Dario Faggioli
  2016-05-24 14:02               ` Dario Faggioli
  1 sibling, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-05-24 13:33 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, xen-devel, Wu, Feng



> -----Original Message-----
> From: Wu, Feng
> Sent: Tuesday, May 24, 2016 6:08 PM
> To: Dario Faggioli <dario.faggioli@citrix.com>; Jan Beulich <JBeulich@suse.com>
> Cc: andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com; Tian, Kevin
> <kevin.tian@intel.com>; xen-devel@lists.xen.org; konrad.wilk@oracle.com;
> keir@xen.org; Wu, Feng <feng.wu@intel.com>
> Subject: RE: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu
> blocking list
> 
> 
> 
> > -----Original Message-----
> > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > Sent: Monday, May 23, 2016 8:39 PM
> > To: Jan Beulich <JBeulich@suse.com>; Wu, Feng <feng.wu@intel.com>
> > Cc: andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com; Tian, Kevin
> > <kevin.tian@intel.com>; xen-devel@lists.xen.org; konrad.wilk@oracle.com;
> > keir@xen.org
> > Subject: Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu
> > blocking list
> >
> > On Mon, 2016-05-23 at 02:51 -0600, Jan Beulich wrote:
> > > > > > On 23.05.16 at 10:44, <feng.wu@intel.com> wrote:
> > > > >
> > > > > vCPU-s currently having their v->processor set to the pCPU being
> > > > > hot removed would simply get migrated elsewhere. If that's not
> > > > > accompanied by respective PI blocking list adjustments, then I
> > > > > can
> > > > > only refer you back to the original series' review process, where
> > > > > it was (iirc) more than once that it was pointed out that getting
> > > > > out
> > > > > of sync v->processor and the PI list a vCPU sits on may casue
> > > > > issues. And if there is an issue here, I'm not sure what a good
> > > > > solution would be.
> > > > Okay, here is my understanding about the blocking and vCPU
> > > > migration things, if vCPU is blocking and its v->processor is pCPU
> > > > 0,
> > > > at some point, pCPU0 is removed from the system, it will also
> > > > migrate the vCPU to another pCPU, say, pCPU1 as the response
> > > > to pCPU0 being unplugged, hence, v->processor = pCPU1, right?
> > >
> > > Yes.
> > >
> > See, for instance, cpu_disable_scheduler() in schedule.c. What we do is
> > go over all the vcpus of all domains of either the system or the
> > cpupool, and force the ones that we found with v->processor set to the
> > pCPU that is going down, to perform migration (system_state will be
> > different than SYS_STATE_suspend, and we hence take the 'else' branch).
> >
> > Considering that the pCPU is no longer part of the relevant bitmask-s
> > during the migration, the vCPUs will figure out that they just can't
> > stay there, and move somewhere else.
> 
> Thanks a lot for the elaboration, it is really helpful.
> 
> >
> > Note, however, that this happens for running and runnable vCPUs.
> 
> I don't quite understand this, do you mean cpu_disable_scheduler()
> only handle running and runnable vCPUs, I tried to find some hints
> from the code, but I didn't get it. Could you please give some more
> information about this?
> 
> > If a
> > vCPU is blocker, there is nothing to do, and in fact, nothing happens
> > (as vcpu_sleep_nosync() and vcpu_wake() are NOP in that case).
> 
> What do you mean by saying ' vcpu_sleep_nosync() and vcpu_wake()
> are NOP '?

I think I understand what you meant above now.

Do you think the following idea makes sense?

When a pCPU is unplugged, we can just remove the vcpus on the
associated per-cpu blocking list, then we can choose another online
cpu, set the right NDST value, and put the vCPU the new per-cpu list?
Meanwhile, we may need some special treatment to the case that
vmx_vcpu_block() may run concurrently with cpu hotplug.

Thanks,
Feng

> 
> > For
> > those vCPUs, as soon as they wake up, they'll figure out that their own
> > v->processor is not there any longer, and will move somewhere else.
> 
> So basically, when vCPU is blocking, it has no impact to the blocking vcpu
> when 'v->processor' is removed. When the vCPU is waken up, it will find
> another pCPU to run, since the original 'v->processor' is down and no
> longer in the cpu bitmask, right?
> 
> >
> > > > If that is the case, I think the current code may have issues in
> > > > the
> > > > above case, since 'NDST' filed in PI descriptor is still vCPU0 even
> > > > it is removed from the system. I need to consider how to handle
> > > > this case.
> > >
> > Exactly. I don't think you can do that, for instance, from within
> > cpu_disable_scheduler(), or anythign that gets called from there, as it
> > basically deals with running vCPUs, while you're interested in blocked
> > ones.
> >
> > I wonder whether you can register your own notifier, and, inside it,
> > scan the blocked list and fixup things... (just the first idea that
> > came up in my mind)
> >
> > > > But this is not an issue  in non pCPU hotplug scenario.
> > > >
> > It's probably an issue even if you remove a cpu from a cpupool (and
> > even a more "interesting" one, if you also manage to add it to another
> > pool, in the meantime) isn't it?
> 
> Yes, things become more complex in that case ....
> 
> Thanks,
> Feng
> 
> >
> > Regards,
> > Dario
> > --
> > <<This happens because I choose it to happen!>> (Raistlin Majere)
> > -----------------------------------------------------------------
> > Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-24 10:07             ` Wu, Feng
  2016-05-24 13:33               ` Wu, Feng
@ 2016-05-24 14:02               ` Dario Faggioli
  2016-05-25 12:39                 ` Wu, Feng
  2016-06-23 12:33                 ` Wu, Feng
  1 sibling, 2 replies; 49+ messages in thread
From: Dario Faggioli @ 2016-05-24 14:02 UTC (permalink / raw)
  To: Wu, Feng, Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5746 bytes --]

On Tue, 2016-05-24 at 10:07 +0000, Wu, Feng wrote:
> > See, for instance, cpu_disable_scheduler() in schedule.c. What we
> > do is
> > go over all the vcpus of all domains of either the system or the
> > cpupool, and force the ones that we found with v->processor set to
> > the
> > pCPU that is going down, to perform migration (system_state will be
> > different than SYS_STATE_suspend, and we hence take the 'else'
> > branch).
> > 
> > Considering that the pCPU is no longer part of the relevant
> > bitmask-s
> > during the migration, the vCPUs will figure out that they just
> > can't
> > stay there, and move somewhere else.
>
> Thanks a lot for the elaboration, it is really helpful.
> 
NP :-)

> > Note, however, that this happens for running and runnable vCPUs. 
>
> I don't quite understand this, do you mean cpu_disable_scheduler()
> only handle running and runnable vCPUs, I tried to find some hints
> from the code, but I didn't get it. Could you please give some more
> information about this?
> 
It goes through all the vcpus of all domains, and does not check or
care whether they are running, runnable or blocked.

Let's look at this in some more details. So, let's assume that
processor 5 is going away, and that you have the following vcpus
around:

 d0v0 : v->processor = 5, running on cpu 5
 d0v1 : v->processor = 4, running on cpu 4
 d1v0 : v->processor = 5, runnable but not running
 d2v3 : v->processor = 5, blocked

for d0v0, we do:
  cpu_disable_scheduler(5)
    set_bit(_VPF_migrating, d0v0->pause_flags);
    vcpu_sleep_nosync(d0v0);
      SCHED_OP(sleep, d0v0);
        csched_vcpu_sleep(d0v0)
          cpu_raise_softirq(5, SCHEDULE_SOFTIRQ);
    vcpu_migrate(d0v0);
      if ( v->is_running || ...) // assume v->is_running is true
        return
    ...
    ... <--- scheduling occurs on processor 5
    ...
    context_saved(d0v0)
      vcpu_migrate(d0v0);
          //is_running is 0, so _VPF_migrating gets cleared
        vcpu_move_locked(d0v0, new_cpu);
        vcpu_wake(d0v0);
          SCHED_OP(wake, d0v0);
            csched_vcpu_wake(d0v0)
              __runq_insert(d0v0);
              __runq_tickle(d0v0);

for d0v1, we do:
  cpu_disable_scheduler(5)
    if ( d0v1->processor != 5 )
      continue

for d1v0, we do:
  cpu_disable_scheduler(5)
    set_bit(_VPF_migrating, d1v0->pause_flags);
    vcpu_sleep_nosync(d1v0);
      SCHED_OP(sleep, d1v0);
        csched_vcpu_sleep(d1v0)
          __runq_remove(d1v0);
    vcpu_migrate(d1v0);
      if ( d1v0->is_running ||
           !test_and_clear_bit(_VPF_migrating, d1v0->pause_flags)
          // false, but clears the _VPF_migrating flag
      vcpu_move_locked(d1v0, new_cpu);
      vcpu_wake(v);
        SCHED_OP(wake, d1v0);
          csched_vcpu_wake(d1v0)
            __runq_insert(d1v0);
            __runq_tickle(d1v0);

for d2v3, we do:
  cpu_disable_scheduler(5)
    set_bit(_VPF_migrating, d2v3-
>pause_flags);
    vcpu_sleep_nosync(d2v3);
      SCHED_OP(sleep, d2v3);
 
      csched_vcpu_sleep(d2v3)
[1]       // Nothing! 
   
vcpu_migrate(d2v3);
      if ( d2v3->is_running ||
         
 !test_and_clear_bit(_VPF_migrating, d2v3->pause_flags)
          //
false, but clears the _VPF_migrating flag
[*]   vcpu_move_locked(d2v3,
new_cpu);
      vcpu_wake(d2v3);
[2]     // Nothing!

> > If a
> > vCPU is blocker, there is nothing to do, and in fact, nothing
> > happens
> > (as vcpu_sleep_nosync() and vcpu_wake() are NOP in that case). 
>
> What do you mean by saying ' vcpu_sleep_nosync() and vcpu_wake()
> are NOP '?
> 
See [1] and [2] above.

> > For
> > those vCPUs, as soon as they wake up, they'll figure out that their
> > own
> > v->processor is not there any longer, and will move somewhere else.
>
> So basically, when vCPU is blocking, it has no impact to the blocking
> vcpu
> when 'v->processor' is removed. When the vCPU is waken up, it will
> find
> another pCPU to run, since the original 'v->processor' is down and no
> longer in the cpu bitmask, right?
> 
Yes, that was my point.

_However_, as you can see at [*] above, it must be noted that even
those vcpus that blocked while running on a certain processor (5 in the
example), indeed have a chance to have their
v->processor changed to something that is still online (something
different than 5), as a consequence of that processor going away.

Whether this is useful/enough for you, I can't tell right now, out of
the top of my head.

> > > > But this is not an issue  in non pCPU hotplug scenario.
> > > > 
> > It's probably an issue even if you remove a cpu from a cpupool (and
> > even a more "interesting" one, if you also manage to add it to
> > another
> > pool, in the meantime) isn't it?
>
> Yes, things become more complex in that case ....
> 
Well, but can you confirm that we also have an issue there, and test
and report what happens if you move a cpu from pool A to pool B, while
it still has vcpus from a domain that stays in pool A.

If there's transient suboptimal behavior, well, we probably can live
with that (depending on the specific characteristics of the transitory,
I'd say). If things crash, we certainly want a fix!

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-24 13:33               ` Wu, Feng
@ 2016-05-24 14:46                 ` Dario Faggioli
  2016-05-25 13:28                   ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2016-05-24 14:46 UTC (permalink / raw)
  To: Wu, Feng, Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1500 bytes --]

On Tue, 2016-05-24 at 13:33 +0000, Wu, Feng wrote:
> > From: Wu, Feng
> > > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > > 
> > > If a
> > > vCPU is blocker, there is nothing to do, and in fact, nothing
> > > happens
> > > (as vcpu_sleep_nosync() and vcpu_wake() are NOP in that case).
> > What do you mean by saying ' vcpu_sleep_nosync() and vcpu_wake()
> > are NOP '?
>
> I think I understand what you meant above now.
> 
Right. In any case, see the email I've just sent, with a detailed
breakdown of the situation and of what actually happens.

> Do you think the following idea makes sense?
> 
> When a pCPU is unplugged, we can just remove the vcpus on the
> associated per-cpu blocking list, then we can choose another online
> cpu, set the right NDST value, and put the vCPU the new per-cpu list?
>
Well, this does make sense to me, but the point is how you do that. I
mean, how do you get to execute some PI adjustment code, from cpu-
teardown code?

Right now, for what seemed to be necessary until now, we have the
arch_vcpu_block(). Do we need more? If yes where?

From looking only at schedule.c, we already have arch_move_irqs(), can
we take advantage of it?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-24 14:02               ` Dario Faggioli
@ 2016-05-25 12:39                 ` Wu, Feng
  2016-06-23 12:33                 ` Wu, Feng
  1 sibling, 0 replies; 49+ messages in thread
From: Wu, Feng @ 2016-05-25 12:39 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, xen-devel, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Tuesday, May 24, 2016 10:02 PM
> To: Wu, Feng <feng.wu@intel.com>; Jan Beulich <JBeulich@suse.com>
> Cc: andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com; Tian, Kevin
> <kevin.tian@intel.com>; xen-devel@lists.xen.org; konrad.wilk@oracle.com;
> keir@xen.org
> Subject: Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu
> blocking list
> 
> On Tue, 2016-05-24 at 10:07 +0000, Wu, Feng wrote:
> > > See, for instance, cpu_disable_scheduler() in schedule.c. What we
> > > do is
> > > go over all the vcpus of all domains of either the system or the
> > > cpupool, and force the ones that we found with v->processor set to
> > > the
> > > pCPU that is going down, to perform migration (system_state will be
> > > different than SYS_STATE_suspend, and we hence take the 'else'
> > > branch).
> > >
> > > Considering that the pCPU is no longer part of the relevant
> > > bitmask-s
> > > during the migration, the vCPUs will figure out that they just
> > > can't
> > > stay there, and move somewhere else.
> >
> > Thanks a lot for the elaboration, it is really helpful.
> >
> NP :-)
> 
> > > Note, however, that this happens for running and runnable vCPUs.
> >
> > I don't quite understand this, do you mean cpu_disable_scheduler()
> > only handle running and runnable vCPUs, I tried to find some hints
> > from the code, but I didn't get it. Could you please give some more
> > information about this?
> >
> It goes through all the vcpus of all domains, and does not check or
> care whether they are running, runnable or blocked.
> 
> Let's look at this in some more details. So, let's assume that
> processor 5 is going away, and that you have the following vcpus
> around:
> 
>  d0v0 : v->processor = 5, running on cpu 5
>  d0v1 : v->processor = 4, running on cpu 4
>  d1v0 : v->processor = 5, runnable but not running
>  d2v3 : v->processor = 5, blocked
> 
> for d0v0, we do:
>   cpu_disable_scheduler(5)
>     set_bit(_VPF_migrating, d0v0->pause_flags);
>     vcpu_sleep_nosync(d0v0);
>       SCHED_OP(sleep, d0v0);
>         csched_vcpu_sleep(d0v0)
>           cpu_raise_softirq(5, SCHEDULE_SOFTIRQ);
>     vcpu_migrate(d0v0);
>       if ( v->is_running || ...) // assume v->is_running is true
>         return
>     ...
>     ... <--- scheduling occurs on processor 5
>     ...
>     context_saved(d0v0)
>       vcpu_migrate(d0v0);
>           //is_running is 0, so _VPF_migrating gets cleared
>         vcpu_move_locked(d0v0, new_cpu);
>         vcpu_wake(d0v0);
>           SCHED_OP(wake, d0v0);
>             csched_vcpu_wake(d0v0)
>               __runq_insert(d0v0);
>               __runq_tickle(d0v0);
> 
> for d0v1, we do:
>   cpu_disable_scheduler(5)
>     if ( d0v1->processor != 5 )
>       continue
> 
> for d1v0, we do:
>   cpu_disable_scheduler(5)
>     set_bit(_VPF_migrating, d1v0->pause_flags);
>     vcpu_sleep_nosync(d1v0);
>       SCHED_OP(sleep, d1v0);
>         csched_vcpu_sleep(d1v0)
>           __runq_remove(d1v0);
>     vcpu_migrate(d1v0);
>       if ( d1v0->is_running ||
>            !test_and_clear_bit(_VPF_migrating, d1v0->pause_flags)
>           // false, but clears the _VPF_migrating flag
>       vcpu_move_locked(d1v0, new_cpu);
>       vcpu_wake(v);
>         SCHED_OP(wake, d1v0);
>           csched_vcpu_wake(d1v0)
>             __runq_insert(d1v0);
>             __runq_tickle(d1v0);
> 
> for d2v3, we do:
>   cpu_disable_scheduler(5)
>     set_bit(_VPF_migrating, d2v3-
> >pause_flags);
>     vcpu_sleep_nosync(d2v3);
>       SCHED_OP(sleep, d2v3);
> 
>       csched_vcpu_sleep(d2v3)
> [1]       // Nothing!
> 
> vcpu_migrate(d2v3);
>       if ( d2v3->is_running ||
> 
>  !test_and_clear_bit(_VPF_migrating, d2v3->pause_flags)
>           //
> false, but clears the _VPF_migrating flag
> [*]   vcpu_move_locked(d2v3,
> new_cpu);
>       vcpu_wake(d2v3);
> [2]     // Nothing!
> 
> > > If a
> > > vCPU is blocker, there is nothing to do, and in fact, nothing
> > > happens
> > > (as vcpu_sleep_nosync() and vcpu_wake() are NOP in that case).
> >
> > What do you mean by saying ' vcpu_sleep_nosync() and vcpu_wake()
> > are NOP '?
> >
> See [1] and [2] above.
> 
> > > For
> > > those vCPUs, as soon as they wake up, they'll figure out that their
> > > own
> > > v->processor is not there any longer, and will move somewhere else.
> >
> > So basically, when vCPU is blocking, it has no impact to the blocking
> > vcpu
> > when 'v->processor' is removed. When the vCPU is waken up, it will
> > find
> > another pCPU to run, since the original 'v->processor' is down and no
> > longer in the cpu bitmask, right?
> >
> Yes, that was my point.
> 
> _However_, as you can see at [*] above, it must be noted that even
> those vcpus that blocked while running on a certain processor (5 in the
> example), indeed have a chance to have their
> v->processor changed to something that is still online (something
> different than 5), as a consequence of that processor going away.
> 
> Whether this is useful/enough for you, I can't tell right now, out of
> the top of my head.

Thank you so much, Dario! These detailed info is really helpful for me!!

> 
> > > > > But this is not an issue  in non pCPU hotplug scenario.
> > > > >
> > > It's probably an issue even if you remove a cpu from a cpupool (and
> > > even a more "interesting" one, if you also manage to add it to
> > > another
> > > pool, in the meantime) isn't it?
> >
> > Yes, things become more complex in that case ....
> >
> Well, but can you confirm that we also have an issue there, and test
> and report what happens if you move a cpu from pool A to pool B, while
> it still has vcpus from a domain that stays in pool A.
> 
> If there's transient suboptimal behavior, well, we probably can live
> with that (depending on the specific characteristics of the transitory,
> I'd say). If things crash, we certainly want a fix!

Thinking this a bit more, seems there is no issue with this case, let's
consider the following scenario:
cpupool0: cpu0, cpu1 cpu2, cpu3
cpupool1: cpu4, cpu5, cpu6, cpu7

1. vcpu0 is blocking and it is on the blocking list of cpu0
2. cpu0 is removed from cpupool0, no matter it is added
to cpupool1 or not
3. As long as the cpu is online (we need to handle it when
cpu is offline, which is cpu hotplug case), sometime later, when
notification event happens, we will wake up the blocking
vcpu and the scheduler will choose the proper pCPU for it
to run.

I also test this case, I don't find anything unusual.

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-24 14:46                 ` Dario Faggioli
@ 2016-05-25 13:28                   ` Wu, Feng
  0 siblings, 0 replies; 49+ messages in thread
From: Wu, Feng @ 2016-05-25 13:28 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, xen-devel, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Tuesday, May 24, 2016 10:47 PM
> To: Wu, Feng <feng.wu@intel.com>; Jan Beulich <JBeulich@suse.com>
> Cc: andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com; Tian, Kevin
> <kevin.tian@intel.com>; xen-devel@lists.xen.org; konrad.wilk@oracle.com;
> keir@xen.org
> Subject: Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu
> blocking list
> 
> On Tue, 2016-05-24 at 13:33 +0000, Wu, Feng wrote:
> > > From: Wu, Feng
> > > > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > > >
> > > > If a
> > > > vCPU is blocker, there is nothing to do, and in fact, nothing
> > > > happens
> > > > (as vcpu_sleep_nosync() and vcpu_wake() are NOP in that case).
> > > What do you mean by saying ' vcpu_sleep_nosync() and vcpu_wake()
> > > are NOP '?
> >
> > I think I understand what you meant above now.
> >
> Right. In any case, see the email I've just sent, with a detailed
> breakdown of the situation and of what actually happens.
> 
> > Do you think the following idea makes sense?
> >
> > When a pCPU is unplugged, we can just remove the vcpus on the
> > associated per-cpu blocking list, then we can choose another online
> > cpu, set the right NDST value, and put the vCPU the new per-cpu list?
> >
> Well, this does make sense to me, but the point is how you do that. I
> mean, how do you get to execute some PI adjustment code, from cpu-
> teardown code?
> 
> Right now, for what seemed to be necessary until now, we have the
> arch_vcpu_block(). Do we need more? If yes where?
> 
> From looking only at schedule.c, we already have arch_move_irqs(), can
> we take advantage of it?

I think we can add the logic in vmx_cpu_dead(). I've already have a draft
patch, and I will send it out to your guys to have a review later!

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-24 14:02               ` Dario Faggioli
  2016-05-25 12:39                 ` Wu, Feng
@ 2016-06-23 12:33                 ` Wu, Feng
  2016-06-23 15:11                   ` Dario Faggioli
  1 sibling, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-06-23 12:33 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, xen-devel, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Tuesday, May 24, 2016 10:02 PM
> To: Wu, Feng <feng.wu@intel.com>; Jan Beulich <JBeulich@suse.com>
> Cc: andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com; Tian, Kevin
> <kevin.tian@intel.com>; xen-devel@lists.xen.org; konrad.wilk@oracle.com;
> keir@xen.org
> Subject: Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu
> blocking list
> 
> On Tue, 2016-05-24 at 10:07 +0000, Wu, Feng wrote:
> > > See, for instance, cpu_disable_scheduler() in schedule.c. What we
> > > do is
> > > go over all the vcpus of all domains of either the system or the
> > > cpupool, and force the ones that we found with v->processor set to
> > > the
> > > pCPU that is going down, to perform migration (system_state will be
> > > different than SYS_STATE_suspend, and we hence take the 'else'
> > > branch).
> > >
> > > Considering that the pCPU is no longer part of the relevant
> > > bitmask-s
> > > during the migration, the vCPUs will figure out that they just
> > > can't
> > > stay there, and move somewhere else.
> >
> > Thanks a lot for the elaboration, it is really helpful.
> >
> NP :-)
> 
> > > Note, however, that this happens for running and runnable vCPUs.
> >
> > I don't quite understand this, do you mean cpu_disable_scheduler()
> > only handle running and runnable vCPUs, I tried to find some hints
> > from the code, but I didn't get it. Could you please give some more
> > information about this?
> >
> It goes through all the vcpus of all domains, and does not check or
> care whether they are running, runnable or blocked.
> 
> Let's look at this in some more details. So, let's assume that
> processor 5 is going away, and that you have the following vcpus
> around:
> 
>  d0v0 : v->processor = 5, running on cpu 5
>  d0v1 : v->processor = 4, running on cpu 4
>  d1v0 : v->processor = 5, runnable but not running
>  d2v3 : v->processor = 5, blocked
> 
> for d0v0, we do:
>   cpu_disable_scheduler(5)
>     set_bit(_VPF_migrating, d0v0->pause_flags);
>     vcpu_sleep_nosync(d0v0);
>       SCHED_OP(sleep, d0v0);
>         csched_vcpu_sleep(d0v0)
>           cpu_raise_softirq(5, SCHEDULE_SOFTIRQ);
>     vcpu_migrate(d0v0);
>       if ( v->is_running || ...) // assume v->is_running is true
>         return

Hi Dario, after read this mail again, I get another question,
could you please help me out?

In the above code flow, we return in vcpu_migrate(d0v0) because
v->is_running == 1, after vcpu_migrate() return, we check:

    if ( v->processor == cpu )
        ret = -EAGAIN; 

In my understand in the above case, 'v->processor' is likely equal to
'cpu', hence return -EAGAIN. However, in __cpu_disable(), there is
some check as below:

    if ( cpu_disable_scheduler(cpu) )
        BUG();

Might we hit the BUG() in the above case? Or am I miss something?
Thanks a lot!

Thanks,
Feng

>     ...
>     ... <--- scheduling occurs on processor 5
>     ...
>     context_saved(d0v0)
>       vcpu_migrate(d0v0);
>           //is_running is 0, so _VPF_migrating gets cleared
>         vcpu_move_locked(d0v0, new_cpu);
>         vcpu_wake(d0v0);
>           SCHED_OP(wake, d0v0);
>             csched_vcpu_wake(d0v0)
>               __runq_insert(d0v0);
>               __runq_tickle(d0v0);
> 
> for d0v1, we do:
>   cpu_disable_scheduler(5)
>     if ( d0v1->processor != 5 )
>       continue
> 
> for d1v0, we do:
>   cpu_disable_scheduler(5)
>     set_bit(_VPF_migrating, d1v0->pause_flags);
>     vcpu_sleep_nosync(d1v0);
>       SCHED_OP(sleep, d1v0);
>         csched_vcpu_sleep(d1v0)
>           __runq_remove(d1v0);
>     vcpu_migrate(d1v0);
>       if ( d1v0->is_running ||
>            !test_and_clear_bit(_VPF_migrating, d1v0->pause_flags)
>           // false, but clears the _VPF_migrating flag
>       vcpu_move_locked(d1v0, new_cpu);
>       vcpu_wake(v);
>         SCHED_OP(wake, d1v0);
>           csched_vcpu_wake(d1v0)
>             __runq_insert(d1v0);
>             __runq_tickle(d1v0);
> 
> for d2v3, we do:
>   cpu_disable_scheduler(5)
>     set_bit(_VPF_migrating, d2v3-
> >pause_flags);
>     vcpu_sleep_nosync(d2v3);
>       SCHED_OP(sleep, d2v3);
> 
>       csched_vcpu_sleep(d2v3)
> [1]       // Nothing!
> 
> vcpu_migrate(d2v3);
>       if ( d2v3->is_running ||
> 
>  !test_and_clear_bit(_VPF_migrating, d2v3->pause_flags)
>           //
> false, but clears the _VPF_migrating flag
> [*]   vcpu_move_locked(d2v3,
> new_cpu);
>       vcpu_wake(d2v3);
> [2]     // Nothing!
> 
> > > If a
> > > vCPU is blocker, there is nothing to do, and in fact, nothing
> > > happens
> > > (as vcpu_sleep_nosync() and vcpu_wake() are NOP in that case).
> >
> > What do you mean by saying ' vcpu_sleep_nosync() and vcpu_wake()
> > are NOP '?
> >
> See [1] and [2] above.
> 
> > > For
> > > those vCPUs, as soon as they wake up, they'll figure out that their
> > > own
> > > v->processor is not there any longer, and will move somewhere else.
> >
> > So basically, when vCPU is blocking, it has no impact to the blocking
> > vcpu
> > when 'v->processor' is removed. When the vCPU is waken up, it will
> > find
> > another pCPU to run, since the original 'v->processor' is down and no
> > longer in the cpu bitmask, right?
> >
> Yes, that was my point.
> 
> _However_, as you can see at [*] above, it must be noted that even
> those vcpus that blocked while running on a certain processor (5 in the
> example), indeed have a chance to have their
> v->processor changed to something that is still online (something
> different than 5), as a consequence of that processor going away.
> 
> Whether this is useful/enough for you, I can't tell right now, out of
> the top of my head.
> 
> > > > > But this is not an issue  in non pCPU hotplug scenario.
> > > > >
> > > It's probably an issue even if you remove a cpu from a cpupool (and
> > > even a more "interesting" one, if you also manage to add it to
> > > another
> > > pool, in the meantime) isn't it?
> >
> > Yes, things become more complex in that case ....
> >
> Well, but can you confirm that we also have an issue there, and test
> and report what happens if you move a cpu from pool A to pool B, while
> it still has vcpus from a domain that stays in pool A.
> 
> If there's transient suboptimal behavior, well, we probably can live
> with that (depending on the specific characteristics of the transitory,
> I'd say). If things crash, we certainly want a fix!
> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-06-23 12:33                 ` Wu, Feng
@ 2016-06-23 15:11                   ` Dario Faggioli
  2016-06-24  6:11                     ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2016-06-23 15:11 UTC (permalink / raw)
  To: Wu, Feng, Jan Beulich
  Cc: george.dunlap, andrew.cooper3, Tian, Kevin, keir, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2577 bytes --]

On Thu, 2016-06-23 at 12:33 +0000, Wu, Feng wrote:
> > -----Original Message-----
> > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > 
> > It goes through all the vcpus of all domains, and does not check or
> > care whether they are running, runnable or blocked.
> > 
> > Let's look at this in some more details. So, let's assume that
> > processor 5 is going away, and that you have the following vcpus
> > around:
> > 
> >  d0v0 : v->processor = 5, running on cpu 5
> >  d0v1 : v->processor = 4, running on cpu 4
> >  d1v0 : v->processor = 5, runnable but not running
> >  d2v3 : v->processor = 5, blocked
> > 
> > for d0v0, we do:
> >   cpu_disable_scheduler(5)
> >     set_bit(_VPF_migrating, d0v0->pause_flags);
> >     vcpu_sleep_nosync(d0v0);
> >       SCHED_OP(sleep, d0v0);
> >         csched_vcpu_sleep(d0v0)
> >           cpu_raise_softirq(5, SCHEDULE_SOFTIRQ);
> >     vcpu_migrate(d0v0);
> >       if ( v->is_running || ...) // assume v->is_running is true
> >         return
> Hi Dario, after read this mail again, I get another question,
> could you please help me out?
> 
> In the above code flow, we return in vcpu_migrate(d0v0) because
> v->is_running == 1, after vcpu_migrate() return, we check:
> 
>     if ( v->processor == cpu )
>         ret = -EAGAIN; 
> 
> In my understand in the above case, 'v->processor' is likely equal to
> 'cpu', hence return -EAGAIN. However, in __cpu_disable(), there is
> some check as below:
> 
>     if ( cpu_disable_scheduler(cpu) )
>         BUG();
> 
Right. But, as the comment inside cpu_disable_scheduler() itself says,
we only return -EAGAIN in case we are calling cpu_disable_scheduler for
removing a pCPU from a cpupool.

In that case, we do not use __cpu_disable(), and hence we can safely
return an error value. In that case, in fact, the caller of
cpu_disable_scheduler() is cpupool_unassign_cpu_helprer(), which does
what's necessary to deal with such error.

> Might we hit the BUG() in the above case? 
>
No, because we call cpu_disable_scheduler() from __cpu_disable(), only
when system state is SYS_STATE_suspend already, and hence we take the
then branch of the 'if', which does never return an error.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-06-23 15:11                   ` Dario Faggioli
@ 2016-06-24  6:11                     ` Wu, Feng
  2016-06-24  7:22                       ` Dario Faggioli
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-06-24  6:11 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, xen-devel, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Thursday, June 23, 2016 11:12 PM
> To: Wu, Feng <feng.wu@intel.com>; Jan Beulich <JBeulich@suse.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; keir@xen.org;
> george.dunlap@eu.citrix.com; andrew.cooper3@citrix.com; xen-
> devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH 0/3] VMX: Properly handle pi descriptor and
> per-cpu blocking list
> 
> On Thu, 2016-06-23 at 12:33 +0000, Wu, Feng wrote:
> > > -----Original Message-----
> > > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > >
> > > It goes through all the vcpus of all domains, and does not check or
> > > care whether they are running, runnable or blocked.
> > >
> > > Let's look at this in some more details. So, let's assume that
> > > processor 5 is going away, and that you have the following vcpus
> > > around:
> > >
> > >  d0v0 : v->processor = 5, running on cpu 5
> > >  d0v1 : v->processor = 4, running on cpu 4
> > >  d1v0 : v->processor = 5, runnable but not running
> > >  d2v3 : v->processor = 5, blocked
> > >
> > > for d0v0, we do:
> > >   cpu_disable_scheduler(5)
> > >     set_bit(_VPF_migrating, d0v0->pause_flags);
> > >     vcpu_sleep_nosync(d0v0);
> > >       SCHED_OP(sleep, d0v0);
> > >         csched_vcpu_sleep(d0v0)
> > >           cpu_raise_softirq(5, SCHEDULE_SOFTIRQ);
> > >     vcpu_migrate(d0v0);
> > >       if ( v->is_running || ...) // assume v->is_running is true
> > >         return
> > Hi Dario, after read this mail again, I get another question,
> > could you please help me out?
> >
> > In the above code flow, we return in vcpu_migrate(d0v0) because
> > v->is_running == 1, after vcpu_migrate() return, we check:
> >
> >     if ( v->processor == cpu )
> >         ret = -EAGAIN;
> >
> > In my understand in the above case, 'v->processor' is likely equal to
> > 'cpu', hence return -EAGAIN. However, in __cpu_disable(), there is
> > some check as below:
> >
> >     if ( cpu_disable_scheduler(cpu) )
> >         BUG();
> >
> Right. But, as the comment inside cpu_disable_scheduler() itself says,
> we only return -EAGAIN in case we are calling cpu_disable_scheduler for
> removing a pCPU from a cpupool.
> 
> In that case, we do not use __cpu_disable(), and hence we can safely
> return an error value. In that case, in fact, the caller of
> cpu_disable_scheduler() is cpupool_unassign_cpu_helprer(), which does
> what's necessary to deal with such error.
> 
> > Might we hit the BUG() in the above case?
> >
> No, because we call cpu_disable_scheduler() from __cpu_disable(), only
> when system state is SYS_STATE_suspend already, and hence we take the
> then branch of the 'if', which does never return an error.

Thanks for the elaboration. I find __cpu_disable() can be called with
system state not being SYS_STATE_suspend. Here is my experiment:

1. Launch a guest and pin vCPU 3 to pCPU 3 (which makes the experiment simpler)
2. offline pCPU 3 via "xen-hptool cpu-offline 3"

The call path of the above steps is:
arch_do_sysctl()
  -> cpu_down_helper()
    -> cpu_down()
      -> take_cpu_down()
        -> __cpu_disable()
          -> cpu_disable_scheduler() (enter the 'else'  part)

I ran an infinite loop on guest vCPU3 (hence pCPU3) to increase the possibility
to hit 'ret = -EAGAGIN' in this function, and I thought it would return -EAGAIN if the
vCPU is running, However, I tested it for many times, it didn't happen. and I find
after vcpu_migrate() returns, vCPU3->runstate.state: 1, vCPU3->is_running: 0, which
means the vCPU is current not running. Then I went back in the call patch and  find
' v->runstate.state ' gets changed during calling stop_machine_run(), and here is the
findings:

int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu)
{

......

point 1

    for_each_cpu ( i, &allbutself )
        tasklet_schedule_on_cpu(&per_cpu(stopmachine_tasklet, i), i);

point 2
......

}

at point 1 above, 
vCPU3->runstate.state: 0, vCPU3->is_running: 1
while at point 2 above:
vCPU3->runstate.state: 1, vCPU3->is_running: 0

I tested it for many times and got the same result. I am not sure the vcpu
state transition just happens to occurs here or not? If the transition doesn't
happen and is_running is still 1 when we get vcpu_migrate() in
cpu_disable_scheduler() in the above case, should it be a problem?
Thanks a lot!

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-06-24  6:11                     ` Wu, Feng
@ 2016-06-24  7:22                       ` Dario Faggioli
  2016-06-24  7:59                         ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2016-06-24  7:22 UTC (permalink / raw)
  To: Wu, Feng, Jan Beulich
  Cc: george.dunlap, andrew.cooper3, Tian, Kevin, keir, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3527 bytes --]

On Fri, 2016-06-24 at 06:11 +0000, Wu, Feng wrote:
> > -----Original Message-----
> > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > No, because we call cpu_disable_scheduler() from __cpu_disable(),
> > only
> > when system state is SYS_STATE_suspend already, and hence we take
> > the
> > then branch of the 'if', which does never return an error.
> Thanks for the elaboration. I find __cpu_disable() can be called with
> system state not being SYS_STATE_suspend. Here is my experiment:
> 
> 1. Launch a guest and pin vCPU 3 to pCPU 3 (which makes the
> experiment simpler)
> 2. offline pCPU 3 via "xen-hptool cpu-offline 3"
> 
Ah, yes, of course. I should have included cpu-hot(un)plug in my
analysis in the first place, sorry for not doing so.

> The call path of the above steps is:
> arch_do_sysctl()
>   -> cpu_down_helper()
>     -> cpu_down()
>       -> take_cpu_down()
>         -> __cpu_disable()
>           -> cpu_disable_scheduler() (enter the 'else'  part)
> 
Right, and the important part is this one:

> int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu)
> {
> 
> ......
> 
> point 1
> 
>     for_each_cpu ( i, &allbutself )
>         tasklet_schedule_on_cpu(&per_cpu(stopmachine_tasklet, i), i);
> 
> point 2
> ......
> 
> }
> 
> at point 1 above, 
> vCPU3->runstate.state: 0, vCPU3->is_running: 1
> while at point 2 above:
> vCPU3->runstate.state: 1, vCPU3->is_running: 0
> 
This is exactly as you describe. cpu hotplug is done in stop machine
context. Check the comment close to the definition of stop_machine_run:

/**
 * stop_machine_run: freeze the machine on all CPUs and run this function
 * @fn: the function to run
 * @data: the data ptr for the @fn()
 * @cpu: the cpu to run @fn() on (or all, if @cpu == NR_CPUS).
 *
 * Description: This causes every other cpu to enter a safe point, with
 * each of which disables interrupts, and finally interrupts are disabled
 * on the current CPU.  The result is that none is holding a spinlock
 * or inside any other preempt-disabled region when @fn() runs.
 *
 * This can be thought of as a very heavy write lock, equivalent to
 * grabbing every spinlock in the kernel. */

As you discovered yourself, this is achieved by forcing the execution
of a tasklet on all the pcpus, which include pCPU 3 of your example.

So, vCPU 3 was running, but then some called stop_machine_run(), which
causes the descheduling of vCPU 3, and the execution of the stopmachine
tasklet.

Thet's why you find is_running to be 0, and that's why  we never return
EAGAIN.

> I tested it for many times and got the same result. I am not sure the
> vcpu
> state transition just happens to occurs here or not? If the
> transition doesn't
> happen and is_running is still 1 when we get vcpu_migrate() in
> cpu_disable_scheduler() in the above case, should it be a problem?
>
I'm not sure what you mean here (in particular with "just happen to
occur"). If you're wondering whether the fact that vCPU 3 gets
descheduled happens by chance or by design, it's indeed the latter, and
so, no, we don't have a problem with this code path.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-06-24  7:22                       ` Dario Faggioli
@ 2016-06-24  7:59                         ` Wu, Feng
  2016-06-24 10:27                           ` Dario Faggioli
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-06-24  7:59 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, xen-devel, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Friday, June 24, 2016 3:23 PM
> To: Wu, Feng <feng.wu@intel.com>; Jan Beulich <JBeulich@suse.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; keir@xen.org;
> george.dunlap@eu.citrix.com; andrew.cooper3@citrix.com; xen-
> devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH 0/3] VMX: Properly handle pi descriptor and
> per-cpu blocking list
> 
> On Fri, 2016-06-24 at 06:11 +0000, Wu, Feng wrote:
> > > -----Original Message-----
> > > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > > No, because we call cpu_disable_scheduler() from __cpu_disable(),
> > > only
> > > when system state is SYS_STATE_suspend already, and hence we take
> > > the
> > > then branch of the 'if', which does never return an error.
> > Thanks for the elaboration. I find __cpu_disable() can be called with
> > system state not being SYS_STATE_suspend. Here is my experiment:
> >
> > 1. Launch a guest and pin vCPU 3 to pCPU 3 (which makes the
> > experiment simpler)
> > 2. offline pCPU 3 via "xen-hptool cpu-offline 3"
> >
> Ah, yes, of course. I should have included cpu-hot(un)plug in my
> analysis in the first place, sorry for not doing so.
> 
> > The call path of the above steps is:
> > arch_do_sysctl()
> >   -> cpu_down_helper()
> >     -> cpu_down()
> >       -> take_cpu_down()
> >         -> __cpu_disable()
> >           -> cpu_disable_scheduler() (enter the 'else'  part)
> >
> Right, and the important part is this one:
> 
> > int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu)
> > {
> >
> > ......
> >
> > point 1
> >
> >     for_each_cpu ( i, &allbutself )
> >         tasklet_schedule_on_cpu(&per_cpu(stopmachine_tasklet, i), i);
> >
> > point 2
> > ......
> >
> > }
> >
> > at point 1 above,
> > vCPU3->runstate.state: 0, vCPU3->is_running: 1
> > while at point 2 above:
> > vCPU3->runstate.state: 1, vCPU3->is_running: 0
> >
> This is exactly as you describe. cpu hotplug is done in stop machine
> context. Check the comment close to the definition of stop_machine_run:
> 
> /**
>  * stop_machine_run: freeze the machine on all CPUs and run this function
>  * @fn: the function to run
>  * @data: the data ptr for the @fn()
>  * @cpu: the cpu to run @fn() on (or all, if @cpu == NR_CPUS).
>  *
>  * Description: This causes every other cpu to enter a safe point, with
>  * each of which disables interrupts, and finally interrupts are disabled
>  * on the current CPU.  The result is that none is holding a spinlock
>  * or inside any other preempt-disabled region when @fn() runs.
>  *
>  * This can be thought of as a very heavy write lock, equivalent to
>  * grabbing every spinlock in the kernel. */
> 
> As you discovered yourself, this is achieved by forcing the execution
> of a tasklet on all the pcpus, which include pCPU 3 of your example.
> 
> So, vCPU 3 was running, but then some called stop_machine_run(), which
> causes the descheduling of vCPU 3, and the execution of the stopmachine
> tasklet.

Thanks for your replay. Yes, I think this is point. Here descheduling of vCPU3
happens, and the reason we will choose the tasklet as the next running
unit for sure (not choosing another vCPU or vCPU3 itself as the next
running unit) is because tasklet will overrides all other choose as
stated in csched_schedule() as below, right?

static struct task_slice
csched_schedule(
    const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled) 
{
......

    snext = __runq_elem(runq->next);
    ret.migrated = 0;

    /* Tasklet work (which runs in idle VCPU context) overrides all else. */
    if ( tasklet_work_scheduled )
    {
        TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
        snext = CSCHED_VCPU(idle_vcpu[cpu]);
        snext->pri = CSCHED_PRI_TS_BOOST;
    }


......
}

Thanks,
Feng
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-06-24  7:59                         ` Wu, Feng
@ 2016-06-24 10:27                           ` Dario Faggioli
  2016-06-24 13:25                             ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2016-06-24 10:27 UTC (permalink / raw)
  To: Wu, Feng, Jan Beulich
  Cc: george.dunlap, andrew.cooper3, Tian, Kevin, keir, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1376 bytes --]

On Fri, 2016-06-24 at 07:59 +0000, Wu, Feng wrote:
> > -----Original Message-----
> > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > So, vCPU 3 was running, but then some called stop_machine_run(),
> > which
> > causes the descheduling of vCPU 3, and the execution of the
> > stopmachine
> > tasklet.
> Thanks for your replay. Yes, I think this is point. Here descheduling
> of vCPU3
> happens, and the reason we will choose the tasklet as the next
> running
> unit for sure (not choosing another vCPU or vCPU3 itself as the next
> running unit) is because tasklet will overrides all other choose as
> stated in csched_schedule() as below, right?
> 
Exactly, tasklets preempt the running vcpu and take precedence of
runnable vcpus.

Then, in this case, the reason why we are sure that all the pcpus are
executing the body of the tasklet, is indeed the structure of
stop_machine_run() and stopmachine_action() themselves, which are built
to make sure of that, much rather than just the fact that tasklet are
higher priority.

But yes, this is what happens.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-06-24 10:27                           ` Dario Faggioli
@ 2016-06-24 13:25                             ` Wu, Feng
  2016-06-24 23:43                               ` Dario Faggioli
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2016-06-24 13:25 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, xen-devel, Wu, Feng

> > Thanks for your replay. Yes, I think this is point. Here descheduling
> > of vCPU3
> > happens, and the reason we will choose the tasklet as the next
> > running
> > unit for sure (not choosing another vCPU or vCPU3 itself as the next
> > running unit) is because tasklet will overrides all other choose as
> > stated in csched_schedule() as below, right?
> >
> Exactly, tasklets preempt the running vcpu and take precedence of
> runnable vcpus.
> 
> Then, in this case, the reason why we are sure that all the pcpus are
> executing the body of the tasklet, is indeed the structure of
> stop_machine_run() and stopmachine_action() themselves, which are built
> to make sure of that,

Thanks for the reply, I am sorry I don't quite understand the above
comment. In my understanding, the tasklet has higher priority, so
stopmachine_action() as the body of the tasklet preempts vCPU3.
Is this the case?

Thanks,
Feng

> much rather than just the fact that tasklet are
> higher priority.
> 
> But yes, this is what happens.
> 
Thanks for the clarification!

Thanks,
Feng

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-06-24 13:25                             ` Wu, Feng
@ 2016-06-24 23:43                               ` Dario Faggioli
  0 siblings, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2016-06-24 23:43 UTC (permalink / raw)
  To: Wu, Feng, Jan Beulich
  Cc: george.dunlap, andrew.cooper3, Tian, Kevin, keir, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1275 bytes --]

On Fri, 2016-06-24 at 13:25 +0000, Wu, Feng wrote:
> > 
> > Then, in this case, the reason why we are sure that all the pcpus
> > are
> > executing the body of the tasklet, is indeed the structure of
> > stop_machine_run() and stopmachine_action() themselves, which are
> > built
> > to make sure of that,
> Thanks for the reply, I am sorry I don't quite understand the above
> comment. In my understanding, the tasklet has higher priority, so
> stopmachine_action() as the body of the tasklet preempts vCPU3.
> Is this the case?
> 
It is the case. What I was trying to say is that, even if tasklets
would not have higher priority than regular vCPUs (as soon as there
would be a mechanism that ensures that tasklets themselves were run and
not starve), things would still work.

It's not that important, it's just that it seemed you wanted to
summarize in order to better understand the situation, and I thought it
was important to make you notice this.

Just that. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2016-06-24 23:43 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-20  8:53 [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
2016-05-20  8:53 ` [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor Feng Wu
2016-05-23  5:15   ` Tian, Kevin
2016-05-23  5:27     ` Wu, Feng
2016-05-23  6:52       ` Tian, Kevin
2016-05-23  7:16         ` Wu, Feng
2016-05-23  9:03           ` Jan Beulich
2016-05-23  9:21             ` Wu, Feng
2016-05-23 11:04               ` Jan Beulich
2016-05-23 12:30   ` Jan Beulich
2016-05-20  8:53 ` [PATCH 2/3] VMX: Make hook pi_do_resume always available Feng Wu
2016-05-23 12:32   ` Jan Beulich
2016-05-23 12:51     ` Dario Faggioli
2016-05-20  8:53 ` [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination Feng Wu
2016-05-23  5:19   ` Tian, Kevin
2016-05-23  5:48     ` Wu, Feng
2016-05-23  6:54       ` Tian, Kevin
2016-05-23  9:08       ` Jan Beulich
2016-05-23  9:17         ` Wu, Feng
2016-05-23 10:35           ` Wu, Feng
2016-05-23 11:11             ` Jan Beulich
2016-05-23 12:24               ` Wu, Feng
2016-05-23 12:46                 ` Jan Beulich
2016-05-23 13:41                   ` Wu, Feng
2016-05-23 12:30   ` Dario Faggioli
2016-05-23 13:32     ` Wu, Feng
2016-05-23 14:45       ` Dario Faggioli
2016-05-23 12:35   ` Jan Beulich
2016-05-23 13:33     ` Wu, Feng
2016-05-20 10:27 ` [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list Jan Beulich
2016-05-20 10:46   ` Wu, Feng
2016-05-23  8:08     ` Jan Beulich
2016-05-23  8:44       ` Wu, Feng
2016-05-23  8:51         ` Jan Beulich
2016-05-23 12:39           ` Dario Faggioli
2016-05-24 10:07             ` Wu, Feng
2016-05-24 13:33               ` Wu, Feng
2016-05-24 14:46                 ` Dario Faggioli
2016-05-25 13:28                   ` Wu, Feng
2016-05-24 14:02               ` Dario Faggioli
2016-05-25 12:39                 ` Wu, Feng
2016-06-23 12:33                 ` Wu, Feng
2016-06-23 15:11                   ` Dario Faggioli
2016-06-24  6:11                     ` Wu, Feng
2016-06-24  7:22                       ` Dario Faggioli
2016-06-24  7:59                         ` Wu, Feng
2016-06-24 10:27                           ` Dario Faggioli
2016-06-24 13:25                             ` Wu, Feng
2016-06-24 23:43                               ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).