xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list
@ 2016-05-26 13:39 Feng Wu
  2016-05-26 13:39 ` [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed Feng Wu
                   ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Feng Wu @ 2016-05-26 13:39 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	jbeulich, Feng Wu

The current VT-d PI related code may operate incorrectly in the
following scenarios:
1. When the last assigned device is dettached from the domain, all
the PI related hooks are removed then, however, the vCPU can be
blocked, switched to another pCPU, etc, all without the aware of
PI. After the next time we attach another device to the domain,
which makes the PI realted hooks avaliable again, the status
of the pi descriptor is not true. Beside that, the blocking vcpu
may still remain in the per-cpu blocking in this case.

Solution for this case: [patch v2 1/4]
- Don't zap the PI hooks after the last assigned device is dettatched.
- Remove all the vCPU of the domain from the per-cpu blocking list,
  when the last assigned devices is dettached.

2. After the domain is destroyed, the the blocking vcpu may also
remain in the per-cpu blocking.

Solution for this case: [patch v2 2/4]
- Remove the vCPU from the per-cpu blocking list when it is going
  to be destroyed.

3. When a pCPU is unplugged, and there might be vCPUs on its
list. Since the pCPU is offline, those vCPUs might not be woken
up again.

Solution for this case: [patch v2 4/4]
- When the pCPU is unplugged, move those vCPUs to another blocking
  list of an online pCPU, as well as, set the new cpu to the 'NDST'
  field of PI descriptor.

Feng Wu (4):
  VMX: Properly handle pi when all the assigned devices are removed
  VMX: Cleanup PI per-cpu blocking list when vcpu is destroyed
  VMX: Assign the right value to 'NDST' field in a concern case
  VMX: fixup PI descritpor when cpu is offline

 xen/arch/x86/hvm/vmx/vmcs.c        |   1 +
 xen/arch/x86/hvm/vmx/vmx.c         | 179 +++++++++++++++++++++++++++++++++----
 xen/common/schedule.c              |   8 +-
 xen/include/asm-x86/hvm/hvm.h      |   4 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h |   5 +-
 xen/include/asm-x86/hvm/vmx/vmx.h  |   1 +
 6 files changed, 174 insertions(+), 24 deletions(-)

-- 
2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed
  2016-05-26 13:39 [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
@ 2016-05-26 13:39 ` Feng Wu
  2016-05-27 13:43   ` Jan Beulich
  2016-05-26 13:39 ` [PATCH v2 2/4] VMX: Cleanup PI per-cpu blocking list when vcpu is destroyed Feng Wu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 32+ messages in thread
From: Feng Wu @ 2016-05-26 13:39 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	jbeulich, Feng Wu

This patch handles some concern case when the last assigned device
is removed from the domain. In this case we should carefully handle
pi descriptor and the per-cpu blocking list, to make sure:
- all the PI descriptor are in the right state when next time a
devices is assigned to the domain again. This is achrived by always
making all the pi hooks available, so the pi descriptor is updated
during scheduling, which make it always up-to-data.
- No remaining vcpus of the domain in the per-cpu blocking list.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c         | 75 +++++++++++++++++++++++++++++++-------
 xen/include/asm-x86/hvm/vmx/vmcs.h |  3 ++
 2 files changed, 65 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index bc4410f..65f5288 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -113,7 +113,19 @@ static void vmx_vcpu_block(struct vcpu *v)
 		&per_cpu(vmx_pi_blocking, v->processor).lock;
     struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
 
-    spin_lock_irqsave(pi_blocking_list_lock, flags);
+    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
+    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up) )
+    {
+        /*
+         * The vcpu is to be destroyed and it has already been removed
+         * from the per-CPU list if it is blocking, we shouldn't add
+         * new vCPU to the list.
+         */
+        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
+        return;
+    }
+
+    spin_lock(pi_blocking_list_lock);
     old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
                        pi_blocking_list_lock);
 
@@ -126,7 +138,9 @@ static void vmx_vcpu_block(struct vcpu *v)
 
     list_add_tail(&v->arch.hvm_vmx.pi_blocking.list,
                   &per_cpu(vmx_pi_blocking, v->processor).list);
-    spin_unlock_irqrestore(pi_blocking_list_lock, flags);
+    spin_unlock(pi_blocking_list_lock);
+
+    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
 
     ASSERT(!pi_test_sn(pi_desc));
 
@@ -199,32 +213,65 @@ static void vmx_pi_do_resume(struct vcpu *v)
     spin_unlock_irqrestore(pi_blocking_list_lock, flags);
 }
 
+static void vmx_pi_blocking_cleanup(struct vcpu *v)
+{
+    unsigned long flags;
+    spinlock_t *pi_blocking_list_lock;
+
+    if ( !iommu_intpost )
+        return;
+
+    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
+    v->arch.hvm_vmx.pi_blocking_cleaned_up = 1;
+
+    pi_blocking_list_lock = v->arch.hvm_vmx.pi_blocking.lock;
+    if (pi_blocking_list_lock == NULL)
+    {
+        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
+        return;
+    }
+
+    spin_lock(pi_blocking_list_lock);
+    if ( v->arch.hvm_vmx.pi_blocking.lock != NULL )
+    {
+        ASSERT(v->arch.hvm_vmx.pi_blocking.lock == pi_blocking_list_lock);
+        list_del(&v->arch.hvm_vmx.pi_blocking.list);
+        v->arch.hvm_vmx.pi_blocking.lock = NULL;
+    }
+    spin_unlock(pi_blocking_list_lock);
+    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
+}
+
 /* This function is called when pcidevs_lock is held */
 void vmx_pi_hooks_assign(struct domain *d)
 {
+    struct vcpu *v;
+
     if ( !iommu_intpost || !has_hvm_container_domain(d) )
         return;
 
-    ASSERT(!d->arch.hvm_domain.vmx.vcpu_block);
+    for_each_vcpu ( d, v )
+        v->arch.hvm_vmx.pi_blocking_cleaned_up = 0;
 
-    d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
-    d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from;
-    d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to;
-    d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
+    if ( !d->arch.hvm_domain.vmx.vcpu_block )
+    {
+        d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
+        d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from;
+        d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to;
+        d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
+    }
 }
 
 /* This function is called when pcidevs_lock is held */
 void vmx_pi_hooks_deassign(struct domain *d)
 {
+    struct vcpu *v;
+
     if ( !iommu_intpost || !has_hvm_container_domain(d) )
         return;
 
-    ASSERT(d->arch.hvm_domain.vmx.vcpu_block);
-
-    d->arch.hvm_domain.vmx.vcpu_block = NULL;
-    d->arch.hvm_domain.vmx.pi_switch_from = NULL;
-    d->arch.hvm_domain.vmx.pi_switch_to = NULL;
-    d->arch.hvm_domain.vmx.pi_do_resume = NULL;
+    for_each_vcpu ( d, v )
+        vmx_pi_blocking_cleanup(v);
 }
 
 static int vmx_domain_initialise(struct domain *d)
@@ -256,6 +303,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 
     INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocking.list);
 
+    spin_lock_init(&v->arch.hvm_vmx.pi_hotplug_lock);
+
     v->arch.schedule_tail    = vmx_do_resume;
     v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
     v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index b54f52f..3834f49 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -231,6 +231,9 @@ struct arch_vmx_struct {
      * pCPU and wakeup the related vCPU.
      */
     struct pi_blocking_vcpu pi_blocking;
+
+    spinlock_t            pi_hotplug_lock;
+    bool_t                pi_blocking_cleaned_up;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
-- 
2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 2/4] VMX: Cleanup PI per-cpu blocking list when vcpu is destroyed
  2016-05-26 13:39 [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
  2016-05-26 13:39 ` [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed Feng Wu
@ 2016-05-26 13:39 ` Feng Wu
  2016-05-27 13:49   ` Jan Beulich
  2016-05-26 13:39 ` [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case Feng Wu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 32+ messages in thread
From: Feng Wu @ 2016-05-26 13:39 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	jbeulich, Feng Wu

We should remove the vCPU from the per-cpu blocking list
if it is going to be destroyed.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 65f5288..b01128a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -366,6 +366,7 @@ static void vmx_vcpu_destroy(struct vcpu *v)
     vmx_destroy_vmcs(v);
     vpmu_destroy(v);
     passive_domain_destroy(v);
+    vmx_pi_blocking_cleanup(v);
 }
 
 static DEFINE_PER_CPU(struct vmx_msr_state, host_msr_state);
-- 
2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case
  2016-05-26 13:39 [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
  2016-05-26 13:39 ` [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed Feng Wu
  2016-05-26 13:39 ` [PATCH v2 2/4] VMX: Cleanup PI per-cpu blocking list when vcpu is destroyed Feng Wu
@ 2016-05-26 13:39 ` Feng Wu
  2016-05-27 14:00   ` Jan Beulich
  2016-06-22 18:00   ` George Dunlap
  2016-05-26 13:39 ` [PATCH v2 4/4] VMX: fixup PI descritpor when cpu is offline Feng Wu
  2016-05-26 17:20 ` [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list Dario Faggioli
  4 siblings, 2 replies; 32+ messages in thread
From: Feng Wu @ 2016-05-26 13:39 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	jbeulich, Feng Wu

Normally, in vmx_cpu_block() 'NDST' filed should have the same
value with 'dest' or 'MASK_INSR(dest, PI_xAPIC_NDST_MASK)' depending
on whether x2apic is enabled. However, in the following scenario,
'NDST' has different value:

'vcpu_block' hook gets assigned in vmx_pi_hooks_assign(), but all
other three PI hooks have not been assigned or not been excuted yet.
And during this interval, we are running in vmx_vcpu_block(), then
'NDST' may have different value.

This patch fix this concern case.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b01128a..662b38d 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -146,8 +146,19 @@ static void vmx_vcpu_block(struct vcpu *v)
 
     dest = cpu_physical_id(v->processor);
 
-    ASSERT(pi_desc->ndst ==
-           (x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK)));
+    /*
+     * Normally, 'NDST' filed should have the same value with dest or
+     * MASK_INSR(dest, PI_xAPIC_NDST_MASK) depending on whether x2apic is
+     * enabled. However, in the following scenario, 'NDST' has different
+     * value:
+     *
+     * 'vcpu_block' hook gets assigned in vmx_pi_hooks_assign(), but all
+     * other three PI hooks have not been assigned or not been excuted yet.
+     * And during this interval, we get here in this function, then 'NDST'
+     * may have different value.
+     */
+    write_atomic(&pi_desc->ndst,
+                 x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
 
     write_atomic(&pi_desc->nv, pi_wakeup_vector);
 }
-- 
2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 4/4] VMX: fixup PI descritpor when cpu is offline
  2016-05-26 13:39 [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
                   ` (2 preceding siblings ...)
  2016-05-26 13:39 ` [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case Feng Wu
@ 2016-05-26 13:39 ` Feng Wu
  2016-05-27 14:56   ` Jan Beulich
  2016-05-26 17:20 ` [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list Dario Faggioli
  4 siblings, 1 reply; 32+ messages in thread
From: Feng Wu @ 2016-05-26 13:39 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	jbeulich, Feng Wu

When cpu is offline, we need to move all the vcpus in its blocking
list to another online cpu, this patch handles it. And we need to
carefully handle the situation that calling to vmx_vcpu_block() and
cpu offline occur concurrently.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        |  1 +
 xen/arch/x86/hvm/vmx/vmx.c         | 90 +++++++++++++++++++++++++++++++++++++-
 xen/common/schedule.c              |  8 +---
 xen/include/asm-x86/hvm/hvm.h      |  4 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h |  2 +-
 xen/include/asm-x86/hvm/vmx/vmx.h  |  1 +
 6 files changed, 96 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 8284281..aa25788 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -590,6 +590,7 @@ void vmx_cpu_dead(unsigned int cpu)
     vmx_free_vmcs(per_cpu(vmxon_region, cpu));
     per_cpu(vmxon_region, cpu) = 0;
     nvmx_cpu_dead(cpu);
+    vmx_pi_desc_fixup(cpu);
 }
 
 int vmx_cpu_up(void)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 662b38d..b56082e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -87,6 +87,7 @@ static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
 struct vmx_pi_blocking_vcpu {
     struct list_head     list;
     spinlock_t           lock;
+    bool_t               down;
 };
 
 /*
@@ -102,9 +103,10 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
 {
     INIT_LIST_HEAD(&per_cpu(vmx_pi_blocking, cpu).list);
     spin_lock_init(&per_cpu(vmx_pi_blocking, cpu).lock);
+    per_cpu(vmx_pi_blocking, cpu).down = 0;
 }
 
-static void vmx_vcpu_block(struct vcpu *v)
+static bool_t vmx_vcpu_block(struct vcpu *v)
 {
     unsigned long flags;
     unsigned int dest;
@@ -122,10 +124,25 @@ static void vmx_vcpu_block(struct vcpu *v)
          * new vCPU to the list.
          */
         spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
-        return;
+        return 1;
     }
 
     spin_lock(pi_blocking_list_lock);
+    if ( unlikely(per_cpu(vmx_pi_blocking, v->processor).down) )
+    {
+        /*
+         * We being here means that the v->processor is going away, and all
+         * the vcpus on its blocking list were removed from it. Hence we
+         * cannot add new vcpu to it. Besides that, we return -1 to
+         * prevent the vcpu from being blocked. This is needed because
+         * if the vCPU is continue to block and here we don't put it
+         * in a per-cpu blocking list, it might not be woken up by the
+         * notification event.
+         */
+        spin_unlock(pi_blocking_list_lock);
+        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
+        return 0;
+    }
     old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
                        pi_blocking_list_lock);
 
@@ -161,6 +178,8 @@ static void vmx_vcpu_block(struct vcpu *v)
                  x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
 
     write_atomic(&pi_desc->nv, pi_wakeup_vector);
+
+    return 1;
 }
 
 static void vmx_pi_switch_from(struct vcpu *v)
@@ -253,6 +272,73 @@ static void vmx_pi_blocking_cleanup(struct vcpu *v)
     spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
 }
 
+void vmx_pi_desc_fixup(int cpu)
+{
+    unsigned int new_cpu, dest;
+    unsigned long flags;
+    struct arch_vmx_struct *vmx, *tmp;
+    spinlock_t *new_lock, *old_lock = &per_cpu(vmx_pi_blocking, cpu).lock;
+    struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
+
+    if ( !iommu_intpost )
+        return;
+
+    spin_lock_irqsave(old_lock, flags);
+    per_cpu(vmx_pi_blocking, cpu).down = 1;
+
+    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
+    {
+        /*
+         * We need to find an online cpu as the NDST of the PI descriptor, it
+         * doesn't matter whether it is within the cpupool of the domain or
+         * not. As long as it is online, the vCPU will be woken up once the
+         * notification event arrives.
+         */
+        new_cpu = cpu;
+restart:
+        while ( 1 )
+        {
+            new_cpu = (new_cpu + 1) % nr_cpu_ids;
+            if ( cpu_online(new_cpu) )
+                break;
+        }
+        new_lock = &per_cpu(vmx_pi_blocking, cpu).lock;
+
+        spin_lock(new_lock);
+        /*
+         * After acquiring the blocking list lock for the new cpu, we need
+         * to check whether new_cpu is still online.
+         *
+         * If '.down' is true, it mean 'new_cpu' is also going to be offline,
+         * so just go back to find another one, otherwise, there are two
+         * possibilities:
+         *   case 1 - 'new_cpu' is online.
+         *   case 2 - 'new_cpu' is about to be offline, but doesn't get to
+         *            the point where '.down' is set.
+         * In either case above, we can just set 'new_cpu' to 'NDST' field.
+         * For case 2 the 'NDST' field will be set to another online cpu when
+         * we get to this function for 'new_cpu' some time later.
+         */
+        if ( per_cpu(vmx_pi_blocking, cpu).down )
+        {
+            spin_unlock(new_lock);
+            goto restart;
+        }
+
+        ASSERT(vmx->pi_blocking.lock == old_lock);
+
+        dest = cpu_physical_id(new_cpu);
+        write_atomic(&vmx->pi_desc.ndst,
+                     x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
+
+        list_move(&vmx->pi_blocking.list,
+                  &per_cpu(vmx_pi_blocking, new_cpu).list);
+        vmx->pi_blocking.lock = new_lock;
+        spin_unlock(new_lock);
+    }
+    spin_unlock_irqrestore(old_lock, flags);
+}
+
 /* This function is called when pcidevs_lock is held */
 void vmx_pi_hooks_assign(struct domain *d)
 {
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5546999..b60491d 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -833,10 +833,8 @@ void vcpu_block(void)
 
     set_bit(_VPF_blocked, &v->pause_flags);
 
-    arch_vcpu_block(v);
-
     /* Check for events /after/ blocking: avoids wakeup waiting race. */
-    if ( local_events_need_delivery() )
+    if ( arch_vcpu_block(v) || local_events_need_delivery() )
     {
         clear_bit(_VPF_blocked, &v->pause_flags);
     }
@@ -872,8 +870,6 @@ static long do_poll(struct sched_poll *sched_poll)
     v->poll_evtchn = -1;
     set_bit(v->vcpu_id, d->poll_mask);
 
-    arch_vcpu_block(v);
-
 #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */
     /* Check for events /after/ setting flags: avoids wakeup waiting race. */
     smp_mb();
@@ -891,7 +887,7 @@ static long do_poll(struct sched_poll *sched_poll)
 #endif
 
     rc = 0;
-    if ( local_events_need_delivery() )
+    if ( arch_vcpu_block(v) || local_events_need_delivery() )
         goto out;
 
     for ( i = 0; i < sched_poll->nr_ports; i++ )
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 7b7ff3f..725dce7 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -608,11 +608,13 @@ unsigned long hvm_cr4_guest_reserved_bits(const struct vcpu *v, bool_t restore);
  * not been defined yet.
  */
 #define arch_vcpu_block(v) ({                                   \
+    bool_t rc = 0;                                              \
     struct vcpu *v_ = (v);                                      \
     struct domain *d_ = v_->domain;                             \
     if ( has_hvm_container_domain(d_) &&                        \
          d_->arch.hvm_domain.vmx.vcpu_block )                   \
-        d_->arch.hvm_domain.vmx.vcpu_block(v_);                 \
+        rc = d_->arch.hvm_domain.vmx.vcpu_block(v_);            \
+    rc;                                                         \
 })
 
 #endif /* __ASM_X86_HVM_HVM_H__ */
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 3834f49..e1834f7 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -132,7 +132,7 @@ struct vmx_domain {
      * has a physical device assigned to it, so we set and clear the callbacks
      * as appropriate when device assignment changes.
      */
-    void (*vcpu_block) (struct vcpu *);
+    bool_t (*vcpu_block) (struct vcpu *);
     void (*pi_switch_from) (struct vcpu *v);
     void (*pi_switch_to) (struct vcpu *v);
     void (*pi_do_resume) (struct vcpu *v);
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index a85d488..b6ba123 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -565,6 +565,7 @@ void free_p2m_hap_data(struct p2m_domain *p2m);
 void p2m_init_hap_data(struct p2m_domain *p2m);
 
 void vmx_pi_per_cpu_init(unsigned int cpu);
+void vmx_pi_desc_fixup(int cpu);
 
 void vmx_pi_hooks_assign(struct domain *d);
 void vmx_pi_hooks_deassign(struct domain *d);
-- 
2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-26 13:39 [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
                   ` (3 preceding siblings ...)
  2016-05-26 13:39 ` [PATCH v2 4/4] VMX: fixup PI descritpor when cpu is offline Feng Wu
@ 2016-05-26 17:20 ` Dario Faggioli
  2016-05-31 10:19   ` Wu, Feng
  4 siblings, 1 reply; 32+ messages in thread
From: Dario Faggioli @ 2016-05-26 17:20 UTC (permalink / raw)
  To: Feng Wu, xen-devel
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 2421 bytes --]

Hi Feng,

On Thu, 2016-05-26 at 21:39 +0800, Feng Wu wrote:
> Feng Wu (4):
>   VMX: Properly handle pi when all the assigned devices are removed
>   VMX: Cleanup PI per-cpu blocking list when vcpu is destroyed
>   VMX: Assign the right value to 'NDST' field in a concern case
>   VMX: fixup PI descritpor when cpu is offline
> 
I need some time to carefully look at this series, and I don't have
such amount of time right now. I'll try to look at it and send my
comments either tomorrow or on Monday.

However, allow me to just say that, from a quick glance, the various
solutions does not look really convincing to me. Basically, you've
added:
 - a couple of flags (pi_blocking_cleaned_up, down)
 - a new spinlock (pi_hotplug_lock)

And yet, neither the various changelogs, nor code comments explain much
about what the lock serializes and protects, and/or what the flags
represent ad how they should be used.

So, if you want try argumenting a bit on what was your line of
reasoning when doing things this way, that would be helpful (at least
to me).

I'm also non-convinced that, in patch 4, the right thing to do is to
just to pick-up a random online pCPU. At some point, during v1 review,
I mentioned arch_move_irqs(), because it seemed to me the place from
where you actually have the proper information available (i.e., where
the vcpu is actually going, e.g. because the pCPU is on is going
away!). It may well be the case that I'm wrong about it being suitable,
and I'll look better at what you actually have implemented, but at a
first glance, it looks cumbersome.

For instance, now arch_vcpu_block() returns a value and, as you say
yourself in a comment, that is for (potentially) preventing a vcpu to
block. So the behavior of schedule.c:vcpu_block(), now depends on your
new flag per_cpu(vmx_pi_blocking, v->processor).down. Again, I'll look
better, but this has few chances of being right (at least logically).

Finally, you're calling *per-vCPU* things foo_hotplug_bar, which is
rather confusing, as it makes one thinking that they're related to
*pCPU* hotplug.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed
  2016-05-26 13:39 ` [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed Feng Wu
@ 2016-05-27 13:43   ` Jan Beulich
  2016-05-31 10:22     ` Wu, Feng
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2016-05-27 13:43 UTC (permalink / raw)
  To: Feng Wu
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -113,7 +113,19 @@ static void vmx_vcpu_block(struct vcpu *v)
>  		&per_cpu(vmx_pi_blocking, v->processor).lock;
>      struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
>  
> -    spin_lock_irqsave(pi_blocking_list_lock, flags);
> +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up) )
> +    {
> +        /*
> +         * The vcpu is to be destroyed and it has already been removed
> +         * from the per-CPU list if it is blocking, we shouldn't add
> +         * new vCPU to the list.
> +         */

This comment appears to be stale wrt the implementation further
down. But see there for whether it's the comment or the code
that need to change.

> +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> +        return;
> +    }
> +
> +    spin_lock(pi_blocking_list_lock);

There doesn't appear to be an active problem with this, but
taking a global lock inside a per-vCPU one feels bad. Both here
and in vmx_pi_blocking_cleanup() I can't see why the global
lock alone won't do - you acquire it anyway.

> +static void vmx_pi_blocking_cleanup(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    spinlock_t *pi_blocking_list_lock;
> +
> +    if ( !iommu_intpost )
> +        return;

If the function is to remain to be called from just the body of a loop
over all vCPU-s, please make that loop conditional upon iommu_intpost
instead of checking it here (and bailing) for every vCPU.

> +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> +    v->arch.hvm_vmx.pi_blocking_cleaned_up = 1;
> +
> +    pi_blocking_list_lock = v->arch.hvm_vmx.pi_blocking.lock;
> +    if (pi_blocking_list_lock == NULL)

Coding style.

> +    {
> +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> +        return;
> +    }
> +
> +    spin_lock(pi_blocking_list_lock);
> +    if ( v->arch.hvm_vmx.pi_blocking.lock != NULL )
> +    {
> +        ASSERT(v->arch.hvm_vmx.pi_blocking.lock == pi_blocking_list_lock);
> +        list_del(&v->arch.hvm_vmx.pi_blocking.list);
> +        v->arch.hvm_vmx.pi_blocking.lock = NULL;
> +    }
> +    spin_unlock(pi_blocking_list_lock);
> +    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> +}
> +
>  /* This function is called when pcidevs_lock is held */
>  void vmx_pi_hooks_assign(struct domain *d)
>  {
> +    struct vcpu *v;
> +
>      if ( !iommu_intpost || !has_hvm_container_domain(d) )
>          return;
>  
> -    ASSERT(!d->arch.hvm_domain.vmx.vcpu_block);
> +    for_each_vcpu ( d, v )
> +        v->arch.hvm_vmx.pi_blocking_cleaned_up = 0;
>  
> -    d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
> -    d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from;
> -    d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to;
> -    d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
> +    if ( !d->arch.hvm_domain.vmx.vcpu_block )
> +    {
> +        d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
> +        d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from;
> +        d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to;
> +        d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
> +    }
>  }
>  
>  /* This function is called when pcidevs_lock is held */
>  void vmx_pi_hooks_deassign(struct domain *d)
>  {
> +    struct vcpu *v;
> +
>      if ( !iommu_intpost || !has_hvm_container_domain(d) )
>          return;
>  
> -    ASSERT(d->arch.hvm_domain.vmx.vcpu_block);
> -
> -    d->arch.hvm_domain.vmx.vcpu_block = NULL;
> -    d->arch.hvm_domain.vmx.pi_switch_from = NULL;
> -    d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> -    d->arch.hvm_domain.vmx.pi_do_resume = NULL;
> +    for_each_vcpu ( d, v )
> +        vmx_pi_blocking_cleanup(v);

If you keep the hooks in place, why is it relevant to do the cleanup
here instead of just at domain death? As suggested before, if you
did it there, you'd likely get away without adding the new per-vCPU
flag.

Furthermore, if things remain the way they are now, a per-domain
flag would suffice.

And finally - do you really need to retain all four hooks? Since if not,
one that gets zapped here could be used in place of such a per-
domain flag.

> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -231,6 +231,9 @@ struct arch_vmx_struct {
>       * pCPU and wakeup the related vCPU.
>       */
>      struct pi_blocking_vcpu pi_blocking;
> +
> +    spinlock_t            pi_hotplug_lock;

Being through all of the patch, I have a problem seeing the hotplug
nature of this.

> +    bool_t                pi_blocking_cleaned_up;

If this flag is to remain, it should move into its designated structure
(right above your addition).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/4] VMX: Cleanup PI per-cpu blocking list when vcpu is destroyed
  2016-05-26 13:39 ` [PATCH v2 2/4] VMX: Cleanup PI per-cpu blocking list when vcpu is destroyed Feng Wu
@ 2016-05-27 13:49   ` Jan Beulich
  2016-05-31 10:22     ` Wu, Feng
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2016-05-27 13:49 UTC (permalink / raw)
  To: Feng Wu
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -366,6 +366,7 @@ static void vmx_vcpu_destroy(struct vcpu *v)
>      vmx_destroy_vmcs(v);
>      vpmu_destroy(v);
>      passive_domain_destroy(v);
> +    vmx_pi_blocking_cleanup(v);
>  }

Isn't this redundant with the cleanup done when the last device
gets removed (via pci_release_devices()) during domain cleanup?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case
  2016-05-26 13:39 ` [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case Feng Wu
@ 2016-05-27 14:00   ` Jan Beulich
  2016-05-31 10:27     ` Wu, Feng
  2016-06-22 18:00   ` George Dunlap
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2016-05-27 14:00 UTC (permalink / raw)
  To: Feng Wu
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
> Normally, in vmx_cpu_block() 'NDST' filed should have the same
> value with 'dest' or 'MASK_INSR(dest, PI_xAPIC_NDST_MASK)' depending
> on whether x2apic is enabled. However, in the following scenario,
> 'NDST' has different value:
> 
> 'vcpu_block' hook gets assigned in vmx_pi_hooks_assign(), but all
> other three PI hooks have not been assigned or not been excuted yet.
> And during this interval, we are running in vmx_vcpu_block(), then
> 'NDST' may have different value.

How about simply assigning vcpu_block last? Or alternatively
holding pi_blocking_list_lock around all four assignments? Or
(maybe in addition to one of these) writing something sensible in
vmx_pi_hooks_assign() before doing the hook assignments? Of
course much depends on whether the ASSERT() you replace was
meant just as a sanity check, or instead logic elsewhere relies on
NDST having the right value already before getting into
vmx_vcpu_block().

Also, rather than saying twice that NDST has a different value in
that case, how about stating what exact value it has then and
why that's not what we want/need? (Same goes for the code
comment then.)

> This patch fix this concern case.

Here as well as in earlier patches - do you perhaps mean "corner
case"?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 4/4] VMX: fixup PI descritpor when cpu is offline
  2016-05-26 13:39 ` [PATCH v2 4/4] VMX: fixup PI descritpor when cpu is offline Feng Wu
@ 2016-05-27 14:56   ` Jan Beulich
  2016-05-31 10:31     ` Wu, Feng
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2016-05-27 14:56 UTC (permalink / raw)
  To: Feng Wu
  Cc: kevin.tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
> @@ -102,9 +103,10 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
>  {
>      INIT_LIST_HEAD(&per_cpu(vmx_pi_blocking, cpu).list);
>      spin_lock_init(&per_cpu(vmx_pi_blocking, cpu).lock);
> +    per_cpu(vmx_pi_blocking, cpu).down = 0;

This seems pointless - per-CPU data starts out all zero (and there
are various places already which rely on that).

> @@ -122,10 +124,25 @@ static void vmx_vcpu_block(struct vcpu *v)
>           * new vCPU to the list.
>           */
>          spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> -        return;
> +        return 1;
>      }
>  
>      spin_lock(pi_blocking_list_lock);
> +    if ( unlikely(per_cpu(vmx_pi_blocking, v->processor).down) )

Is this something that can actually happen? vmx_pi_desc_fixup()
runs in stop-machine context, i.e. no CPU can actively be here (or
anywhere near the arch_vcpu_block() call sites).

> +    {
> +        /*
> +         * We being here means that the v->processor is going away, and all
> +         * the vcpus on its blocking list were removed from it. Hence we
> +         * cannot add new vcpu to it. Besides that, we return -1 to
> +         * prevent the vcpu from being blocked. This is needed because
> +         * if the vCPU is continue to block and here we don't put it
> +         * in a per-cpu blocking list, it might not be woken up by the
> +         * notification event.
> +         */
> +        spin_unlock(pi_blocking_list_lock);
> +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> +        return 0;

The comment says you mean to return -1 here...

> +void vmx_pi_desc_fixup(int cpu)

unsigned int

> +{
> +    unsigned int new_cpu, dest;
> +    unsigned long flags;
> +    struct arch_vmx_struct *vmx, *tmp;
> +    spinlock_t *new_lock, *old_lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> +    struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
> +
> +    if ( !iommu_intpost )
> +        return;
> +
> +    spin_lock_irqsave(old_lock, flags);
> +    per_cpu(vmx_pi_blocking, cpu).down = 1;
> +
> +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
> +    {
> +        /*
> +         * We need to find an online cpu as the NDST of the PI descriptor, it
> +         * doesn't matter whether it is within the cpupool of the domain or
> +         * not. As long as it is online, the vCPU will be woken up once the
> +         * notification event arrives.
> +         */
> +        new_cpu = cpu;
> +restart:

Labels indented by at least one blank please. Or even better, get
things done without goto.

> +        while ( 1 )
> +        {
> +            new_cpu = (new_cpu + 1) % nr_cpu_ids;
> +            if ( cpu_online(new_cpu) )
> +                break;
> +        }

Please don't open code things like cpumask_cycle(). But with the
restart logic likely unnecessary (see below), this would probably
better become cpumask_any() then.

> +        new_lock = &per_cpu(vmx_pi_blocking, cpu).lock;

DYM new_cpu here? In fact with ...

> +        spin_lock(new_lock);

... this I can't see how you would have successfully tested this
new code path, as I can't see how this would end in other than
a deadlock (as you hold this very lock already).

> +        /*
> +         * After acquiring the blocking list lock for the new cpu, we need
> +         * to check whether new_cpu is still online.

How could it have gone offline? As mentioned, CPUs get brought
down in stop-machine context (and btw for the very reason of
avoiding complexity like this).

> +         * If '.down' is true, it mean 'new_cpu' is also going to be offline,
> +         * so just go back to find another one, otherwise, there are two
> +         * possibilities:
> +         *   case 1 - 'new_cpu' is online.
> +         *   case 2 - 'new_cpu' is about to be offline, but doesn't get to
> +         *            the point where '.down' is set.
> +         * In either case above, we can just set 'new_cpu' to 'NDST' field.
> +         * For case 2 the 'NDST' field will be set to another online cpu when
> +         * we get to this function for 'new_cpu' some time later.
> +         */
> +        if ( per_cpu(vmx_pi_blocking, cpu).down )

And again I suspect you mean new_cpu here.

> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -833,10 +833,8 @@ void vcpu_block(void)
>  
>      set_bit(_VPF_blocked, &v->pause_flags);
>  
> -    arch_vcpu_block(v);
> -
>      /* Check for events /after/ blocking: avoids wakeup waiting race. */
> -    if ( local_events_need_delivery() )
> +    if ( arch_vcpu_block(v) || local_events_need_delivery() )

Here as well as below I'm getting the impression that you have things
backwards: vmx_vcpu_block() returns true for the two pre-existing
return paths (in which case you previously did not enter this if()'s
body), and false on the one new return path. Plus ...

> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -608,11 +608,13 @@ unsigned long hvm_cr4_guest_reserved_bits(const struct vcpu *v, bool_t restore);
>   * not been defined yet.
>   */
>  #define arch_vcpu_block(v) ({                                   \
> +    bool_t rc = 0;                                              \
>      struct vcpu *v_ = (v);                                      \
>      struct domain *d_ = v_->domain;                             \
>      if ( has_hvm_container_domain(d_) &&                        \
>           d_->arch.hvm_domain.vmx.vcpu_block )                   \
> -        d_->arch.hvm_domain.vmx.vcpu_block(v_);                 \
> +        rc = d_->arch.hvm_domain.vmx.vcpu_block(v_);            \
> +    rc;                                                         \
>  })

... rc defaulting to zero here supports my suspicion of something
having got mixed up.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-26 17:20 ` [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list Dario Faggioli
@ 2016-05-31 10:19   ` Wu, Feng
  2016-06-22 21:33     ` Dario Faggioli
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2016-05-31 10:19 UTC (permalink / raw)
  To: 'Dario Faggioli', xen-devel
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, jbeulich, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Friday, May 27, 2016 1:21 AM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: keir@xen.org; Tian, Kevin <kevin.tian@intel.com>; jbeulich@suse.com;
> andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com;
> konrad.wilk@oracle.com
> Subject: Re: [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu
> blocking list
> 
> Hi Feng,
> 
> On Thu, 2016-05-26 at 21:39 +0800, Feng Wu wrote:
> > Feng Wu (4):
> >   VMX: Properly handle pi when all the assigned devices are removed
> >   VMX: Cleanup PI per-cpu blocking list when vcpu is destroyed
> >   VMX: Assign the right value to 'NDST' field in a concern case
> >   VMX: fixup PI descritpor when cpu is offline
> >
> I need some time to carefully look at this series, and I don't have
> such amount of time right now. I'll try to look at it and send my
> comments either tomorrow or on Monday.

Thanks for your time!

> 
> However, allow me to just say that, from a quick glance, the various
> solutions does not look really convincing to me. Basically, you've
> added:
>  - a couple of flags (pi_blocking_cleaned_up, down)
>  - a new spinlock (pi_hotplug_lock)
> 
> And yet, neither the various changelogs, nor code comments explain much
> about what the lock serializes and protects, and/or what the flags
> represent ad how they should be used.
> 
> So, if you want try argumenting a bit on what was your line of
> reasoning when doing things this way, that would be helpful (at least
> to me).

'pi_hotplug_lock' is trying to protect the following scenario:
vmx_pi_blocking_cleanup() gets called either when the last assigned
device is detached or the vCPU is going to be destroyed, and at the
same time vmx_vcpu_block() is running. We need to make sure
after all the blocking vCPU is cleaned up, we should not add new
vCPU to the per-cpu blocking list. And that is why I introduce
' pi_blocking_cleaned_up' for each vCPU, which being set to
1 means that we cannot add it to the blocking list in vmx_vcpu_block().

For the other flag 'down', it is used for the following scenario:
When a pCPU is going away and meanwhile vmx_vcpu_block() is
called, we should not put the vCPU to a per-cpu blocking list, which
is going away. Another usage of 'down' is to control that the online
cpu we choose in vmx_pi_desc_fixup() is not going away as mentioned
in the comments in this function.

> 
> I'm also non-convinced that, in patch 4, the right thing to do is to
> just to pick-up a random online pCPU. At some point, during v1 review,
> I mentioned arch_move_irqs(), because it seemed to me the place from
> where you actually have the proper information available (i.e., where
> the vcpu is actually going, e.g. because the pCPU is on is going
> away!). It may well be the case that I'm wrong about it being suitable,
> and I'll look better at what you actually have implemented, but at a
> first glance, it looks cumbersome.
> 
> For instance, now arch_vcpu_block() returns a value and, as you say
> yourself in a comment, that is for (potentially) preventing a vcpu to
> block. So the behavior of schedule.c:vcpu_block(), now depends on your
> new flag per_cpu(vmx_pi_blocking, v->processor).down. Again, I'll look
> better, but this has few chances of being right (at least logically).

Like in vcpu_block(),it will check events before actually blocking the vcpu,
here we just introduce another case in which the vCPU cannot be blocked.
I don't know why you think this is problematic?

> 
> Finally, you're calling *per-vCPU* things foo_hotplug_bar, which is
> rather confusing, as it makes one thinking that they're related to
> *pCPU* hotplug.

The purpose is, we need to remove the vCPUs in the blocking list
of the pCPU which is about to go away. Do you think how should we
do for it?

Thanks,
Feng

> 
> Thanks and Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed
  2016-05-27 13:43   ` Jan Beulich
@ 2016-05-31 10:22     ` Wu, Feng
  2016-05-31 11:52       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2016-05-31 10:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, May 27, 2016 9:43 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: Re: [PATCH v2 1/4] VMX: Properly handle pi when all the assigned
> devices are removed
> 
> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -113,7 +113,19 @@ static void vmx_vcpu_block(struct vcpu *v)
> >  		&per_cpu(vmx_pi_blocking, v->processor).lock;
> >      struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> >
> > -    spin_lock_irqsave(pi_blocking_list_lock, flags);
> > +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up) )
> > +    {
> > +        /*
> > +         * The vcpu is to be destroyed and it has already been removed
> > +         * from the per-CPU list if it is blocking, we shouldn't add
> > +         * new vCPU to the list.
> > +         */
> 
> This comment appears to be stale wrt the implementation further
> down. But see there for whether it's the comment or the code
> that need to change.
> 
> > +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> > +        return;
> > +    }
> > +
> > +    spin_lock(pi_blocking_list_lock);
> 
> There doesn't appear to be an active problem with this, but
> taking a global lock inside a per-vCPU one feels bad. Both here
> and in vmx_pi_blocking_cleanup() I can't see why the global
> lock alone won't do - you acquire it anyway.

The purpose of ' v->arch.hvm_vmx.pi_hotplug_lock ' is to make
sure things go right when vmx_pi_blocking_cleanup() and
vmx_vcpu_block() are running concurrently. However, the lock
'pi_blocking_list_lock' in vmx_pi_blocking_cleanup() refers to
' v->arch.hvm_vmx.pi_blocking.lock ' while 'pi_blocking_list_lock'
in vmx_vcpu_block() is '&per_cpu(vmx_pi_blocking, v->processor).lock'.
These two can be different, please consider the following scenario:

vmx_pi_blocking_cleanup() gets called when the vcpu is in blocking
state and 'v->arch.hvm_vmx.pi_blocking.lock' points to the per-cpu
lock of pCPU0, and when acquiring the lock in this function, another
one wins, (such as vmx_pi_do_resume() or pi_wakeup_interrupt()),
so we are spinning in vmx_pi_blocking_cleanup(). After the vCPU
is woken up, it is running on pCPU1, then sometime later, the vCPU
is blocking again and in vmx_vcpu_block() and if the previous
vmx_pi_blocking_cleanup() still doesn't finish its job. Then we
acquire a different lock (it is pCPU1 this time)in vmx_vcpu_block()
compared to the one in vmx_pi_blocking_cleanup. This can break
things, since we might still put the vCPU to the per-cpu blocking list
after vCPU is destroyed or the last assigned device is detached.

> 
> > +static void vmx_pi_blocking_cleanup(struct vcpu *v)
> > +{
> > +    unsigned long flags;
> > +    spinlock_t *pi_blocking_list_lock;
> > +
> > +    if ( !iommu_intpost )
> > +        return;
> 
> If the function is to remain to be called from just the body of a loop
> over all vCPU-s, please make that loop conditional upon iommu_intpost
> instead of checking it here (and bailing) for every vCPU.
> 
> > +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> > +    v->arch.hvm_vmx.pi_blocking_cleaned_up = 1;
> > +
> > +    pi_blocking_list_lock = v->arch.hvm_vmx.pi_blocking.lock;
> > +    if (pi_blocking_list_lock == NULL)
> 
> Coding style.
> 
> > +    {
> > +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> > +        return;
> > +    }
> > +
> > +    spin_lock(pi_blocking_list_lock);
> > +    if ( v->arch.hvm_vmx.pi_blocking.lock != NULL )
> > +    {
> > +        ASSERT(v->arch.hvm_vmx.pi_blocking.lock == pi_blocking_list_lock);
> > +        list_del(&v->arch.hvm_vmx.pi_blocking.list);
> > +        v->arch.hvm_vmx.pi_blocking.lock = NULL;
> > +    }
> > +    spin_unlock(pi_blocking_list_lock);
> > +    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> > +}
> > +
> >  /* This function is called when pcidevs_lock is held */
> >  void vmx_pi_hooks_assign(struct domain *d)
> >  {
> > +    struct vcpu *v;
> > +
> >      if ( !iommu_intpost || !has_hvm_container_domain(d) )
> >          return;
> >
> > -    ASSERT(!d->arch.hvm_domain.vmx.vcpu_block);
> > +    for_each_vcpu ( d, v )
> > +        v->arch.hvm_vmx.pi_blocking_cleaned_up = 0;
> >
> > -    d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
> > -    d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from;
> > -    d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to;
> > -    d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
> > +    if ( !d->arch.hvm_domain.vmx.vcpu_block )
> > +    {
> > +        d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
> > +        d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from;
> > +        d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to;
> > +        d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
> > +    }
> >  }
> >
> >  /* This function is called when pcidevs_lock is held */
> >  void vmx_pi_hooks_deassign(struct domain *d)
> >  {
> > +    struct vcpu *v;
> > +
> >      if ( !iommu_intpost || !has_hvm_container_domain(d) )
> >          return;
> >
> > -    ASSERT(d->arch.hvm_domain.vmx.vcpu_block);
> > -
> > -    d->arch.hvm_domain.vmx.vcpu_block = NULL;
> > -    d->arch.hvm_domain.vmx.pi_switch_from = NULL;
> > -    d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> > -    d->arch.hvm_domain.vmx.pi_do_resume = NULL;
> > +    for_each_vcpu ( d, v )
> > +        vmx_pi_blocking_cleanup(v);
> 
> If you keep the hooks in place, why is it relevant to do the cleanup
> here instead of just at domain death? As suggested before, if you
> did it there, you'd likely get away without adding the new per-vCPU
> flag.

I don't quite understand this. Adding the cleanup here is to handle
the corner case when the last assigned device is detached from the
domain. Why do you think we don't need to per-vCPU flag, we need
to set it here to indicate that the vCPU is cleaned up, and in
vmx_vcpu_block(), we cannot put the vCPUs with this flag set to the
per-cpu blocking list. Do I miss something?

> 
> Furthermore, if things remain the way they are now, a per-domain
> flag would suffice.

vmx_pi_blocking_cleanup() is called in vmx_cpu_destroy(), which can
be called during domain destroy or vcpu_initialise() if it meets some
errors. For the latter case, if we use per-domain flag and set it in
vmx_pi_blocking_clean(), that should be problematic, since it will
affect another vcpus which should be running normally.

> 
> And finally - do you really need to retain all four hooks? Since if not,
> one that gets zapped here could be used in place of such a per-
> domain flag.

Can you elaborate a bit more about this? thanks a lot!

> 
> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> > @@ -231,6 +231,9 @@ struct arch_vmx_struct {
> >       * pCPU and wakeup the related vCPU.
> >       */
> >      struct pi_blocking_vcpu pi_blocking;
> > +
> > +    spinlock_t            pi_hotplug_lock;
> 
> Being through all of the patch, I have a problem seeing the hotplug
> nature of this.

This is related to the assigned device hotplug.

Thanks,
Feng

> 
> > +    bool_t                pi_blocking_cleaned_up;
> 
> If this flag is to remain, it should move into its designated structure
> (right above your addition).
> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/4] VMX: Cleanup PI per-cpu blocking list when vcpu is destroyed
  2016-05-27 13:49   ` Jan Beulich
@ 2016-05-31 10:22     ` Wu, Feng
  2016-05-31 11:54       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2016-05-31 10:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, May 27, 2016 9:49 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: Re: [PATCH v2 2/4] VMX: Cleanup PI per-cpu blocking list when vcpu is
> destroyed
> 
> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -366,6 +366,7 @@ static void vmx_vcpu_destroy(struct vcpu *v)
> >      vmx_destroy_vmcs(v);
> >      vpmu_destroy(v);
> >      passive_domain_destroy(v);
> > +    vmx_pi_blocking_cleanup(v);
> >  }
> 
> Isn't this redundant with the cleanup done when the last device
> gets removed (via pci_release_devices()) during domain cleanup?

We need to handle the following two cases:
- the last device gets removed (via 'xl pci-detach ...'), and the guest
is running after that. The logical in vmx_pi_hooks_deassign() cover
this case.
- the guest is shutting down. It is covered here.

Thanks,
Feng

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case
  2016-05-27 14:00   ` Jan Beulich
@ 2016-05-31 10:27     ` Wu, Feng
  2016-05-31 11:58       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2016-05-31 10:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, May 27, 2016 10:00 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: Re: [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a
> concern case
> 
> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
> > Normally, in vmx_cpu_block() 'NDST' filed should have the same
> > value with 'dest' or 'MASK_INSR(dest, PI_xAPIC_NDST_MASK)' depending
> > on whether x2apic is enabled. However, in the following scenario,
> > 'NDST' has different value:
> >
> > 'vcpu_block' hook gets assigned in vmx_pi_hooks_assign(), but all
> > other three PI hooks have not been assigned or not been excuted yet.
> > And during this interval, we are running in vmx_vcpu_block(), then
> > 'NDST' may have different value.
> 
> How about simply assigning vcpu_block last? Or alternatively
> holding pi_blocking_list_lock around all four assignments? Or
> (maybe in addition to one of these) writing something sensible in
> vmx_pi_hooks_assign() before doing the hook assignments? 

The problem is vmx_vcpu_block() can still get called first, since
the other ones might not get called yet.

> Of course much depends on whether the ASSERT() you replace was
> meant just as a sanity check, or instead logic elsewhere relies on
> NDST having the right value already before getting into
> vmx_vcpu_block().

The point is we need to make sure NDST have the right value
after leaving vmx_vcpu_block().

> 
> Also, rather than saying twice that NDST has a different value in
> that case, how about stating what exact value it has then and
> why that's not what we want/need? (Same goes for the code
> comment then.)

Sure, will add this in the next version.

> 
> > This patch fix this concern case.
> 
> Here as well as in earlier patches - do you perhaps mean "corner
> case"?

Sorry for the typo.

Thanks,
Feng

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 4/4] VMX: fixup PI descritpor when cpu is offline
  2016-05-27 14:56   ` Jan Beulich
@ 2016-05-31 10:31     ` Wu, Feng
  2016-06-22 18:33       ` George Dunlap
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2016-05-31 10:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, May 27, 2016 10:57 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: Re: [PATCH v2 4/4] VMX: fixup PI descritpor when cpu is offline
> 
> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
> > @@ -102,9 +103,10 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
> >  {
> >      INIT_LIST_HEAD(&per_cpu(vmx_pi_blocking, cpu).list);
> >      spin_lock_init(&per_cpu(vmx_pi_blocking, cpu).lock);
> > +    per_cpu(vmx_pi_blocking, cpu).down = 0;
> 
> This seems pointless - per-CPU data starts out all zero (and there
> are various places already which rely on that).
> 
> > @@ -122,10 +124,25 @@ static void vmx_vcpu_block(struct vcpu *v)
> >           * new vCPU to the list.
> >           */
> >          spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> > -        return;
> > +        return 1;
> >      }
> >
> >      spin_lock(pi_blocking_list_lock);
> > +    if ( unlikely(per_cpu(vmx_pi_blocking, v->processor).down) )
> 
> Is this something that can actually happen? vmx_pi_desc_fixup()
> runs in stop-machine context, i.e. no CPU can actively be here (or
> anywhere near the arch_vcpu_block() call sites).

This is related to scheduler, maybe Dario can give some input about
this. Dario?

> 
> > +    {
> > +        /*
> > +         * We being here means that the v->processor is going away, and all
> > +         * the vcpus on its blocking list were removed from it. Hence we
> > +         * cannot add new vcpu to it. Besides that, we return -1 to
> > +         * prevent the vcpu from being blocked. This is needed because
> > +         * if the vCPU is continue to block and here we don't put it
> > +         * in a per-cpu blocking list, it might not be woken up by the
> > +         * notification event.
> > +         */
> > +        spin_unlock(pi_blocking_list_lock);
> > +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> > +        return 0;
> 
> The comment says you mean to return -1 here...
> 
> > +void vmx_pi_desc_fixup(int cpu)
> 
> unsigned int
> 
> > +{
> > +    unsigned int new_cpu, dest;
> > +    unsigned long flags;
> > +    struct arch_vmx_struct *vmx, *tmp;
> > +    spinlock_t *new_lock, *old_lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> > +    struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
> > +
> > +    if ( !iommu_intpost )
> > +        return;
> > +
> > +    spin_lock_irqsave(old_lock, flags);
> > +    per_cpu(vmx_pi_blocking, cpu).down = 1;
> > +
> > +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
> > +    {
> > +        /*
> > +         * We need to find an online cpu as the NDST of the PI descriptor, it
> > +         * doesn't matter whether it is within the cpupool of the domain or
> > +         * not. As long as it is online, the vCPU will be woken up once the
> > +         * notification event arrives.
> > +         */
> > +        new_cpu = cpu;
> > +restart:
> 
> Labels indented by at least one blank please. Or even better, get
> things done without goto.
> 
> > +        while ( 1 )
> > +        {
> > +            new_cpu = (new_cpu + 1) % nr_cpu_ids;
> > +            if ( cpu_online(new_cpu) )
> > +                break;
> > +        }
> 
> Please don't open code things like cpumask_cycle(). But with the
> restart logic likely unnecessary (see below), this would probably
> better become cpumask_any() then.
> 
> > +        new_lock = &per_cpu(vmx_pi_blocking, cpu).lock;
> 
> DYM new_cpu here? In fact with ...
> 
> > +        spin_lock(new_lock);
> 
> ... this I can't see how you would have successfully tested this
> new code path, as I can't see how this would end in other than
> a deadlock (as you hold this very lock already).
> 
> > +        /*
> > +         * After acquiring the blocking list lock for the new cpu, we need
> > +         * to check whether new_cpu is still online.
> 
> How could it have gone offline? 

And I think this also needs Dario's confirm, as he is the scheduler expert.

> As mentioned, CPUs get brought
> down in stop-machine context (and btw for the very reason of
> avoiding complexity like this).
> 
> > +         * If '.down' is true, it mean 'new_cpu' is also going to be offline,
> > +         * so just go back to find another one, otherwise, there are two
> > +         * possibilities:
> > +         *   case 1 - 'new_cpu' is online.
> > +         *   case 2 - 'new_cpu' is about to be offline, but doesn't get to
> > +         *            the point where '.down' is set.
> > +         * In either case above, we can just set 'new_cpu' to 'NDST' field.
> > +         * For case 2 the 'NDST' field will be set to another online cpu when
> > +         * we get to this function for 'new_cpu' some time later.
> > +         */
> > +        if ( per_cpu(vmx_pi_blocking, cpu).down )
> 
> And again I suspect you mean new_cpu here.
> 
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -833,10 +833,8 @@ void vcpu_block(void)
> >
> >      set_bit(_VPF_blocked, &v->pause_flags);
> >
> > -    arch_vcpu_block(v);
> > -
> >      /* Check for events /after/ blocking: avoids wakeup waiting race. */
> > -    if ( local_events_need_delivery() )
> > +    if ( arch_vcpu_block(v) || local_events_need_delivery() )
> 
> Here as well as below I'm getting the impression that you have things
> backwards: vmx_vcpu_block() returns true for the two pre-existing
> return paths (in which case you previously did not enter this if()'s
> body), and false on the one new return path. Plus ...
> 
> > --- a/xen/include/asm-x86/hvm/hvm.h
> > +++ b/xen/include/asm-x86/hvm/hvm.h
> > @@ -608,11 +608,13 @@ unsigned long hvm_cr4_guest_reserved_bits(const
> struct vcpu *v, bool_t restore);
> >   * not been defined yet.
> >   */
> >  #define arch_vcpu_block(v) ({                                   \
> > +    bool_t rc = 0;                                              \
> >      struct vcpu *v_ = (v);                                      \
> >      struct domain *d_ = v_->domain;                             \
> >      if ( has_hvm_container_domain(d_) &&                        \
> >           d_->arch.hvm_domain.vmx.vcpu_block )                   \
> > -        d_->arch.hvm_domain.vmx.vcpu_block(v_);                 \
> > +        rc = d_->arch.hvm_domain.vmx.vcpu_block(v_);            \
> > +    rc;                                                         \
> >  })
> 
> ... rc defaulting to zero here supports my suspicion of something
> having got mixed up.

Oh, yes, it should be like this:

-    if ( local_events_need_delivery() )
+    if ( !arch_vcpu_block(v) || local_events_need_delivery() )

Thanks,
Feng

> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed
  2016-05-31 10:22     ` Wu, Feng
@ 2016-05-31 11:52       ` Jan Beulich
  2016-06-03  5:12         ` Wu, Feng
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2016-05-31 11:52 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 31.05.16 at 12:22, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, May 27, 2016 9:43 PM
>> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
>> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > @@ -113,7 +113,19 @@ static void vmx_vcpu_block(struct vcpu *v)
>> >  		&per_cpu(vmx_pi_blocking, v->processor).lock;
>> >      struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
>> >
>> > -    spin_lock_irqsave(pi_blocking_list_lock, flags);
>> > +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
>> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up) )
>> > +    {
>> > +        /*
>> > +         * The vcpu is to be destroyed and it has already been removed
>> > +         * from the per-CPU list if it is blocking, we shouldn't add
>> > +         * new vCPU to the list.
>> > +         */
>> 
>> This comment appears to be stale wrt the implementation further
>> down. But see there for whether it's the comment or the code
>> that need to change.
>> 
>> > +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
>> > +        return;
>> > +    }
>> > +
>> > +    spin_lock(pi_blocking_list_lock);
>> 
>> There doesn't appear to be an active problem with this, but
>> taking a global lock inside a per-vCPU one feels bad. Both here
>> and in vmx_pi_blocking_cleanup() I can't see why the global
>> lock alone won't do - you acquire it anyway.
> 
> The purpose of ' v->arch.hvm_vmx.pi_hotplug_lock ' is to make
> sure things go right when vmx_pi_blocking_cleanup() and
> vmx_vcpu_block() are running concurrently. However, the lock
> 'pi_blocking_list_lock' in vmx_pi_blocking_cleanup() refers to
> ' v->arch.hvm_vmx.pi_blocking.lock ' while 'pi_blocking_list_lock'
> in vmx_vcpu_block() is '&per_cpu(vmx_pi_blocking, v->processor).lock'.
> These two can be different, please consider the following scenario:
> 
> vmx_pi_blocking_cleanup() gets called when the vcpu is in blocking
> state and 'v->arch.hvm_vmx.pi_blocking.lock' points to the per-cpu
> lock of pCPU0, and when acquiring the lock in this function, another
> one wins, (such as vmx_pi_do_resume() or pi_wakeup_interrupt()),
> so we are spinning in vmx_pi_blocking_cleanup(). After the vCPU
> is woken up, it is running on pCPU1, then sometime later, the vCPU
> is blocking again and in vmx_vcpu_block() and if the previous
> vmx_pi_blocking_cleanup() still doesn't finish its job. Then we
> acquire a different lock (it is pCPU1 this time)in vmx_vcpu_block()
> compared to the one in vmx_pi_blocking_cleanup. This can break
> things, since we might still put the vCPU to the per-cpu blocking list
> after vCPU is destroyed or the last assigned device is detached.

Makes sense. Then the next question is whether you really need
to hold both locks at the same time. Please understand that I'd
really prefer for this to have a simpler solution - the original code
was already more complicated than one would really think it should
be, and now it's getting worse. While I can't immediately suggest
alternatives, it feels like the solution to the current problem may
rather be simplification instead of making things even more
complicated.

>> >  void vmx_pi_hooks_deassign(struct domain *d)
>> >  {
>> > +    struct vcpu *v;
>> > +
>> >      if ( !iommu_intpost || !has_hvm_container_domain(d) )
>> >          return;
>> >
>> > -    ASSERT(d->arch.hvm_domain.vmx.vcpu_block);
>> > -
>> > -    d->arch.hvm_domain.vmx.vcpu_block = NULL;
>> > -    d->arch.hvm_domain.vmx.pi_switch_from = NULL;
>> > -    d->arch.hvm_domain.vmx.pi_switch_to = NULL;
>> > -    d->arch.hvm_domain.vmx.pi_do_resume = NULL;
>> > +    for_each_vcpu ( d, v )
>> > +        vmx_pi_blocking_cleanup(v);
>> 
>> If you keep the hooks in place, why is it relevant to do the cleanup
>> here instead of just at domain death? As suggested before, if you
>> did it there, you'd likely get away without adding the new per-vCPU
>> flag.
> 
> I don't quite understand this. Adding the cleanup here is to handle
> the corner case when the last assigned device is detached from the
> domain.

There's nothing to be cleaned up really if that de-assign isn't a
result of the domain being brought down.

> Why do you think we don't need to per-vCPU flag, we need
> to set it here to indicate that the vCPU is cleaned up, and in
> vmx_vcpu_block(), we cannot put the vCPUs with this flag set to the
> per-cpu blocking list. Do I miss something?

When clean up is done only at domain destruction time, there's
per-domain state already that can be checked instead of this
per-vCPU flag.

>> Furthermore, if things remain the way they are now, a per-domain
>> flag would suffice.
> 
> vmx_pi_blocking_cleanup() is called in vmx_cpu_destroy(), which can
> be called during domain destroy or vcpu_initialise() if it meets some
> errors. For the latter case, if we use per-domain flag and set it in
> vmx_pi_blocking_clean(), that should be problematic, since it will
> affect another vcpus which should be running normally.

I don't see how the error case of vcpu_initialise() can be problematic.
Any such vCPU would never run, and hence never want to block.

>> And finally - do you really need to retain all four hooks? Since if not,
>> one that gets zapped here could be used in place of such a per-
>> domain flag.
> 
> Can you elaborate a bit more about this? thanks a lot!

I have a really hard time what more I can say here. I dislike
redundant information, and hence if the flag value can be
deduced from some other field, I'd rather see that other field
used instead of yet another flag getting added.

>> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> > @@ -231,6 +231,9 @@ struct arch_vmx_struct {
>> >       * pCPU and wakeup the related vCPU.
>> >       */
>> >      struct pi_blocking_vcpu pi_blocking;
>> > +
>> > +    spinlock_t            pi_hotplug_lock;
>> 
>> Being through all of the patch, I have a problem seeing the hotplug
>> nature of this.
> 
> This is related to the assigned device hotplug.

This reply means nothing to me. As said - I'm missing the connection
between this name and the rest of the patch (where there is no
further talk of hotplug afair).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/4] VMX: Cleanup PI per-cpu blocking list when vcpu is destroyed
  2016-05-31 10:22     ` Wu, Feng
@ 2016-05-31 11:54       ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2016-05-31 11:54 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 31.05.16 at 12:22, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, May 27, 2016 9:49 PM
>> To: Wu, Feng <feng.wu@intel.com>
>> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
>> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
>> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org 
>> Subject: Re: [PATCH v2 2/4] VMX: Cleanup PI per-cpu blocking list when vcpu 
> is
>> destroyed
>> 
>> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
>> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > @@ -366,6 +366,7 @@ static void vmx_vcpu_destroy(struct vcpu *v)
>> >      vmx_destroy_vmcs(v);
>> >      vpmu_destroy(v);
>> >      passive_domain_destroy(v);
>> > +    vmx_pi_blocking_cleanup(v);
>> >  }
>> 
>> Isn't this redundant with the cleanup done when the last device
>> gets removed (via pci_release_devices()) during domain cleanup?
> 
> We need to handle the following two cases:
> - the last device gets removed (via 'xl pci-detach ...'), and the guest
> is running after that. The logical in vmx_pi_hooks_deassign() cover
> this case.
> - the guest is shutting down. It is covered here.

Exactly. Yet that latter case is already being taken care of by the
former: When the guest is shutting down, its last device will get
removed anyway.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case
  2016-05-31 10:27     ` Wu, Feng
@ 2016-05-31 11:58       ` Jan Beulich
  2016-06-03  5:23         ` Wu, Feng
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2016-05-31 11:58 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 31.05.16 at 12:27, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, May 27, 2016 10:00 PM
>> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
>> > Normally, in vmx_cpu_block() 'NDST' filed should have the same
>> > value with 'dest' or 'MASK_INSR(dest, PI_xAPIC_NDST_MASK)' depending
>> > on whether x2apic is enabled. However, in the following scenario,
>> > 'NDST' has different value:
>> >
>> > 'vcpu_block' hook gets assigned in vmx_pi_hooks_assign(), but all
>> > other three PI hooks have not been assigned or not been excuted yet.
>> > And during this interval, we are running in vmx_vcpu_block(), then
>> > 'NDST' may have different value.
>> 
>> How about simply assigning vcpu_block last? Or alternatively
>> holding pi_blocking_list_lock around all four assignments? Or
>> (maybe in addition to one of these) writing something sensible in
>> vmx_pi_hooks_assign() before doing the hook assignments? 
> 
> The problem is vmx_vcpu_block() can still get called first, since
> the other ones might not get called yet.

That refers to the first two questions, yet the third (unanswered
one) was added by me because I anticipated that response. So
what's wrong with that last option?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed
  2016-05-31 11:52       ` Jan Beulich
@ 2016-06-03  5:12         ` Wu, Feng
  2016-06-03  9:45           ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2016-06-03  5:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, May 31, 2016 7:52 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: RE: [PATCH v2 1/4] VMX: Properly handle pi when all the assigned
> devices are removed
> 
> >>> On 31.05.16 at 12:22, <feng.wu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, May 27, 2016 9:43 PM
> >> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> > @@ -113,7 +113,19 @@ static void vmx_vcpu_block(struct vcpu *v)
> >> >  		&per_cpu(vmx_pi_blocking, v->processor).lock;
> >> >      struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> >> >
> >> > -    spin_lock_irqsave(pi_blocking_list_lock, flags);
> >> > +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> >> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up) )
> >> > +    {
> >> > +        /*
> >> > +         * The vcpu is to be destroyed and it has already been removed
> >> > +         * from the per-CPU list if it is blocking, we shouldn't add
> >> > +         * new vCPU to the list.
> >> > +         */
> >>
> >> This comment appears to be stale wrt the implementation further
> >> down. But see there for whether it's the comment or the code
> >> that need to change.
> >>
> >> > +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> >> > +        return;
> >> > +    }
> >> > +
> >> > +    spin_lock(pi_blocking_list_lock);
> >>
> >> There doesn't appear to be an active problem with this, but
> >> taking a global lock inside a per-vCPU one feels bad. Both here
> >> and in vmx_pi_blocking_cleanup() I can't see why the global
> >> lock alone won't do - you acquire it anyway.
> >
> > The purpose of ' v->arch.hvm_vmx.pi_hotplug_lock ' is to make
> > sure things go right when vmx_pi_blocking_cleanup() and
> > vmx_vcpu_block() are running concurrently. However, the lock
> > 'pi_blocking_list_lock' in vmx_pi_blocking_cleanup() refers to
> > ' v->arch.hvm_vmx.pi_blocking.lock ' while 'pi_blocking_list_lock'
> > in vmx_vcpu_block() is '&per_cpu(vmx_pi_blocking, v->processor).lock'.
> > These two can be different, please consider the following scenario:
> >
> > vmx_pi_blocking_cleanup() gets called when the vcpu is in blocking
> > state and 'v->arch.hvm_vmx.pi_blocking.lock' points to the per-cpu
> > lock of pCPU0, and when acquiring the lock in this function, another
> > one wins, (such as vmx_pi_do_resume() or pi_wakeup_interrupt()),
> > so we are spinning in vmx_pi_blocking_cleanup(). After the vCPU
> > is woken up, it is running on pCPU1, then sometime later, the vCPU
> > is blocking again and in vmx_vcpu_block() and if the previous
> > vmx_pi_blocking_cleanup() still doesn't finish its job. Then we
> > acquire a different lock (it is pCPU1 this time)in vmx_vcpu_block()
> > compared to the one in vmx_pi_blocking_cleanup. This can break
> > things, since we might still put the vCPU to the per-cpu blocking list
> > after vCPU is destroyed or the last assigned device is detached.
> 
> Makes sense. Then the next question is whether you really need
> to hold both locks at the same time.

I think we need to hold both. The two spinlocks have different purpose:
'v->arch.hvm_vmx.pi_hotplug_lock' is to protect some race case while
device hotplug or the domain is being down, while the other one is
used to normally protect accesses to the blocking list.

> Please understand that I'd
> really prefer for this to have a simpler solution - the original code
> was already more complicated than one would really think it should
> be, and now it's getting worse. While I can't immediately suggest
> alternatives, it feels like the solution to the current problem may
> rather be simplification instead of making things even more
> complicated.

To be honest, comments like this make one frustrated. If you have
any other better solutions, we can discuss it here. However, just
saying the current solution is too complicated is not a good way
to speed up the process.

> 
> >> >  void vmx_pi_hooks_deassign(struct domain *d)
> >> >  {
> >> > +    struct vcpu *v;
> >> > +
> >> >      if ( !iommu_intpost || !has_hvm_container_domain(d) )
> >> >          return;
> >> >
> >> > -    ASSERT(d->arch.hvm_domain.vmx.vcpu_block);
> >> > -
> >> > -    d->arch.hvm_domain.vmx.vcpu_block = NULL;
> >> > -    d->arch.hvm_domain.vmx.pi_switch_from = NULL;
> >> > -    d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> >> > -    d->arch.hvm_domain.vmx.pi_do_resume = NULL;
> >> > +    for_each_vcpu ( d, v )
> >> > +        vmx_pi_blocking_cleanup(v);
> >>
> >> If you keep the hooks in place, why is it relevant to do the cleanup
> >> here instead of just at domain death? As suggested before, if you
> >> did it there, you'd likely get away without adding the new per-vCPU
> >> flag.
> >
> > I don't quite understand this. Adding the cleanup here is to handle
> > the corner case when the last assigned device is detached from the
> > domain.
> 
> There's nothing to be cleaned up really if that de-assign isn't a
> result of the domain being brought down.

I mentioned it clearly in [0/4] of this series. We _need_ to cleanup
the blocking list when removing the device while the domain
is running, since vmx_vcpu_block() might be running at the same
time. If that is the case, there might be some stale vcpus in the
blocking list.

> 
> > Why do you think we don't need to per-vCPU flag, we need
> > to set it here to indicate that the vCPU is cleaned up, and in
> > vmx_vcpu_block(), we cannot put the vCPUs with this flag set to the
> > per-cpu blocking list. Do I miss something?
> 
> When clean up is done only at domain destruction time, 

No.

Thanks,
Feng

> there's per-domain state already that can be checked instead of this
> per-vCPU flag.
> 
> >> Furthermore, if things remain the way they are now, a per-domain
> >> flag would suffice.
> >
> > vmx_pi_blocking_cleanup() is called in vmx_cpu_destroy(), which can
> > be called during domain destroy or vcpu_initialise() if it meets some
> > errors. For the latter case, if we use per-domain flag and set it in
> > vmx_pi_blocking_clean(), that should be problematic, since it will
> > affect another vcpus which should be running normally.
> 
> I don't see how the error case of vcpu_initialise() can be problematic.
> Any such vCPU would never run, and hence never want to block.
> 
> >> And finally - do you really need to retain all four hooks? Since if not,
> >> one that gets zapped here could be used in place of such a per-
> >> domain flag.
> >
> > Can you elaborate a bit more about this? thanks a lot!
> 
> I have a really hard time what more I can say here. I dislike
> redundant information, and hence if the flag value can be
> deduced from some other field, I'd rather see that other field
> used instead of yet another flag getting added.
> 
> >> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> >> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> >> > @@ -231,6 +231,9 @@ struct arch_vmx_struct {
> >> >       * pCPU and wakeup the related vCPU.
> >> >       */
> >> >      struct pi_blocking_vcpu pi_blocking;
> >> > +
> >> > +    spinlock_t            pi_hotplug_lock;
> >>
> >> Being through all of the patch, I have a problem seeing the hotplug
> >> nature of this.
> >
> > This is related to the assigned device hotplug.
> 
> This reply means nothing to me. As said - I'm missing the connection
> between this name and the rest of the patch (where there is no
> further talk of hotplug afair).
> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case
  2016-05-31 11:58       ` Jan Beulich
@ 2016-06-03  5:23         ` Wu, Feng
  2016-06-03  9:57           ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2016-06-03  5:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, May 31, 2016 7:58 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
> Subject: RE: [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a
> concern case
> 
> >>> On 31.05.16 at 12:27, <feng.wu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, May 27, 2016 10:00 PM
> >> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
> >> > Normally, in vmx_cpu_block() 'NDST' filed should have the same
> >> > value with 'dest' or 'MASK_INSR(dest, PI_xAPIC_NDST_MASK)' depending
> >> > on whether x2apic is enabled. However, in the following scenario,
> >> > 'NDST' has different value:
> >> >
> >> > 'vcpu_block' hook gets assigned in vmx_pi_hooks_assign(), but all
> >> > other three PI hooks have not been assigned or not been excuted yet.
> >> > And during this interval, we are running in vmx_vcpu_block(), then
> >> > 'NDST' may have different value.
> >>
> >> How about simply assigning vcpu_block last? Or alternatively
> >> holding pi_blocking_list_lock around all four assignments? Or
> >> (maybe in addition to one of these) writing something sensible in
> >> vmx_pi_hooks_assign() before doing the hook assignments?
> >
> > The problem is vmx_vcpu_block() can still get called first, since
> > the other ones might not get called yet.
> 
> That refers to the first two questions, yet the third (unanswered
> one) was added by me because I anticipated that response. So
> what's wrong with that last option?

Do you mean " Or (maybe in addition to one of these) writing
something sensible in vmx_pi_hooks_assign() before doing the
hook assignments?"?

The problem is you cannot know whether vmx_vcpu_block() is
called before or after the other hooks. And I am wondering
why do you think the current solution is not good.

Thanks,
Feng

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed
  2016-06-03  5:12         ` Wu, Feng
@ 2016-06-03  9:45           ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2016-06-03  9:45 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 03.06.16 at 07:12, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Tuesday, May 31, 2016 7:52 PM
>> >>> On 31.05.16 at 12:22, <feng.wu@intel.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> Sent: Friday, May 27, 2016 9:43 PM
>> >> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
>> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> >> > @@ -113,7 +113,19 @@ static void vmx_vcpu_block(struct vcpu *v)
>> >> >  		&per_cpu(vmx_pi_blocking, v->processor).lock;
>> >> >      struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
>> >> >
>> >> > -    spin_lock_irqsave(pi_blocking_list_lock, flags);
>> >> > +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
>> >> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up) )
>> >> > +    {
>> >> > +        /*
>> >> > +         * The vcpu is to be destroyed and it has already been removed
>> >> > +         * from the per-CPU list if it is blocking, we shouldn't add
>> >> > +         * new vCPU to the list.
>> >> > +         */
>> >> > +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
>> >> > +        return;
>> >> > +    }
>> >> > +
>> >> > +    spin_lock(pi_blocking_list_lock);
>> >>
>> >> There doesn't appear to be an active problem with this, but
>> >> taking a global lock inside a per-vCPU one feels bad. Both here
>> >> and in vmx_pi_blocking_cleanup() I can't see why the global
>> >> lock alone won't do - you acquire it anyway.
>> >
>> > The purpose of ' v->arch.hvm_vmx.pi_hotplug_lock ' is to make
>> > sure things go right when vmx_pi_blocking_cleanup() and
>> > vmx_vcpu_block() are running concurrently. However, the lock
>> > 'pi_blocking_list_lock' in vmx_pi_blocking_cleanup() refers to
>> > ' v->arch.hvm_vmx.pi_blocking.lock ' while 'pi_blocking_list_lock'
>> > in vmx_vcpu_block() is '&per_cpu(vmx_pi_blocking, v->processor).lock'.
>> > These two can be different, please consider the following scenario:
>> >
>> > vmx_pi_blocking_cleanup() gets called when the vcpu is in blocking
>> > state and 'v->arch.hvm_vmx.pi_blocking.lock' points to the per-cpu
>> > lock of pCPU0, and when acquiring the lock in this function, another
>> > one wins, (such as vmx_pi_do_resume() or pi_wakeup_interrupt()),
>> > so we are spinning in vmx_pi_blocking_cleanup(). After the vCPU
>> > is woken up, it is running on pCPU1, then sometime later, the vCPU
>> > is blocking again and in vmx_vcpu_block() and if the previous
>> > vmx_pi_blocking_cleanup() still doesn't finish its job. Then we
>> > acquire a different lock (it is pCPU1 this time)in vmx_vcpu_block()
>> > compared to the one in vmx_pi_blocking_cleanup. This can break
>> > things, since we might still put the vCPU to the per-cpu blocking list
>> > after vCPU is destroyed or the last assigned device is detached.
>> 
>> Makes sense. Then the next question is whether you really need
>> to hold both locks at the same time.
> 
> I think we need to hold both. The two spinlocks have different purpose:
> 'v->arch.hvm_vmx.pi_hotplug_lock' is to protect some race case while
> device hotplug or the domain is being down, while the other one is
> used to normally protect accesses to the blocking list.

I understand they have different purposes, but that doesn't mean
they need to be held at the same time. In particular (looking back
at the full patch), the newly added lock frames not only the newly
added code, but also the code inside the other locked region. The
natural thing to expect would be that you need one lock for that
new code, and the other for the pre-existing one. I.e. the function
would still acquire both locks, but not one inside the other.

>> Please understand that I'd
>> really prefer for this to have a simpler solution - the original code
>> was already more complicated than one would really think it should
>> be, and now it's getting worse. While I can't immediately suggest
>> alternatives, it feels like the solution to the current problem may
>> rather be simplification instead of making things even more
>> complicated.
> 
> To be honest, comments like this make one frustrated. If you have
> any other better solutions, we can discuss it here. However, just
> saying the current solution is too complicated is not a good way
> to speed up the process.

I can understand your frustration. But please understand that I'm
also getting frustrated - by more and more difficult to maintain code
getting added. Please understand that addressing the immediate
problem is only one aspect; another is the long term maintainability
of whatever gets added for that purpose. But see below.

It is a more general observation of mine (also, just fyi, in my own
code) that if complicated things need even more complicated fixes,
then quite often something is wrong with the original complicated
arrangements / design.

>> >> >  void vmx_pi_hooks_deassign(struct domain *d)
>> >> >  {
>> >> > +    struct vcpu *v;
>> >> > +
>> >> >      if ( !iommu_intpost || !has_hvm_container_domain(d) )
>> >> >          return;
>> >> >
>> >> > -    ASSERT(d->arch.hvm_domain.vmx.vcpu_block);
>> >> > -
>> >> > -    d->arch.hvm_domain.vmx.vcpu_block = NULL;
>> >> > -    d->arch.hvm_domain.vmx.pi_switch_from = NULL;
>> >> > -    d->arch.hvm_domain.vmx.pi_switch_to = NULL;
>> >> > -    d->arch.hvm_domain.vmx.pi_do_resume = NULL;
>> >> > +    for_each_vcpu ( d, v )
>> >> > +        vmx_pi_blocking_cleanup(v);
>> >>
>> >> If you keep the hooks in place, why is it relevant to do the cleanup
>> >> here instead of just at domain death? As suggested before, if you
>> >> did it there, you'd likely get away without adding the new per-vCPU
>> >> flag.
>> >
>> > I don't quite understand this. Adding the cleanup here is to handle
>> > the corner case when the last assigned device is detached from the
>> > domain.
>> 
>> There's nothing to be cleaned up really if that de-assign isn't a
>> result of the domain being brought down.
> 
> I mentioned it clearly in [0/4] of this series. We _need_ to cleanup
> the blocking list when removing the device while the domain
> is running, since vmx_vcpu_block() might be running at the same
> time. If that is the case, there might be some stale vcpus in the
> blocking list.

I have to admit that the description there is far from clear to me,
even more so in the specific regard being discussed here.

Anyway - if there are problems here, did you consider simply
broadcasting a PI wakeup interrupt after device removal (and
after having "manually" set the condition to satisfy the
pi_test_on() in pi_wakeup_interrupt())? It would seem to me that
this would leverage existing code without adding new complicated
logic. But of course I may be overlooking something.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case
  2016-06-03  5:23         ` Wu, Feng
@ 2016-06-03  9:57           ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2016-06-03  9:57 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 03.06.16 at 07:23, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Tuesday, May 31, 2016 7:58 PM
>> To: Wu, Feng <feng.wu@intel.com>
>> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
>> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
>> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org 
>> Subject: RE: [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a
>> concern case
>> 
>> >>> On 31.05.16 at 12:27, <feng.wu@intel.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> Sent: Friday, May 27, 2016 10:00 PM
>> >> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
>> >> > Normally, in vmx_cpu_block() 'NDST' filed should have the same
>> >> > value with 'dest' or 'MASK_INSR(dest, PI_xAPIC_NDST_MASK)' depending
>> >> > on whether x2apic is enabled. However, in the following scenario,
>> >> > 'NDST' has different value:
>> >> >
>> >> > 'vcpu_block' hook gets assigned in vmx_pi_hooks_assign(), but all
>> >> > other three PI hooks have not been assigned or not been excuted yet.
>> >> > And during this interval, we are running in vmx_vcpu_block(), then
>> >> > 'NDST' may have different value.
>> >>
>> >> How about simply assigning vcpu_block last? Or alternatively
>> >> holding pi_blocking_list_lock around all four assignments? Or
>> >> (maybe in addition to one of these) writing something sensible in
>> >> vmx_pi_hooks_assign() before doing the hook assignments?
>> >
>> > The problem is vmx_vcpu_block() can still get called first, since
>> > the other ones might not get called yet.
>> 
>> That refers to the first two questions, yet the third (unanswered
>> one) was added by me because I anticipated that response. So
>> what's wrong with that last option?
> 
> Do you mean " Or (maybe in addition to one of these) writing
> something sensible in vmx_pi_hooks_assign() before doing the
> hook assignments?"?

Yes.

> The problem is you cannot know whether vmx_vcpu_block() is
> called before or after the other hooks.

Well - it for sure can't get called before getting installed as a hook.
I.e. if you set up proper state before installing it, you should be
fine.

> And I am wondering
> why do you think the current solution is not good.

See the other thread (on patch 1). When dealing with corner
cases, as a first option (leading to less complication, or even
simplification) it should be understood why they are corner
cases in the first place, i.e. why the normal code flow doesn't
take care of them. It's only the second best option to add
extra logic / conditionals in order to deal with such.

For the case at hand that first question translates to "Why
does the ASSERT() not hold, despite us having thought it
always would?" And yes, the change here isn't complicated,
so may well be fine to go in, but together with the other
patches I can't help getting the feeling that the overall
situation wasn't really fully understood (and thus explained in
the patch overview and descriptions). In particular, this part
of the source comment

+     * 'vcpu_block' hook gets assigned in vmx_pi_hooks_assign(), but all
+     * other three PI hooks have not been assigned or not been excuted yet.
+     * And during this interval, we get here in this function, then 'NDST'
+     * may have different value.

suggests to me that my consideration above (about setting up
state correctly _before_ assigning the hooks) provides a viable
alternative solution.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case
  2016-05-26 13:39 ` [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case Feng Wu
  2016-05-27 14:00   ` Jan Beulich
@ 2016-06-22 18:00   ` George Dunlap
  2016-06-24  9:08     ` Wu, Feng
  1 sibling, 1 reply; 32+ messages in thread
From: George Dunlap @ 2016-06-22 18:00 UTC (permalink / raw)
  To: Feng Wu
  Cc: Tian, Kevin, Keir Fraser, Andrew Cooper, Dario Faggioli,
	xen-devel, Jan Beulich

On Thu, May 26, 2016 at 2:39 PM, Feng Wu <feng.wu@intel.com> wrote:
> Normally, in vmx_cpu_block() 'NDST' filed should have the same
> value with 'dest' or 'MASK_INSR(dest, PI_xAPIC_NDST_MASK)' depending
> on whether x2apic is enabled. However, in the following scenario,
> 'NDST' has different value:
>
> 'vcpu_block' hook gets assigned in vmx_pi_hooks_assign(), but all
> other three PI hooks have not been assigned or not been excuted yet.
> And during this interval, we are running in vmx_vcpu_block(), then
> 'NDST' may have different value.
>
> This patch fix this concern case.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>

I agree with Jan that a cleaner solution here would be making sure
that all the appropriate state is actually set up for all vcpus before
leaving vmx_pi_hooks_assign().  With the patch you propose, the
following sequence of events is possible:

* vcpu 0 starts running on a pcpu
* a device is assigned, causing the hooks to be set
* an interrupt from the device is routed to vcpu 0, but it is not
actually delivered properly, since ndst is not pointing to the right
processor.

One option would be to pause all vcpus before setting the hooks and
then un-pause them; this would force all the vcpus to go through
vmx_pi_switch_to() before vmx_vcpu_block().  Another would be to grab
the scheduler lock for each pcpu and write the vcpu's ndst with the
appropriate value before setting the hooks.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 4/4] VMX: fixup PI descritpor when cpu is offline
  2016-05-31 10:31     ` Wu, Feng
@ 2016-06-22 18:33       ` George Dunlap
  2016-06-24  6:34         ` Wu, Feng
  0 siblings, 1 reply; 32+ messages in thread
From: George Dunlap @ 2016-06-22 18:33 UTC (permalink / raw)
  To: Wu, Feng
  Cc: Tian, Kevin, keir, andrew.cooper3, dario.faggioli, xen-devel,
	Jan Beulich

On Tue, May 31, 2016 at 11:31 AM, Wu, Feng <feng.wu@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, May 27, 2016 10:57 PM
>> To: Wu, Feng <feng.wu@intel.com>
>> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
>> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
>> devel@lists.xen.org; konrad.wilk@oracle.com; keir@xen.org
>> Subject: Re: [PATCH v2 4/4] VMX: fixup PI descritpor when cpu is offline
>>
>> >>> On 26.05.16 at 15:39, <feng.wu@intel.com> wrote:
>> > @@ -102,9 +103,10 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
>> >  {
>> >      INIT_LIST_HEAD(&per_cpu(vmx_pi_blocking, cpu).list);
>> >      spin_lock_init(&per_cpu(vmx_pi_blocking, cpu).lock);
>> > +    per_cpu(vmx_pi_blocking, cpu).down = 0;
>>
>> This seems pointless - per-CPU data starts out all zero (and there
>> are various places already which rely on that).
>>
>> > @@ -122,10 +124,25 @@ static void vmx_vcpu_block(struct vcpu *v)
>> >           * new vCPU to the list.
>> >           */
>> >          spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
>> > -        return;
>> > +        return 1;
>> >      }
>> >
>> >      spin_lock(pi_blocking_list_lock);
>> > +    if ( unlikely(per_cpu(vmx_pi_blocking, v->processor).down) )
>>
>> Is this something that can actually happen? vmx_pi_desc_fixup()
>> runs in stop-machine context, i.e. no CPU can actively be here (or
>> anywhere near the arch_vcpu_block() call sites).
>
> This is related to scheduler, maybe Dario can give some input about
> this. Dario?

Yeah, I was going to say just looking through this -- there's no way
that vmx_cpu_dead() will be called while there are vcpus *still
running* on that pcpu.  vcpus will be migrated to other pcpus before
that happens.  All of your changes around vmx_vcpu_block() are
unnecessary.

>> > +        new_lock = &per_cpu(vmx_pi_blocking, cpu).lock;
>>
>> DYM new_cpu here? In fact with ...
>>
>> > +        spin_lock(new_lock);
>>
>> ... this I can't see how you would have successfully tested this
>> new code path, as I can't see how this would end in other than
>> a deadlock (as you hold this very lock already).

Indeed, it's not very nice to send us complicated code that obviously
can't even run.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-05-31 10:19   ` Wu, Feng
@ 2016-06-22 21:33     ` Dario Faggioli
  2016-06-24  6:33       ` Wu, Feng
  0 siblings, 1 reply; 32+ messages in thread
From: Dario Faggioli @ 2016-06-22 21:33 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 3303 bytes --]

On Tue, 2016-05-31 at 10:19 +0000, Wu, Feng wrote:
> > -----Original Message-----
> > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > 
> > So, if you want try argumenting a bit on what was your line of
> > reasoning when doing things this way, that would be helpful (at
> > least
> > to me).
> 'pi_hotplug_lock' is trying to protect the following scenario:
> vmx_pi_blocking_cleanup() gets called either when the last assigned
> device is detached or the vCPU is going to be destroyed, and at the
> same time vmx_vcpu_block() is running. We need to make sure
> after all the blocking vCPU is cleaned up, we should not add new
> vCPU to the per-cpu blocking list. And that is why I introduce
> ' pi_blocking_cleaned_up' for each vCPU, which being set to
> 1 means that we cannot add it to the blocking list in
> vmx_vcpu_block().
> 
By "the vCPU is going to be destroyed", to what code path do you refer?
Because, for instance, there's this:

  domain_kill() --> domain_destroy() --> complete_domain_destroy() --
  --> vcpu_destroy() --> hvm_vcpu_destroy()

in which case, the vCPUs are not running --and hence can't block--
during their own destruction.

I take that you've found a path where that does not hold, and hence
requires this kind of protection?

For the other race (last device being unassigned), I'll look more into
it, but, in general, I share George's and Jan's views that we need
simpler, more consistent and easier to maintain solutions.

> For the other flag 'down', it is used for the following scenario:
> When a pCPU is going away and meanwhile vmx_vcpu_block() is
> called, we should not put the vCPU to a per-cpu blocking list, which
> is going away. 
> 
But, in this case, as George basically said already, if a pCPU is being
destroyed, there should be no vCPU running on it, and hence no vCPU
that, if blocking, would need being added to the pCPU blocking list.

> > For instance, now arch_vcpu_block() returns a value and, as you say
> > yourself in a comment, that is for (potentially) preventing a vcpu
> > to
> > block. So the behavior of schedule.c:vcpu_block(), now depends on
> > your
> > new flag per_cpu(vmx_pi_blocking, v->processor).down. Again, I'll
> > look
> > better, but this has few chances of being right (at least
> > logically).
> Like in vcpu_block(),it will check events before actually blocking
> the vcpu,
> here we just introduce another case in which the vCPU cannot be
> blocked.
> I don't know why you think this is problematic?
> 
Well, but, right now, it's like this:
 - the vCPU should block, waiting for an event
 - it turns out the event is already arrive
 - we can avoid blocking

In your case, AFAICUI, it's:
 - the vCPU should block, waiting for an event
 - the event is _not_ arrived, so we indeed should block
 - we do *not* block, due to some other reason

That does not look right to me... what about the fact that we wanted to
actually wait for the event? :-O

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-06-22 21:33     ` Dario Faggioli
@ 2016-06-24  6:33       ` Wu, Feng
  2016-06-24 10:29         ` Dario Faggioli
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2016-06-24  6:33 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, jbeulich, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Thursday, June 23, 2016 5:33 AM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: keir@xen.org; Tian, Kevin <kevin.tian@intel.com>; jbeulich@suse.com;
> andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com;
> konrad.wilk@oracle.com
> Subject: Re: [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu
> blocking list
> 
> On Tue, 2016-05-31 at 10:19 +0000, Wu, Feng wrote:
> > > -----Original Message-----
> > > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > >
> > > So, if you want try argumenting a bit on what was your line of
> > > reasoning when doing things this way, that would be helpful (at
> > > least
> > > to me).
> > 'pi_hotplug_lock' is trying to protect the following scenario:
> > vmx_pi_blocking_cleanup() gets called either when the last assigned
> > device is detached or the vCPU is going to be destroyed, and at the
> > same time vmx_vcpu_block() is running. We need to make sure
> > after all the blocking vCPU is cleaned up, we should not add new
> > vCPU to the per-cpu blocking list. And that is why I introduce
> > ' pi_blocking_cleaned_up' for each vCPU, which being set to
> > 1 means that we cannot add it to the blocking list in
> > vmx_vcpu_block().
> >
> By "the vCPU is going to be destroyed", to what code path do you refer?
> Because, for instance, there's this:
> 
>   domain_kill() --> domain_destroy() --> complete_domain_destroy() --
>   --> vcpu_destroy() --> hvm_vcpu_destroy()

Yes, this is the case I was thinking of.

> 
> in which case, the vCPUs are not running --and hence can't block--
> during their own destruction.

Thanks for the clarification. That is good I don't need to consider
this case.:)

> 
> I take that you've found a path where that does not hold, and hence
> requires this kind of protection?
> 
> For the other race (last device being unassigned), I'll look more into
> it, but, in general, I share George's and Jan's views that we need
> simpler, more consistent and easier to maintain solutions.

Thanks for your time and looking forward to your comments!

> 
> > For the other flag 'down', it is used for the following scenario:
> > When a pCPU is going away and meanwhile vmx_vcpu_block() is
> > called, we should not put the vCPU to a per-cpu blocking list, which
> > is going away.
> >
> But, in this case, as George basically said already, if a pCPU is being
> destroyed, there should be no vCPU running on it, and hence no vCPU
> that, if blocking, would need being added to the pCPU blocking list.
> 
> > > For instance, now arch_vcpu_block() returns a value and, as you say
> > > yourself in a comment, that is for (potentially) preventing a vcpu
> > > to
> > > block. So the behavior of schedule.c:vcpu_block(), now depends on
> > > your
> > > new flag per_cpu(vmx_pi_blocking, v->processor).down. Again, I'll
> > > look
> > > better, but this has few chances of being right (at least
> > > logically).
> > Like in vcpu_block(),it will check events before actually blocking
> > the vcpu,
> > here we just introduce another case in which the vCPU cannot be
> > blocked.
> > I don't know why you think this is problematic?
> >
> Well, but, right now, it's like this:
>  - the vCPU should block, waiting for an event
>  - it turns out the event is already arrive
>  - we can avoid blocking
> 
> In your case, AFAICUI, it's:
>  - the vCPU should block, waiting for an event
>  - the event is _not_ arrived, so we indeed should block
>  - we do *not* block, due to some other reason
> 
> That does not look right to me... what about the fact that we wanted to
> actually wait for the event? :-O

I understand your point. In my understanding, currently, vcpu_block() is
for guest's HLT operation, which means, guest has nothing to do. In
this case, even we return (not blocking), seems the function is not
broken.

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 4/4] VMX: fixup PI descritpor when cpu is offline
  2016-06-22 18:33       ` George Dunlap
@ 2016-06-24  6:34         ` Wu, Feng
  0 siblings, 0 replies; 32+ messages in thread
From: Wu, Feng @ 2016-06-24  6:34 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tian, Kevin, keir, andrew.cooper3, dario.faggioli, xen-devel,
	Jan Beulich, Wu, Feng

> >> > @@ -122,10 +124,25 @@ static void vmx_vcpu_block(struct vcpu *v)
> >> >           * new vCPU to the list.
> >> >           */
> >> >          spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> >> > -        return;
> >> > +        return 1;
> >> >      }
> >> >
> >> >      spin_lock(pi_blocking_list_lock);
> >> > +    if ( unlikely(per_cpu(vmx_pi_blocking, v->processor).down) )
> >>
> >> Is this something that can actually happen? vmx_pi_desc_fixup()
> >> runs in stop-machine context, i.e. no CPU can actively be here (or
> >> anywhere near the arch_vcpu_block() call sites).
> >
> > This is related to scheduler, maybe Dario can give some input about
> > this. Dario?
> 
> Yeah, I was going to say just looking through this -- there's no way
> that vmx_cpu_dead() will be called while there are vcpus *still
> running* on that pcpu.  vcpus will be migrated to other pcpus before
> that happens.  All of your changes around vmx_vcpu_block() are
> unnecessary.

Thanks for the clarification, George!

Thanks,
Feng
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case
  2016-06-22 18:00   ` George Dunlap
@ 2016-06-24  9:08     ` Wu, Feng
  0 siblings, 0 replies; 32+ messages in thread
From: Wu, Feng @ 2016-06-24  9:08 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tian, Kevin, Keir Fraser, Andrew Cooper, Dario Faggioli,
	xen-devel, Jan Beulich, Wu, Feng



> -----Original Message-----
> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of George
> Dunlap
> Sent: Thursday, June 23, 2016 2:01 AM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: xen-devel@lists.xen.org; Tian, Kevin <kevin.tian@intel.com>; Keir Fraser
> <keir@xen.org>; Andrew Cooper <andrew.cooper3@citrix.com>; Dario Faggioli
> <dario.faggioli@citrix.com>; Jan Beulich <jbeulich@suse.com>
> Subject: Re: [Xen-devel] [PATCH v2 3/4] VMX: Assign the right value to 'NDST'
> field in a concern case
> 
> On Thu, May 26, 2016 at 2:39 PM, Feng Wu <feng.wu@intel.com> wrote:
> > Normally, in vmx_cpu_block() 'NDST' filed should have the same
> > value with 'dest' or 'MASK_INSR(dest, PI_xAPIC_NDST_MASK)' depending
> > on whether x2apic is enabled. However, in the following scenario,
> > 'NDST' has different value:
> >
> > 'vcpu_block' hook gets assigned in vmx_pi_hooks_assign(), but all
> > other three PI hooks have not been assigned or not been excuted yet.
> > And during this interval, we are running in vmx_vcpu_block(), then
> > 'NDST' may have different value.
> >
> > This patch fix this concern case.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> 
> I agree with Jan that a cleaner solution here would be making sure
> that all the appropriate state is actually set up for all vcpus before
> leaving vmx_pi_hooks_assign().  With the patch you propose, the
> following sequence of events is possible:
> 
> * vcpu 0 starts running on a pcpu
> * a device is assigned, causing the hooks to be set
> * an interrupt from the device is routed to vcpu 0, but it is not
> actually delivered properly, since ndst is not pointing to the right
> processor.
> 
> One option would be to pause all vcpus before setting the hooks and
> then un-pause them; this would force all the vcpus to go through
> vmx_pi_switch_to() before vmx_vcpu_block().  Another would be to grab
> the scheduler lock for each pcpu and write the vcpu's ndst with the
> appropriate value before setting the hooks.

That sounds a great idea. Besides that, maybe we can also pause/unpause
the domain before/after unsetting the hooks, then we don't need to
care about the race condition when vmx_pi_hooks_deassign() and
vmx_vcpu_block() get called at the same time. After unpause the domain,
we can safely remove the vCPUs from the per-cpu blocking list if needed.

Thanks,
Feng

> 
>  -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-06-24  6:33       ` Wu, Feng
@ 2016-06-24 10:29         ` Dario Faggioli
  2016-06-24 13:42           ` Wu, Feng
  0 siblings, 1 reply; 32+ messages in thread
From: Dario Faggioli @ 2016-06-24 10:29 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 1133 bytes --]

On Fri, 2016-06-24 at 06:33 +0000, Wu, Feng wrote:
> > -----Original Message-----
> > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > In your case, AFAICUI, it's:
> >  - the vCPU should block, waiting for an event
> >  - the event is _not_ arrived, so we indeed should block
> >  - we do *not* block, due to some other reason
> > 
> > That does not look right to me... what about the fact that we
> > wanted to
> > actually wait for the event? :-O
> I understand your point. In my understanding, currently, vcpu_block()
> is
> for guest's HLT operation, which means, guest has nothing to do. In
> this case, even we return (not blocking), seems the function is not
> broken.
>
So, basically, you're saying that the vcpu has nothing to do, and in
fact it executed an HLT instruction, and you want to let it continue to
run? :-O

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-06-24 10:29         ` Dario Faggioli
@ 2016-06-24 13:42           ` Wu, Feng
  2016-06-24 13:49             ` George Dunlap
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2016-06-24 13:42 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, jbeulich, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Friday, June 24, 2016 6:29 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: keir@xen.org; Tian, Kevin <kevin.tian@intel.com>; jbeulich@suse.com;
> andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com;
> konrad.wilk@oracle.com
> Subject: Re: [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu
> blocking list
> 
> On Fri, 2016-06-24 at 06:33 +0000, Wu, Feng wrote:
> > > -----Original Message-----
> > > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > > In your case, AFAICUI, it's:
> > >  - the vCPU should block, waiting for an event
> > >  - the event is _not_ arrived, so we indeed should block
> > >  - we do *not* block, due to some other reason
> > >
> > > That does not look right to me... what about the fact that we
> > > wanted to
> > > actually wait for the event? :-O
> > I understand your point. In my understanding, currently, vcpu_block()
> > is
> > for guest's HLT operation, which means, guest has nothing to do. In
> > this case, even we return (not blocking), seems the function is not
> > broken.
> >
> So, basically, you're saying that the vcpu has nothing to do, and in
> fact it executed an HLT instruction, and you want to let it continue to
> run? :-O

Here is my understanding, if the guest has nothing to do, it will
call HLT, and Xen hypervisor will call vcpu_block(), if we don't
block the vCPU and return to guest, guest will continue to run
HLT if it still has nothing to do. If this is the case, I feel there aren’t
problems with it. Anyway, I have replied to George's mail and
based on his suggestion, I proposed a new method to handle
the case that the last assigned devices is detached. Could you
please have a look at it when you have time. If that works, I think
we don't need to the above changes anymore. Thanks in advance!

Thanks,
Feng
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-06-24 13:42           ` Wu, Feng
@ 2016-06-24 13:49             ` George Dunlap
  2016-06-24 14:36               ` Dario Faggioli
  0 siblings, 1 reply; 32+ messages in thread
From: George Dunlap @ 2016-06-24 13:49 UTC (permalink / raw)
  To: Wu, Feng, Dario Faggioli, xen-devel
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, jbeulich

On 24/06/16 14:42, Wu, Feng wrote:
> 
> 
>> -----Original Message-----
>> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
>> Sent: Friday, June 24, 2016 6:29 PM
>> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
>> Cc: keir@xen.org; Tian, Kevin <kevin.tian@intel.com>; jbeulich@suse.com;
>> andrew.cooper3@citrix.com; george.dunlap@eu.citrix.com;
>> konrad.wilk@oracle.com
>> Subject: Re: [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu
>> blocking list
>>
>> On Fri, 2016-06-24 at 06:33 +0000, Wu, Feng wrote:
>>>> -----Original Message-----
>>>> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
>>>> In your case, AFAICUI, it's:
>>>>  - the vCPU should block, waiting for an event
>>>>  - the event is _not_ arrived, so we indeed should block
>>>>  - we do *not* block, due to some other reason
>>>>
>>>> That does not look right to me... what about the fact that we
>>>> wanted to
>>>> actually wait for the event? :-O
>>> I understand your point. In my understanding, currently, vcpu_block()
>>> is
>>> for guest's HLT operation, which means, guest has nothing to do. In
>>> this case, even we return (not blocking), seems the function is not
>>> broken.
>>>
>> So, basically, you're saying that the vcpu has nothing to do, and in
>> fact it executed an HLT instruction, and you want to let it continue to
>> run? :-O
> 
> Here is my understanding, if the guest has nothing to do, it will
> call HLT, and Xen hypervisor will call vcpu_block(), if we don't
> block the vCPU and return to guest, guest will continue to run
> HLT if it still has nothing to do. 

As it happens, most operating systems will handle this situation just
fine; but I'm not sure this is something we want to rely on.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list
  2016-06-24 13:49             ` George Dunlap
@ 2016-06-24 14:36               ` Dario Faggioli
  0 siblings, 0 replies; 32+ messages in thread
From: Dario Faggioli @ 2016-06-24 14:36 UTC (permalink / raw)
  To: George Dunlap, Wu, Feng, xen-devel
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 1659 bytes --]

On Fri, 2016-06-24 at 14:49 +0100, George Dunlap wrote:
> On 24/06/16 14:42, Wu, Feng wrote:
> > Here is my understanding, if the guest has nothing to do, it will
> > call HLT, and Xen hypervisor will call vcpu_block(), if we don't
> > block the vCPU and return to guest, guest will continue to run
> > HLT if it still has nothing to do. 
> As it happens, most operating systems will handle this situation just
> fine; but I'm not sure this is something we want to rely on.
> 
Exactly! I indeed don't think it would be wise to rely on that. But
more than that, I don't think we want to waste pCPU cycles considering
runnable, and hence potentially scheduling, vCPUs that have HTL-ed!

That may be acceptable in emulators like Bochs or "plain" Qemu... Xen
is an hypervisor, it does virtualization, and this is one of the points
of virtualization, IMO: when a virtual CPU has nothing to do, use the
physical CPU to execute _another_ virtual CPU.

And even if that would be just for the sake of handling a corner case,
I still think that:
 - we really want to try as hard as possible to avoid it,
 - if we really can't, we want a big comment in the code for:
  * explaining what actually happens,
  * proving that it will be something limited in time (and preferably
    rather short),
  * providing an explanations of why we could not do better.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-06-24 14:36 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-26 13:39 [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
2016-05-26 13:39 ` [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed Feng Wu
2016-05-27 13:43   ` Jan Beulich
2016-05-31 10:22     ` Wu, Feng
2016-05-31 11:52       ` Jan Beulich
2016-06-03  5:12         ` Wu, Feng
2016-06-03  9:45           ` Jan Beulich
2016-05-26 13:39 ` [PATCH v2 2/4] VMX: Cleanup PI per-cpu blocking list when vcpu is destroyed Feng Wu
2016-05-27 13:49   ` Jan Beulich
2016-05-31 10:22     ` Wu, Feng
2016-05-31 11:54       ` Jan Beulich
2016-05-26 13:39 ` [PATCH v2 3/4] VMX: Assign the right value to 'NDST' field in a concern case Feng Wu
2016-05-27 14:00   ` Jan Beulich
2016-05-31 10:27     ` Wu, Feng
2016-05-31 11:58       ` Jan Beulich
2016-06-03  5:23         ` Wu, Feng
2016-06-03  9:57           ` Jan Beulich
2016-06-22 18:00   ` George Dunlap
2016-06-24  9:08     ` Wu, Feng
2016-05-26 13:39 ` [PATCH v2 4/4] VMX: fixup PI descritpor when cpu is offline Feng Wu
2016-05-27 14:56   ` Jan Beulich
2016-05-31 10:31     ` Wu, Feng
2016-06-22 18:33       ` George Dunlap
2016-06-24  6:34         ` Wu, Feng
2016-05-26 17:20 ` [PATCH v2 0/4] VMX: Properly handle pi descriptor and per-cpu blocking list Dario Faggioli
2016-05-31 10:19   ` Wu, Feng
2016-06-22 21:33     ` Dario Faggioli
2016-06-24  6:33       ` Wu, Feng
2016-06-24 10:29         ` Dario Faggioli
2016-06-24 13:42           ` Wu, Feng
2016-06-24 13:49             ` George Dunlap
2016-06-24 14:36               ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).