[PATCH v2 0/3] add hypercall option to temporarily pin a vcpu

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/3] add hypercall option to temporarily pin a vcpu
@ 2016-03-01  9:02 Juergen Gross
  2016-03-01  9:02 ` [PATCH v2 1/3] xen: silence affinity messages on suspend/resume Juergen Gross
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Juergen Gross @ 2016-03-01  9:02 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, wei.liu2, stefano.stabellini, george.dunlap,
	andrew.cooper3, dario.faggioli, ian.jackson, david.vrabel,
	jbeulich

Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be
called on physical cpu 0 only. Linux drivers like dcdbas or i8k try
to achieve this by pinning the running thread to cpu 0, but in Dom0
this is not enough: the vcpu must be pinned to physical cpu 0 via
Xen, too.

This patch series adds a stable hypercall option to achieve this.

Changes in V2:
- add patch 1 to silence messages on suspend/resume
- add patch 3 to handle EBUSY case when removing cpu from cpupool
- limit operation to hardware domain as suggested by Jan Beulich
- some style issues corrected as requested by Jan Beulich
- use fixed width types in interface as requested by Jan Beulich
- add compat layer checking as requested by Jan Beulich


Juergen Gross (3):
  xen: silence affinity messages on suspend/resume
  xen: add hypercall option to temporarily pin a vcpu
  libxc: do some retries in xc_cpupool_removecpu() for EBUSY case

 tools/libxc/xc_cpupool.c     | 13 +++++-
 xen/common/compat/schedule.c |  4 ++
 xen/common/schedule.c        | 95 ++++++++++++++++++++++++++++++++++++++++----
 xen/include/public/sched.h   | 17 ++++++++
 xen/include/xlat.lst         |  1 +
 5 files changed, 122 insertions(+), 8 deletions(-)

-- 
2.6.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/3] xen: silence affinity messages on suspend/resume
  2016-03-01  9:02 [PATCH v2 0/3] add hypercall option to temporarily pin a vcpu Juergen Gross
@ 2016-03-01  9:02 ` Juergen Gross
  2016-03-02 11:11   ` Dario Faggioli
  2016-03-01  9:02 ` [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu Juergen Gross
  2016-03-01  9:02 ` [PATCH v2 3/3] libxc: do some retries in xc_cpupool_removecpu() for EBUSY case Juergen Gross
  2 siblings, 1 reply; 26+ messages in thread
From: Juergen Gross @ 2016-03-01  9:02 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, wei.liu2, stefano.stabellini, george.dunlap,
	andrew.cooper3, dario.faggioli, ian.jackson, david.vrabel,
	jbeulich

When taking cpus offline for suspend or bringing them online on resume
again the scheduler might issue debug messages when temporarily
breaking vcpu affinity or restoring the original affinity settings.

The resume message can be removed completely, while the message when
breaking affinity should only be issued if the breakage is permanent.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 434dcfc..b0d4b18 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -615,7 +615,6 @@ void restore_vcpu_affinity(struct domain *d)
 
         if ( v->affinity_broken )
         {
-            printk(XENLOG_DEBUG "Restoring affinity for %pv\n", v);
             cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
             v->affinity_broken = 0;
         }
@@ -670,14 +669,14 @@ int cpu_disable_scheduler(unsigned int cpu)
             if ( cpumask_empty(&online_affinity) &&
                  cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
             {
-                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
-
                 if (system_state == SYS_STATE_suspend)
                 {
                     cpumask_copy(v->cpu_hard_affinity_saved,
                                  v->cpu_hard_affinity);
                     v->affinity_broken = 1;
                 }
+                else
+                    printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
 
                 cpumask_setall(v->cpu_hard_affinity);
             }
-- 
2.6.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-01  9:02 [PATCH v2 0/3] add hypercall option to temporarily pin a vcpu Juergen Gross
  2016-03-01  9:02 ` [PATCH v2 1/3] xen: silence affinity messages on suspend/resume Juergen Gross
@ 2016-03-01  9:02 ` Juergen Gross
  2016-03-01 11:27   ` Jan Beulich
                     ` (3 more replies)
  2016-03-01  9:02 ` [PATCH v2 3/3] libxc: do some retries in xc_cpupool_removecpu() for EBUSY case Juergen Gross
  2 siblings, 4 replies; 26+ messages in thread
From: Juergen Gross @ 2016-03-01  9:02 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, wei.liu2, stefano.stabellini, george.dunlap,
	andrew.cooper3, dario.faggioli, ian.jackson, david.vrabel,
	jbeulich

Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be
called on physical cpu 0 only. Linux drivers like dcdbas or i8k try
to achieve this by pinning the running thread to cpu 0, but in Dom0
this is not enough: the vcpu must be pinned to physical cpu 0 via
Xen, too.

Add a stable hypercall option SCHEDOP_pin_temp to the sched_op
hypercall to achieve this. It is taking a physical cpu number as
parameter. If pinning is possible (the calling domain has the
privilege to make the call and the cpu is available in the domain's
cpupool) the calling vcpu is pinned to the specified cpu. The old
cpu affinity is saved. To undo the temporary pinning a cpu -1 is
specified. This will restore the original cpu affinity for the vcpu.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2: - limit operation to hardware domain as suggested by Jan Beulich
    - some style issues corrected as requested by Jan Beulich
    - use fixed width types in interface as requested by Jan Beulich
    - add compat layer checking as requested by Jan Beulich
---
 xen/common/compat/schedule.c |  4 ++
 xen/common/schedule.c        | 92 +++++++++++++++++++++++++++++++++++++++++---
 xen/include/public/sched.h   | 17 ++++++++
 xen/include/xlat.lst         |  1 +
 4 files changed, 109 insertions(+), 5 deletions(-)

diff --git a/xen/common/compat/schedule.c b/xen/common/compat/schedule.c
index 812c550..73b0f01 100644
--- a/xen/common/compat/schedule.c
+++ b/xen/common/compat/schedule.c
@@ -10,6 +10,10 @@
 
 #define do_sched_op compat_sched_op
 
+#define xen_sched_pin_temp sched_pin_temp
+CHECK_sched_pin_temp;
+#undef xen_sched_pin_temp
+
 #define xen_sched_shutdown sched_shutdown
 CHECK_sched_shutdown;
 #undef xen_sched_shutdown
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index b0d4b18..653f852 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     struct scheduler *old_ops;
     void *old_domdata;
 
+    for_each_vcpu ( d, v )
+    {
+        if ( v->affinity_broken )
+            return -EBUSY;
+    }
+
     domdata = SCHED_OP(c->sched, alloc_domdata, d);
     if ( domdata == NULL )
         return -ENOMEM;
@@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu)
             if ( cpumask_empty(&online_affinity) &&
                  cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
             {
+                if ( v->affinity_broken )
+                {
+                    /* The vcpu is temporarily pinned, can't move it. */
+                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
+                    ret = -EBUSY;
+                    break;
+                }
+
                 if (system_state == SYS_STATE_suspend)
                 {
                     cpumask_copy(v->cpu_hard_affinity_saved,
@@ -752,14 +766,20 @@ static int vcpu_set_affinity(
     struct vcpu *v, const cpumask_t *affinity, cpumask_t *which)
 {
     spinlock_t *lock;
+    int ret = 0;
 
     lock = vcpu_schedule_lock_irq(v);
 
-    cpumask_copy(which, affinity);
+    if ( v->affinity_broken )
+        ret = -EBUSY;
+    else
+    {
+        cpumask_copy(which, affinity);
 
-    /* Always ask the scheduler to re-evaluate placement
-     * when changing the affinity */
-    set_bit(_VPF_migrating, &v->pause_flags);
+        /* Always ask the scheduler to re-evaluate placement
+         * when changing the affinity */
+        set_bit(_VPF_migrating, &v->pause_flags);
+    }
 
     vcpu_schedule_unlock_irq(lock, v);
 
@@ -771,7 +791,7 @@ static int vcpu_set_affinity(
         vcpu_migrate(v);
     }
 
-    return 0;
+    return ret;
 }
 
 int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity)
@@ -978,6 +998,51 @@ void watchdog_domain_destroy(struct domain *d)
         kill_timer(&d->watchdog_timer[i]);
 }
 
+static long do_pin_temp(int cpu)
+{
+    struct vcpu *v = current;
+    spinlock_t *lock;
+    long ret = -EINVAL;
+
+    lock = vcpu_schedule_lock_irq(v);
+
+    if ( cpu < 0 )
+    {
+        if ( v->affinity_broken )
+        {
+            cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
+            v->affinity_broken = 0;
+            set_bit(_VPF_migrating, &v->pause_flags);
+            ret = 0;
+        }
+    }
+    else if ( cpu < nr_cpu_ids )
+    {
+        if ( v->affinity_broken )
+            ret = -EBUSY;
+        else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) )
+        {
+            cpumask_copy(v->cpu_hard_affinity_saved, v->cpu_hard_affinity);
+            v->affinity_broken = 1;
+            cpumask_copy(v->cpu_hard_affinity, cpumask_of(cpu));
+            set_bit(_VPF_migrating, &v->pause_flags);
+            ret = 0;
+        }
+    }
+
+    vcpu_schedule_unlock_irq(lock, v);
+
+    domain_update_node_affinity(v->domain);
+
+    if ( v->pause_flags & VPF_migrating )
+    {
+        vcpu_sleep_nosync(v);
+        vcpu_migrate(v);
+    }
+
+    return ret;
+}
+
 typedef long ret_t;
 
 #endif /* !COMPAT */
@@ -1087,6 +1152,23 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case SCHEDOP_pin_temp:
+    {
+        struct sched_pin_temp sched_pin_temp;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&sched_pin_temp, arg, 1) )
+            break;
+
+        ret = -EPERM;
+        if ( !is_hardware_domain(current->domain) )
+            break;
+
+        ret = do_pin_temp(sched_pin_temp.pcpu);
+
+        break;
+    }
+
     default:
         ret = -ENOSYS;
     }
diff --git a/xen/include/public/sched.h b/xen/include/public/sched.h
index 2219696..a0ce5a6 100644
--- a/xen/include/public/sched.h
+++ b/xen/include/public/sched.h
@@ -118,6 +118,17 @@
  * With id != 0 and timeout != 0, poke watchdog timer and set new timeout.
  */
 #define SCHEDOP_watchdog    6
+
+/*
+ * Temporarily pin the current vcpu to one physical cpu or undo that pinning.
+ * @arg == pointer to sched_pin_temp_t structure.
+ *
+ * Setting pcpu to -1 will undo a previous temporary pinning and restore the
+ * previous cpu affinity. The temporary aspect of the pinning isn't enforced
+ * by the hypervisor.
+ * This call is allowed for the hardware domain only.
+ */
+#define SCHEDOP_pin_temp    7
 /* ` } */
 
 struct sched_shutdown {
@@ -148,6 +159,12 @@ struct sched_watchdog {
 typedef struct sched_watchdog sched_watchdog_t;
 DEFINE_XEN_GUEST_HANDLE(sched_watchdog_t);
 
+struct sched_pin_temp {
+    int32_t pcpu;
+};
+typedef struct sched_pin_temp sched_pin_temp_t;
+DEFINE_XEN_GUEST_HANDLE(sched_pin_temp_t);
+
 /*
  * Reason codes for SCHEDOP_shutdown. These may be interpreted by control
  * software to determine the appropriate action. For the most part, Xen does
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index fda1137..52c7233 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -104,6 +104,7 @@
 ?	pmu_data			pmu.h
 ?	pmu_params			pmu.h
 !	sched_poll			sched.h
+?	sched_pin_temp			sched.h
 ?	sched_remote_shutdown		sched.h
 ?	sched_shutdown			sched.h
 ?	tmem_oid			tmem.h
-- 
2.6.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/3] libxc: do some retries in xc_cpupool_removecpu() for EBUSY case
  2016-03-01  9:02 [PATCH v2 0/3] add hypercall option to temporarily pin a vcpu Juergen Gross
  2016-03-01  9:02 ` [PATCH v2 1/3] xen: silence affinity messages on suspend/resume Juergen Gross
  2016-03-01  9:02 ` [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu Juergen Gross
@ 2016-03-01  9:02 ` Juergen Gross
  2016-03-01 11:58   ` Wei Liu
  2 siblings, 1 reply; 26+ messages in thread
From: Juergen Gross @ 2016-03-01  9:02 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, wei.liu2, stefano.stabellini, george.dunlap,
	andrew.cooper3, dario.faggioli, ian.jackson, david.vrabel,
	jbeulich

The hypervisor might return EBUSY when trying to remove a cpu from a
cpupool when a domain running in this cpupool has pinned a vcpu
temporarily. Do some retries in this case, perhaps the situation
cleans up.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/libxc/xc_cpupool.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_cpupool.c b/tools/libxc/xc_cpupool.c
index c42273e..9f2f95c 100644
--- a/tools/libxc/xc_cpupool.c
+++ b/tools/libxc/xc_cpupool.c
@@ -20,8 +20,11 @@
  */
 
 #include <stdarg.h>
+#include <unistd.h>
 #include "xc_private.h"
 
+#define LIBXC_BUSY_RETRIES 5
+
 static int do_sysctl_save(xc_interface *xch, struct xen_sysctl *sysctl)
 {
     int ret;
@@ -141,13 +144,21 @@ int xc_cpupool_removecpu(xc_interface *xch,
                          uint32_t poolid,
                          int cpu)
 {
+    unsigned retries;
+    int err;
     DECLARE_SYSCTL;
 
     sysctl.cmd = XEN_SYSCTL_cpupool_op;
     sysctl.u.cpupool_op.op = XEN_SYSCTL_CPUPOOL_OP_RMCPU;
     sysctl.u.cpupool_op.cpupool_id = poolid;
     sysctl.u.cpupool_op.cpu = (cpu < 0) ? XEN_SYSCTL_CPUPOOL_PAR_ANY : cpu;
-    return do_sysctl_save(xch, &sysctl);
+    for (retries = 0; retries < LIBXC_BUSY_RETRIES; retries++) {
+        err = do_sysctl_save(xch, &sysctl);
+        if (err >= 0 || errno != EBUSY)
+            break;
+        sleep(1);
+    }
+    return err;
 }
 
 int xc_cpupool_movedomain(xc_interface *xch,
-- 
2.6.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-01  9:02 ` [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu Juergen Gross
@ 2016-03-01 11:27   ` Jan Beulich
  2016-03-01 11:55   ` David Vrabel
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2016-03-01 11:27 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, david.vrabel

>>> On 01.03.16 at 10:02, <JGross@suse.com> wrote:
> @@ -752,14 +766,20 @@ static int vcpu_set_affinity(
>      struct vcpu *v, const cpumask_t *affinity, cpumask_t *which)
>  {
>      spinlock_t *lock;
> +    int ret = 0;
>  
>      lock = vcpu_schedule_lock_irq(v);
>  
> -    cpumask_copy(which, affinity);
> +    if ( v->affinity_broken )
> +        ret = -EBUSY;
> +    else
> +    {
> +        cpumask_copy(which, affinity);
>  
> -    /* Always ask the scheduler to re-evaluate placement
> -     * when changing the affinity */
> -    set_bit(_VPF_migrating, &v->pause_flags);
> +        /* Always ask the scheduler to re-evaluate placement
> +         * when changing the affinity */
> +        set_bit(_VPF_migrating, &v->pause_flags);

When you touch code like this, would it be possible to at once fix
the coding style issues it (the comment in this case) has?

> @@ -978,6 +998,51 @@ void watchdog_domain_destroy(struct domain *d)
>          kill_timer(&d->watchdog_timer[i]);
>  }
>  
> +static long do_pin_temp(int cpu)

As expressed before, throughout this patch I dislike the "temp"
naming, when the temporary nature of this operation isn't being
enforced by anything.

Apart from that I (vaguely) recall there having been previous
suggestions in the direction of (temporary), which have got
rejected.

On both points I think we need to have input from the scheduler
maintainers.

> +{
> +    struct vcpu *v = current;
> +    spinlock_t *lock;
> +    long ret = -EINVAL;

"int" seems completely sufficient for both the variable and the
function return type.

> @@ -1087,6 +1152,23 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case SCHEDOP_pin_temp:
> +    {
> +        struct sched_pin_temp sched_pin_temp;
> +
> +        ret = -EFAULT;
> +        if ( copy_from_guest(&sched_pin_temp, arg, 1) )
> +            break;
> +
> +        ret = -EPERM;
> +        if ( !is_hardware_domain(current->domain) )
> +            break;

I'd generally suggest swapping these two.

> --- a/xen/include/public/sched.h
> +++ b/xen/include/public/sched.h
> @@ -118,6 +118,17 @@
>   * With id != 0 and timeout != 0, poke watchdog timer and set new timeout.
>   */
>  #define SCHEDOP_watchdog    6
> +
> +/*
> + * Temporarily pin the current vcpu to one physical cpu or undo that pinning.
> + * @arg == pointer to sched_pin_temp_t structure.
> + *
> + * Setting pcpu to -1 will undo a previous temporary pinning and restore the
> + * previous cpu affinity. The temporary aspect of the pinning isn't enforced
> + * by the hypervisor.

This comment is now out of sync with the code, since you now
accept any negative CPU number as "undo" request.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-01  9:02 ` [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu Juergen Gross
  2016-03-01 11:27   ` Jan Beulich
@ 2016-03-01 11:55   ` David Vrabel
  2016-03-01 11:58     ` Juergen Gross
       [not found]   ` <56D58ABF02000078000D7C46@suse.com>
  2016-03-01 15:52   ` George Dunlap
  3 siblings, 1 reply; 26+ messages in thread
From: David Vrabel @ 2016-03-01 11:55 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, david.vrabel, jbeulich

On 01/03/16 09:02, Juergen Gross wrote:
> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be
> called on physical cpu 0 only. Linux drivers like dcdbas or i8k try
> to achieve this by pinning the running thread to cpu 0, but in Dom0
> this is not enough: the vcpu must be pinned to physical cpu 0 via
> Xen, too.
> 
> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op
> hypercall to achieve this. It is taking a physical cpu number as
> parameter. If pinning is possible (the calling domain has the
> privilege to make the call and the cpu is available in the domain's
> cpupool) the calling vcpu is pinned to the specified cpu. The old
> cpu affinity is saved. To undo the temporary pinning a cpu -1 is
> specified. This will restore the original cpu affinity for the vcpu.

I suggest SCHEDOP_pin_override as a name.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
       [not found]   ` <56D58ABF02000078000D7C46@suse.com>
@ 2016-03-01 11:58     ` Juergen Gross
  0 siblings, 0 replies; 26+ messages in thread
From: Juergen Gross @ 2016-03-01 11:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, david.vrabel

On 01/03/16 12:27, Jan Beulich wrote:
>>>> On 01.03.16 at 10:02, <JGross@suse.com> wrote:
>> @@ -752,14 +766,20 @@ static int vcpu_set_affinity(
>>      struct vcpu *v, const cpumask_t *affinity, cpumask_t *which)
>>  {
>>      spinlock_t *lock;
>> +    int ret = 0;
>>  
>>      lock = vcpu_schedule_lock_irq(v);
>>  
>> -    cpumask_copy(which, affinity);
>> +    if ( v->affinity_broken )
>> +        ret = -EBUSY;
>> +    else
>> +    {
>> +        cpumask_copy(which, affinity);
>>  
>> -    /* Always ask the scheduler to re-evaluate placement
>> -     * when changing the affinity */
>> -    set_bit(_VPF_migrating, &v->pause_flags);
>> +        /* Always ask the scheduler to re-evaluate placement
>> +         * when changing the affinity */
>> +        set_bit(_VPF_migrating, &v->pause_flags);
> 
> When you touch code like this, would it be possible to at once fix
> the coding style issues it (the comment in this case) has?

Sure, NP.

> 
>> @@ -978,6 +998,51 @@ void watchdog_domain_destroy(struct domain *d)
>>          kill_timer(&d->watchdog_timer[i]);
>>  }
>>  
>> +static long do_pin_temp(int cpu)
> 
> As expressed before, throughout this patch I dislike the "temp"
> naming, when the temporary nature of this operation isn't being
> enforced by anything.
> 
> Apart from that I (vaguely) recall there having been previous
> suggestions in the direction of (temporary), which have got
> rejected.
> 
> On both points I think we need to have input from the scheduler
> maintainers.

Okay. I don't mind changing the name. We should just agree on one.

> 
>> +{
>> +    struct vcpu *v = current;
>> +    spinlock_t *lock;
>> +    long ret = -EINVAL;
> 
> "int" seems completely sufficient for both the variable and the
> function return type.

Hmm, yes.

> 
>> @@ -1087,6 +1152,23 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          break;
>>      }
>>  
>> +    case SCHEDOP_pin_temp:
>> +    {
>> +        struct sched_pin_temp sched_pin_temp;
>> +
>> +        ret = -EFAULT;
>> +        if ( copy_from_guest(&sched_pin_temp, arg, 1) )
>> +            break;
>> +
>> +        ret = -EPERM;
>> +        if ( !is_hardware_domain(current->domain) )
>> +            break;
> 
> I'd generally suggest swapping these two.

Will do.

> 
>> --- a/xen/include/public/sched.h
>> +++ b/xen/include/public/sched.h
>> @@ -118,6 +118,17 @@
>>   * With id != 0 and timeout != 0, poke watchdog timer and set new timeout.
>>   */
>>  #define SCHEDOP_watchdog    6
>> +
>> +/*
>> + * Temporarily pin the current vcpu to one physical cpu or undo that pinning.
>> + * @arg == pointer to sched_pin_temp_t structure.
>> + *
>> + * Setting pcpu to -1 will undo a previous temporary pinning and restore the
>> + * previous cpu affinity. The temporary aspect of the pinning isn't enforced
>> + * by the hypervisor.
> 
> This comment is now out of sync with the code, since you now
> accept any negative CPU number as "undo" request.

Will change it.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/3] libxc: do some retries in xc_cpupool_removecpu() for EBUSY case
  2016-03-01  9:02 ` [PATCH v2 3/3] libxc: do some retries in xc_cpupool_removecpu() for EBUSY case Juergen Gross
@ 2016-03-01 11:58   ` Wei Liu
  2016-03-01 11:59     ` Juergen Gross
  0 siblings, 1 reply; 26+ messages in thread
From: Wei Liu @ 2016-03-01 11:58 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, david.vrabel, jbeulich

On Tue, Mar 01, 2016 at 10:02:13AM +0100, Juergen Gross wrote:
> The hypervisor might return EBUSY when trying to remove a cpu from a
> cpupool when a domain running in this cpupool has pinned a vcpu
> temporarily. Do some retries in this case, perhaps the situation
> cleans up.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  tools/libxc/xc_cpupool.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxc/xc_cpupool.c b/tools/libxc/xc_cpupool.c
> index c42273e..9f2f95c 100644
> --- a/tools/libxc/xc_cpupool.c
> +++ b/tools/libxc/xc_cpupool.c
> @@ -20,8 +20,11 @@
>   */
>  
>  #include <stdarg.h>
> +#include <unistd.h>
>  #include "xc_private.h"
>  
> +#define LIBXC_BUSY_RETRIES 5
> +
>  static int do_sysctl_save(xc_interface *xch, struct xen_sysctl *sysctl)
>  {
>      int ret;
> @@ -141,13 +144,21 @@ int xc_cpupool_removecpu(xc_interface *xch,
>                           uint32_t poolid,
>                           int cpu)
>  {
> +    unsigned retries;
> +    int err;
>      DECLARE_SYSCTL;
>  
>      sysctl.cmd = XEN_SYSCTL_cpupool_op;
>      sysctl.u.cpupool_op.op = XEN_SYSCTL_CPUPOOL_OP_RMCPU;
>      sysctl.u.cpupool_op.cpupool_id = poolid;
>      sysctl.u.cpupool_op.cpu = (cpu < 0) ? XEN_SYSCTL_CPUPOOL_PAR_ANY : cpu;
> -    return do_sysctl_save(xch, &sysctl);
> +    for (retries = 0; retries < LIBXC_BUSY_RETRIES; retries++) {

Libxc coding style requires spaces between ().

> +        err = do_sysctl_save(xch, &sysctl);
> +        if (err >= 0 || errno != EBUSY)

Ditto.

> +            break;
> +        sleep(1);
> +    }
> +    return err;
>  }
>  
>  int xc_cpupool_movedomain(xc_interface *xch,
> -- 
> 2.6.2
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-01 11:55   ` David Vrabel
@ 2016-03-01 11:58     ` Juergen Gross
  2016-03-01 12:15       ` Dario Faggioli
  0 siblings, 1 reply; 26+ messages in thread
From: Juergen Gross @ 2016-03-01 11:58 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, jbeulich

On 01/03/16 12:55, David Vrabel wrote:
> On 01/03/16 09:02, Juergen Gross wrote:
>> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be
>> called on physical cpu 0 only. Linux drivers like dcdbas or i8k try
>> to achieve this by pinning the running thread to cpu 0, but in Dom0
>> this is not enough: the vcpu must be pinned to physical cpu 0 via
>> Xen, too.
>>
>> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op
>> hypercall to achieve this. It is taking a physical cpu number as
>> parameter. If pinning is possible (the calling domain has the
>> privilege to make the call and the cpu is available in the domain's
>> cpupool) the calling vcpu is pinned to the specified cpu. The old
>> cpu affinity is saved. To undo the temporary pinning a cpu -1 is
>> specified. This will restore the original cpu affinity for the vcpu.
> 
> I suggest SCHEDOP_pin_override as a name.

I'm fine with that. Any objections?


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/3] libxc: do some retries in xc_cpupool_removecpu() for EBUSY case
  2016-03-01 11:58   ` Wei Liu
@ 2016-03-01 11:59     ` Juergen Gross
  0 siblings, 0 replies; 26+ messages in thread
From: Juergen Gross @ 2016-03-01 11:59 UTC (permalink / raw)
  To: Wei Liu
  Cc: stefano.stabellini, george.dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, david.vrabel, jbeulich

On 01/03/16 12:58, Wei Liu wrote:
> On Tue, Mar 01, 2016 at 10:02:13AM +0100, Juergen Gross wrote:
>> The hypervisor might return EBUSY when trying to remove a cpu from a
>> cpupool when a domain running in this cpupool has pinned a vcpu
>> temporarily. Do some retries in this case, perhaps the situation
>> cleans up.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>  tools/libxc/xc_cpupool.c | 13 ++++++++++++-
>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/libxc/xc_cpupool.c b/tools/libxc/xc_cpupool.c
>> index c42273e..9f2f95c 100644
>> --- a/tools/libxc/xc_cpupool.c
>> +++ b/tools/libxc/xc_cpupool.c
>> @@ -20,8 +20,11 @@
>>   */
>>  
>>  #include <stdarg.h>
>> +#include <unistd.h>
>>  #include "xc_private.h"
>>  
>> +#define LIBXC_BUSY_RETRIES 5
>> +
>>  static int do_sysctl_save(xc_interface *xch, struct xen_sysctl *sysctl)
>>  {
>>      int ret;
>> @@ -141,13 +144,21 @@ int xc_cpupool_removecpu(xc_interface *xch,
>>                           uint32_t poolid,
>>                           int cpu)
>>  {
>> +    unsigned retries;
>> +    int err;
>>      DECLARE_SYSCTL;
>>  
>>      sysctl.cmd = XEN_SYSCTL_cpupool_op;
>>      sysctl.u.cpupool_op.op = XEN_SYSCTL_CPUPOOL_OP_RMCPU;
>>      sysctl.u.cpupool_op.cpupool_id = poolid;
>>      sysctl.u.cpupool_op.cpu = (cpu < 0) ? XEN_SYSCTL_CPUPOOL_PAR_ANY : cpu;
>> -    return do_sysctl_save(xch, &sysctl);
>> +    for (retries = 0; retries < LIBXC_BUSY_RETRIES; retries++) {
> 
> Libxc coding style requires spaces between ().

Oops, sorry.

> 
>> +        err = do_sysctl_save(xch, &sysctl);
>> +        if (err >= 0 || errno != EBUSY)
> 
> Ditto.
> 
>> +            break;
>> +        sleep(1);
>> +    }
>> +    return err;
>>  }
>>  
>>  int xc_cpupool_movedomain(xc_interface *xch,


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-01 11:58     ` Juergen Gross
@ 2016-03-01 12:15       ` Dario Faggioli
  2016-03-01 14:02         ` George Dunlap
  0 siblings, 1 reply; 26+ messages in thread
From: Dario Faggioli @ 2016-03-01 12:15 UTC (permalink / raw)
  To: Juergen Gross, David Vrabel, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	ian.jackson, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 1437 bytes --]

On Tue, 2016-03-01 at 12:58 +0100, Juergen Gross wrote:
> On 01/03/16 12:55, David Vrabel wrote:
> > 
> > On 01/03/16 09:02, Juergen Gross wrote:
> > > 
> > > Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be
> > > called on physical cpu 0 only. Linux drivers like dcdbas or i8k
> > > try
> > > to achieve this by pinning the running thread to cpu 0, but in
> > > Dom0
> > > this is not enough: the vcpu must be pinned to physical cpu 0 via
> > > Xen, too.
> > > 
> > > Add a stable hypercall option SCHEDOP_pin_temp to the sched_op
> > > hypercall to achieve this. It is taking a physical cpu number as
> > > parameter. If pinning is possible (the calling domain has the
> > > privilege to make the call and the cpu is available in the
> > > domain's
> > > cpupool) the calling vcpu is pinned to the specified cpu. The old
> > > cpu affinity is saved. To undo the temporary pinning a cpu -1 is
> > > specified. This will restore the original cpu affinity for the
> > > vcpu.
> > I suggest SCHEDOP_pin_override as a name.
>
> I'm fine with that. Any objections?
> 
Not at all. I actually like it a lot.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-01 12:15       ` Dario Faggioli
@ 2016-03-01 14:02         ` George Dunlap
  0 siblings, 0 replies; 26+ messages in thread
From: George Dunlap @ 2016-03-01 14:02 UTC (permalink / raw)
  To: Dario Faggioli, Juergen Gross, David Vrabel, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	ian.jackson, jbeulich

On 01/03/16 12:15, Dario Faggioli wrote:
> On Tue, 2016-03-01 at 12:58 +0100, Juergen Gross wrote:
>> On 01/03/16 12:55, David Vrabel wrote:
>>>
>>> On 01/03/16 09:02, Juergen Gross wrote:
>>>>
>>>> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be
>>>> called on physical cpu 0 only. Linux drivers like dcdbas or i8k
>>>> try
>>>> to achieve this by pinning the running thread to cpu 0, but in
>>>> Dom0
>>>> this is not enough: the vcpu must be pinned to physical cpu 0 via
>>>> Xen, too.
>>>>
>>>> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op
>>>> hypercall to achieve this. It is taking a physical cpu number as
>>>> parameter. If pinning is possible (the calling domain has the
>>>> privilege to make the call and the cpu is available in the
>>>> domain's
>>>> cpupool) the calling vcpu is pinned to the specified cpu. The old
>>>> cpu affinity is saved. To undo the temporary pinning a cpu -1 is
>>>> specified. This will restore the original cpu affinity for the
>>>> vcpu.
>>> I suggest SCHEDOP_pin_override as a name.
>>
>> I'm fine with that. Any objections?
>>
> Not at all. I actually like it a lot.

+1 to the name.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-01  9:02 ` [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu Juergen Gross
                     ` (2 preceding siblings ...)
       [not found]   ` <56D58ABF02000078000D7C46@suse.com>
@ 2016-03-01 15:52   ` George Dunlap
  2016-03-01 15:55     ` George Dunlap
                       ` (2 more replies)
  3 siblings, 3 replies; 26+ messages in thread
From: George Dunlap @ 2016-03-01 15:52 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, david.vrabel, jbeulich

On 01/03/16 09:02, Juergen Gross wrote:
> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be
> called on physical cpu 0 only. Linux drivers like dcdbas or i8k try
> to achieve this by pinning the running thread to cpu 0, but in Dom0
> this is not enough: the vcpu must be pinned to physical cpu 0 via
> Xen, too.
> 
> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op
> hypercall to achieve this. It is taking a physical cpu number as
> parameter. If pinning is possible (the calling domain has the
> privilege to make the call and the cpu is available in the domain's
> cpupool) the calling vcpu is pinned to the specified cpu. The old
> cpu affinity is saved. To undo the temporary pinning a cpu -1 is
> specified. This will restore the original cpu affinity for the vcpu.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V2: - limit operation to hardware domain as suggested by Jan Beulich
>     - some style issues corrected as requested by Jan Beulich
>     - use fixed width types in interface as requested by Jan Beulich
>     - add compat layer checking as requested by Jan Beulich
> ---
>  xen/common/compat/schedule.c |  4 ++
>  xen/common/schedule.c        | 92 +++++++++++++++++++++++++++++++++++++++++---
>  xen/include/public/sched.h   | 17 ++++++++
>  xen/include/xlat.lst         |  1 +
>  4 files changed, 109 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/common/compat/schedule.c b/xen/common/compat/schedule.c
> index 812c550..73b0f01 100644
> --- a/xen/common/compat/schedule.c
> +++ b/xen/common/compat/schedule.c
> @@ -10,6 +10,10 @@
>  
>  #define do_sched_op compat_sched_op
>  
> +#define xen_sched_pin_temp sched_pin_temp
> +CHECK_sched_pin_temp;
> +#undef xen_sched_pin_temp
> +
>  #define xen_sched_shutdown sched_shutdown
>  CHECK_sched_shutdown;
>  #undef xen_sched_shutdown
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index b0d4b18..653f852 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>      struct scheduler *old_ops;
>      void *old_domdata;
>  
> +    for_each_vcpu ( d, v )
> +    {
> +        if ( v->affinity_broken )
> +            return -EBUSY;
> +    }
> +
>      domdata = SCHED_OP(c->sched, alloc_domdata, d);
>      if ( domdata == NULL )
>          return -ENOMEM;
> @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu)
>              if ( cpumask_empty(&online_affinity) &&
>                   cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
>              {
> +                if ( v->affinity_broken )
> +                {
> +                    /* The vcpu is temporarily pinned, can't move it. */
> +                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
> +                    ret = -EBUSY;
> +                    break;
> +                }

Does this mean that if the user closes the laptop lid while one of these
drivers has vcpu0 pinned, that Xen will crash (see
xen/arch/x86/smpboot.c:__cpu_disable())?  Or is it the OS's job to make
sure that all temporary pins are removed before suspending?

Also -- have you actually tested the "cpupool move while pinned"
functionality to make sure it actually works?  There's a weird bit in
cpupool_unassign_cpu_helper() where after calling
cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in the
cpupool_free_cpus mask, even if it returns an error.  That can't be
right, even for the existing -EAGAIN case, can it?

I see that you have a loop to retry this call several times in the next
patch; but what if it fails every time -- what state is the system in?

And, in general, what happens if the device driver gets mixed up and
forgets to unpin the vcpu?  Is the only recourse to reboot your host (or
deal with the fact that you can't reconfigure your cpupools)?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-01 15:52   ` George Dunlap
@ 2016-03-01 15:55     ` George Dunlap
  2016-03-01 16:11       ` Jan Beulich
  2016-03-02  7:14     ` Juergen Gross
  2016-03-02 17:21     ` Anshul Makkar
  2 siblings, 1 reply; 26+ messages in thread
From: George Dunlap @ 2016-03-01 15:55 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, david.vrabel, jbeulich

On 01/03/16 15:52, George Dunlap wrote:
> On 01/03/16 09:02, Juergen Gross wrote:
>> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be
>> called on physical cpu 0 only. Linux drivers like dcdbas or i8k try
>> to achieve this by pinning the running thread to cpu 0, but in Dom0
>> this is not enough: the vcpu must be pinned to physical cpu 0 via
>> Xen, too.
>>
>> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op
>> hypercall to achieve this. It is taking a physical cpu number as
>> parameter. If pinning is possible (the calling domain has the
>> privilege to make the call and the cpu is available in the domain's
>> cpupool) the calling vcpu is pinned to the specified cpu. The old
>> cpu affinity is saved. To undo the temporary pinning a cpu -1 is
>> specified. This will restore the original cpu affinity for the vcpu.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2: - limit operation to hardware domain as suggested by Jan Beulich
>>     - some style issues corrected as requested by Jan Beulich
>>     - use fixed width types in interface as requested by Jan Beulich
>>     - add compat layer checking as requested by Jan Beulich
>> ---
>>  xen/common/compat/schedule.c |  4 ++
>>  xen/common/schedule.c        | 92 +++++++++++++++++++++++++++++++++++++++++---
>>  xen/include/public/sched.h   | 17 ++++++++
>>  xen/include/xlat.lst         |  1 +
>>  4 files changed, 109 insertions(+), 5 deletions(-)
>>
>> diff --git a/xen/common/compat/schedule.c b/xen/common/compat/schedule.c
>> index 812c550..73b0f01 100644
>> --- a/xen/common/compat/schedule.c
>> +++ b/xen/common/compat/schedule.c
>> @@ -10,6 +10,10 @@
>>  
>>  #define do_sched_op compat_sched_op
>>  
>> +#define xen_sched_pin_temp sched_pin_temp
>> +CHECK_sched_pin_temp;
>> +#undef xen_sched_pin_temp
>> +
>>  #define xen_sched_shutdown sched_shutdown
>>  CHECK_sched_shutdown;
>>  #undef xen_sched_shutdown
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index b0d4b18..653f852 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>      struct scheduler *old_ops;
>>      void *old_domdata;
>>  
>> +    for_each_vcpu ( d, v )
>> +    {
>> +        if ( v->affinity_broken )
>> +            return -EBUSY;
>> +    }
>> +
>>      domdata = SCHED_OP(c->sched, alloc_domdata, d);
>>      if ( domdata == NULL )
>>          return -ENOMEM;
>> @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu)
>>              if ( cpumask_empty(&online_affinity) &&
>>                   cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
>>              {
>> +                if ( v->affinity_broken )
>> +                {
>> +                    /* The vcpu is temporarily pinned, can't move it. */
>> +                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
>> +                    ret = -EBUSY;
>> +                    break;
>> +                }
> 
> Does this mean that if the user closes the laptop lid while one of these
> drivers has vcpu0 pinned, that Xen will crash (see
> xen/arch/x86/smpboot.c:__cpu_disable())?  Or is it the OS's job to make
> sure that all temporary pins are removed before suspending?
> 
> Also -- have you actually tested the "cpupool move while pinned"
> functionality to make sure it actually works?  There's a weird bit in
> cpupool_unassign_cpu_helper() where after calling
> cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in the
> cpupool_free_cpus mask, even if it returns an error.  That can't be
> right, even for the existing -EAGAIN case, can it?
> 
> I see that you have a loop to retry this call several times in the next
> patch; but what if it fails every time -- what state is the system in?
> 
> And, in general, what happens if the device driver gets mixed up and
> forgets to unpin the vcpu?  Is the only recourse to reboot your host (or
> deal with the fact that you can't reconfigure your cpupools)?

(I should say, I think this probably is the best solution to this
problem; I just want to make sure we think about the error cases carefully.)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-01 15:55     ` George Dunlap
@ 2016-03-01 16:11       ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2016-03-01 16:11 UTC (permalink / raw)
  To: George Dunlap, Juergen Gross
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, david.vrabel

>>> On 01.03.16 at 16:55, <george.dunlap@citrix.com> wrote:
> On 01/03/16 15:52, George Dunlap wrote:
>> On 01/03/16 09:02, Juergen Gross wrote:
>>> --- a/xen/common/schedule.c
>>> +++ b/xen/common/schedule.c
>>> @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>>      struct scheduler *old_ops;
>>>      void *old_domdata;
>>>  
>>> +    for_each_vcpu ( d, v )
>>> +    {
>>> +        if ( v->affinity_broken )
>>> +            return -EBUSY;
>>> +    }
>>> +
>>>      domdata = SCHED_OP(c->sched, alloc_domdata, d);
>>>      if ( domdata == NULL )
>>>          return -ENOMEM;
>>> @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu)
>>>              if ( cpumask_empty(&online_affinity) &&
>>>                   cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
>>>              {
>>> +                if ( v->affinity_broken )
>>> +                {
>>> +                    /* The vcpu is temporarily pinned, can't move it. */
>>> +                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
>>> +                    ret = -EBUSY;
>>> +                    break;
>>> +                }
>> 
>> Does this mean that if the user closes the laptop lid while one of these
>> drivers has vcpu0 pinned, that Xen will crash (see
>> xen/arch/x86/smpboot.c:__cpu_disable())?  Or is it the OS's job to make
>> sure that all temporary pins are removed before suspending?
>> 
>> Also -- have you actually tested the "cpupool move while pinned"
>> functionality to make sure it actually works?  There's a weird bit in
>> cpupool_unassign_cpu_helper() where after calling
>> cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in the
>> cpupool_free_cpus mask, even if it returns an error.  That can't be
>> right, even for the existing -EAGAIN case, can it?
>> 
>> I see that you have a loop to retry this call several times in the next
>> patch; but what if it fails every time -- what state is the system in?
>> 
>> And, in general, what happens if the device driver gets mixed up and
>> forgets to unpin the vcpu?  Is the only recourse to reboot your host (or
>> deal with the fact that you can't reconfigure your cpupools)?
> 
> (I should say, I think this probably is the best solution to this
> problem; I just want to make sure we think about the error cases carefully.)

I guess in the worst case there could be a utility or xl command
doing the missing unpin in such an emergency?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-01 15:52   ` George Dunlap
  2016-03-01 15:55     ` George Dunlap
@ 2016-03-02  7:14     ` Juergen Gross
  2016-03-02  9:27       ` Dario Faggioli
  2016-03-02 17:21     ` Anshul Makkar
  2 siblings, 1 reply; 26+ messages in thread
From: Juergen Gross @ 2016-03-02  7:14 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, david.vrabel, jbeulich

On 01/03/16 16:52, George Dunlap wrote:
> On 01/03/16 09:02, Juergen Gross wrote:
>> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be
>> called on physical cpu 0 only. Linux drivers like dcdbas or i8k try
>> to achieve this by pinning the running thread to cpu 0, but in Dom0
>> this is not enough: the vcpu must be pinned to physical cpu 0 via
>> Xen, too.
>>
>> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op
>> hypercall to achieve this. It is taking a physical cpu number as
>> parameter. If pinning is possible (the calling domain has the
>> privilege to make the call and the cpu is available in the domain's
>> cpupool) the calling vcpu is pinned to the specified cpu. The old
>> cpu affinity is saved. To undo the temporary pinning a cpu -1 is
>> specified. This will restore the original cpu affinity for the vcpu.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2: - limit operation to hardware domain as suggested by Jan Beulich
>>     - some style issues corrected as requested by Jan Beulich
>>     - use fixed width types in interface as requested by Jan Beulich
>>     - add compat layer checking as requested by Jan Beulich
>> ---
>>  xen/common/compat/schedule.c |  4 ++
>>  xen/common/schedule.c        | 92 +++++++++++++++++++++++++++++++++++++++++---
>>  xen/include/public/sched.h   | 17 ++++++++
>>  xen/include/xlat.lst         |  1 +
>>  4 files changed, 109 insertions(+), 5 deletions(-)
>>
>> diff --git a/xen/common/compat/schedule.c b/xen/common/compat/schedule.c
>> index 812c550..73b0f01 100644
>> --- a/xen/common/compat/schedule.c
>> +++ b/xen/common/compat/schedule.c
>> @@ -10,6 +10,10 @@
>>  
>>  #define do_sched_op compat_sched_op
>>  
>> +#define xen_sched_pin_temp sched_pin_temp
>> +CHECK_sched_pin_temp;
>> +#undef xen_sched_pin_temp
>> +
>>  #define xen_sched_shutdown sched_shutdown
>>  CHECK_sched_shutdown;
>>  #undef xen_sched_shutdown
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index b0d4b18..653f852 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>      struct scheduler *old_ops;
>>      void *old_domdata;
>>  
>> +    for_each_vcpu ( d, v )
>> +    {
>> +        if ( v->affinity_broken )
>> +            return -EBUSY;
>> +    }
>> +
>>      domdata = SCHED_OP(c->sched, alloc_domdata, d);
>>      if ( domdata == NULL )
>>          return -ENOMEM;
>> @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu)
>>              if ( cpumask_empty(&online_affinity) &&
>>                   cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
>>              {
>> +                if ( v->affinity_broken )
>> +                {
>> +                    /* The vcpu is temporarily pinned, can't move it. */
>> +                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
>> +                    ret = -EBUSY;
>> +                    break;
>> +                }
> 
> Does this mean that if the user closes the laptop lid while one of these
> drivers has vcpu0 pinned, that Xen will crash (see
> xen/arch/x86/smpboot.c:__cpu_disable())?  Or is it the OS's job to make
> sure that all temporary pins are removed before suspending?

Yes, this must be ensured by the OS.

> Also -- have you actually tested the "cpupool move while pinned"
> functionality to make sure it actually works?  There's a weird bit in
> cpupool_unassign_cpu_helper() where after calling
> cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in the
> cpupool_free_cpus mask, even if it returns an error.  That can't be
> right, even for the existing -EAGAIN case, can it?

That should be no problem. Such a failure can be repaired easily by
adding the cpu to the cpupool again. Adding a comment seems to be a
good idea. :-)

What is wrong and even worse, schedule_cpu_switch() returning an error
will leak domlist_read_lock. I'll write another patch to correct this
issue.

> I see that you have a loop to retry this call several times in the next
> patch; but what if it fails every time -- what state is the system in?

The cpu can be added to the original cpupool via "xl cpupool-add" again.

> And, in general, what happens if the device driver gets mixed up and
> forgets to unpin the vcpu?  Is the only recourse to reboot your host (or
> deal with the fact that you can't reconfigure your cpupools)?

Unless we add a "forced" option to "xl vcpu-pin", yes.

Thanks for the thorough review,

Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-02  7:14     ` Juergen Gross
@ 2016-03-02  9:27       ` Dario Faggioli
  2016-03-02 11:19         ` Juergen Gross
  2016-03-02 15:34         ` Juergen Gross
  0 siblings, 2 replies; 26+ messages in thread
From: Dario Faggioli @ 2016-03-02  9:27 UTC (permalink / raw)
  To: Juergen Gross, George Dunlap, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	ian.jackson, david.vrabel, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 2985 bytes --]

On Wed, 2016-03-02 at 08:14 +0100, Juergen Gross wrote:
> On 01/03/16 16:52, George Dunlap wrote:
> > 
> > 
> > Also -- have you actually tested the "cpupool move while pinned"
> > functionality to make sure it actually works?  There's a weird bit
> > in
> > cpupool_unassign_cpu_helper() where after calling
> > cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in
> > the
> > cpupool_free_cpus mask, even if it returns an error.  That can't be
> > right, even for the existing -EAGAIN case, can it?
> That should be no problem. Such a failure can be repaired easily by
> adding the cpu to the cpupool again. 
>
And there's not much else one can do, I would say. When we are in
cpu_disable_scheduler(), coming from
cpupool_unassign_cpu()-->cpupool_unassign_cpu() we're already halfway
through removing the cpu from the pool (e.g., we already cleared the
relevant bit from the cpupool's cpu_valid mask).

And we don't actually want to revert that, as doing so would allow the
scheduler to start again moving vcpus to that cpu (and the following
attempts will risk failing with EAGAIN again :-D).

FWIW, I've also found that part rather weird for quite some time... But
it does indeed makes sense, IMO.

> Adding a comment seems to be a
> good idea. :-)
> 
Yep. Should we also add an error message for the user to be able to see
it, even if she can't read the comment in the source code? (Not
necessarily right there, if that would make it trigger too much... just
in a place where it can be seen in the case the user actually need to
do something).

> What is wrong and even worse, schedule_cpu_switch() returning an
> error
> will leak domlist_read_lock. 
>
Indeed, good catch. :-)

> > And, in general, what happens if the device driver gets mixed up
> > and
> > forgets to unpin the vcpu?  Is the only recourse to reboot your
> > host (or
> > deal with the fact that you can't reconfigure your cpupools)?
> Unless we add a "forced" option to "xl vcpu-pin", yes.
> 
Which would be fine to have, IMO. I'm not sure if it would better be an
`xl vcpu-pin' flag, or a separate utility (as Jan is also saying).

A separate utility would fit better the "emergency nature" of the
thing, avoiding having to clobber xl for that (as this will be the
only, pretty uncommon, case where such flag would be needed).

However, an xl flag is easier to add, easier to document and easier and
more natural to find, from the point of view of an user that really
needs it. And perhaps it could turn out useful for other situations in
future. So, I guess I'd say:
 - yes, let's add that
 - let's do it as a "force flag" of `xl vcpu-pin'.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/3] xen: silence affinity messages on suspend/resume
  2016-03-01  9:02 ` [PATCH v2 1/3] xen: silence affinity messages on suspend/resume Juergen Gross
@ 2016-03-02 11:11   ` Dario Faggioli
  0 siblings, 0 replies; 26+ messages in thread
From: Dario Faggioli @ 2016-03-02 11:11 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	ian.jackson, david.vrabel, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 873 bytes --]

On Tue, 2016-03-01 at 10:02 +0100, Juergen Gross wrote:
> When taking cpus offline for suspend or bringing them online on
> resume
> again the scheduler might issue debug messages when temporarily
> breaking vcpu affinity or restoring the original affinity settings.
> 
> The resume message can be removed completely, while the message when
> breaking affinity should only be issued if the breakage is permanent.
> 
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-02  9:27       ` Dario Faggioli
@ 2016-03-02 11:19         ` Juergen Gross
  2016-03-02 11:49           ` Dario Faggioli
  2016-03-02 15:34         ` Juergen Gross
  1 sibling, 1 reply; 26+ messages in thread
From: Juergen Gross @ 2016-03-02 11:19 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	ian.jackson, david.vrabel, jbeulich

On 02/03/16 10:27, Dario Faggioli wrote:
> On Wed, 2016-03-02 at 08:14 +0100, Juergen Gross wrote:
>> On 01/03/16 16:52, George Dunlap wrote:
>>>  
>>>
>>> Also -- have you actually tested the "cpupool move while pinned"
>>> functionality to make sure it actually works?  There's a weird bit
>>> in
>>> cpupool_unassign_cpu_helper() where after calling
>>> cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in
>>> the
>>> cpupool_free_cpus mask, even if it returns an error.  That can't be
>>> right, even for the existing -EAGAIN case, can it?
>> That should be no problem. Such a failure can be repaired easily by
>> adding the cpu to the cpupool again. 
>>
> And there's not much else one can do, I would say. When we are in
> cpu_disable_scheduler(), coming from
> cpupool_unassign_cpu()-->cpupool_unassign_cpu() we're already halfway
> through removing the cpu from the pool (e.g., we already cleared the
> relevant bit from the cpupool's cpu_valid mask).
> 
> And we don't actually want to revert that, as doing so would allow the
> scheduler to start again moving vcpus to that cpu (and the following
> attempts will risk failing with EAGAIN again :-D).

Yep.

> 
> FWIW, I've also found that part rather weird for quite some time... But
> it does indeed makes sense, IMO.
> 
>> Adding a comment seems to be a
>> good idea. :-)
>>
> Yep. Should we also add an error message for the user to be able to see
> it, even if she can't read the comment in the source code? (Not
> necessarily right there, if that would make it trigger too much... just
> in a place where it can be seen in the case the user actually need to
> do something).

I'd rather add the error message to xl. That's where the user will see
it and where he can react at once. The message can even tell the user
the correct command, which would be a very strange thing to do in the
hypervisor.

Another patch, I guess. :-)

> 
>> What is wrong and even worse, schedule_cpu_switch() returning an
>> error
>> will leak domlist_read_lock. 
>>
> Indeed, good catch. :-)
> 
>>> And, in general, what happens if the device driver gets mixed up
>>> and
>>> forgets to unpin the vcpu?  Is the only recourse to reboot your
>>> host (or
>>> deal with the fact that you can't reconfigure your cpupools)?
>> Unless we add a "forced" option to "xl vcpu-pin", yes.
>>
> Which would be fine to have, IMO. I'm not sure if it would better be an
> `xl vcpu-pin' flag, or a separate utility (as Jan is also saying).
> 
> A separate utility would fit better the "emergency nature" of the
> thing, avoiding having to clobber xl for that (as this will be the
> only, pretty uncommon, case where such flag would be needed).
> 
> However, an xl flag is easier to add, easier to document and easier and
> more natural to find, from the point of view of an user that really
> needs it. And perhaps it could turn out useful for other situations in
> future. So, I guess I'd say:
>  - yes, let's add that
>  - let's do it as a "force flag" of `xl vcpu-pin'.

Okay, patch will follow...


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-02 11:19         ` Juergen Gross
@ 2016-03-02 11:49           ` Dario Faggioli
  2016-03-02 12:12             ` Juergen Gross
  0 siblings, 1 reply; 26+ messages in thread
From: Dario Faggioli @ 2016-03-02 11:49 UTC (permalink / raw)
  To: Juergen Gross, George Dunlap, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	ian.jackson, david.vrabel, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 1053 bytes --]

On Wed, 2016-03-02 at 12:19 +0100, Juergen Gross wrote:
> On 02/03/16 10:27, Dario Faggioli wrote:
> > 
> > Yep. Should we also add an error message for the user to be able to
> > see
> > it, even if she can't read the comment in the source code? (Not
> > necessarily right there, if that would make it trigger too much...
> > just
> > in a place where it can be seen in the case the user actually need
> > to
> > do something).
> I'd rather add the error message to xl. That's where the user will
> see
> it and where he can react at once. The message can even tell the user
> the correct command, which would be a very strange thing to do in the
> hypervisor.
> 
Sure, wherever it's most useful.

> Another patch, I guess. :-)
> 
Yeah, sorry. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-02 11:49           ` Dario Faggioli
@ 2016-03-02 12:12             ` Juergen Gross
  0 siblings, 0 replies; 26+ messages in thread
From: Juergen Gross @ 2016-03-02 12:12 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	ian.jackson, david.vrabel, jbeulich

On 02/03/16 12:49, Dario Faggioli wrote:
> On Wed, 2016-03-02 at 12:19 +0100, Juergen Gross wrote:
>> On 02/03/16 10:27, Dario Faggioli wrote:
>>>  
>>> Yep. Should we also add an error message for the user to be able to
>>> see
>>> it, even if she can't read the comment in the source code? (Not
>>> necessarily right there, if that would make it trigger too much...
>>> just
>>> in a place where it can be seen in the case the user actually need
>>> to
>>> do something).
>> I'd rather add the error message to xl. That's where the user will
>> see
>> it and where he can react at once. The message can even tell the user
>> the correct command, which would be a very strange thing to do in the
>> hypervisor.
>>
> Sure, wherever it's most useful.
> 
>> Another patch, I guess. :-)
>>
> Yeah, sorry. :-)

Sarcio ergo sum! :-)


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-02  9:27       ` Dario Faggioli
  2016-03-02 11:19         ` Juergen Gross
@ 2016-03-02 15:34         ` Juergen Gross
  2016-03-02 16:03           ` Dario Faggioli
  1 sibling, 1 reply; 26+ messages in thread
From: Juergen Gross @ 2016-03-02 15:34 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	ian.jackson, david.vrabel, jbeulich

On 02/03/16 10:27, Dario Faggioli wrote:
> On Wed, 2016-03-02 at 08:14 +0100, Juergen Gross wrote:
>> On 01/03/16 16:52, George Dunlap wrote:
>>>  
>>>
>>> Also -- have you actually tested the "cpupool move while pinned"
>>> functionality to make sure it actually works?  There's a weird bit
>>> in
>>> cpupool_unassign_cpu_helper() where after calling
>>> cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in
>>> the
>>> cpupool_free_cpus mask, even if it returns an error.  That can't be
>>> right, even for the existing -EAGAIN case, can it?
>> That should be no problem. Such a failure can be repaired easily by
>> adding the cpu to the cpupool again. 
>>
> And there's not much else one can do, I would say. When we are in
> cpu_disable_scheduler(), coming from
> cpupool_unassign_cpu()-->cpupool_unassign_cpu() we're already halfway
> through removing the cpu from the pool (e.g., we already cleared the
> relevant bit from the cpupool's cpu_valid mask).
> 
> And we don't actually want to revert that, as doing so would allow the
> scheduler to start again moving vcpus to that cpu (and the following
> attempts will risk failing with EAGAIN again :-D).
> 
> FWIW, I've also found that part rather weird for quite some time... But
> it does indeed makes sense, IMO.
> 
>> Adding a comment seems to be a
>> good idea. :-)
>>
> Yep. Should we also add an error message for the user to be able to see
> it, even if she can't read the comment in the source code? (Not
> necessarily right there, if that would make it trigger too much... just
> in a place where it can be seen in the case the user actually need to
> do something).
> 
>> What is wrong and even worse, schedule_cpu_switch() returning an
>> error
>> will leak domlist_read_lock. 
>>
> Indeed, good catch. :-)
> 
>>> And, in general, what happens if the device driver gets mixed up
>>> and
>>> forgets to unpin the vcpu?  Is the only recourse to reboot your
>>> host (or
>>> deal with the fact that you can't reconfigure your cpupools)?
>> Unless we add a "forced" option to "xl vcpu-pin", yes.
>>
> Which would be fine to have, IMO. I'm not sure if it would better be an
> `xl vcpu-pin' flag, or a separate utility (as Jan is also saying).
> 
> A separate utility would fit better the "emergency nature" of the
> thing, avoiding having to clobber xl for that (as this will be the
> only, pretty uncommon, case where such flag would be needed).
> 
> However, an xl flag is easier to add, easier to document and easier and
> more natural to find, from the point of view of an user that really
> needs it. And perhaps it could turn out useful for other situations in
> future. So, I guess I'd say:
>  - yes, let's add that
>  - let's do it as a "force flag" of `xl vcpu-pin'.

Which raises the question: how to do that on the libxl level?

a) expand libxl_set_vcpuaffinity() with another parameter (is this even
   possible? I could do some ifdeffery, but the API would change...)

b) add a libxl_set_vcpuaffinity_force() variant

c) imply the force flag by specifying both hard and soft maps as NULL
   (it _is_ basically just that: keep both affinity sets), implying that
   it makes no sense to specify any affinities with the -f flag (which
   renders the "force" meaning rather strange, would be more a "restore"
   now).


Juergen

> 
> Regards,
> Dario
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-02 15:34         ` Juergen Gross
@ 2016-03-02 16:03           ` Dario Faggioli
  2016-03-02 17:15             ` Juergen Gross
  0 siblings, 1 reply; 26+ messages in thread
From: Dario Faggioli @ 2016-03-02 16:03 UTC (permalink / raw)
  To: Juergen Gross, George Dunlap, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	ian.jackson, david.vrabel, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 2270 bytes --]

On Wed, 2016-03-02 at 16:34 +0100, Juergen Gross wrote:
> On 02/03/16 10:27, Dario Faggioli wrote:
> > 
> > However, an xl flag is easier to add, easier to document and easier
> > and
> > more natural to find, from the point of view of an user that really
> > needs it. And perhaps it could turn out useful for other situations
> > in
> > future. So, I guess I'd say:
> >  - yes, let's add that
> >  - let's do it as a "force flag" of `xl vcpu-pin'.
> Which raises the question: how to do that on the libxl level?
> 
Ah, right.

> a) expand libxl_set_vcpuaffinity() with another parameter (is this
> even
>    possible? I could do some ifdeffery, but the API would change...)
> 
> b) add a libxl_set_vcpuaffinity_force() variant
> 
> c) imply the force flag by specifying both hard and soft maps as NULL
>    (it _is_ basically just that: keep both affinity sets), implying
> that
>    it makes no sense to specify any affinities with the -f flag
> (which
>    renders the "force" meaning rather strange, would be more a
> "restore"
>    now).
> 
Eheh, tools' maintainers' call. My preference would be b).

I don't like a), mostly because that would mean everyone will need to
specify a parameter that it is really only necessary in special cases.

I could live with c), but it indeed makes the semantic too convoluted
for my taste.

I guess, however, that even if going for b), we need to decide whether
to require a cpumask or not, and what to do if one passes NULL. Maybe
we can have a cpumask parameter and,
 - if it is not NULL, force affinity to that,
 - if it is NULL, just 'restore';
what do you think?

Actually, at Xen level, the override only acts on hard affinity...
should libxl take only one cpumask (for hard affinity only), or both
hard and soft?
I'd say just one for hard is enough, unless we want to make space for a
potential future situation where we will want to break and restore soft
affinity as well...

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-02 16:03           ` Dario Faggioli
@ 2016-03-02 17:15             ` Juergen Gross
  0 siblings, 0 replies; 26+ messages in thread
From: Juergen Gross @ 2016-03-02 17:15 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap, xen-devel
  Cc: wei.liu2, stefano.stabellini, george.dunlap, andrew.cooper3,
	ian.jackson, david.vrabel, jbeulich

On 02/03/16 17:03, Dario Faggioli wrote:
> On Wed, 2016-03-02 at 16:34 +0100, Juergen Gross wrote:
>> On 02/03/16 10:27, Dario Faggioli wrote:
>>>  
>>> However, an xl flag is easier to add, easier to document and easier
>>> and
>>> more natural to find, from the point of view of an user that really
>>> needs it. And perhaps it could turn out useful for other situations
>>> in
>>> future. So, I guess I'd say:
>>>  - yes, let's add that
>>>  - let's do it as a "force flag" of `xl vcpu-pin'.
>> Which raises the question: how to do that on the libxl level?
>>
> Ah, right.
> 
>> a) expand libxl_set_vcpuaffinity() with another parameter (is this
>> even
>>    possible? I could do some ifdeffery, but the API would change...)
>>
>> b) add a libxl_set_vcpuaffinity_force() variant
>>
>> c) imply the force flag by specifying both hard and soft maps as NULL
>>    (it _is_ basically just that: keep both affinity sets), implying
>> that
>>    it makes no sense to specify any affinities with the -f flag
>> (which
>>    renders the "force" meaning rather strange, would be more a
>> "restore"
>>    now).
>>
> Eheh, tools' maintainers' call. My preference would be b).
> 
> I don't like a), mostly because that would mean everyone will need to
> specify a parameter that it is really only necessary in special cases.
> 
> I could live with c), but it indeed makes the semantic too convoluted
> for my taste.
> 
> I guess, however, that even if going for b), we need to decide whether
> to require a cpumask or not, and what to do if one passes NULL. Maybe
> we can have a cpumask parameter and,
>  - if it is not NULL, force affinity to that,
>  - if it is NULL, just 'restore';
> what do you think?

I would just let the force flag restore the old setting (thus clearing
the affinity_broken flag) and then apply the normal affinity settings.

> Actually, at Xen level, the override only acts on hard affinity...
> should libxl take only one cpumask (for hard affinity only), or both
> hard and soft?

Just as the user is specifying: 0, 1 or 2.

> I'd say just one for hard is enough, unless we want to make space for a
> potential future situation where we will want to break and restore soft
> affinity as well...

The force flag would be just an add-on. That's rather easy in the
hypervisor and in the tools.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-01 15:52   ` George Dunlap
  2016-03-01 15:55     ` George Dunlap
  2016-03-02  7:14     ` Juergen Gross
@ 2016-03-02 17:21     ` Anshul Makkar
  2016-03-03  5:31       ` Juergen Gross
  2 siblings, 1 reply; 26+ messages in thread
From: Anshul Makkar @ 2016-03-02 17:21 UTC (permalink / raw)
  To: Juergen Gross, xen-devel

Hi,


-----Original Message-----
From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of George Dunlap
Sent: 01 March 2016 15:53
To: Juergen Gross <jgross@suse.com>; xen-devel@lists.xen.org
Cc: Wei Liu <wei.liu2@citrix.com>; Stefano Stabellini <Stefano.Stabellini@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Dario Faggioli <dario.faggioli@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; David Vrabel <david.vrabel@citrix.com>; jbeulich@suse.com
Subject: Re: [Xen-devel] [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu

On 01/03/16 09:02, Juergen Gross wrote:
> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be 
> called on physical cpu 0 only. Linux drivers like dcdbas or i8k try to 
> achieve this by pinning the running thread to cpu 0, but in Dom0 this 
> is not enough: the vcpu must be pinned to physical cpu 0 via Xen, too.
> 
> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op 
> hypercall to achieve this. It is taking a physical cpu number as 
> parameter. If pinning is possible (the calling domain has the 
> privilege to make the call and the cpu is available in the domain's
> cpupool) the calling vcpu is pinned to the specified cpu. The old cpu 
> affinity is saved. To undo the temporary pinning a cpu -1 is 
> specified. This will restore the original cpu affinity for the vcpu.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V2: - limit operation to hardware domain as suggested by Jan Beulich
>     - some style issues corrected as requested by Jan Beulich
>     - use fixed width types in interface as requested by Jan Beulich
>     - add compat layer checking as requested by Jan Beulich
> ---
>  xen/common/compat/schedule.c |  4 ++
>  xen/common/schedule.c        | 92 +++++++++++++++++++++++++++++++++++++++++---
>  xen/include/public/sched.h   | 17 ++++++++
>  xen/include/xlat.lst         |  1 +
>  4 files changed, 109 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/common/compat/schedule.c 
> b/xen/common/compat/schedule.c index 812c550..73b0f01 100644
> --- a/xen/common/compat/schedule.c
> +++ b/xen/common/compat/schedule.c
> @@ -10,6 +10,10 @@
>  
>  #define do_sched_op compat_sched_op
>  
> +#define xen_sched_pin_temp sched_pin_temp CHECK_sched_pin_temp; 
> +#undef xen_sched_pin_temp
> +
>  #define xen_sched_shutdown sched_shutdown  CHECK_sched_shutdown;  
> #undef xen_sched_shutdown diff --git a/xen/common/schedule.c 
> b/xen/common/schedule.c index b0d4b18..653f852 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>      struct scheduler *old_ops;
>      void *old_domdata;
>  
> +    for_each_vcpu ( d, v )
> +    {
> +        if ( v->affinity_broken )
> +            return -EBUSY;
> +    }
> +
>      domdata = SCHED_OP(c->sched, alloc_domdata, d);
>      if ( domdata == NULL )
>          return -ENOMEM;
> @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu)
>              if ( cpumask_empty(&online_affinity) &&
>                   cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
>              {
> +                if ( v->affinity_broken )
> +                {
> +                    /* The vcpu is temporarily pinned, can't move it. */
> +                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
> +                    ret = -EBUSY;
> +                    break;
> +                }

Does this mean that if the user closes the laptop lid while one of these drivers has vcpu0 pinned, that Xen will crash (see xen/arch/x86/smpboot.c:__cpu_disable())?  Or is it the OS's job to make sure that all temporary pins are removed before suspending?

Also -- have you actually tested the "cpupool move while pinned"
functionality to make sure it actually works?  There's a weird bit in
cpupool_unassign_cpu_helper() where after calling cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in the cpupool_free_cpus mask, even if it returns an error.  That can't be right, even for the existing -EAGAIN case, can it?

I see that you have a loop to retry this call several times in the next patch; but what if it fails every time -- what state is the system in?

And, in general, what happens if the device driver gets mixed up and forgets to unpin the vcpu?  Is the only recourse to reboot your host (or deal with the fact that you can't reconfigure your cpupools)?

 -George

Sorry, lost the original thread so replying at the top of mail chain.

+static XSM_INLINE int xsm_schedop_pin_temp(XSM_DEFAULT_VOID) 
+{ 
+ XSM_ASSERT_ACTION(XSM_PRIV); 
+ return xsm_default_action(action, current->domain, NULL); 
+}

Is the intention is to restrict the hypercall usage to dom0 only ?

Anshul Makkar

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
  2016-03-02 17:21     ` Anshul Makkar
@ 2016-03-03  5:31       ` Juergen Gross
  0 siblings, 0 replies; 26+ messages in thread
From: Juergen Gross @ 2016-03-03  5:31 UTC (permalink / raw)
  To: Anshul Makkar, xen-devel

On 02/03/16 18:21, Anshul Makkar wrote:
> Hi,
> 
> 
> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of George Dunlap
> Sent: 01 March 2016 15:53
> To: Juergen Gross <jgross@suse.com>; xen-devel@lists.xen.org
> Cc: Wei Liu <wei.liu2@citrix.com>; Stefano Stabellini <Stefano.Stabellini@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Dario Faggioli <dario.faggioli@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; David Vrabel <david.vrabel@citrix.com>; jbeulich@suse.com
> Subject: Re: [Xen-devel] [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu
> 
> On 01/03/16 09:02, Juergen Gross wrote:
>> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be 
>> called on physical cpu 0 only. Linux drivers like dcdbas or i8k try to 
>> achieve this by pinning the running thread to cpu 0, but in Dom0 this 
>> is not enough: the vcpu must be pinned to physical cpu 0 via Xen, too.
>>
>> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op 
>> hypercall to achieve this. It is taking a physical cpu number as 
>> parameter. If pinning is possible (the calling domain has the 
>> privilege to make the call and the cpu is available in the domain's
>> cpupool) the calling vcpu is pinned to the specified cpu. The old cpu 
>> affinity is saved. To undo the temporary pinning a cpu -1 is 
>> specified. This will restore the original cpu affinity for the vcpu.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2: - limit operation to hardware domain as suggested by Jan Beulich
>>     - some style issues corrected as requested by Jan Beulich
>>     - use fixed width types in interface as requested by Jan Beulich
>>     - add compat layer checking as requested by Jan Beulich
>> ---
>>  xen/common/compat/schedule.c |  4 ++
>>  xen/common/schedule.c        | 92 +++++++++++++++++++++++++++++++++++++++++---
>>  xen/include/public/sched.h   | 17 ++++++++
>>  xen/include/xlat.lst         |  1 +
>>  4 files changed, 109 insertions(+), 5 deletions(-)
>>
>> diff --git a/xen/common/compat/schedule.c 
>> b/xen/common/compat/schedule.c index 812c550..73b0f01 100644
>> --- a/xen/common/compat/schedule.c
>> +++ b/xen/common/compat/schedule.c
>> @@ -10,6 +10,10 @@
>>  
>>  #define do_sched_op compat_sched_op
>>  
>> +#define xen_sched_pin_temp sched_pin_temp CHECK_sched_pin_temp; 
>> +#undef xen_sched_pin_temp
>> +
>>  #define xen_sched_shutdown sched_shutdown  CHECK_sched_shutdown;  
>> #undef xen_sched_shutdown diff --git a/xen/common/schedule.c 
>> b/xen/common/schedule.c index b0d4b18..653f852 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>      struct scheduler *old_ops;
>>      void *old_domdata;
>>  
>> +    for_each_vcpu ( d, v )
>> +    {
>> +        if ( v->affinity_broken )
>> +            return -EBUSY;
>> +    }
>> +
>>      domdata = SCHED_OP(c->sched, alloc_domdata, d);
>>      if ( domdata == NULL )
>>          return -ENOMEM;
>> @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu)
>>              if ( cpumask_empty(&online_affinity) &&
>>                   cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
>>              {
>> +                if ( v->affinity_broken )
>> +                {
>> +                    /* The vcpu is temporarily pinned, can't move it. */
>> +                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
>> +                    ret = -EBUSY;
>> +                    break;
>> +                }
> 
> Does this mean that if the user closes the laptop lid while one of these drivers has vcpu0 pinned, that Xen will crash (see xen/arch/x86/smpboot.c:__cpu_disable())?  Or is it the OS's job to make sure that all temporary pins are removed before suspending?
> 
> Also -- have you actually tested the "cpupool move while pinned"
> functionality to make sure it actually works?  There's a weird bit in
> cpupool_unassign_cpu_helper() where after calling cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in the cpupool_free_cpus mask, even if it returns an error.  That can't be right, even for the existing -EAGAIN case, can it?
> 
> I see that you have a loop to retry this call several times in the next patch; but what if it fails every time -- what state is the system in?
> 
> And, in general, what happens if the device driver gets mixed up and forgets to unpin the vcpu?  Is the only recourse to reboot your host (or deal with the fact that you can't reconfigure your cpupools)?
> 
>  -George
> 
> Sorry, lost the original thread so replying at the top of mail chain.
> 
> +static XSM_INLINE int xsm_schedop_pin_temp(XSM_DEFAULT_VOID) 
> +{ 
> + XSM_ASSERT_ACTION(XSM_PRIV); 
> + return xsm_default_action(action, current->domain, NULL); 
> +}
> 
> Is the intention is to restrict the hypercall usage to dom0 only ?

To be more precise: to the hardware domain (the patch sniplet you are
referencing was part of V1 of the series, it isn't existing in V2 any
longer).


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-03-03  5:31 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-01  9:02 [PATCH v2 0/3] add hypercall option to temporarily pin a vcpu Juergen Gross
2016-03-01  9:02 ` [PATCH v2 1/3] xen: silence affinity messages on suspend/resume Juergen Gross
2016-03-02 11:11   ` Dario Faggioli
2016-03-01  9:02 ` [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu Juergen Gross
2016-03-01 11:27   ` Jan Beulich
2016-03-01 11:55   ` David Vrabel
2016-03-01 11:58     ` Juergen Gross
2016-03-01 12:15       ` Dario Faggioli
2016-03-01 14:02         ` George Dunlap
     [not found]   ` <56D58ABF02000078000D7C46@suse.com>
2016-03-01 11:58     ` Juergen Gross
2016-03-01 15:52   ` George Dunlap
2016-03-01 15:55     ` George Dunlap
2016-03-01 16:11       ` Jan Beulich
2016-03-02  7:14     ` Juergen Gross
2016-03-02  9:27       ` Dario Faggioli
2016-03-02 11:19         ` Juergen Gross
2016-03-02 11:49           ` Dario Faggioli
2016-03-02 12:12             ` Juergen Gross
2016-03-02 15:34         ` Juergen Gross
2016-03-02 16:03           ` Dario Faggioli
2016-03-02 17:15             ` Juergen Gross
2016-03-02 17:21     ` Anshul Makkar
2016-03-03  5:31       ` Juergen Gross
2016-03-01  9:02 ` [PATCH v2 3/3] libxc: do some retries in xc_cpupool_removecpu() for EBUSY case Juergen Gross
2016-03-01 11:58   ` Wei Liu
2016-03-01 11:59     ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.