All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH v2 0/2] xen: enhance temporary vcpu pinning
@ 2019-07-23 18:25 Juergen Gross
  2019-07-23 18:25 ` [Xen-devel] [PATCH v2 1/2] xen/x86: cleanup unused NMI/MCE code Juergen Gross
  2019-07-23 18:25 ` [Xen-devel] [PATCH v2 2/2] xen: merge temporary vcpu pinning scenarios Juergen Gross
  0 siblings, 2 replies; 9+ messages in thread
From: Juergen Gross @ 2019-07-23 18:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

While trying to handle temporary vcpu pinnings in a sane way in my
core scheduling series I found a nice way to simplify the temporary
pinning cases.

I'm sending the two patches independently from my core scheduling
series as they should be considered even without core scheduling.

Changes in V2:
- original patch 1 dropped, as already applied
- new patch 1 removing dead coding and unneeded pinning
- addressed various comments in patch 2

Juergen Gross (2):
  xen/x86: cleanup unused NMI/MCE code
  xen: merge temporary vcpu pinning scenarios

 xen/arch/x86/pv/traps.c        | 88 ++++++++----------------------------------
 xen/arch/x86/traps.c           | 10 +----
 xen/common/domain.c            |  4 +-
 xen/common/domctl.c            |  2 +-
 xen/common/schedule.c          | 46 +++++++++++++++-------
 xen/common/wait.c              | 30 +++++---------
 xen/include/asm-x86/pv/traps.h |  8 ++--
 xen/include/asm-x86/softirq.h  |  2 +-
 xen/include/xen/sched.h        | 10 ++---
 9 files changed, 72 insertions(+), 128 deletions(-)

-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Xen-devel] [PATCH v2 1/2] xen/x86: cleanup unused NMI/MCE code
  2019-07-23 18:25 [Xen-devel] [PATCH v2 0/2] xen: enhance temporary vcpu pinning Juergen Gross
@ 2019-07-23 18:25 ` Juergen Gross
  2019-07-23 18:48   ` Andrew Cooper
  2019-07-23 18:25 ` [Xen-devel] [PATCH v2 2/2] xen: merge temporary vcpu pinning scenarios Juergen Gross
  1 sibling, 1 reply; 9+ messages in thread
From: Juergen Gross @ 2019-07-23 18:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Roger Pau Monné

pv_raise_interrupt() is only called for NMIs these days, so the MCE
specific part can be removed. Rename pv_raise_interrupt() to
pv_raise_nmi() and NMI_MCE_SOFTIRQ to NMI_SOFTIRQ.

Additionally there is no need to pin the vcpu the NMI is delivered
to, that is a leftover of (already removed) MCE handling. So remove
the pinning, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/pv/traps.c        | 88 ++++++++----------------------------------
 xen/arch/x86/traps.c           | 10 +----
 xen/common/domain.c            |  3 --
 xen/include/asm-x86/pv/traps.h |  8 ++--
 xen/include/asm-x86/softirq.h  |  2 +-
 xen/include/xen/sched.h        |  2 -
 6 files changed, 23 insertions(+), 90 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 1740784ff2..9436c80047 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -136,47 +136,21 @@ bool set_guest_nmi_trapbounce(void)
     return !null_trap_bounce(curr, tb);
 }
 
-struct softirq_trap {
-    struct domain *domain;   /* domain to inject trap */
-    struct vcpu *vcpu;       /* vcpu to inject trap */
-    unsigned int processor;  /* physical cpu to inject trap */
-};
+static DEFINE_PER_CPU(struct vcpu *, softirq_nmi_vcpu);
 
-static DEFINE_PER_CPU(struct softirq_trap, softirq_trap);
-
-static void nmi_mce_softirq(void)
+static void nmi_softirq(void)
 {
     unsigned int cpu = smp_processor_id();
-    struct softirq_trap *st = &per_cpu(softirq_trap, cpu);
-
-    BUG_ON(st->vcpu == NULL);
-
-    /*
-     * Set the tmp value unconditionally, so that the check in the iret
-     * hypercall works.
-     */
-    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
-                 st->vcpu->cpu_hard_affinity);
+    struct vcpu **v_ptr = &per_cpu(softirq_nmi_vcpu, cpu);
 
-    if ( (cpu != st->processor) ||
-         (st->processor != st->vcpu->processor) )
-    {
-
-        /*
-         * We are on a different physical cpu.  Make sure to wakeup the vcpu on
-         * the specified processor.
-         */
-        vcpu_set_hard_affinity(st->vcpu, cpumask_of(st->processor));
-
-        /* Affinity is restored in the iret hypercall. */
-    }
+    BUG_ON(*v_ptr == NULL);
 
     /*
-     * Only used to defer wakeup of domain/vcpu to a safe (non-NMI/MCE)
+     * Only used to defer wakeup of domain/vcpu to a safe (non-NMI)
      * context.
      */
-    vcpu_kick(st->vcpu);
-    st->vcpu = NULL;
+    vcpu_kick(*v_ptr);
+    *v_ptr = NULL;
 }
 
 void __init pv_trap_init(void)
@@ -189,50 +163,22 @@ void __init pv_trap_init(void)
     _set_gate(idt_table + LEGACY_SYSCALL_VECTOR, SYS_DESC_trap_gate, 3,
               &int80_direct_trap);
 
-    open_softirq(NMI_MCE_SOFTIRQ, nmi_mce_softirq);
+    open_softirq(NMI_SOFTIRQ, nmi_softirq);
 }
 
-int pv_raise_interrupt(struct vcpu *v, uint8_t vector)
+int pv_raise_nmi(struct vcpu *v)
 {
-    struct softirq_trap *st = &per_cpu(softirq_trap, smp_processor_id());
+    struct vcpu **v_ptr = &per_cpu(softirq_nmi_vcpu, smp_processor_id());
 
-    switch ( vector )
+    if ( cmpxchgptr(v_ptr, NULL, v) )
+        return -EBUSY;
+    if ( !test_and_set_bool(v->nmi_pending) )
     {
-    case TRAP_nmi:
-        if ( cmpxchgptr(&st->vcpu, NULL, v) )
-            return -EBUSY;
-        if ( !test_and_set_bool(v->nmi_pending) )
-        {
-            st->domain = v->domain;
-            st->processor = v->processor;
-
-            /* Not safe to wake up a vcpu here */
-            raise_softirq(NMI_MCE_SOFTIRQ);
-            return 0;
-        }
-        st->vcpu = NULL;
-        break;
-
-    case TRAP_machine_check:
-        if ( cmpxchgptr(&st->vcpu, NULL, v) )
-            return -EBUSY;
-
-        /*
-         * We are called by the machine check (exception or polling) handlers
-         * on the physical CPU that reported a machine check error.
-         */
-        if ( !test_and_set_bool(v->mce_pending) )
-        {
-            st->domain = v->domain;
-            st->processor = v->processor;
-
-            /* not safe to wake up a vcpu here */
-            raise_softirq(NMI_MCE_SOFTIRQ);
-            return 0;
-        }
-        st->vcpu = NULL;
-        break;
+        /* Not safe to wake up a vcpu here */
+        raise_softirq(NMI_SOFTIRQ);
+        return 0;
     }
+    *v_ptr = NULL;
 
     /* Delivery failed */
     return -EIO;
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 25b4b47e5e..08d7edc568 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1600,14 +1600,6 @@ void async_exception_cleanup(struct vcpu *curr)
     if ( !curr->async_exception_mask )
         return;
 
-    /* Restore affinity.  */
-    if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) &&
-         !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) )
-    {
-        vcpu_set_hard_affinity(curr, curr->cpu_hard_affinity_tmp);
-        cpumask_clear(curr->cpu_hard_affinity_tmp);
-    }
-
     if ( !(curr->async_exception_mask & (curr->async_exception_mask - 1)) )
         trap = __scanbit(curr->async_exception_mask, VCPU_TRAP_NONE);
     else
@@ -1634,7 +1626,7 @@ static void nmi_hwdom_report(unsigned int reason_idx)
 
     set_bit(reason_idx, nmi_reason(d));
 
-    pv_raise_interrupt(d->vcpu[0], TRAP_nmi);
+    pv_raise_nmi(d->vcpu[0]);
 }
 
 static void pci_serr_error(const struct cpu_user_regs *regs)
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 55aa759b75..bc56a51815 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -133,7 +133,6 @@ static void vcpu_info_reset(struct vcpu *v)
 static void vcpu_destroy(struct vcpu *v)
 {
     free_cpumask_var(v->cpu_hard_affinity);
-    free_cpumask_var(v->cpu_hard_affinity_tmp);
     free_cpumask_var(v->cpu_hard_affinity_saved);
     free_cpumask_var(v->cpu_soft_affinity);
 
@@ -161,7 +160,6 @@ struct vcpu *vcpu_create(
     grant_table_init_vcpu(v);
 
     if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
-         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
          !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
          !zalloc_cpumask_var(&v->cpu_soft_affinity) )
         goto fail;
@@ -1269,7 +1267,6 @@ int vcpu_reset(struct vcpu *v)
     v->async_exception_mask = 0;
     memset(v->async_exception_state, 0, sizeof(v->async_exception_state));
 #endif
-    cpumask_clear(v->cpu_hard_affinity_tmp);
     clear_bit(_VPF_blocked, &v->pause_flags);
     clear_bit(_VPF_in_reset, &v->pause_flags);
 
diff --git a/xen/include/asm-x86/pv/traps.h b/xen/include/asm-x86/pv/traps.h
index fcc75f5e9a..47d6cf5fc4 100644
--- a/xen/include/asm-x86/pv/traps.h
+++ b/xen/include/asm-x86/pv/traps.h
@@ -27,8 +27,8 @@
 
 void pv_trap_init(void);
 
-/* Deliver interrupt to PV guest. Return 0 on success. */
-int pv_raise_interrupt(struct vcpu *v, uint8_t vector);
+/* Deliver NMI to PV guest. Return 0 on success. */
+int pv_raise_nmi(struct vcpu *v);
 
 int pv_emulate_privileged_op(struct cpu_user_regs *regs);
 void pv_emulate_gate_op(struct cpu_user_regs *regs);
@@ -46,8 +46,8 @@ static inline bool pv_trap_callback_registered(const struct vcpu *v,
 
 static inline void pv_trap_init(void) {}
 
-/* Deliver interrupt to PV guest. Return 0 on success. */
-static inline int pv_raise_interrupt(struct vcpu *v, uint8_t vector) { return -EOPNOTSUPP; }
+/* Deliver NMI to PV guest. Return 0 on success. */
+static inline int pv_raise_nmi(struct vcpu *v) { return -EOPNOTSUPP; }
 
 static inline int pv_emulate_privileged_op(struct cpu_user_regs *regs) { return 0; }
 static inline void pv_emulate_gate_op(struct cpu_user_regs *regs) {}
diff --git a/xen/include/asm-x86/softirq.h b/xen/include/asm-x86/softirq.h
index 5c1a7db566..0b7a77f11f 100644
--- a/xen/include/asm-x86/softirq.h
+++ b/xen/include/asm-x86/softirq.h
@@ -1,7 +1,7 @@
 #ifndef __ASM_SOFTIRQ_H__
 #define __ASM_SOFTIRQ_H__
 
-#define NMI_MCE_SOFTIRQ        (NR_COMMON_SOFTIRQS + 0)
+#define NMI_SOFTIRQ            (NR_COMMON_SOFTIRQS + 0)
 #define TIME_CALIBRATE_SOFTIRQ (NR_COMMON_SOFTIRQS + 1)
 #define VCPU_KICK_SOFTIRQ      (NR_COMMON_SOFTIRQS + 2)
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index b40c8fd138..c197e93d73 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -245,8 +245,6 @@ struct vcpu
 
     /* Bitmask of CPUs on which this VCPU may run. */
     cpumask_var_t    cpu_hard_affinity;
-    /* Used to change affinity temporarily. */
-    cpumask_var_t    cpu_hard_affinity_tmp;
     /* Used to restore affinity across S3. */
     cpumask_var_t    cpu_hard_affinity_saved;
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Xen-devel] [PATCH v2 2/2] xen: merge temporary vcpu pinning scenarios
  2019-07-23 18:25 [Xen-devel] [PATCH v2 0/2] xen: enhance temporary vcpu pinning Juergen Gross
  2019-07-23 18:25 ` [Xen-devel] [PATCH v2 1/2] xen/x86: cleanup unused NMI/MCE code Juergen Gross
@ 2019-07-23 18:25 ` Juergen Gross
  2019-07-23 18:53   ` Andrew Cooper
  2019-07-24 10:07   ` Jan Beulich
  1 sibling, 2 replies; 9+ messages in thread
From: Juergen Gross @ 2019-07-23 18:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli

Today there are two scenarios which are pinning vcpus temporarily to
a single physical cpu:

- wait_event() handling
- vcpu_pin_override() handling

Each of those cases are handled independently today using their own
temporary cpumask to save the old affinity settings.

The two cases can be combined as the first case will only pin a vcpu to
the physical cpu it is already running on, while vcpu_pin_override() is
allowed to fail.

So merge the two temporary pinning scenarios by only using one cpumask
and a per-vcpu bitmask for specifying which of the scenarios is
currently active (they are allowed to nest).

Note that we don't need to call domain_update_node_affinity() as we
are only pinning for a brief period of time.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- removed the NMI/MCE case
- rename vcpu_set_tmp_affinity() (Jan Beulich)
- remove vcpu_pin_override() wrapper (Andrew Cooper)
- current -> curr (Jan Beulich, Andrew Cooper)
- make cpu parameter unsigned int (Jan Beulich)
- add comment (Dario Faggioli)
---
 xen/common/domain.c     |  1 +
 xen/common/domctl.c     |  2 +-
 xen/common/schedule.c   | 46 ++++++++++++++++++++++++++++++++--------------
 xen/common/wait.c       | 30 ++++++++++--------------------
 xen/include/xen/sched.h |  8 +++++---
 5 files changed, 49 insertions(+), 38 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index bc56a51815..e8e850796e 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1267,6 +1267,7 @@ int vcpu_reset(struct vcpu *v)
     v->async_exception_mask = 0;
     memset(v->async_exception_state, 0, sizeof(v->async_exception_state));
 #endif
+    v->affinity_broken = 0;
     clear_bit(_VPF_blocked, &v->pause_flags);
     clear_bit(_VPF_in_reset, &v->pause_flags);
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 72a44953d0..fa260ce5fb 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -654,7 +654,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 
             /* Undo a stuck SCHED_pin_override? */
             if ( vcpuaff->flags & XEN_VCPUAFFINITY_FORCE )
-                vcpu_pin_override(v, -1);
+                vcpu_temporary_affinity(v, NR_CPUS, VCPU_AFFINITY_OVERRIDE);
 
             ret = 0;
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 349f9624f5..508176a142 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1106,43 +1106,59 @@ void watchdog_domain_destroy(struct domain *d)
         kill_timer(&d->watchdog_timer[i]);
 }
 
-int vcpu_pin_override(struct vcpu *v, int cpu)
+/*
+ * Pin a vcpu temporarily to a specific CPU (or restore old pinning state if
+ * cpu is NR_CPUS).
+ * Temporary pinning can be done due to two reasons, which may be nested:
+ * - VCPU_AFFINITY_OVERRIDE (requested by guest): is allowed to fail in case
+ *   of a conflict (e.g. in case cpupool doesn't include requested CPU, or
+ *   another conflicting temporary pinning is already in effect.
+ * - VCPU_AFFINITY_WAIT (called by wait_event(): only used to pin vcpu to the
+ *   CPU it is just running on. Can't fail if used properly.
+ */
+int vcpu_temporary_affinity(struct vcpu *v, unsigned int cpu, uint8_t reason)
 {
     spinlock_t *lock;
     int ret = -EINVAL;
+    bool migrate;
 
     lock = vcpu_schedule_lock_irq(v);
 
-    if ( cpu < 0 )
+    if ( cpu == NR_CPUS )
     {
-        if ( v->affinity_broken )
+        if ( v->affinity_broken & reason )
         {
-            sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
-            v->affinity_broken = 0;
             ret = 0;
+            v->affinity_broken &= ~reason;
         }
+        if ( !ret && !v->affinity_broken )
+            sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
     }
     else if ( cpu < nr_cpu_ids )
     {
-        if ( v->affinity_broken )
+        if ( (v->affinity_broken & reason) ||
+             (v->affinity_broken && v->processor != cpu) )
             ret = -EBUSY;
         else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) )
         {
-            cpumask_copy(v->cpu_hard_affinity_saved, v->cpu_hard_affinity);
-            v->affinity_broken = 1;
-            sched_set_affinity(v, cpumask_of(cpu), NULL);
+            if ( !v->affinity_broken )
+            {
+                cpumask_copy(v->cpu_hard_affinity_saved, v->cpu_hard_affinity);
+                sched_set_affinity(v, cpumask_of(cpu), NULL);
+            }
+            v->affinity_broken |= reason;
             ret = 0;
         }
     }
 
-    if ( ret == 0 )
+    migrate = !ret && !cpumask_test_cpu(v->processor, v->cpu_hard_affinity);
+    if ( migrate )
         vcpu_migrate_start(v);
 
     vcpu_schedule_unlock_irq(lock, v);
 
-    domain_update_node_affinity(v->domain);
-
-    vcpu_migrate_finish(v);
+    if ( migrate )
+        vcpu_migrate_finish(v);
 
     return ret;
 }
@@ -1258,6 +1274,7 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case SCHEDOP_pin_override:
     {
         struct sched_pin_override sched_pin_override;
+        unsigned int cpu;
 
         ret = -EPERM;
         if ( !is_hardware_domain(current->domain) )
@@ -1267,7 +1284,8 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&sched_pin_override, arg, 1) )
             break;
 
-        ret = vcpu_pin_override(current, sched_pin_override.pcpu);
+        cpu = sched_pin_override.pcpu < 0 ? NR_CPUS : sched_pin_override.pcpu;
+        ret = vcpu_temporary_affinity(current, cpu, VCPU_AFFINITY_OVERRIDE);
 
         break;
     }
diff --git a/xen/common/wait.c b/xen/common/wait.c
index 4f830a14e8..3fc5f68611 100644
--- a/xen/common/wait.c
+++ b/xen/common/wait.c
@@ -34,8 +34,6 @@ struct waitqueue_vcpu {
      */
     void *esp;
     char *stack;
-    cpumask_t saved_affinity;
-    unsigned int wakeup_cpu;
 #endif
 };
 
@@ -131,12 +129,10 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
     ASSERT(wqv->esp == 0);
 
     /* Save current VCPU affinity; force wakeup on *this* CPU only. */
-    wqv->wakeup_cpu = smp_processor_id();
-    cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
-    if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
+    if ( vcpu_temporary_affinity(curr, smp_processor_id(), VCPU_AFFINITY_WAIT) )
     {
         gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
-        domain_crash(current->domain);
+        domain_crash(curr->domain);
 
         for ( ; ; )
             do_softirq();
@@ -170,7 +166,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
     if ( unlikely(wqv->esp == 0) )
     {
         gdprintk(XENLOG_ERR, "Stack too large in %s\n", __func__);
-        domain_crash(current->domain);
+        domain_crash(curr->domain);
 
         for ( ; ; )
             do_softirq();
@@ -182,30 +178,24 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
 static void __finish_wait(struct waitqueue_vcpu *wqv)
 {
     wqv->esp = NULL;
-    (void)vcpu_set_hard_affinity(current, &wqv->saved_affinity);
+    vcpu_temporary_affinity(current, NR_CPUS, VCPU_AFFINITY_WAIT);
 }
 
 void check_wakeup_from_wait(void)
 {
-    struct waitqueue_vcpu *wqv = current->waitqueue_vcpu;
+    struct vcpu *curr = current;
+    struct waitqueue_vcpu *wqv = curr->waitqueue_vcpu;
 
     ASSERT(list_empty(&wqv->list));
 
     if ( likely(wqv->esp == NULL) )
         return;
 
-    /* Check if we woke up on the wrong CPU. */
-    if ( unlikely(smp_processor_id() != wqv->wakeup_cpu) )
+    /* Check if we are still pinned. */
+    if ( unlikely(!(curr->affinity_broken & VCPU_AFFINITY_WAIT)) )
     {
-        /* Re-set VCPU affinity and re-enter the scheduler. */
-        struct vcpu *curr = current;
-        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
-        if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
-        {
-            gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
-            domain_crash(current->domain);
-        }
-        wait(); /* takes us back into the scheduler */
+        gdprintk(XENLOG_ERR, "vcpu affinity lost\n");
+        domain_crash(curr->domain);
     }
 
     /*
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c197e93d73..9578628c6a 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -200,7 +200,9 @@ struct vcpu
     /* VCPU is paused following shutdown request (d->is_shutting_down)? */
     bool             paused_for_shutdown;
     /* VCPU need affinity restored */
-    bool             affinity_broken;
+    uint8_t          affinity_broken;
+#define VCPU_AFFINITY_OVERRIDE    0x01
+#define VCPU_AFFINITY_WAIT        0x02
 
     /* A hypercall has been preempted. */
     bool             hcall_preempted;
@@ -245,7 +247,7 @@ struct vcpu
 
     /* Bitmask of CPUs on which this VCPU may run. */
     cpumask_var_t    cpu_hard_affinity;
-    /* Used to restore affinity across S3. */
+    /* Used to save affinity during temporary pinning. */
     cpumask_var_t    cpu_hard_affinity_saved;
 
     /* Bitmask of CPUs on which this VCPU prefers to run. */
@@ -873,10 +875,10 @@ int cpu_disable_scheduler(unsigned int cpu);
 /* We need it in dom0_setup_vcpu */
 void sched_set_affinity(struct vcpu *v, const cpumask_t *hard,
                         const cpumask_t *soft);
+int vcpu_temporary_affinity(struct vcpu *v, unsigned int cpu, uint8_t reason);
 int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity);
 int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity);
 void restore_vcpu_affinity(struct domain *d);
-int vcpu_pin_override(struct vcpu *v, int cpu);
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate);
 uint64_t get_cpu_idle_time(unsigned int cpu);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] [PATCH v2 1/2] xen/x86: cleanup unused NMI/MCE code
  2019-07-23 18:25 ` [Xen-devel] [PATCH v2 1/2] xen/x86: cleanup unused NMI/MCE code Juergen Gross
@ 2019-07-23 18:48   ` Andrew Cooper
  2019-07-24  5:06     ` Juergen Gross
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2019-07-23 18:48 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Julien Grall,
	Jan Beulich, Roger Pau Monné

On 23/07/2019 19:25, Juergen Gross wrote:
> pv_raise_interrupt() is only called for NMIs these days, so the MCE
> specific part can be removed. Rename pv_raise_interrupt() to
> pv_raise_nmi() and NMI_MCE_SOFTIRQ to NMI_SOFTIRQ.

For posterity, it would be helpful to explicitly identify 355b0469a8
which introduced NMI and MCE pinning (where previously there was no NMI
pinning beforehand AFAICT), and then 3a91769d6e which removed the MCE
pinning.

Stated like that, I doubt the NMI pinning was ever relevant in practice.

>
> Additionally there is no need to pin the vcpu the NMI is delivered
> to, that is a leftover of (already removed) MCE handling. So remove
> the pinning, too.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Everything LGTM.  A few trivial notes.

> diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
> index 1740784ff2..9436c80047 100644
> --- a/xen/arch/x86/pv/traps.c
> +++ b/xen/arch/x86/pv/traps.c
> @@ -136,47 +136,21 @@ bool set_guest_nmi_trapbounce(void)
>      return !null_trap_bounce(curr, tb);
>  }
>  
> -struct softirq_trap {
> -    struct domain *domain;   /* domain to inject trap */
> -    struct vcpu *vcpu;       /* vcpu to inject trap */
> -    unsigned int processor;  /* physical cpu to inject trap */
> -};
> +static DEFINE_PER_CPU(struct vcpu *, softirq_nmi_vcpu);
>  
> -static DEFINE_PER_CPU(struct softirq_trap, softirq_trap);
> -
> -static void nmi_mce_softirq(void)
> +static void nmi_softirq(void)
>  {
>      unsigned int cpu = smp_processor_id();
> -    struct softirq_trap *st = &per_cpu(softirq_trap, cpu);
> -
> -    BUG_ON(st->vcpu == NULL);
> -
> -    /*
> -     * Set the tmp value unconditionally, so that the check in the iret
> -     * hypercall works.
> -     */
> -    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
> -                 st->vcpu->cpu_hard_affinity);
> +    struct vcpu **v_ptr = &per_cpu(softirq_nmi_vcpu, cpu);

There is only a single use of 'cpu' here, so you can drop that and use
this_cpu(softirq_nmi_vcpu) instead.

> diff --git a/xen/include/asm-x86/pv/traps.h b/xen/include/asm-x86/pv/traps.h
> index fcc75f5e9a..47d6cf5fc4 100644
> --- a/xen/include/asm-x86/pv/traps.h
> +++ b/xen/include/asm-x86/pv/traps.h
> @@ -27,8 +27,8 @@
>  
>  void pv_trap_init(void);
>  
> -/* Deliver interrupt to PV guest. Return 0 on success. */
> -int pv_raise_interrupt(struct vcpu *v, uint8_t vector);
> +/* Deliver NMI to PV guest. Return 0 on success. */
> +int pv_raise_nmi(struct vcpu *v);
>  
>  int pv_emulate_privileged_op(struct cpu_user_regs *regs);
>  void pv_emulate_gate_op(struct cpu_user_regs *regs);
> @@ -46,8 +46,8 @@ static inline bool pv_trap_callback_registered(const struct vcpu *v,
>  
>  static inline void pv_trap_init(void) {}
>  
> -/* Deliver interrupt to PV guest. Return 0 on success. */
> -static inline int pv_raise_interrupt(struct vcpu *v, uint8_t vector) { return -EOPNOTSUPP; }
> +/* Deliver NMI to PV guest. Return 0 on success. */
> +static inline int pv_raise_nmi(struct vcpu *v) { return -EOPNOTSUPP; }

I don't think duplicating the function description here is useful. 
Instead, I'd recommend dropping these lines, and commenting it once in
pv/traps.c.  That should include the fact that it is expected to be used
NMI context, which means its not safe to use printk() etc in there.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] [PATCH v2 2/2] xen: merge temporary vcpu pinning scenarios
  2019-07-23 18:25 ` [Xen-devel] [PATCH v2 2/2] xen: merge temporary vcpu pinning scenarios Juergen Gross
@ 2019-07-23 18:53   ` Andrew Cooper
  2019-07-24  5:06     ` Juergen Gross
  2019-07-24 10:07   ` Jan Beulich
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2019-07-23 18:53 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Dario Faggioli,
	Julien Grall, Jan Beulich

On 23/07/2019 19:25, Juergen Gross wrote:
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 349f9624f5..508176a142 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1106,43 +1106,59 @@ void watchdog_domain_destroy(struct domain *d)
>          kill_timer(&d->watchdog_timer[i]);
>  }
>  
> -int vcpu_pin_override(struct vcpu *v, int cpu)
> +/*
> + * Pin a vcpu temporarily to a specific CPU (or restore old pinning state if
> + * cpu is NR_CPUS).
> + * Temporary pinning can be done due to two reasons, which may be nested:
> + * - VCPU_AFFINITY_OVERRIDE (requested by guest): is allowed to fail in case
> + *   of a conflict (e.g. in case cpupool doesn't include requested CPU, or
> + *   another conflicting temporary pinning is already in effect.
> + * - VCPU_AFFINITY_WAIT (called by wait_event(): only used to pin vcpu to the

Need an extra )

Otherwise, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] [PATCH v2 1/2] xen/x86: cleanup unused NMI/MCE code
  2019-07-23 18:48   ` Andrew Cooper
@ 2019-07-24  5:06     ` Juergen Gross
  0 siblings, 0 replies; 9+ messages in thread
From: Juergen Gross @ 2019-07-24  5:06 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, TimDeegan, Ian Jackson, Julien Grall, Jan Beulich,
	Roger Pau Monné

On 23.07.19 20:48, Andrew Cooper wrote:
> On 23/07/2019 19:25, Juergen Gross wrote:
>> pv_raise_interrupt() is only called for NMIs these days, so the MCE
>> specific part can be removed. Rename pv_raise_interrupt() to
>> pv_raise_nmi() and NMI_MCE_SOFTIRQ to NMI_SOFTIRQ.
> 
> For posterity, it would be helpful to explicitly identify 355b0469a8
> which introduced NMI and MCE pinning (where previously there was no NMI
> pinning beforehand AFAICT), and then 3a91769d6e which removed the MCE
> pinning.

Okay.

> 
> Stated like that, I doubt the NMI pinning was ever relevant in practice.

Indeed.

> 
>>
>> Additionally there is no need to pin the vcpu the NMI is delivered
>> to, that is a leftover of (already removed) MCE handling. So remove
>> the pinning, too.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Everything LGTM.  A few trivial notes.
> 
>> diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
>> index 1740784ff2..9436c80047 100644
>> --- a/xen/arch/x86/pv/traps.c
>> +++ b/xen/arch/x86/pv/traps.c
>> @@ -136,47 +136,21 @@ bool set_guest_nmi_trapbounce(void)
>>       return !null_trap_bounce(curr, tb);
>>   }
>>   
>> -struct softirq_trap {
>> -    struct domain *domain;   /* domain to inject trap */
>> -    struct vcpu *vcpu;       /* vcpu to inject trap */
>> -    unsigned int processor;  /* physical cpu to inject trap */
>> -};
>> +static DEFINE_PER_CPU(struct vcpu *, softirq_nmi_vcpu);
>>   
>> -static DEFINE_PER_CPU(struct softirq_trap, softirq_trap);
>> -
>> -static void nmi_mce_softirq(void)
>> +static void nmi_softirq(void)
>>   {
>>       unsigned int cpu = smp_processor_id();
>> -    struct softirq_trap *st = &per_cpu(softirq_trap, cpu);
>> -
>> -    BUG_ON(st->vcpu == NULL);
>> -
>> -    /*
>> -     * Set the tmp value unconditionally, so that the check in the iret
>> -     * hypercall works.
>> -     */
>> -    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
>> -                 st->vcpu->cpu_hard_affinity);
>> +    struct vcpu **v_ptr = &per_cpu(softirq_nmi_vcpu, cpu);
> 
> There is only a single use of 'cpu' here, so you can drop that and use
> this_cpu(softirq_nmi_vcpu) instead.

Okay.

> 
>> diff --git a/xen/include/asm-x86/pv/traps.h b/xen/include/asm-x86/pv/traps.h
>> index fcc75f5e9a..47d6cf5fc4 100644
>> --- a/xen/include/asm-x86/pv/traps.h
>> +++ b/xen/include/asm-x86/pv/traps.h
>> @@ -27,8 +27,8 @@
>>   
>>   void pv_trap_init(void);
>>   
>> -/* Deliver interrupt to PV guest. Return 0 on success. */
>> -int pv_raise_interrupt(struct vcpu *v, uint8_t vector);
>> +/* Deliver NMI to PV guest. Return 0 on success. */
>> +int pv_raise_nmi(struct vcpu *v);
>>   
>>   int pv_emulate_privileged_op(struct cpu_user_regs *regs);
>>   void pv_emulate_gate_op(struct cpu_user_regs *regs);
>> @@ -46,8 +46,8 @@ static inline bool pv_trap_callback_registered(const struct vcpu *v,
>>   
>>   static inline void pv_trap_init(void) {}
>>   
>> -/* Deliver interrupt to PV guest. Return 0 on success. */
>> -static inline int pv_raise_interrupt(struct vcpu *v, uint8_t vector) { return -EOPNOTSUPP; }
>> +/* Deliver NMI to PV guest. Return 0 on success. */
>> +static inline int pv_raise_nmi(struct vcpu *v) { return -EOPNOTSUPP; }
> 
> I don't think duplicating the function description here is useful.
> Instead, I'd recommend dropping these lines, and commenting it once in
> pv/traps.c.  That should include the fact that it is expected to be used
> NMI context, which means its not safe to use printk() etc in there.

Will do.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] [PATCH v2 2/2] xen: merge temporary vcpu pinning scenarios
  2019-07-23 18:53   ` Andrew Cooper
@ 2019-07-24  5:06     ` Juergen Gross
  0 siblings, 0 replies; 9+ messages in thread
From: Juergen Gross @ 2019-07-24  5:06 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Stefano Stabellini, WeiLiu, Konrad Rzeszutek Wilk, George Dunlap,
	Tim Deegan, Ian Jackson, Dario Faggioli, Julien Grall,
	Jan Beulich

On 23.07.19 20:53, Andrew Cooper wrote:
> On 23/07/2019 19:25, Juergen Gross wrote:
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index 349f9624f5..508176a142 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -1106,43 +1106,59 @@ void watchdog_domain_destroy(struct domain *d)
>>           kill_timer(&d->watchdog_timer[i]);
>>   }
>>   
>> -int vcpu_pin_override(struct vcpu *v, int cpu)
>> +/*
>> + * Pin a vcpu temporarily to a specific CPU (or restore old pinning state if
>> + * cpu is NR_CPUS).
>> + * Temporary pinning can be done due to two reasons, which may be nested:
>> + * - VCPU_AFFINITY_OVERRIDE (requested by guest): is allowed to fail in case
>> + *   of a conflict (e.g. in case cpupool doesn't include requested CPU, or
>> + *   another conflicting temporary pinning is already in effect.
>> + * - VCPU_AFFINITY_WAIT (called by wait_event(): only used to pin vcpu to the
> 
> Need an extra )

Yes.

> 
> Otherwise, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 

Thanks,

Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] [PATCH v2 2/2] xen: merge temporary vcpu pinning scenarios
  2019-07-23 18:25 ` [Xen-devel] [PATCH v2 2/2] xen: merge temporary vcpu pinning scenarios Juergen Gross
  2019-07-23 18:53   ` Andrew Cooper
@ 2019-07-24 10:07   ` Jan Beulich
  2019-07-24 11:21     ` Juergen Gross
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2019-07-24 10:07 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel

On 23.07.2019 20:25, Juergen Gross wrote:
> Today there are two scenarios which are pinning vcpus temporarily to
> a single physical cpu:
> 
> - wait_event() handling
> - vcpu_pin_override() handling
> 
> Each of those cases are handled independently today using their own
> temporary cpumask to save the old affinity settings.
> 
> The two cases can be combined as the first case will only pin a vcpu to
> the physical cpu it is already running on, while vcpu_pin_override() is
> allowed to fail.
> 
> So merge the two temporary pinning scenarios by only using one cpumask
> and a per-vcpu bitmask for specifying which of the scenarios is
> currently active (they are allowed to nest).

Hmm, "nest" to me means LIFO-like behavior, but the logic is more relaxed
afaict.

> @@ -1267,7 +1284,8 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( copy_from_guest(&sched_pin_override, arg, 1) )
>              break;
>   
> -        ret = vcpu_pin_override(current, sched_pin_override.pcpu);
> +        cpu = sched_pin_override.pcpu < 0 ? NR_CPUS : sched_pin_override.pcpu;

I don't think you mean the caller to achieve the same effect by both
passing in a negative value or NR_CPUS - it should remain to be just
negative values which clear the override.

Everything else looks fine to me, thanks.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] [PATCH v2 2/2] xen: merge temporary vcpu pinning scenarios
  2019-07-24 10:07   ` Jan Beulich
@ 2019-07-24 11:21     ` Juergen Gross
  0 siblings, 0 replies; 9+ messages in thread
From: Juergen Gross @ 2019-07-24 11:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel

On 24.07.19 12:07, Jan Beulich wrote:
> On 23.07.2019 20:25, Juergen Gross wrote:
>> Today there are two scenarios which are pinning vcpus temporarily to
>> a single physical cpu:
>>
>> - wait_event() handling
>> - vcpu_pin_override() handling
>>
>> Each of those cases are handled independently today using their own
>> temporary cpumask to save the old affinity settings.
>>
>> The two cases can be combined as the first case will only pin a vcpu to
>> the physical cpu it is already running on, while vcpu_pin_override() is
>> allowed to fail.
>>
>> So merge the two temporary pinning scenarios by only using one cpumask
>> and a per-vcpu bitmask for specifying which of the scenarios is
>> currently active (they are allowed to nest).
> 
> Hmm, "nest" to me means LIFO-like behavior, but the logic is more relaxed
> afaict.

Okay, will rephrase.

> 
>> @@ -1267,7 +1284,8 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>           if ( copy_from_guest(&sched_pin_override, arg, 1) )
>>               break;
>>    
>> -        ret = vcpu_pin_override(current, sched_pin_override.pcpu);
>> +        cpu = sched_pin_override.pcpu < 0 ? NR_CPUS : sched_pin_override.pcpu;
> 
> I don't think you mean the caller to achieve the same effect by both
> passing in a negative value or NR_CPUS - it should remain to be just
> negative values which clear the override.

Okay.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-07-24 11:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-23 18:25 [Xen-devel] [PATCH v2 0/2] xen: enhance temporary vcpu pinning Juergen Gross
2019-07-23 18:25 ` [Xen-devel] [PATCH v2 1/2] xen/x86: cleanup unused NMI/MCE code Juergen Gross
2019-07-23 18:48   ` Andrew Cooper
2019-07-24  5:06     ` Juergen Gross
2019-07-23 18:25 ` [Xen-devel] [PATCH v2 2/2] xen: merge temporary vcpu pinning scenarios Juergen Gross
2019-07-23 18:53   ` Andrew Cooper
2019-07-24  5:06     ` Juergen Gross
2019-07-24 10:07   ` Jan Beulich
2019-07-24 11:21     ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.