All of lore.kernel.org
 help / color / mirror / Atom feed
* [v7 PATCH 00/10] Implement vcpu soft affinity for credit1
@ 2014-06-10  0:44 Dario Faggioli
  2014-06-10  0:44 ` [v7 PATCH 01/10] xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity Dario Faggioli
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10  0:44 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

Hi Everyone,

This is v7 of the soft affinity series. For v6, and for a little history and
quick description of the feature, see here:

 http://lists.xen.org/archives/html/xen-devel/2014-05/msg03272.html

Wrt that, I, mostly:

 - fixed a bug present in patch 3, in case a domain is moved to a cpupool when
   the domain's vcpus have not been allocated yet (i.e., what happens when you
   create a domain not in the default cpupool);

 - introduced patch 5, to bump libxc and libxl SONAMEs;

 - changed the libxl bits (patch 7), as requestd and agreed during review.

All the patches have the acks they need from the relevant people, with the
exceptions of:
 - patch 5, that is new (tools maintainer's Ack required)
 - patch 9,8,7 (tools maintainer's Ack required)

A git branch with this version of the series can be found here:

 git://xenbits.xen.org/people/dariof/xen.git numa/per-vcpu-affinity-v7

Let me know if I need to change anything else.

Thanks and Regards,
Dario

---

Dario Faggioli (10):
      xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity
      xen: sched: introduce soft-affinity and use it instead d->node-affinity
      xen: derive NUMA node affinity from hard and soft CPU affinity
      xen/libxc: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
      libxc/libxl: bump library SONAMEs
      libxc: get and set soft and hard affinity
      libxl: get and set soft affinity
      xl: enable getting and setting soft
      xl: enable for specifying soft-affinity in the config file
      libxl: automatic NUMA placement affects soft affinity


 docs/man/xl.cfg.pod.5                |   42 +++--
 docs/man/xl.pod.1                    |   32 +++
 docs/misc/xl-numa-placement.markdown |  162 ++++++++++++------
 tools/libxc/Makefile                 |    2 
 tools/libxc/xc_domain.c              |   72 +++++---
 tools/libxc/xenctrl.h                |   55 ++++++
 tools/libxl/Makefile                 |    2 
 tools/libxl/libxl.c                  |   97 +++++++++--
 tools/libxl/libxl.h                  |   26 +++
 tools/libxl/libxl_create.c           |    6 +
 tools/libxl/libxl_dom.c              |   23 ++
 tools/libxl/libxl_types.idl          |    4 
 tools/libxl/libxl_utils.h            |   25 +++
 tools/libxl/xl_cmdimpl.c             |  313 ++++++++++++++++++++++------------
 tools/libxl/xl_cmdtable.c            |    2 
 tools/ocaml/libs/xc/xenctrl_stubs.c  |    8 +
 tools/python/xen/lowlevel/xc/xc.c    |    6 -
 xen/arch/x86/traps.c                 |   13 +
 xen/common/domain.c                  |   86 ++++++---
 xen/common/domctl.c                  |  107 +++++++++++-
 xen/common/keyhandler.c              |    4 
 xen/common/sched_credit.c            |  161 +++++++----------
 xen/common/sched_sedf.c              |    2 
 xen/common/schedule.c                |   66 ++++---
 xen/common/wait.c                    |   10 +
 xen/include/public/domctl.h          |   29 +++
 xen/include/xen/sched-if.h           |    2 
 xen/include/xen/sched.h              |   15 +-
 28 files changed, 953 insertions(+), 419 deletions(-)

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [v7 PATCH 01/10] xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity
  2014-06-10  0:44 [v7 PATCH 00/10] Implement vcpu soft affinity for credit1 Dario Faggioli
@ 2014-06-10  0:44 ` Dario Faggioli
  2014-06-10  0:44 ` [v7 PATCH 02/10] xen: sched: introduce soft-affinity and use it instead d->node-affinity Dario Faggioli
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10  0:44 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

in order to distinguish it from the cpu_soft_affinity which will
be introduced a later commit ("xen: sched: introduce soft-affinity
and use it instead d->node-affinity").

This patch does not imply any functional change, it is basically
the result of something like the following:

 s/cpu_affinity/cpu_hard_affinity/g
 s/cpu_affinity_tmp/cpu_hard_affinity_tmp/g
 s/cpu_affinity_saved/cpu_hard_affinity_saved/g

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes from v2:
 * patch has been moved one step up in the series.
---
 xen/arch/x86/traps.c      |   11 ++++++-----
 xen/common/domain.c       |   22 +++++++++++-----------
 xen/common/domctl.c       |    2 +-
 xen/common/keyhandler.c   |    2 +-
 xen/common/sched_credit.c |   12 ++++++------
 xen/common/sched_sedf.c   |    2 +-
 xen/common/schedule.c     |   21 +++++++++++----------
 xen/common/wait.c         |    4 ++--
 xen/include/xen/sched.h   |    8 ++++----
 9 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 8161585..3883f68 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -3173,7 +3173,8 @@ static void nmi_mce_softirq(void)
 
     /* Set the tmp value unconditionally, so that
      * the check in the iret hypercall works. */
-    cpumask_copy(st->vcpu->cpu_affinity_tmp, st->vcpu->cpu_affinity);
+    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
+                 st->vcpu->cpu_hard_affinity);
 
     if ((cpu != st->processor)
        || (st->processor != st->vcpu->processor))
@@ -3208,11 +3209,11 @@ void async_exception_cleanup(struct vcpu *curr)
         return;
 
     /* Restore affinity.  */
-    if ( !cpumask_empty(curr->cpu_affinity_tmp) &&
-         !cpumask_equal(curr->cpu_affinity_tmp, curr->cpu_affinity) )
+    if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) &&
+         !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) )
     {
-        vcpu_set_affinity(curr, curr->cpu_affinity_tmp);
-        cpumask_clear(curr->cpu_affinity_tmp);
+        vcpu_set_affinity(curr, curr->cpu_hard_affinity_tmp);
+        cpumask_clear(curr->cpu_hard_affinity_tmp);
     }
 
     if ( !(curr->async_exception_mask & (curr->async_exception_mask - 1)) )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index bc57174..141a5dc 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -125,9 +125,9 @@ struct vcpu *alloc_vcpu(
 
     tasklet_init(&v->continue_hypercall_tasklet, NULL, 0);
 
-    if ( !zalloc_cpumask_var(&v->cpu_affinity) ||
-         !zalloc_cpumask_var(&v->cpu_affinity_tmp) ||
-         !zalloc_cpumask_var(&v->cpu_affinity_saved) ||
+    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
+         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
+         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
          !zalloc_cpumask_var(&v->vcpu_dirty_cpumask) )
         goto fail_free;
 
@@ -156,9 +156,9 @@ struct vcpu *alloc_vcpu(
  fail_wq:
         destroy_waitqueue_vcpu(v);
  fail_free:
-        free_cpumask_var(v->cpu_affinity);
-        free_cpumask_var(v->cpu_affinity_tmp);
-        free_cpumask_var(v->cpu_affinity_saved);
+        free_cpumask_var(v->cpu_hard_affinity);
+        free_cpumask_var(v->cpu_hard_affinity_tmp);
+        free_cpumask_var(v->cpu_hard_affinity_saved);
         free_cpumask_var(v->vcpu_dirty_cpumask);
         free_vcpu_struct(v);
         return NULL;
@@ -427,7 +427,7 @@ void domain_update_node_affinity(struct domain *d)
 
     for_each_vcpu ( d, v )
     {
-        cpumask_and(online_affinity, v->cpu_affinity, online);
+        cpumask_and(online_affinity, v->cpu_hard_affinity, online);
         cpumask_or(cpumask, cpumask, online_affinity);
     }
 
@@ -792,9 +792,9 @@ static void complete_domain_destroy(struct rcu_head *head)
     for ( i = d->max_vcpus - 1; i >= 0; i-- )
         if ( (v = d->vcpu[i]) != NULL )
         {
-            free_cpumask_var(v->cpu_affinity);
-            free_cpumask_var(v->cpu_affinity_tmp);
-            free_cpumask_var(v->cpu_affinity_saved);
+            free_cpumask_var(v->cpu_hard_affinity);
+            free_cpumask_var(v->cpu_hard_affinity_tmp);
+            free_cpumask_var(v->cpu_hard_affinity_saved);
             free_cpumask_var(v->vcpu_dirty_cpumask);
             free_vcpu_struct(v);
         }
@@ -934,7 +934,7 @@ int vcpu_reset(struct vcpu *v)
     v->async_exception_mask = 0;
     memset(v->async_exception_state, 0, sizeof(v->async_exception_state));
 #endif
-    cpumask_clear(v->cpu_affinity_tmp);
+    cpumask_clear(v->cpu_hard_affinity_tmp);
     clear_bit(_VPF_blocked, &v->pause_flags);
     clear_bit(_VPF_in_reset, &v->pause_flags);
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 4774277..b5c5c6c 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -625,7 +625,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         else
         {
             ret = cpumask_to_xenctl_bitmap(
-                &op->u.vcpuaffinity.cpumap, v->cpu_affinity);
+                &op->u.vcpuaffinity.cpumap, v->cpu_hard_affinity);
         }
     }
     break;
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 5afcfef..d6eb026 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -295,7 +295,7 @@ static void dump_domains(unsigned char key)
                    !vcpu_event_delivery_is_enabled(v));
             cpuset_print(tmpstr, sizeof(tmpstr), v->vcpu_dirty_cpumask);
             printk("dirty_cpus=%s ", tmpstr);
-            cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_affinity);
+            cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_hard_affinity);
             printk("cpu_affinity=%s\n", tmpstr);
             printk("    pause_count=%d pause_flags=%lx\n",
                    atomic_read(&v->pause_count), v->pause_flags);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index db5512e..c6a2560 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -332,13 +332,13 @@ csched_balance_cpumask(const struct vcpu *vc, int step, cpumask_t *mask)
     if ( step == CSCHED_BALANCE_NODE_AFFINITY )
     {
         cpumask_and(mask, CSCHED_DOM(vc->domain)->node_affinity_cpumask,
-                    vc->cpu_affinity);
+                    vc->cpu_hard_affinity);
 
         if ( unlikely(cpumask_empty(mask)) )
-            cpumask_copy(mask, vc->cpu_affinity);
+            cpumask_copy(mask, vc->cpu_hard_affinity);
     }
     else /* step == CSCHED_BALANCE_CPU_AFFINITY */
-        cpumask_copy(mask, vc->cpu_affinity);
+        cpumask_copy(mask, vc->cpu_hard_affinity);
 }
 
 static void burn_credits(struct csched_vcpu *svc, s_time_t now)
@@ -407,7 +407,7 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
 
             if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY
                  && !__vcpu_has_node_affinity(new->vcpu,
-                                              new->vcpu->cpu_affinity) )
+                                              new->vcpu->cpu_hard_affinity) )
                 continue;
 
             /* Are there idlers suitable for new (for this balance step)? */
@@ -642,7 +642,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
 
     /* Store in cpus the mask of online cpus on which the domain can run */
     online = cpupool_scheduler_cpumask(vc->domain->cpupool);
-    cpumask_and(&cpus, vc->cpu_affinity, online);
+    cpumask_and(&cpus, vc->cpu_hard_affinity, online);
 
     for_each_csched_balance_step( balance_step )
     {
@@ -1498,7 +1498,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
              * or counter.
              */
             if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY
-                 && !__vcpu_has_node_affinity(vc, vc->cpu_affinity) )
+                 && !__vcpu_has_node_affinity(vc, vc->cpu_hard_affinity) )
                 continue;
 
             csched_balance_cpumask(vc, balance_step, csched_balance_mask);
diff --git a/xen/common/sched_sedf.c b/xen/common/sched_sedf.c
index 0c9011a..7c80bad 100644
--- a/xen/common/sched_sedf.c
+++ b/xen/common/sched_sedf.c
@@ -384,7 +384,7 @@ static int sedf_pick_cpu(const struct scheduler *ops, struct vcpu *v)
     cpumask_t *online;
 
     online = cpupool_scheduler_cpumask(v->domain->cpupool);
-    cpumask_and(&online_affinity, v->cpu_affinity, online);
+    cpumask_and(&online_affinity, v->cpu_hard_affinity, online);
     return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) - 1,
                          &online_affinity);
 }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c174c41..4c633da 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -194,9 +194,9 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
      */
     v->processor = processor;
     if ( is_idle_domain(d) || d->is_pinned )
-        cpumask_copy(v->cpu_affinity, cpumask_of(processor));
+        cpumask_copy(v->cpu_hard_affinity, cpumask_of(processor));
     else
-        cpumask_setall(v->cpu_affinity);
+        cpumask_setall(v->cpu_hard_affinity);
 
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
@@ -285,7 +285,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         migrate_timer(&v->singleshot_timer, new_p);
         migrate_timer(&v->poll_timer, new_p);
 
-        cpumask_setall(v->cpu_affinity);
+        cpumask_setall(v->cpu_hard_affinity);
 
         lock = vcpu_schedule_lock_irq(v);
         v->processor = new_p;
@@ -457,7 +457,7 @@ static void vcpu_migrate(struct vcpu *v)
              */
             if ( pick_called &&
                  (new_lock == per_cpu(schedule_data, new_cpu).schedule_lock) &&
-                 cpumask_test_cpu(new_cpu, v->cpu_affinity) &&
+                 cpumask_test_cpu(new_cpu, v->cpu_hard_affinity) &&
                  cpumask_test_cpu(new_cpu, v->domain->cpupool->cpu_valid) )
                 break;
 
@@ -560,7 +560,7 @@ void restore_vcpu_affinity(struct domain *d)
         if ( v->affinity_broken )
         {
             printk(XENLOG_DEBUG "Restoring affinity for %pv\n", v);
-            cpumask_copy(v->cpu_affinity, v->cpu_affinity_saved);
+            cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
             v->affinity_broken = 0;
         }
 
@@ -603,19 +603,20 @@ int cpu_disable_scheduler(unsigned int cpu)
             unsigned long flags;
             spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
 
-            cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid);
+            cpumask_and(&online_affinity, v->cpu_hard_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
-                 cpumask_test_cpu(cpu, v->cpu_affinity) )
+                 cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
             {
                 printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
 
                 if (system_state == SYS_STATE_suspend)
                 {
-                    cpumask_copy(v->cpu_affinity_saved, v->cpu_affinity);
+                    cpumask_copy(v->cpu_hard_affinity_saved,
+                                 v->cpu_hard_affinity);
                     v->affinity_broken = 1;
                 }
 
-                cpumask_setall(v->cpu_affinity);
+                cpumask_setall(v->cpu_hard_affinity);
             }
 
             if ( v->processor == cpu )
@@ -663,7 +664,7 @@ int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity)
 
     lock = vcpu_schedule_lock_irq(v);
 
-    cpumask_copy(v->cpu_affinity, affinity);
+    cpumask_copy(v->cpu_hard_affinity, affinity);
 
     /* Always ask the scheduler to re-evaluate placement
      * when changing the affinity */
diff --git a/xen/common/wait.c b/xen/common/wait.c
index 3c9366c..3f6ff41 100644
--- a/xen/common/wait.c
+++ b/xen/common/wait.c
@@ -134,7 +134,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
 
     /* Save current VCPU affinity; force wakeup on *this* CPU only. */
     wqv->wakeup_cpu = smp_processor_id();
-    cpumask_copy(&wqv->saved_affinity, curr->cpu_affinity);
+    cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
     if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
     {
         gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
@@ -183,7 +183,7 @@ void check_wakeup_from_wait(void)
     {
         /* Re-set VCPU affinity and re-enter the scheduler. */
         struct vcpu *curr = current;
-        cpumask_copy(&wqv->saved_affinity, curr->cpu_affinity);
+        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
         if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
         {
             gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 44851ae..6f91abd 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -219,11 +219,11 @@ struct vcpu
     spinlock_t       virq_lock;
 
     /* Bitmask of CPUs on which this VCPU may run. */
-    cpumask_var_t    cpu_affinity;
+    cpumask_var_t    cpu_hard_affinity;
     /* Used to change affinity temporarily. */
-    cpumask_var_t    cpu_affinity_tmp;
+    cpumask_var_t    cpu_hard_affinity_tmp;
     /* Used to restore affinity across S3. */
-    cpumask_var_t    cpu_affinity_saved;
+    cpumask_var_t    cpu_hard_affinity_saved;
 
     /* Bitmask of CPUs which are holding onto this VCPU's state. */
     cpumask_var_t    vcpu_dirty_cpumask;
@@ -819,7 +819,7 @@ void watchdog_domain_destroy(struct domain *d);
 #define has_hvm_container_domain(d) ((d)->guest_type != guest_type_pv)
 #define has_hvm_container_vcpu(v)   (has_hvm_container_domain((v)->domain))
 #define is_pinned_vcpu(v) ((v)->domain->is_pinned || \
-                           cpumask_weight((v)->cpu_affinity) == 1)
+                           cpumask_weight((v)->cpu_hard_affinity) == 1)
 #ifdef HAS_PASSTHROUGH
 #define need_iommu(d)    ((d)->need_iommu)
 #else

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [v7 PATCH 02/10] xen: sched: introduce soft-affinity and use it instead d->node-affinity
  2014-06-10  0:44 [v7 PATCH 00/10] Implement vcpu soft affinity for credit1 Dario Faggioli
  2014-06-10  0:44 ` [v7 PATCH 01/10] xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity Dario Faggioli
@ 2014-06-10  0:44 ` Dario Faggioli
  2014-06-10 11:26   ` George Dunlap
  2014-06-10  0:44 ` [v7 PATCH 03/10] xen: derive NUMA node affinity from hard and soft CPU affinity Dario Faggioli
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10  0:44 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

Before this change, each vcpu had its own vcpu-affinity
(in v->cpu_affinity), representing the set of pcpus where
the vcpu is allowed to run. Since when NUMA-aware scheduling
was introduced the (credit1 only, for now) scheduler also
tries as much as it can to run all the vcpus of a domain
on one of the nodes that constitutes the domain's
node-affinity.

The idea here is making the mechanism more general by:
  * allowing for this 'preference' for some pcpus/nodes to be
    expressed on a per-vcpu basis, instead than for the domain
    as a whole. That is to say, each vcpu should have its own
    set of preferred pcpus/nodes, instead than it being the
    very same for all the vcpus of the domain;
  * generalizing the idea of 'preferred pcpus' to not only NUMA
    awareness and support. That is to say, independently from
    it being or not (mostly) useful on NUMA systems, it should
    be possible to specify, for each vcpu, a set of pcpus where
    it prefers to run (in addition, and possibly unrelated to,
    the set of pcpus where it is allowed to run).

We will be calling this set of *preferred* pcpus the vcpu's
soft affinity, and this changes introduce it, and starts using it
for scheduling, replacing the indirect use of the domain's NUMA
node-affinity. This is more general, as soft affinity does not
have to be related to NUMA. Nevertheless, it allows to achieve the
same results of NUMA-aware scheduling, just by making soft affinity
equal to the domain's node affinity, for all the vCPUs (e.g.,
from the toolstack).

This also means renaming most of the NUMA-aware scheduling related
functions, in credit1, to something more generic, hinting toward
the concept of soft affinity rather than directly to NUMA awareness.

As a side effects, this simplifies the code quit a bit. In fact,
prior to this change, we needed to cache the translation of
d->node_affinity (which is a nodemask_t) to a cpumask_t, since that
is what scheduling decisions require (we used to keep it in
node_affinity_cpumask). This, and all the complicated logic
required to keep it updated, is not necessary any longer.

The high level description of NUMA placement and scheduling in
docs/misc/xl-numa-placement.markdown is being updated too, to match
the new architecture.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes from v6:
 * set_node_affinity completely removed (I mean the field and
   the scheduling abstraction around it), as requested during
   review.

Changes from v2:
 * this patch folds patches 6 ("xen: sched: make space for
   cpu_soft_affinity") and 10 ("xen: sched: use soft-affinity
   instead of domain's node-affinity"), as suggested during
   review. 'Reviewed-by' from George is there since both patch
   6 and 10 had it, and I didn't do anything else than squashing
   them.

Changes from v1:
 * in v1, "7/12 xen: numa-sched: use per-vcpu node-affinity for
   actual scheduling" was doing something very similar to this
   patch.
---
 docs/misc/xl-numa-placement.markdown |  148 ++++++++++++++++++++++-----------
 xen/common/domain.c                  |    5 +
 xen/common/keyhandler.c              |    2 
 xen/common/sched_credit.c            |  153 +++++++++++++---------------------
 xen/common/schedule.c                |    8 +-
 xen/include/xen/sched-if.h           |    2 
 xen/include/xen/sched.h              |    4 +
 7 files changed, 168 insertions(+), 154 deletions(-)

diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown
index caa3fec..9d64eae 100644
--- a/docs/misc/xl-numa-placement.markdown
+++ b/docs/misc/xl-numa-placement.markdown
@@ -12,13 +12,6 @@ is quite more complex and slow. On these machines, a NUMA node is usually
 defined as a set of processor cores (typically a physical CPU package) and
 the memory directly attached to the set of cores.
 
-The Xen hypervisor deals with NUMA machines by assigning to each domain
-a "node affinity", i.e., a set of NUMA nodes of the host from which they
-get their memory allocated. Also, even if the node affinity of a domain
-is allowed to change on-line, it is very important to "place" the domain
-correctly when it is fist created, as the most of its memory is allocated
-at that time and can not (for now) be moved easily.
-
 NUMA awareness becomes very important as soon as many domains start
 running memory-intensive workloads on a shared host. In fact, the cost
 of accessing non node-local memory locations is very high, and the
@@ -27,14 +20,37 @@ performance degradation is likely to be noticeable.
 For more information, have a look at the [Xen NUMA Introduction][numa_intro]
 page on the Wiki.
 
+## Xen and NUMA machines: the concept of _node-affinity_ ##
+
+The Xen hypervisor deals with NUMA machines throughout the concept of
+_node-affinity_. The node-affinity of a domain is the set of NUMA nodes
+of the host where the memory for the domain is being allocated (mostly,
+at domain creation time). This is, at least in principle, different and
+unrelated with the vCPU (hard and soft, see below) scheduling affinity,
+which instead is the set of pCPUs where the vCPU is allowed (or prefers)
+to run.
+
+Of course, despite the fact that they belong to and affect different
+subsystems, the domain node-affinity and the vCPUs affinity are not
+completely independent.
+In fact, if the domain node-affinity is not explicitly specified by the
+user, via the proper libxl calls or xl config item, it will be computed
+basing on the vCPUs' scheduling affinity.
+
+Notice that, even if the node affinity of a domain may change on-line,
+it is very important to "place" the domain correctly when it is fist
+created, as the most of its memory is allocated at that time and can
+not (for now) be moved easily.
+
 ### Placing via pinning and cpupools ###
 
-The simplest way of placing a domain on a NUMA node is statically pinning
-the domain's vCPUs to the pCPUs of the node. This goes under the name of
-CPU affinity and can be set through the "cpus=" option in the config file
-(more about this below). Another option is to pool together the pCPUs
-spanning the node and put the domain in such a cpupool with the "pool="
-config option (as documented in our [Wiki][cpupools_howto]).
+The simplest way of placing a domain on a NUMA node is setting the hard
+scheduling affinity of the domain's vCPUs to the pCPUs of the node. This
+also goes under the name of vCPU pinning, and can be done through the
+"cpus=" option in the config file (more about this below). Another option
+is to pool together the pCPUs spanning the node and put the domain in
+such a _cpupool_ with the "pool=" config option (as documented in our
+[Wiki][cpupools_howto]).
 
 In both the above cases, the domain will not be able to execute outside
 the specified set of pCPUs for any reasons, even if all those pCPUs are
@@ -45,24 +61,45 @@ may come at he cost of some load imbalances.
 
 ### NUMA aware scheduling ###
 
-If the credit scheduler is in use, the concept of node affinity defined
-above does not only apply to memory. In fact, starting from Xen 4.3, the
-scheduler always tries to run the domain's vCPUs on one of the nodes in
-its node affinity. Only if that turns out to be impossible, it will just
-pick any free pCPU.
-
-This is, therefore, something more flexible than CPU affinity, as a domain
-can still run everywhere, it just prefers some nodes rather than others.
-Locality of access is less guaranteed than in the pinning case, but that
-comes along with better chances to exploit all the host resources (e.g.,
-the pCPUs).
-
-In fact, if all the pCPUs in a domain's node affinity are busy, it is
-possible for the domain to run outside of there, but it is very likely that
-slower execution (due to remote memory accesses) is still better than no
-execution at all, as it would happen with pinning. For this reason, NUMA
-aware scheduling has the potential of bringing substantial performances
-benefits, although this will depend on the workload.
+If using the credit1 scheduler, and starting from Xen 4.3, the scheduler
+itself always tries to run the domain's vCPUs on one of the nodes in
+its node-affinity. Only if that turns out to be impossible, it will just
+pick any free pCPU. Locality of access is less guaranteed than in the
+pinning case, but that comes along with better chances to exploit all
+the host resources (e.g., the pCPUs).
+
+Starting from Xen 4.5, credit1 supports two forms of affinity: hard and
+soft, both on a per-vCPU basis. This means each vCPU can have its own
+soft affinity, stating where such vCPU prefers to execute on. This is
+less strict than what it (also starting from 4.5) is called hard affinity,
+as the vCPU can potentially run everywhere, it just prefers some pCPUs
+rather than others.
+In Xen 4.5, therefore, NUMA-aware scheduling is achieved by matching the
+soft affinity of the vCPUs of a domain with its node-affinity.
+
+In fact, as it was for 4.3, if all the pCPUs in a vCPU's soft affinity
+are busy, it is possible for the domain to run outside from there. The
+idea is that slower execution (due to remote memory accesses) is still
+better than no execution at all (as it would happen with pinning). For
+this reason, NUMA aware scheduling has the potential of bringing
+substantial performances benefits, although this will depend on the
+workload.
+
+Notice that, for each vCPU, the following three scenarios are possbile:
+
+  * a vCPU *is pinned* to some pCPUs and *does not have* any soft affinity
+    In this case, the vCPU is always scheduled on one of the pCPUs to which
+    it is pinned, without any specific peference among them.
+  * a vCPU *has* its own soft affinity and *is not* pinned to any particular
+    pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the
+    scheduler will try to have it running on one of the pCPUs in its soft
+    affinity;
+  * a vCPU *has* its own vCPU soft affinity and *is also* pinned to some
+    pCPUs. In this case, the vCPU is always scheduled on one of the pCPUs
+    onto which it is pinned, with, among them, a preference for the ones
+    that also forms its soft affinity. In case pinning and soft affinity
+    form two disjoint sets of pCPUs, pinning "wins", and the soft affinity
+    is just ignored.
 
 ## Guest placement in xl ##
 
@@ -71,25 +108,23 @@ both manual or automatic placement of them across the host's NUMA nodes.
 
 Note that xm/xend does a very similar thing, the only differences being
 the details of the heuristics adopted for automatic placement (see below),
-and the lack of support (in both xm/xend and the Xen versions where that\
+and the lack of support (in both xm/xend and the Xen versions where that
 was the default toolstack) for NUMA aware scheduling.
 
 ### Placing the guest manually ###
 
 Thanks to the "cpus=" option, it is possible to specify where a domain
 should be created and scheduled on, directly in its config file. This
-affects NUMA placement and memory accesses as the hypervisor constructs
-the node affinity of a VM basing right on its CPU affinity when it is
-created.
+affects NUMA placement and memory accesses as, in this case, the
+hypervisor constructs the node-affinity of a VM basing right on its
+vCPU pinning when it is created.
 
 This is very simple and effective, but requires the user/system
-administrator to explicitly specify affinities for each and every domain,
+administrator to explicitly specify the pinning for each and every domain,
 or Xen won't be able to guarantee the locality for their memory accesses.
 
-Notice that this also pins the domain's vCPUs to the specified set of
-pCPUs, so it not only sets the domain's node affinity (its memory will
-come from the nodes to which the pCPUs belong), but at the same time
-forces the vCPUs of the domain to be scheduled on those same pCPUs.
+That, of course, also mean the vCPUs of the domain will only be able to
+execute on those same pCPUs.
 
 ### Placing the guest automatically ###
 
@@ -97,7 +132,9 @@ If no "cpus=" option is specified in the config file, libxl tries
 to figure out on its own on which node(s) the domain could fit best.
 If it finds one (some), the domain's node affinity get set to there,
 and both memory allocations and NUMA aware scheduling (for the credit
-scheduler and starting from Xen 4.3) will comply with it.
+scheduler and starting from Xen 4.3) will comply with it. Starting from
+Xen 4.5, this also means that the mask resulting from this "fitting"
+procedure will become the soft affinity of all the vCPUs of the domain.
 
 It is worthwhile noting that optimally fitting a set of VMs on the NUMA
 nodes of an host is an incarnation of the Bin Packing Problem. In fact,
@@ -142,34 +179,43 @@ any placement from happening:
 
     libxl_defbool_set(&domain_build_info->numa_placement, false);
 
-Also, if `numa_placement` is set to `true`, the domain must not
-have any CPU affinity (i.e., `domain_build_info->cpumap` must
-have all its bits set, as it is by default), or domain creation
-will fail returning `ERROR_INVAL`.
+Also, if `numa_placement` is set to `true`, the domain's vCPUs must
+not be pinned (i.e., `domain_build_info->cpumap` must have all its
+bits set, as it is by default), or domain creation will fail with
+`ERROR_INVAL`.
 
 Starting from Xen 4.3, in case automatic placement happens (and is
-successful), it will affect the domain's node affinity and _not_ its
-CPU affinity. Namely, the domain's vCPUs will not be pinned to any
+successful), it will affect the domain's node-affinity and _not_ its
+vCPU pinning. Namely, the domain's vCPUs will not be pinned to any
 pCPU on the host, but the memory from the domain will come from the
 selected node(s) and the NUMA aware scheduling (if the credit scheduler
-is in use) will try to keep the domain there as much as possible.
+is in use) will try to keep the domain's vCPUs there as much as possible.
 
 Besides than that, looking and/or tweaking the placement algorithm
 search "Automatic NUMA placement" in libxl\_internal.h.
 
 Note this may change in future versions of Xen/libxl.
 
+## Xen < 4.5 ##
+
+The concept of vCPU soft affinity has been introduced for the first time
+in Xen 4.5. In 4.3, it is the domain's node-affinity that drives the
+NUMA-aware scheduler. The main difference is soft affinity is per-vCPU,
+and so each vCPU can have its own mask of pCPUs, while node-affinity is
+per-domain, that is the equivalent of having all the vCPUs with the same
+soft affinity.
+
 ## Xen < 4.3 ##
 
 As NUMA aware scheduling is a new feature of Xen 4.3, things are a little
 bit different for earlier version of Xen. If no "cpus=" option is specified
 and Xen 4.2 is in use, the automatic placement algorithm still runs, but
 the results is used to _pin_ the vCPUs of the domain to the output node(s).
-This is consistent with what was happening with xm/xend, which were also
-affecting the domain's CPU affinity.
+This is consistent with what was happening with xm/xend.
 
 On a version of Xen earlier than 4.2, there is not automatic placement at
-all in xl or libxl, and hence no node or CPU affinity being affected.
+all in xl or libxl, and hence no node-affinity, vCPU affinity or pinning
+being introduced/modified.
 
 ## Limitations ##
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 141a5dc..e20d3bf 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -128,6 +128,7 @@ struct vcpu *alloc_vcpu(
     if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
          !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
          !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
+         !zalloc_cpumask_var(&v->cpu_soft_affinity) ||
          !zalloc_cpumask_var(&v->vcpu_dirty_cpumask) )
         goto fail_free;
 
@@ -159,6 +160,7 @@ struct vcpu *alloc_vcpu(
         free_cpumask_var(v->cpu_hard_affinity);
         free_cpumask_var(v->cpu_hard_affinity_tmp);
         free_cpumask_var(v->cpu_hard_affinity_saved);
+        free_cpumask_var(v->cpu_soft_affinity);
         free_cpumask_var(v->vcpu_dirty_cpumask);
         free_vcpu_struct(v);
         return NULL;
@@ -446,8 +448,6 @@ void domain_update_node_affinity(struct domain *d)
                 node_set(node, d->node_affinity);
     }
 
-    sched_set_node_affinity(d, &d->node_affinity);
-
     spin_unlock(&d->node_affinity_lock);
 
     free_cpumask_var(online_affinity);
@@ -795,6 +795,7 @@ static void complete_domain_destroy(struct rcu_head *head)
             free_cpumask_var(v->cpu_hard_affinity);
             free_cpumask_var(v->cpu_hard_affinity_tmp);
             free_cpumask_var(v->cpu_hard_affinity_saved);
+            free_cpumask_var(v->cpu_soft_affinity);
             free_cpumask_var(v->vcpu_dirty_cpumask);
             free_vcpu_struct(v);
         }
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index d6eb026..809378c 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -297,6 +297,8 @@ static void dump_domains(unsigned char key)
             printk("dirty_cpus=%s ", tmpstr);
             cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_hard_affinity);
             printk("cpu_affinity=%s\n", tmpstr);
+            cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_soft_affinity);
+            printk("cpu_soft_affinity=%s\n", tmpstr);
             printk("    pause_count=%d pause_flags=%lx\n",
                    atomic_read(&v->pause_count), v->pause_flags);
             arch_dump_vcpu_info(v);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index c6a2560..8b02b7b 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -112,10 +112,24 @@
 
 
 /*
- * Node Balancing
+ * Hard and soft affinity load balancing.
+ *
+ * Idea is each vcpu has some pcpus that it prefers, some that it does not
+ * prefer but is OK with, and some that it cannot run on at all. The first
+ * set of pcpus are the ones that are both in the soft affinity *and* in the
+ * hard affinity; the second set of pcpus are the ones that are in the hard
+ * affinity but *not* in the soft affinity; the third set of pcpus are the
+ * ones that are not in the hard affinity.
+ *
+ * We implement a two step balancing logic. Basically, every time there is
+ * the need to decide where to run a vcpu, we first check the soft affinity
+ * (well, actually, the && between soft and hard affinity), to see if we can
+ * send it where it prefers to (and can) run on. However, if the first step
+ * does not find any suitable and free pcpu, we fall back checking the hard
+ * affinity.
  */
-#define CSCHED_BALANCE_NODE_AFFINITY    0
-#define CSCHED_BALANCE_CPU_AFFINITY     1
+#define CSCHED_BALANCE_SOFT_AFFINITY    0
+#define CSCHED_BALANCE_HARD_AFFINITY    1
 
 /*
  * Boot parameters
@@ -138,7 +152,7 @@ struct csched_pcpu {
 
 /*
  * Convenience macro for accessing the per-PCPU cpumask we need for
- * implementing the two steps (vcpu and node affinity) balancing logic.
+ * implementing the two steps (soft and hard affinity) balancing logic.
  * It is stored in csched_pcpu so that serialization is not an issue,
  * as there is a csched_pcpu for each PCPU and we always hold the
  * runqueue spin-lock when using this.
@@ -178,9 +192,6 @@ struct csched_dom {
     struct list_head active_vcpu;
     struct list_head active_sdom_elem;
     struct domain *dom;
-    /* cpumask translated from the domain's node-affinity.
-     * Basically, the CPUs we prefer to be scheduled on. */
-    cpumask_var_t node_affinity_cpumask;
     uint16_t active_vcpu_count;
     uint16_t weight;
     uint16_t cap;
@@ -261,59 +272,28 @@ __runq_remove(struct csched_vcpu *svc)
     list_del_init(&svc->runq_elem);
 }
 
-/*
- * Translates node-affinity mask into a cpumask, so that we can use it during
- * actual scheduling. That of course will contain all the cpus from all the
- * set nodes in the original node-affinity mask.
- *
- * Note that any serialization needed to access mask safely is complete
- * responsibility of the caller of this function/hook.
- */
-static void csched_set_node_affinity(
-    const struct scheduler *ops,
-    struct domain *d,
-    nodemask_t *mask)
-{
-    struct csched_dom *sdom;
-    int node;
-
-    /* Skip idle domain since it doesn't even have a node_affinity_cpumask */
-    if ( unlikely(is_idle_domain(d)) )
-        return;
-
-    sdom = CSCHED_DOM(d);
-    cpumask_clear(sdom->node_affinity_cpumask);
-    for_each_node_mask( node, *mask )
-        cpumask_or(sdom->node_affinity_cpumask, sdom->node_affinity_cpumask,
-                   &node_to_cpumask(node));
-}
 
 #define for_each_csched_balance_step(step) \
-    for ( (step) = 0; (step) <= CSCHED_BALANCE_CPU_AFFINITY; (step)++ )
+    for ( (step) = 0; (step) <= CSCHED_BALANCE_HARD_AFFINITY; (step)++ )
 
 
 /*
- * vcpu-affinity balancing is always necessary and must never be skipped.
- * OTOH, if a domain's node-affinity is said to be automatically computed
- * (or if it just spans all the nodes), we can safely avoid dealing with
- * node-affinity entirely.
+ * Hard affinity balancing is always necessary and must never be skipped.
+ * OTOH, if the vcpu's soft affinity is full (it spans all the possible
+ * pcpus) we can safely avoid dealing with it entirely.
  *
- * Node-affinity is also deemed meaningless in case it has empty
- * intersection with mask, to cover the cases where using the node-affinity
+ * A vcpu's soft affinity is also deemed meaningless in case it has empty
+ * intersection with mask, to cover the cases where using the soft affinity
  * mask seems legit, but would instead led to trying to schedule the vcpu
  * on _no_ pcpu! Typical use cases are for mask to be equal to the vcpu's
- * vcpu-affinity, or to the && of vcpu-affinity and the set of online cpus
+ * hard affinity, or to the && of hard affinity and the set of online cpus
  * in the domain's cpupool.
  */
-static inline int __vcpu_has_node_affinity(const struct vcpu *vc,
+static inline int __vcpu_has_soft_affinity(const struct vcpu *vc,
                                            const cpumask_t *mask)
 {
-    const struct domain *d = vc->domain;
-    const struct csched_dom *sdom = CSCHED_DOM(d);
-
-    if ( d->auto_node_affinity
-         || cpumask_full(sdom->node_affinity_cpumask)
-         || !cpumask_intersects(sdom->node_affinity_cpumask, mask) )
+    if ( cpumask_full(vc->cpu_soft_affinity)
+         || !cpumask_intersects(vc->cpu_soft_affinity, mask) )
         return 0;
 
     return 1;
@@ -321,23 +301,22 @@ static inline int __vcpu_has_node_affinity(const struct vcpu *vc,
 
 /*
  * Each csched-balance step uses its own cpumask. This function determines
- * which one (given the step) and copies it in mask. For the node-affinity
- * balancing step, the pcpus that are not part of vc's vcpu-affinity are
+ * which one (given the step) and copies it in mask. For the soft affinity
+ * balancing step, the pcpus that are not part of vc's hard affinity are
  * filtered out from the result, to avoid running a vcpu where it would
  * like, but is not allowed to!
  */
 static void
 csched_balance_cpumask(const struct vcpu *vc, int step, cpumask_t *mask)
 {
-    if ( step == CSCHED_BALANCE_NODE_AFFINITY )
+    if ( step == CSCHED_BALANCE_SOFT_AFFINITY )
     {
-        cpumask_and(mask, CSCHED_DOM(vc->domain)->node_affinity_cpumask,
-                    vc->cpu_hard_affinity);
+        cpumask_and(mask, vc->cpu_soft_affinity, vc->cpu_hard_affinity);
 
         if ( unlikely(cpumask_empty(mask)) )
             cpumask_copy(mask, vc->cpu_hard_affinity);
     }
-    else /* step == CSCHED_BALANCE_CPU_AFFINITY */
+    else /* step == CSCHED_BALANCE_HARD_AFFINITY */
         cpumask_copy(mask, vc->cpu_hard_affinity);
 }
 
@@ -398,15 +377,15 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
     else if ( !idlers_empty )
     {
         /*
-         * Node and vcpu-affinity balancing loop. For vcpus without
-         * a useful node-affinity, consider vcpu-affinity only.
+         * Soft and hard affinity balancing loop. For vcpus without
+         * a useful soft affinity, consider hard affinity only.
          */
         for_each_csched_balance_step( balance_step )
         {
             int new_idlers_empty;
 
-            if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY
-                 && !__vcpu_has_node_affinity(new->vcpu,
+            if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY
+                 && !__vcpu_has_soft_affinity(new->vcpu,
                                               new->vcpu->cpu_hard_affinity) )
                 continue;
 
@@ -418,11 +397,11 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
 
             /*
              * Let's not be too harsh! If there aren't idlers suitable
-             * for new in its node-affinity mask, make sure we check its
-             * vcpu-affinity as well, before taking final decisions.
+             * for new in its soft affinity mask, make sure we check its
+             * hard affinity as well, before taking final decisions.
              */
             if ( new_idlers_empty
-                 && balance_step == CSCHED_BALANCE_NODE_AFFINITY )
+                 && balance_step == CSCHED_BALANCE_SOFT_AFFINITY )
                 continue;
 
             /*
@@ -649,23 +628,23 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
         /*
          * We want to pick up a pcpu among the ones that are online and
          * can accommodate vc, which is basically what we computed above
-         * and stored in cpus. As far as vcpu-affinity is concerned,
+         * and stored in cpus. As far as hard affinity is concerned,
          * there always will be at least one of these pcpus, hence cpus
          * is never empty and the calls to cpumask_cycle() and
          * cpumask_test_cpu() below are ok.
          *
-         * On the other hand, when considering node-affinity too, it
+         * On the other hand, when considering soft affinity too, it
          * is possible for the mask to become empty (for instance, if the
          * domain has been put in a cpupool that does not contain any of the
-         * nodes in its node-affinity), which would result in the ASSERT()-s
+         * pcpus in its soft affinity), which would result in the ASSERT()-s
          * inside cpumask_*() operations triggering (in debug builds).
          *
-         * Therefore, in this case, we filter the node-affinity mask against
-         * cpus and, if the result is empty, we just skip the node-affinity
+         * Therefore, in this case, we filter the soft affinity mask against
+         * cpus and, if the result is empty, we just skip the soft affinity
          * balancing step all together.
          */
-        if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY
-             && !__vcpu_has_node_affinity(vc, &cpus) )
+        if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY
+             && !__vcpu_has_soft_affinity(vc, &cpus) )
             continue;
 
         /* Pick an online CPU from the proper affinity mask */
@@ -1122,13 +1101,6 @@ csched_alloc_domdata(const struct scheduler *ops, struct domain *dom)
     if ( sdom == NULL )
         return NULL;
 
-    if ( !alloc_cpumask_var(&sdom->node_affinity_cpumask) )
-    {
-        xfree(sdom);
-        return NULL;
-    }
-    cpumask_setall(sdom->node_affinity_cpumask);
-
     /* Initialize credit and weight */
     INIT_LIST_HEAD(&sdom->active_vcpu);
     INIT_LIST_HEAD(&sdom->active_sdom_elem);
@@ -1158,9 +1130,6 @@ csched_dom_init(const struct scheduler *ops, struct domain *dom)
 static void
 csched_free_domdata(const struct scheduler *ops, void *data)
 {
-    struct csched_dom *sdom = data;
-
-    free_cpumask_var(sdom->node_affinity_cpumask);
     xfree(data);
 }
 
@@ -1486,19 +1455,19 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
             BUG_ON( is_idle_vcpu(vc) );
 
             /*
-             * If the vcpu has no useful node-affinity, skip this vcpu.
-             * In fact, what we want is to check if we have any node-affine
-             * work to steal, before starting to look at vcpu-affine work.
+             * If the vcpu has no useful soft affinity, skip this vcpu.
+             * In fact, what we want is to check if we have any "soft-affine
+             * work" to steal, before starting to look at "hard-affine work".
              *
              * Notice that, if not even one vCPU on this runq has a useful
-             * node-affinity, we could have avoid considering this runq for
-             * a node balancing step in the first place. This, for instance,
+             * soft affinity, we could have avoid considering this runq for
+             * a soft balancing step in the first place. This, for instance,
              * can be implemented by taking note of on what runq there are
-             * vCPUs with useful node-affinities in some sort of bitmap
+             * vCPUs with useful soft affinities in some sort of bitmap
              * or counter.
              */
-            if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY
-                 && !__vcpu_has_node_affinity(vc, vc->cpu_hard_affinity) )
+            if ( balance_step == CSCHED_BALANCE_SOFT_AFFINITY
+                 && !__vcpu_has_soft_affinity(vc, vc->cpu_hard_affinity) )
                 continue;
 
             csched_balance_cpumask(vc, balance_step, csched_balance_mask);
@@ -1546,17 +1515,17 @@ csched_load_balance(struct csched_private *prv, int cpu,
         SCHED_STAT_CRANK(load_balance_other);
 
     /*
-     * Let's look around for work to steal, taking both vcpu-affinity
-     * and node-affinity into account. More specifically, we check all
+     * Let's look around for work to steal, taking both hard affinity
+     * and soft affinity into account. More specifically, we check all
      * the non-idle CPUs' runq, looking for:
-     *  1. any node-affine work to steal first,
-     *  2. if not finding anything, any vcpu-affine work to steal.
+     *  1. any "soft-affine work" to steal first,
+     *  2. if not finding anything, any "hard-affine work" to steal.
      */
     for_each_csched_balance_step( bstep )
     {
         /*
          * We peek at the non-idling CPUs in a node-wise fashion. In fact,
-         * it is more likely that we find some node-affine work on our same
+         * it is more likely that we find some affine work on our same
          * node, not to mention that migrating vcpus within the same node
          * could well expected to be cheaper than across-nodes (memory
          * stays local, there might be some node-wide cache[s], etc.).
@@ -1982,8 +1951,6 @@ const struct scheduler sched_credit_def = {
     .adjust         = csched_dom_cntl,
     .adjust_global  = csched_sys_cntl,
 
-    .set_node_affinity  = csched_set_node_affinity,
-
     .pick_cpu       = csched_cpu_pick,
     .do_schedule    = csched_schedule,
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 4c633da..6499954 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -198,6 +198,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     else
         cpumask_setall(v->cpu_hard_affinity);
 
+    cpumask_setall(v->cpu_soft_affinity);
+
     /* Initialise the per-vcpu timers. */
     init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
                v, v->processor);
@@ -286,6 +288,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         migrate_timer(&v->poll_timer, new_p);
 
         cpumask_setall(v->cpu_hard_affinity);
+        cpumask_setall(v->cpu_soft_affinity);
 
         lock = vcpu_schedule_lock_irq(v);
         v->processor = new_p;
@@ -644,11 +647,6 @@ int cpu_disable_scheduler(unsigned int cpu)
     return ret;
 }
 
-void sched_set_node_affinity(struct domain *d, nodemask_t *mask)
-{
-    SCHED_OP(DOM2OP(d), set_node_affinity, d, mask);
-}
-
 int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity)
 {
     cpumask_t online_affinity;
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index d95e254..4164dff 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -158,8 +158,6 @@ struct scheduler {
                                     struct xen_domctl_scheduler_op *);
     int          (*adjust_global)  (const struct scheduler *,
                                     struct xen_sysctl_scheduler_op *);
-    void         (*set_node_affinity) (const struct scheduler *,
-                                       struct domain *, nodemask_t *);
     void         (*dump_settings)  (const struct scheduler *);
     void         (*dump_cpu_state) (const struct scheduler *, int);
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 6f91abd..445b659 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -225,6 +225,9 @@ struct vcpu
     /* Used to restore affinity across S3. */
     cpumask_var_t    cpu_hard_affinity_saved;
 
+    /* Bitmask of CPUs on which this VCPU prefers to run. */
+    cpumask_var_t    cpu_soft_affinity;
+
     /* Bitmask of CPUs which are holding onto this VCPU's state. */
     cpumask_var_t    vcpu_dirty_cpumask;
 
@@ -627,7 +630,6 @@ void sched_destroy_domain(struct domain *d);
 int sched_move_domain(struct domain *d, struct cpupool *c);
 long sched_adjust(struct domain *, struct xen_domctl_scheduler_op *);
 long sched_adjust_global(struct xen_sysctl_scheduler_op *);
-void sched_set_node_affinity(struct domain *, nodemask_t *);
 int  sched_id(void);
 void sched_tick_suspend(void);
 void sched_tick_resume(void);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [v7 PATCH 03/10] xen: derive NUMA node affinity from hard and soft CPU affinity
  2014-06-10  0:44 [v7 PATCH 00/10] Implement vcpu soft affinity for credit1 Dario Faggioli
  2014-06-10  0:44 ` [v7 PATCH 01/10] xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity Dario Faggioli
  2014-06-10  0:44 ` [v7 PATCH 02/10] xen: sched: introduce soft-affinity and use it instead d->node-affinity Dario Faggioli
@ 2014-06-10  0:44 ` Dario Faggioli
  2014-06-10 14:53   ` George Dunlap
  2014-06-10  0:44 ` [v7 PATCH 04/10] xen/libxc: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity Dario Faggioli
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10  0:44 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

if a domain's NUMA node-affinity (which is what controls
memory allocations) is provided by the user/toolstack, it
just is not touched. However, if the user does not say
anything, leaving it all to Xen, let's compute it in the
following way:

 1. cpupool's cpus & hard-affinity & soft-affinity
 2. if (1) is empty: cpupool's cpus & hard-affinity

This guarantees memory to be allocated from the narrowest
possible set of NUMA nodes, ad makes it relatively easy to
set up NUMA-aware scheduling on top of soft affinity.

Note that such 'narrowest set' is guaranteed to be non-empty.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Chenges from v6:
 * fixed a bug when a domain was being created inside a
   cpupool;
 * coding style.

Changes from v3:
 * avoid pointless calls to cpumask_clear(), as requested
   during review;
 * ASSERT() non emptyness of cpupool & hard affinity, as
   suggested during review.

Changes from v2:
 * the loop computing the mask is now only executed when
   it really is useful, as suggested during review;
 * the loop, and all the cpumask handling is optimized,
   in a way similar to what was suggested during review.
---
 xen/common/domain.c   |   61 +++++++++++++++++++++++++++++++------------------
 xen/common/schedule.c |    4 ++-
 2 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index e20d3bf..c3a576e 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -409,17 +409,17 @@ struct domain *domain_create(
 
 void domain_update_node_affinity(struct domain *d)
 {
-    cpumask_var_t cpumask;
-    cpumask_var_t online_affinity;
+    cpumask_var_t dom_cpumask, dom_cpumask_soft;
+    cpumask_t *dom_affinity;
     const cpumask_t *online;
     struct vcpu *v;
-    unsigned int node;
+    unsigned int cpu;
 
-    if ( !zalloc_cpumask_var(&cpumask) )
+    if ( !zalloc_cpumask_var(&dom_cpumask) )
         return;
-    if ( !alloc_cpumask_var(&online_affinity) )
+    if ( !zalloc_cpumask_var(&dom_cpumask_soft) )
     {
-        free_cpumask_var(cpumask);
+        free_cpumask_var(dom_cpumask);
         return;
     }
 
@@ -427,31 +427,48 @@ void domain_update_node_affinity(struct domain *d)
 
     spin_lock(&d->node_affinity_lock);
 
-    for_each_vcpu ( d, v )
-    {
-        cpumask_and(online_affinity, v->cpu_hard_affinity, online);
-        cpumask_or(cpumask, cpumask, online_affinity);
-    }
-
     /*
-     * If d->auto_node_affinity is true, the domain's node-affinity mask
-     * (d->node_affinity) is automaically computed from all the domain's
-     * vcpus' vcpu-affinity masks (the union of which we have just built
-     * above in cpumask). OTOH, if d->auto_node_affinity is false, we
-     * must leave the node-affinity of the domain alone.
+     * If d->auto_node_affinity is true, let's compute the domain's
+     * node-affinity and update d->node_affinity accordingly. if false,
+     * just leave d->auto_node_affinity alone.
      */
     if ( d->auto_node_affinity )
     {
+        /*
+         * We want the narrowest possible set of pcpus (to get the narowest
+         * possible set of nodes). What we need is the cpumask of where the
+         * domain can run (the union of the hard affinity of all its vcpus),
+         * and the full mask of where it would prefer to run (the union of
+         * the soft affinity of all its various vcpus). Let's build them.
+         */
+        for_each_vcpu ( d, v )
+        {
+            cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity);
+            cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
+                       v->cpu_soft_affinity);
+        }
+        /* Filter out non-online cpus */
+        cpumask_and(dom_cpumask, dom_cpumask, online);
+        ASSERT(!cpumask_empty(dom_cpumask));
+        /* And compute the intersection between hard, online and soft */
+        cpumask_and(dom_cpumask_soft, dom_cpumask_soft, dom_cpumask);
+
+        /*
+         * If not empty, the intersection of hard, soft and online is the
+         * narrowest set we want. If empty, we fall back to hard&online.
+         */
+        dom_affinity = cpumask_empty(dom_cpumask_soft) ?
+                           dom_cpumask : dom_cpumask_soft;
+
         nodes_clear(d->node_affinity);
-        for_each_online_node ( node )
-            if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
-                node_set(node, d->node_affinity);
+        for_each_cpu ( cpu, dom_affinity )
+            node_set(cpu_to_node(cpu), d->node_affinity);
     }
 
     spin_unlock(&d->node_affinity_lock);
 
-    free_cpumask_var(online_affinity);
-    free_cpumask_var(cpumask);
+    free_cpumask_var(dom_cpumask_soft);
+    free_cpumask_var(dom_cpumask);
 }
 
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 6499954..5abefa1 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -309,7 +309,9 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         SCHED_OP(old_ops, free_vdata, vcpudata);
     }
 
-    domain_update_node_affinity(d);
+    /* Do we have vcpus already? If not, no need to update node-affinity */
+    if ( d->vcpu )
+        domain_update_node_affinity(d);
 
     domain_unpause(d);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [v7 PATCH 04/10] xen/libxc: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity
  2014-06-10  0:44 [v7 PATCH 00/10] Implement vcpu soft affinity for credit1 Dario Faggioli
                   ` (2 preceding siblings ...)
  2014-06-10  0:44 ` [v7 PATCH 03/10] xen: derive NUMA node affinity from hard and soft CPU affinity Dario Faggioli
@ 2014-06-10  0:44 ` Dario Faggioli
  2014-06-10  0:45 ` [v7 PATCH 05/10] libxc/libxl: bump library SONAMEs Dario Faggioli
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10  0:44 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

by adding a flag for the caller to specify which one he cares about.

At the same time, enable the caller to get back the "effective affinity"
of the vCPU. That is the intersection between cpupool's cpus, the (new)
hard affinity and, for soft affinity, the (new) soft affinity. In fact,
despite what has been successfully set with the DOMCTL_setvcpuaffinity
hypercall, the Xen scheduler will never run a vCPU outside of its hard
affinity or of its domain's cpupool.

This happens by adding another cpumap to the interface and making both
the cpumaps IN/OUT parameters (for DOMCTL_setvcpuaffinity, they're of
course out-only for DOMCTL_getvcpuaffinity).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
Changes from v5:
 * fix range checking the hypercal input cpumap before copying it;
 * fix coding style and factor some common code, as requested during
   review;
 * add a note on the interface in Xen public headers too, as
   requested during review.

Changes since v4
 * make both the cpumaps IN/OUT and use them for reporting back
   both effective hard and soft affinity, as requested during
   review;
 * fix the arguments' names, the comments and the annotations in
   the public header accordingly, as requested during review.

Changes since v3:
 * no longer discarding possible errors. Also, rollback setting
   hard affinity if setting soft affinity fails afterwards, so
   that the caller really sees no changes when the call fails,
   as requested during review;
 * fixed -EFAULT --> -ENOMEM in case of a failed memory allocation,
   as requested during review;
 * removed non necessary use of pointer to pointer, as requested
   during review.

Changes from v2:
 * in DOMCTL_[sg]etvcpuaffinity, flag is really a flag now,
   i.e., we accept request for setting and getting: (1) only
   hard affinity; (2) only soft affinity; (3) both; as
   suggested during review.
---
 tools/libxc/xc_domain.c     |   12 +++--
 xen/arch/x86/traps.c        |    4 +-
 xen/common/domctl.c         |  107 +++++++++++++++++++++++++++++++++++++++----
 xen/common/schedule.c       |   35 ++++++++++----
 xen/common/wait.c           |    6 +-
 xen/include/public/domctl.h |   29 +++++++++++-
 xen/include/xen/sched.h     |    3 +
 7 files changed, 162 insertions(+), 34 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 37ed141..861b471 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -225,13 +225,14 @@ int xc_vcpu_setaffinity(xc_interface *xch,
 
     domctl.cmd = XEN_DOMCTL_setvcpuaffinity;
     domctl.domain = (domid_t)domid;
-    domctl.u.vcpuaffinity.vcpu    = vcpu;
+    domctl.u.vcpuaffinity.vcpu = vcpu;
+    domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD;
 
     memcpy(local, cpumap, cpusize);
 
-    set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local);
+    set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap_hard.bitmap, local);
 
-    domctl.u.vcpuaffinity.cpumap.nr_bits = cpusize * 8;
+    domctl.u.vcpuaffinity.cpumap_hard.nr_bits = cpusize * 8;
 
     ret = do_domctl(xch, &domctl);
 
@@ -269,9 +270,10 @@ int xc_vcpu_getaffinity(xc_interface *xch,
     domctl.cmd = XEN_DOMCTL_getvcpuaffinity;
     domctl.domain = (domid_t)domid;
     domctl.u.vcpuaffinity.vcpu = vcpu;
+    domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD;
 
-    set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap.bitmap, local);
-    domctl.u.vcpuaffinity.cpumap.nr_bits = cpusize * 8;
+    set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap_hard.bitmap, local);
+    domctl.u.vcpuaffinity.cpumap_hard.nr_bits = cpusize * 8;
 
     ret = do_domctl(xch, &domctl);
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 3883f68..c3eec8e 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -3183,7 +3183,7 @@ static void nmi_mce_softirq(void)
          * Make sure to wakeup the vcpu on the
          * specified processor.
          */
-        vcpu_set_affinity(st->vcpu, cpumask_of(st->processor));
+        vcpu_set_hard_affinity(st->vcpu, cpumask_of(st->processor));
 
         /* Affinity is restored in the iret hypercall. */
     }
@@ -3212,7 +3212,7 @@ void async_exception_cleanup(struct vcpu *curr)
     if ( !cpumask_empty(curr->cpu_hard_affinity_tmp) &&
          !cpumask_equal(curr->cpu_hard_affinity_tmp, curr->cpu_hard_affinity) )
     {
-        vcpu_set_affinity(curr, curr->cpu_hard_affinity_tmp);
+        vcpu_set_hard_affinity(curr, curr->cpu_hard_affinity_tmp);
         cpumask_clear(curr->cpu_hard_affinity_tmp);
     }
 
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index b5c5c6c..000993f 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -287,6 +287,16 @@ void domctl_lock_release(void)
     spin_unlock(&current->domain->hypercall_deadlock_mutex);
 }
 
+static inline
+int vcpuaffinity_params_invalid(const xen_domctl_vcpuaffinity_t *vcpuaff)
+{
+    return vcpuaff->flags == 0 ||
+           ((vcpuaff->flags & XEN_VCPUAFFINITY_HARD) &&
+            guest_handle_is_null(vcpuaff->cpumap_hard.bitmap)) ||
+           ((vcpuaff->flags & XEN_VCPUAFFINITY_SOFT) &&
+            guest_handle_is_null(vcpuaff->cpumap_soft.bitmap));
+}
+
 long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 {
     long ret = 0;
@@ -601,31 +611,108 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     case XEN_DOMCTL_getvcpuaffinity:
     {
         struct vcpu *v;
+        xen_domctl_vcpuaffinity_t *vcpuaff = &op->u.vcpuaffinity;
 
         ret = -EINVAL;
-        if ( op->u.vcpuaffinity.vcpu >= d->max_vcpus )
+        if ( vcpuaff->vcpu >= d->max_vcpus )
             break;
 
         ret = -ESRCH;
-        if ( (v = d->vcpu[op->u.vcpuaffinity.vcpu]) == NULL )
+        if ( (v = d->vcpu[vcpuaff->vcpu]) == NULL )
+            break;
+
+        ret = -EINVAL;
+        if ( vcpuaffinity_params_invalid(vcpuaff) )
             break;
 
         if ( op->cmd == XEN_DOMCTL_setvcpuaffinity )
         {
-            cpumask_var_t new_affinity;
+            cpumask_var_t new_affinity, old_affinity;
+            cpumask_t *online = cpupool_online_cpumask(v->domain->cpupool);;
+
+            /*
+             * We want to be able to restore hard affinity if we are trying
+             * setting both and changing soft affinity (which happens later,
+             * when hard affinity has been succesfully chaged already) fails.
+             */
+            if ( !alloc_cpumask_var(&old_affinity) )
+            {
+                ret = -ENOMEM;
+                break;
+            }
+            cpumask_copy(old_affinity, v->cpu_hard_affinity);
+
+            if ( !alloc_cpumask_var(&new_affinity) )
+            {
+                free_cpumask_var(old_affinity);
+                ret = -ENOMEM;
+                break;
+            }
 
-            ret = xenctl_bitmap_to_cpumask(
-                &new_affinity, &op->u.vcpuaffinity.cpumap);
-            if ( !ret )
+            /*
+             * We both set a new affinity and report back to the caller what
+             * the scheduler will be effectively using.
+             */
+            if ( vcpuaff->flags & XEN_VCPUAFFINITY_HARD )
+            {
+                ret = xenctl_bitmap_to_bitmap(cpumask_bits(new_affinity),
+                                              &vcpuaff->cpumap_hard,
+                                              nr_cpu_ids);
+                if ( !ret )
+                    ret = vcpu_set_hard_affinity(v, new_affinity);
+                if ( ret )
+                    goto setvcpuaffinity_out;
+
+                /*
+                 * For hard affinity, what we return is the intersection of
+                 * cpupool's online mask and the new hard affinity.
+                 */
+                cpumask_and(new_affinity, online, v->cpu_hard_affinity);
+                ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard,
+                                               new_affinity);
+            }
+            if ( vcpuaff->flags & XEN_VCPUAFFINITY_SOFT )
             {
-                ret = vcpu_set_affinity(v, new_affinity);
-                free_cpumask_var(new_affinity);
+                ret = xenctl_bitmap_to_bitmap(cpumask_bits(new_affinity),
+                                              &vcpuaff->cpumap_soft,
+                                              nr_cpu_ids);
+                if ( !ret)
+                    ret = vcpu_set_soft_affinity(v, new_affinity);
+                if ( ret )
+                {
+                    /*
+                     * Since we're returning error, the caller expects nothing
+                     * happened, so we rollback the changes to hard affinity
+                     * (if any).
+                     */
+                    if ( vcpuaff->flags & XEN_VCPUAFFINITY_HARD )
+                        vcpu_set_hard_affinity(v, old_affinity);
+                    goto setvcpuaffinity_out;
+                }
+
+                /*
+                 * For soft affinity, we return the intersection between the
+                 * new soft affinity, the cpupool's online map and the (new)
+                 * hard affinity.
+                 */
+                cpumask_and(new_affinity, new_affinity, online);
+                cpumask_and(new_affinity, new_affinity, v->cpu_hard_affinity);
+                ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft,
+                                               new_affinity);
             }
+
+ setvcpuaffinity_out:
+            free_cpumask_var(new_affinity);
+            free_cpumask_var(old_affinity);
         }
         else
         {
-            ret = cpumask_to_xenctl_bitmap(
-                &op->u.vcpuaffinity.cpumap, v->cpu_hard_affinity);
+            if ( vcpuaff->flags & XEN_VCPUAFFINITY_HARD )
+                ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_hard,
+                                               v->cpu_hard_affinity);
+            if ( vcpuaff->flags & XEN_VCPUAFFINITY_SOFT )
+                ret = cpumask_to_xenctl_bitmap(&vcpuaff->cpumap_soft,
+                                               v->cpu_soft_affinity);
         }
     }
     break;
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5abefa1..6a726af 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -649,22 +649,14 @@ int cpu_disable_scheduler(unsigned int cpu)
     return ret;
 }
 
-int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity)
+static int vcpu_set_affinity(
+    struct vcpu *v, const cpumask_t *affinity, cpumask_t *which)
 {
-    cpumask_t online_affinity;
-    cpumask_t *online;
     spinlock_t *lock;
 
-    if ( v->domain->is_pinned )
-        return -EINVAL;
-    online = VCPU2ONLINE(v);
-    cpumask_and(&online_affinity, affinity, online);
-    if ( cpumask_empty(&online_affinity) )
-        return -EINVAL;
-
     lock = vcpu_schedule_lock_irq(v);
 
-    cpumask_copy(v->cpu_hard_affinity, affinity);
+    cpumask_copy(which, affinity);
 
     /* Always ask the scheduler to re-evaluate placement
      * when changing the affinity */
@@ -683,6 +675,27 @@ int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity)
     return 0;
 }
 
+int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity)
+{
+    cpumask_t online_affinity;
+    cpumask_t *online;
+
+    if ( v->domain->is_pinned )
+        return -EINVAL;
+
+    online = VCPU2ONLINE(v);
+    cpumask_and(&online_affinity, affinity, online);
+    if ( cpumask_empty(&online_affinity) )
+        return -EINVAL;
+
+    return vcpu_set_affinity(v, affinity, v->cpu_hard_affinity);
+}
+
+int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity)
+{
+    return vcpu_set_affinity(v, affinity, v->cpu_soft_affinity);
+}
+
 /* Block the currently-executing domain until a pertinent event occurs. */
 void vcpu_block(void)
 {
diff --git a/xen/common/wait.c b/xen/common/wait.c
index 3f6ff41..1f6b597 100644
--- a/xen/common/wait.c
+++ b/xen/common/wait.c
@@ -135,7 +135,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
     /* Save current VCPU affinity; force wakeup on *this* CPU only. */
     wqv->wakeup_cpu = smp_processor_id();
     cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
-    if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
+    if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
     {
         gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
         domain_crash_synchronous();
@@ -166,7 +166,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
 static void __finish_wait(struct waitqueue_vcpu *wqv)
 {
     wqv->esp = NULL;
-    (void)vcpu_set_affinity(current, &wqv->saved_affinity);
+    (void)vcpu_set_hard_affinity(current, &wqv->saved_affinity);
 }
 
 void check_wakeup_from_wait(void)
@@ -184,7 +184,7 @@ void check_wakeup_from_wait(void)
         /* Re-set VCPU affinity and re-enter the scheduler. */
         struct vcpu *curr = current;
         cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
-        if ( vcpu_set_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
+        if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
         {
             gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
             domain_crash_synchronous();
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 385b053..365446f 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -300,8 +300,33 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_nodeaffinity_t);
 /* XEN_DOMCTL_setvcpuaffinity */
 /* XEN_DOMCTL_getvcpuaffinity */
 struct xen_domctl_vcpuaffinity {
-    uint32_t  vcpu;              /* IN */
-    struct xenctl_bitmap cpumap; /* IN/OUT */
+    /* IN variables. */
+    uint32_t  vcpu;
+ /* Set/get the hard affinity for vcpu */
+#define _XEN_VCPUAFFINITY_HARD  0
+#define XEN_VCPUAFFINITY_HARD   (1U<<_XEN_VCPUAFFINITY_HARD)
+ /* Set/get the soft affinity for vcpu */
+#define _XEN_VCPUAFFINITY_SOFT  1
+#define XEN_VCPUAFFINITY_SOFT   (1U<<_XEN_VCPUAFFINITY_SOFT)
+    uint32_t flags;
+    /*
+     * IN/OUT variables.
+     *
+     * Both are IN/OUT for XEN_DOMCTL_setvcpuaffinity, in which case they
+     * contain effective hard or/and soft affinity. That is, upon successful
+     * return, cpumap_soft, contains the intersection of the soft affinity,
+     * hard affinity and the cpupool's online CPUs for the domain (if
+     * XEN_VCPUAFFINITY_SOFT was set in flags). cpumap_hard contains the
+     * intersection between hard affinity and the cpupool's online CPUs (if
+     * XEN_VCPUAFFINITY_HARD was set in flags).
+     *
+     * Both are OUT-only for XEN_DOMCTL_getvcpuaffinity, in which case they
+     * contain the plain hard and/or soft affinity masks that were set during
+     * previous successful calls to XEN_DOMCTL_setvcpuaffinity (or the
+     * default values), without intersecting or altering them in any way.
+     */
+    struct xenctl_bitmap cpumap_hard;
+    struct xenctl_bitmap cpumap_soft;
 };
 typedef struct xen_domctl_vcpuaffinity xen_domctl_vcpuaffinity_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuaffinity_t);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 445b659..f920e1a 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -781,7 +781,8 @@ void scheduler_free(struct scheduler *sched);
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c);
 void vcpu_force_reschedule(struct vcpu *v);
 int cpu_disable_scheduler(unsigned int cpu);
-int vcpu_set_affinity(struct vcpu *v, const cpumask_t *affinity);
+int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity);
+int vcpu_set_soft_affinity(struct vcpu *v, const cpumask_t *affinity);
 void restore_vcpu_affinity(struct domain *d);
 
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [v7 PATCH 05/10] libxc/libxl: bump library SONAMEs
  2014-06-10  0:44 [v7 PATCH 00/10] Implement vcpu soft affinity for credit1 Dario Faggioli
                   ` (3 preceding siblings ...)
  2014-06-10  0:44 ` [v7 PATCH 04/10] xen/libxc: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity Dario Faggioli
@ 2014-06-10  0:45 ` Dario Faggioli
  2014-06-10 13:46   ` Ian Campbell
  2014-06-10  0:45 ` [v7 PATCH 06/10] libxc: get and set soft and hard affinity Dario Faggioli
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10  0:45 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

The following two patches break both libxc and libxl ABI and
API, so we better bump the MAJORs.

Of course, for libxl, proper measures are taken (in the
relevant patch) in order to guarantee API stability.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
 tools/libxc/Makefile |    2 +-
 tools/libxl/Makefile |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index a74b19e..e7cb613 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -1,7 +1,7 @@
 XEN_ROOT = $(CURDIR)/../..
 include $(XEN_ROOT)/tools/Rules.mk
 
-MAJOR    = 4.4
+MAJOR    = 4.5
 MINOR    = 0
 
 CTRL_SRCS-y       :=
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 4cfa275..1bf9358 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -5,7 +5,7 @@
 XEN_ROOT = $(CURDIR)/../..
 include $(XEN_ROOT)/tools/Rules.mk
 
-MAJOR = 4.4
+MAJOR = 4.5
 MINOR = 0
 
 XLUMAJOR = 4.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [v7 PATCH 06/10] libxc: get and set soft and hard affinity
  2014-06-10  0:44 [v7 PATCH 00/10] Implement vcpu soft affinity for credit1 Dario Faggioli
                   ` (4 preceding siblings ...)
  2014-06-10  0:45 ` [v7 PATCH 05/10] libxc/libxl: bump library SONAMEs Dario Faggioli
@ 2014-06-10  0:45 ` Dario Faggioli
  2014-06-10  0:45 ` [v7 PATCH 07/10] libxl: get and set soft affinity Dario Faggioli
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10  0:45 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

by using the flag and the new cpumap arguments introduced in
the parameters of the DOMCTL_{get,set}_vcpuaffinity hypercalls.

Now, both xc_vcpu_setaffinity() and xc_vcpu_getaffinity() have
a new flag parameter, to specify whether the user wants to
set/get hard affinity, soft affinity or both. They also have
two cpumap parameters instead of only one. This way, it is
possible to set/get both hard and soft affinity at the same
time (and, in case of set, each one to its own value).

In xc_vcpu_setaffinity(), the cpumaps are IN/OUT parameters,
as it is for the corresponding arguments of the
DOMCTL_set_vcpuaffinity hypercall. What Xen puts there is the
hard and soft effective affinity, that is what Xen will actually
use for scheduling.

In-tree callers are also fixed to cope with the new interface.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
Changes from v4:
 * update toward the new hypercall interface;
 * migrate to hypercall BOUNCEs instead of BUFFERs, as
   suggested during (v3) review;

Changes from v2:
 * better cleanup logic in _vcpu_setaffinity() (regarding
   xc_hypercall_buffer_{alloc,free}()), as suggested during
   review;
 * make it more evident that DOMCTL_setvcpuaffinity has an out
   parameter, by calling ecpumap_out, and improving the comment
   wrt that;
 * change the interface and have xc_vcpu_[sg]etaffinity() so
   that they take the new parameters (flags and ecpumap_out) and
   fix the in tree callers.
---
 tools/libxc/xc_domain.c             |   68 +++++++++++++++++++++++------------
 tools/libxc/xenctrl.h               |   55 +++++++++++++++++++++++++++-
 tools/libxl/libxl.c                 |    6 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c |    8 +++-
 tools/python/xen/lowlevel/xc/xc.c   |    6 ++-
 5 files changed, 111 insertions(+), 32 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 861b471..20ed127 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -202,10 +202,15 @@ int xc_domain_node_getaffinity(xc_interface *xch,
 int xc_vcpu_setaffinity(xc_interface *xch,
                         uint32_t domid,
                         int vcpu,
-                        xc_cpumap_t cpumap)
+                        xc_cpumap_t cpumap_hard_inout,
+                        xc_cpumap_t cpumap_soft_inout,
+                        uint32_t flags)
 {
     DECLARE_DOMCTL;
-    DECLARE_HYPERCALL_BUFFER(uint8_t, local);
+    DECLARE_HYPERCALL_BOUNCE(cpumap_hard_inout, 0,
+                             XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+    DECLARE_HYPERCALL_BOUNCE(cpumap_soft_inout, 0,
+                             XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
     int ret = -1;
     int cpusize;
 
@@ -213,32 +218,37 @@ int xc_vcpu_setaffinity(xc_interface *xch,
     if (cpusize <= 0)
     {
         PERROR("Could not get number of cpus");
-        goto out;
+        return -1;
     }
 
-    local = xc_hypercall_buffer_alloc(xch, local, cpusize);
-    if ( local == NULL )
+    HYPERCALL_BOUNCE_SET_SIZE(cpumap_hard_inout, cpusize);
+    HYPERCALL_BOUNCE_SET_SIZE(cpumap_soft_inout, cpusize);
+
+    if ( xc_hypercall_bounce_pre(xch, cpumap_hard_inout) ||
+         xc_hypercall_bounce_pre(xch, cpumap_soft_inout) )
     {
-        PERROR("Could not allocate memory for setvcpuaffinity domctl hypercall");
+        PERROR("Could not allocate hcall buffers for DOMCTL_setvcpuaffinity");
         goto out;
     }
 
     domctl.cmd = XEN_DOMCTL_setvcpuaffinity;
     domctl.domain = (domid_t)domid;
     domctl.u.vcpuaffinity.vcpu = vcpu;
-    domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD;
-
-    memcpy(local, cpumap, cpusize);
-
-    set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap_hard.bitmap, local);
+    domctl.u.vcpuaffinity.flags = flags;
 
+    set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap_hard.bitmap,
+                         cpumap_hard_inout);
     domctl.u.vcpuaffinity.cpumap_hard.nr_bits = cpusize * 8;
+    set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap_soft.bitmap,
+                         cpumap_soft_inout);
+    domctl.u.vcpuaffinity.cpumap_soft.nr_bits = cpusize * 8;
 
     ret = do_domctl(xch, &domctl);
 
-    xc_hypercall_buffer_free(xch, local);
-
  out:
+    xc_hypercall_bounce_post(xch, cpumap_hard_inout);
+    xc_hypercall_bounce_post(xch, cpumap_soft_inout);
+
     return ret;
 }
 
@@ -246,10 +256,13 @@ int xc_vcpu_setaffinity(xc_interface *xch,
 int xc_vcpu_getaffinity(xc_interface *xch,
                         uint32_t domid,
                         int vcpu,
-                        xc_cpumap_t cpumap)
+                        xc_cpumap_t cpumap_hard,
+                        xc_cpumap_t cpumap_soft,
+                        uint32_t flags)
 {
     DECLARE_DOMCTL;
-    DECLARE_HYPERCALL_BUFFER(uint8_t, local);
+    DECLARE_HYPERCALL_BOUNCE(cpumap_hard, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(cpumap_soft, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
     int ret = -1;
     int cpusize;
 
@@ -257,30 +270,37 @@ int xc_vcpu_getaffinity(xc_interface *xch,
     if (cpusize <= 0)
     {
         PERROR("Could not get number of cpus");
-        goto out;
+        return -1;
     }
 
-    local = xc_hypercall_buffer_alloc(xch, local, cpusize);
-    if (local == NULL)
+    HYPERCALL_BOUNCE_SET_SIZE(cpumap_hard, cpusize);
+    HYPERCALL_BOUNCE_SET_SIZE(cpumap_soft, cpusize);
+
+    if ( xc_hypercall_bounce_pre(xch, cpumap_hard) ||
+         xc_hypercall_bounce_pre(xch, cpumap_soft) )
     {
-        PERROR("Could not allocate memory for getvcpuaffinity domctl hypercall");
+        PERROR("Could not allocate hcall buffers for DOMCTL_getvcpuaffinity");
         goto out;
     }
 
     domctl.cmd = XEN_DOMCTL_getvcpuaffinity;
     domctl.domain = (domid_t)domid;
     domctl.u.vcpuaffinity.vcpu = vcpu;
-    domctl.u.vcpuaffinity.flags = XEN_VCPUAFFINITY_HARD;
+    domctl.u.vcpuaffinity.flags = flags;
 
-    set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap_hard.bitmap, local);
+    set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap_hard.bitmap,
+                         cpumap_hard);
     domctl.u.vcpuaffinity.cpumap_hard.nr_bits = cpusize * 8;
+    set_xen_guest_handle(domctl.u.vcpuaffinity.cpumap_soft.bitmap,
+                         cpumap_soft);
+    domctl.u.vcpuaffinity.cpumap_soft.nr_bits = cpusize * 8;
 
     ret = do_domctl(xch, &domctl);
 
-    memcpy(cpumap, local, cpusize);
+ out:
+    xc_hypercall_bounce_post(xch, cpumap_hard);
+    xc_hypercall_bounce_post(xch, cpumap_soft);
 
-    xc_hypercall_buffer_free(xch, local);
-out:
     return ret;
 }
 
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 400f0df..fd2388e 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -582,14 +582,65 @@ int xc_domain_node_getaffinity(xc_interface *xch,
                                uint32_t domind,
                                xc_nodemap_t nodemap);
 
+/**
+ * This function specifies the CPU affinity for a vcpu.
+ *
+ * There are two kinds of affinity. Soft affinity is on what CPUs a vcpu
+ * prefers to run. Hard affinity is on what CPUs a vcpu is allowed to run.
+ * If flags contains XEN_VCPUAFFINITY_SOFT, the soft affinity it is set to
+ * what cpumap_soft_inout contains. If flags contains XEN_VCPUAFFINITY_HARD,
+ * the hard affinity is set to what cpumap_hard_inout contains. Both flags
+ * can be set at the same time, in which case both soft and hard affinity are
+ * set to what the respective parameter contains.
+ *
+ * The function also returns the effective hard or/and soft affinity, still
+ * via the cpumap_soft_inout and cpumap_hard_inout parameters. Effective
+ * affinity is, in case of soft affinity, the intersection of soft affinity,
+ * hard affinity and the cpupool's online CPUs for the domain, and is returned
+ * in cpumap_soft_inout, if XEN_VCPUAFFINITY_SOFT is set in flags. In case of
+ * hard affinity, it is the intersection between hard affinity and the
+ * cpupool's online CPUs, and is returned in cpumap_hard_inout, if
+ * XEN_VCPUAFFINITY_HARD is set in flags. If both flags are set, both soft
+ * and hard affinity are returned in the respective parameter.
+ *
+ * We do report it back as effective affinity is what the Xen scheduler will
+ * actually use, and we thus allow checking whether or not that matches with,
+ * or at least is good enough for, the caller's purposes.
+ *
+ * @param xch a handle to an open hypervisor interface.
+ * @param domid the id of the domain to which the vcpu belongs
+ * @param vcpu the vcpu id wihin the domain
+ * @param cpumap_hard_inout specifies(/returns) the (effective) hard affinity
+ * @param cpumap_soft_inout specifies(/returns) the (effective) soft affinity
+ * @param flags what we want to set
+ */
 int xc_vcpu_setaffinity(xc_interface *xch,
                         uint32_t domid,
                         int vcpu,
-                        xc_cpumap_t cpumap);
+                        xc_cpumap_t cpumap_hard_inout,
+                        xc_cpumap_t cpumap_soft_inout,
+                        uint32_t flags);
+
+/**
+ * This function retrieves hard and soft CPU affinity of a vcpu,
+ * depending on what flags are set.
+ *
+ * Soft affinity is returned in cpumap_soft if XEN_VCPUAFFINITY_SOFT is set.
+ * Hard affinity is returned in cpumap_hard if XEN_VCPUAFFINITY_HARD is set.
+ *
+ * @param xch a handle to an open hypervisor interface.
+ * @param domid the id of the domain to which the vcpu belongs
+ * @param vcpu the vcpu id wihin the domain
+ * @param cpumap_hard is where hard affinity is returned
+ * @param cpumap_soft is where soft affinity is returned
+ * @param flags what we want get
+ */
 int xc_vcpu_getaffinity(xc_interface *xch,
                         uint32_t domid,
                         int vcpu,
-                        xc_cpumap_t cpumap);
+                        xc_cpumap_t cpumap_hard,
+                        xc_cpumap_t cpumap_soft,
+                        uint32_t flags);
 
 
 /**
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 900b8d4..ec79645 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -4582,7 +4582,8 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid,
             goto err;
         }
         if (xc_vcpu_getaffinity(ctx->xch, domid, *nr_vcpus_out,
-                                ptr->cpumap.map) == -1) {
+                                ptr->cpumap.map, NULL,
+                                XEN_VCPUAFFINITY_HARD) == -1) {
             LOGE(ERROR, "getting vcpu affinity");
             goto err;
         }
@@ -4606,7 +4607,8 @@ err:
 int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid,
                            libxl_bitmap *cpumap)
 {
-    if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map)) {
+    if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map, NULL,
+                            XEN_VCPUAFFINITY_HARD)) {
         LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu affinity");
         return ERROR_FAIL;
     }
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index ff29b47..f0810eb 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -438,7 +438,9 @@ CAMLprim value stub_xc_vcpu_setaffinity(value xch, value domid,
 			c_cpumap[i/8] |= 1 << (i&7);
 	}
 	retval = xc_vcpu_setaffinity(_H(xch), _D(domid),
-	                             Int_val(vcpu), c_cpumap);
+				     Int_val(vcpu),
+				     c_cpumap, NULL,
+				     XEN_VCPUAFFINITY_HARD);
 	free(c_cpumap);
 
 	if (retval < 0)
@@ -460,7 +462,9 @@ CAMLprim value stub_xc_vcpu_getaffinity(value xch, value domid,
 		failwith_xc(_H(xch));
 
 	retval = xc_vcpu_getaffinity(_H(xch), _D(domid),
-	                             Int_val(vcpu), c_cpumap);
+				     Int_val(vcpu),
+				     c_cpumap, NULL,
+				     XEN_VCPUAFFINITY_HARD);
 	if (retval < 0) {
 		free(c_cpumap);
 		failwith_xc(_H(xch));
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index cb34446..54e8799 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -256,7 +256,8 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self,
         }
     }
   
-    if ( xc_vcpu_setaffinity(self->xc_handle, dom, vcpu, cpumap) != 0 )
+    if ( xc_vcpu_setaffinity(self->xc_handle, dom, vcpu, cpumap,
+                             NULL, XEN_VCPUAFFINITY_HARD) != 0 )
     {
         free(cpumap);
         return pyxc_error_to_exception(self->xc_handle);
@@ -403,7 +404,8 @@ static PyObject *pyxc_vcpu_getinfo(XcObject *self,
     if(cpumap == NULL)
         return pyxc_error_to_exception(self->xc_handle);
 
-    rc = xc_vcpu_getaffinity(self->xc_handle, dom, vcpu, cpumap);
+    rc = xc_vcpu_getaffinity(self->xc_handle, dom, vcpu, cpumap,
+                             NULL, XEN_VCPUAFFINITY_HARD);
     if ( rc < 0 )
     {
         free(cpumap);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [v7 PATCH 07/10] libxl: get and set soft affinity
  2014-06-10  0:44 [v7 PATCH 00/10] Implement vcpu soft affinity for credit1 Dario Faggioli
                   ` (5 preceding siblings ...)
  2014-06-10  0:45 ` [v7 PATCH 06/10] libxc: get and set soft and hard affinity Dario Faggioli
@ 2014-06-10  0:45 ` Dario Faggioli
  2014-06-10 14:02   ` Ian Campbell
  2014-06-10 15:39   ` George Dunlap
  2014-06-10  0:45 ` [v7 PATCH 08/10] xl: enable getting and setting soft Dario Faggioli
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10  0:45 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

Make space for two new cpumap-s, one in vcpu_info (for getting
soft affinity) and build_info (for setting it) and amend the
API for setting vCPU affinity.

libxl_set_vcpuaffinity() now takes two cpumaps, one for hard
and one for soft affinity (LIBXL_API_VERSION is exploited to
retain source level backword compatibility). Either of the
two cpumap can be NULL, in which case, only the affinity
corresponding to the non-NULL cpumap will be affected.

Getting soft affinity happens indirectly, via `xl vcpu-list'
(as it is already for hard affinity).

This commit also introduces some logic to check whether the
affinity which will be used by Xen to schedule the vCPU(s)
does actually match with the cpumaps provided. In fact, we
want to allow every possible combination of hard and soft
affinity to be set, but we warn the user upon particularly
weird situations (e.g., hard and soft being disjoint sets
of pCPUs).

This very change also update the error handling for calls
to libxl_set_vcpuaffinity() in xl, as that can now be any
libxl error code, not just only -1.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
---
Changes from v6:
 * fix a typo and update LIBXL_API_VERSION appropriately,
   as requested during review;
 * libxl_bitmap_equal() rewritten in such a way that it now
   automatically deals with bitmaps of different sizes. Bits
   non present in one of the maps are just considered to be
   0, as suggested during review;
 * there is no need for libxl_bitmap_bit_valid() any longer,
   so killed it;
 * inside libxl, retrieve and set both hard and soft affinity
   atomically, in just one call (to xc_vcpu_getaffinity() and
   xc_vcpu_setaffinity(), respectively), as requested during
   review.
 * update the changelog about changing the error handling for
   the call to libxl_set_vcpuaffinity() in xl.

Changes from v4:
 * get rid of inline stubs inside the LIBXL_API_VERSION_XXX
   block and just use define, as suggested during review
 * adapt to the new xc interface;
 * avoid leaking cpumap_soft in libxl_list_vcpu on error, as
   requested during review;
 * fix bogous `return 0' in libxl_set_vcpuaffinity, as
   requested during review;
 * clarify the comment for LIBXL_HAVE_SOFT_AFFINITY, as
   suggested during review;
 * renamed libxl_bitmap_valid() to libxl_bitmap_bit_valid(),
   as suggested uring review.

Changes from v3:
 * only introduce one LIBXL_HAVE_ symbol for soft affinity,
   as requested during review;
 * use LIBXL_API_VERSION instead of having multiple version
   of the same function, as suggested during review;
 * use libxl_get_nr_cpus() rather than libxl_get_cputopology(),
   as suggested during review;
 * use LOGE() instead of LIBXL__LOG_ERRNO(), as requested
   during review;
 * kill the flags and use just one _set_vcpuaffinity()
   function with two cpumaps, allowing either of them to
   be NULL, as suggested during review;
 * avoid overflowing the bitmaps in libxl_bitmap_equal(),
   as suggested during review.

Changes from v2:
 * interface completely redesigned, as discussed during
   review.
---
 tools/libxl/libxl.c         |   99 ++++++++++++++++++++++++++++++++++++++-----
 tools/libxl/libxl.h         |   26 ++++++++++-
 tools/libxl/libxl_create.c  |    6 +++
 tools/libxl/libxl_dom.c     |    3 +
 tools/libxl/libxl_types.idl |    4 +-
 tools/libxl/libxl_utils.h   |   25 +++++++++++
 tools/libxl/xl_cmdimpl.c    |    6 +--
 7 files changed, 150 insertions(+), 19 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index ec79645..2cb7174 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -4577,13 +4577,17 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid,
         libxl_bitmap_init(&ptr->cpumap);
         if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap, 0))
             goto err;
+        libxl_bitmap_init(&ptr->cpumap_soft);
+        if (libxl_cpu_bitmap_alloc(ctx, &ptr->cpumap_soft, 0))
+            goto err;
         if (xc_vcpu_getinfo(ctx->xch, domid, *nr_vcpus_out, &vcpuinfo) == -1) {
             LOGE(ERROR, "getting vcpu info");
             goto err;
         }
+
         if (xc_vcpu_getaffinity(ctx->xch, domid, *nr_vcpus_out,
-                                ptr->cpumap.map, NULL,
-                                XEN_VCPUAFFINITY_HARD) == -1) {
+                                ptr->cpumap.map, ptr->cpumap_soft.map,
+                                XEN_VCPUAFFINITY_SOFT|XEN_VCPUAFFINITY_HARD) == -1) {
             LOGE(ERROR, "getting vcpu affinity");
             goto err;
         }
@@ -4599,34 +4603,105 @@ libxl_vcpuinfo *libxl_list_vcpu(libxl_ctx *ctx, uint32_t domid,
 
 err:
     libxl_bitmap_dispose(&ptr->cpumap);
+    libxl_bitmap_dispose(&ptr->cpumap_soft);
     free(ret);
     GC_FREE;
     return NULL;
 }
 
 int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid,
-                           libxl_bitmap *cpumap)
+                           const libxl_bitmap *cpumap_hard,
+                           const libxl_bitmap *cpumap_soft)
 {
-    if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid, cpumap->map, NULL,
-                            XEN_VCPUAFFINITY_HARD)) {
-        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting vcpu affinity");
-        return ERROR_FAIL;
+    GC_INIT(ctx);
+    libxl_bitmap hard, soft;
+    int rc, flags = 0;
+
+    libxl_bitmap_init(&hard);
+    libxl_bitmap_init(&soft);
+
+    if (!cpumap_hard && !cpumap_soft) {
+        rc = ERROR_INVAL;
+        goto out;
     }
-    return 0;
+
+    /*
+     * Xen wants writable hard and/or soft cpumaps, to put back in them
+     * the effective hard and/or soft affinity that will be used.
+     */
+    if (cpumap_hard) {
+        rc = libxl_cpu_bitmap_alloc(ctx, &hard, 0);
+        if (rc)
+            goto out;
+
+        libxl_bitmap_copy(ctx, &hard, cpumap_hard);
+        flags = XEN_VCPUAFFINITY_HARD;
+    }
+    if (cpumap_soft) {
+        rc = libxl_cpu_bitmap_alloc(ctx, &soft, 0);
+        if (rc)
+            goto out;
+
+        libxl_bitmap_copy(ctx, &soft, cpumap_soft);
+        flags |= XEN_VCPUAFFINITY_SOFT;
+    }
+
+    if (xc_vcpu_setaffinity(ctx->xch, domid, vcpuid,
+                            cpumap_hard ? hard.map : NULL,
+                            cpumap_soft ? soft.map : NULL,
+                            flags)) {
+        LOGE(ERROR, "setting vcpu affinity");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    /*
+     * Let's check the results. Hard affinity will never be empty, but it
+     * is possible that Xen will use something different from what we asked
+     * for various reasons. If that's the case, report it.
+     */
+    if (cpumap_hard &&
+        !libxl_bitmap_equal(cpumap_hard, &hard, 0))
+        LOG(DEBUG, "New hard affinity for vcpu %d has unreachable cpus",
+        vcpuid);
+    /*
+     * Soft affinity can both be different from what asked and empty. Check
+     * for (and report) both.
+     */
+    if (cpumap_soft) {
+        if (!libxl_bitmap_equal(cpumap_soft, &soft, 0))
+            LOG(DEBUG, "New soft affinity for vcpu %d has unreachable cpus",
+                vcpuid);
+        if (libxl_bitmap_is_empty(&soft))
+            LOG(WARN, "all cpus in soft affinity of vcpu %d are unreachable."
+                " Only hard affinity will be considered for scheduling",
+                vcpuid);
+    }
+
+    rc = 0;
+ out:
+    libxl_bitmap_dispose(&hard);
+    libxl_bitmap_dispose(&soft);
+    GC_FREE;
+    return rc;
 }
 
 int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
-                               unsigned int max_vcpus, libxl_bitmap *cpumap)
+                               unsigned int max_vcpus,
+                               const libxl_bitmap *cpumap_hard,
+                               const libxl_bitmap *cpumap_soft)
 {
+    GC_INIT(ctx);
     int i, rc = 0;
 
     for (i = 0; i < max_vcpus; i++) {
-        if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap)) {
-            LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
-                       "failed to set affinity for %d", i);
+        if (libxl_set_vcpuaffinity(ctx, domid, i, cpumap_hard, cpumap_soft)) {
+            LOG(WARN, "failed to set affinity for %d", i);
             rc = ERROR_FAIL;
         }
     }
+
+    GC_FREE;
     return rc;
 }
 
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 80947c3..e549db8 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -82,6 +82,15 @@
 #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1
 
 /*
+ * LIBXL_HAVE_SOFT_AFFINITY indicates that a 'cpumap_soft'
+ * field (of libxl_bitmap type) is present in:
+ *  - libxl_vcpuinfo, containing the soft affinity of a vcpu;
+ *  - libxl_domain_build_info, for setting the soft affinity of
+ *    all vcpus while creating the domain.
+ */
+#define LIBXL_HAVE_SOFT_AFFINITY 1
+
+/*
  * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the
  * libxl_vendor_device field is present in the hvm sections of
  * libxl_domain_build_info. This field tells libxl which
@@ -1097,9 +1106,22 @@ int libxl_userdata_retrieve(libxl_ctx *ctx, uint32_t domid,
 
 int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo);
 int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid,
-                           libxl_bitmap *cpumap);
+                           const libxl_bitmap *cpumap_hard,
+                           const libxl_bitmap *cpumap_soft);
 int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
-                               unsigned int max_vcpus, libxl_bitmap *cpumap);
+                               unsigned int max_vcpus,
+                               const libxl_bitmap *cpumap_hard,
+                               const libxl_bitmap *cpumap_soft);
+
+#if defined (LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x040500
+
+#define libxl_set_vcpuaffinity(ctx, domid, vcpuid, map) \
+    libxl_set_vcpuaffinity((ctx), (domid), (vcpuid), (map), NULL)
+#define libxl_set_vcpuaffinity_all(ctx, domid, max_vcpus, map) \
+    libxl_set_vcpuaffinity_all((ctx), (domid), (max_vcpus), (map), NULL)
+
+#endif
+
 int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
                                   libxl_bitmap *nodemap);
 int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index d015cf4..49a01a7 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -193,6 +193,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         libxl_bitmap_set_any(&b_info->cpumap);
     }
 
+    if (!b_info->cpumap_soft.size) {
+        if (libxl_cpu_bitmap_alloc(CTX, &b_info->cpumap_soft, 0))
+            return ERROR_FAIL;
+        libxl_bitmap_set_any(&b_info->cpumap_soft);
+    }
+
     libxl_defbool_setdefault(&b_info->numa_placement, true);
 
     if (!b_info->nodemap.size) {
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 661999c..91af7f8 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -261,7 +261,8 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
             return rc;
     }
     libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap);
-    libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap);
+    libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap,
+                               &info->cpumap_soft);
 
     if (xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb +
         LIBXL_MAXMEM_CONSTANT) < 0) {
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 52f1aa9..15382cc 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -300,6 +300,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
     ("cpumap",          libxl_bitmap),
+    ("cpumap_soft",     libxl_bitmap),
     ("nodemap",         libxl_bitmap),
     ("numa_placement",  libxl_defbool),
     ("tsc_mode",        libxl_tsc_mode),
@@ -516,7 +517,8 @@ libxl_vcpuinfo = Struct("vcpuinfo", [
     ("blocked", bool),
     ("running", bool),
     ("vcpu_time", uint64), # total vcpu time ran (ns)
-    ("cpumap", libxl_bitmap), # current cpu's affinities
+    ("cpumap", libxl_bitmap), # current hard cpu affinity
+    ("cpumap_soft", libxl_bitmap), # current soft cpu affinity
     ], dir=DIR_OUT)
 
 libxl_physinfo = Struct("physinfo", [
diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h
index e37fb89..014f1f6 100644
--- a/tools/libxl/libxl_utils.h
+++ b/tools/libxl/libxl_utils.h
@@ -98,6 +98,31 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit)
 #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \
                                              if (libxl_bitmap_test(&(m), v))
 
+/*
+ * Compares two bitmaps bit by bit, up to nr_bits or, if nr_bits is 0, up
+ * to the size of the lagest bitmap. If sizes does not match, bits past the
+ * of a bitmap are considered as being 0, which matches with the semantic and
+ * implementation of libxl_bitmap_test I think().
+ *
+ * So, basically, [0,1,0] and [0,1] are considered equal, while [0,1,1] and
+ * [0,1] are different.
+ */
+static inline int libxl_bitmap_equal(const libxl_bitmap *ba,
+                                     const libxl_bitmap *bb,
+                                     int nr_bits)
+{
+    int i;
+
+    if (nr_bits == 0)
+        nr_bits = ba->size > bb->size ? ba->size * 8 : bb->size * 8;
+
+    for (i = 0; i < nr_bits; i++) {
+        if (libxl_bitmap_test(ba, i) != libxl_bitmap_test(bb, i))
+            return 0;
+    }
+    return 1;
+}
+
 int libxl_cpu_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *cpumap, int max_cpus);
 int libxl_node_bitmap_alloc(libxl_ctx *ctx, libxl_bitmap *nodemap,
                             int max_nodes);
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 5195914..fa5ab8e 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2269,7 +2269,7 @@ start:
             } else {
                 libxl_bitmap_set_any(&vcpu_cpumap);
             }
-            if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap)) {
+            if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap, NULL)) {
                 fprintf(stderr, "setting affinity failed on vcpu `%d'.\n", i);
                 libxl_bitmap_dispose(&vcpu_cpumap);
                 free(vcpu_to_pcpu);
@@ -4700,7 +4700,7 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu)
     }
 
     if (vcpuid != -1) {
-        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap) == -1) {
+        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) {
             fprintf(stderr, "Could not set affinity for vcpu `%u'.\n", vcpuid);
             goto out;
         }
@@ -4712,7 +4712,7 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu)
         }
         for (i = 0; i < nb_vcpu; i++) {
             if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid,
-                                       &cpumap) == -1) {
+                                       &cpumap, NULL)) {
                 fprintf(stderr, "libxl_set_vcpuaffinity failed"
                                 " on vcpu `%u'.\n", vcpuinfo[i].vcpuid);
             }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [v7 PATCH 08/10] xl: enable getting and setting soft
  2014-06-10  0:44 [v7 PATCH 00/10] Implement vcpu soft affinity for credit1 Dario Faggioli
                   ` (6 preceding siblings ...)
  2014-06-10  0:45 ` [v7 PATCH 07/10] libxl: get and set soft affinity Dario Faggioli
@ 2014-06-10  0:45 ` Dario Faggioli
  2014-06-10 14:10   ` Ian Campbell
  2014-06-10  0:45 ` [v7 PATCH 09/10] xl: enable for specifying soft-affinity in the config file Dario Faggioli
  2014-06-10  0:45 ` [v7 PATCH 10/10] libxl: automatic NUMA placement affects soft affinity Dario Faggioli
  9 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10  0:45 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

Getting happens via `xl vcpu-list', which now looks like this:

 # xl vcpu-list -s
 Name       ID VCPU CPU State Time(s) Affinity (Hard / Soft)
 Domain-0   0     0  11  -b-     5.4  8-15 / all
 Domain-0   0     1  11  -b-     1.0  8-15 / all
 Domain-0   0    14  13  -b-     1.4  8-15 / all
 Domain-0   0    15   8  -b-     1.6  8-15 / all
 vm-test    3     0   4  -b-     2.5  0-12 / 0-7
 vm-test    3     1   0  -b-     3.2  0-12 / 0-7

Setting happens by specifying two pCPU masks to the `xl vcpu-pin'
command, the first one will be hard affinity, the second soft
affinity. If only one mask is specified, it is only hard affinity
that is affected. To change only soft affinity, '-' can be used
as the hard affinity mask parameter, and it will be left alone.

xl manual page is updated accordingly.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Changes from v6:
 * removed a remnant from previous iteration, when `vcpu-pin'
   was tacking a porameter;
 * tried to simplify parsing inside of `xl vcpu-pin', as suggested
   during review. Could not take the exact approach described in the
   emails, but still did my best for making this more clear and easy
   to read;
 * fixed a few typos and leftovers.

Changes from v5:
 * change command line interface for 'vcpu-pin', as suggested during
   review.

Changes from v4:
 * fix and rephrased the manual entry, as suggested during review;
 * more refactoring to remove some leftover special casing, as
   suggested during review.

Changes from v3:
 * fix typos in doc, rephrased the help message and changed
   the title of the column for hard/soft affinity, as suggested
   during review.

Changes from v2:
 * this patch folds what in v2 were patches 13 and 14;
 * `xl vcpu-pin' always shows both had and soft affinity,
   without the need of passing '-s'.
---
 docs/man/xl.pod.1         |   32 +++++++++++----
 tools/libxl/xl_cmdimpl.c  |   97 +++++++++++++++++++++++++++++++--------------
 tools/libxl/xl_cmdtable.c |    2 -
 3 files changed, 91 insertions(+), 40 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 30bd4bf..9d1c2a5 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -651,16 +651,32 @@ after B<vcpu-set>, go to B<SEE ALSO> section for information.
 Lists VCPU information for a specific domain.  If no domain is
 specified, VCPU information for all domains will be provided.
 
-=item B<vcpu-pin> I<domain-id> I<vcpu> I<cpus>
+=item B<vcpu-pin> I<domain-id> I<vcpu> I<cpus hard> I<cpus soft>
 
-Pins the VCPU to only run on the specific CPUs.  The keyword
-B<all> can be used to apply the I<cpus> list to all VCPUs in the
-domain.
+Set hard and soft affinity for a I<vcpu> of <domain-id>. Normally VCPUs
+can float between available CPUs whenever Xen deems a different run state
+is appropriate.
 
-Normally VCPUs can float between available CPUs whenever Xen deems a
-different run state is appropriate.  Pinning can be used to restrict
-this, by ensuring certain VCPUs can only run on certain physical
-CPUs.
+Hard affinity can be used to restrict this, by ensuring certain VCPUs
+can only run on certain physical CPUs. Soft affinity specifies a I<preferred>
+set of CPUs. Soft affinity needs special support in the scheduler, which is
+only provided in credit1.
+
+The keyword B<all> can be used to apply the hard and soft affinity masks to
+all the VCPUs in the domain. The symbol '-' can be used to leave either
+hard or soft affinity alone.
+
+For example:
+
+ xl vcpu-pin 0 3 - 6-9
+
+will set soft affinity for vCPU 3 of domain 0 to pCPUs 6,7,8 and 9,
+leaving its hard affinity untouched. On the othe hand:
+
+ xl vcpu-pin 0 3 3,4 6-9
+
+will set both hard and soft affinity, the former to pCPUs 3 and 4, the
+latter to pCPUs 6,7,8, and 9.
 
 =item B<vm-list>
 
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index fa5ab8e..97a1b8a 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -4584,8 +4584,10 @@ static void print_vcpuinfo(uint32_t tdomid,
     }
     /*      TIM */
     printf("%9.1f  ", ((float)vcpuinfo->vcpu_time / 1e9));
-    /* CPU AFFINITY */
+    /* CPU HARD AND SOFT AFFINITY */
     print_bitmap(vcpuinfo->cpumap.map, nr_cpus, stdout);
+    printf(" / ");
+    print_bitmap(vcpuinfo->cpumap_soft.map, nr_cpus, stdout);
     printf("\n");
 }
 
@@ -4620,7 +4622,8 @@ static void vcpulist(int argc, char **argv)
     }
 
     printf("%-32s %5s %5s %5s %5s %9s %s\n",
-           "Name", "ID", "VCPU", "CPU", "State", "Time(s)", "CPU Affinity");
+           "Name", "ID", "VCPU", "CPU", "State", "Time(s)",
+           "Affinity (Hard / Soft)");
     if (!argc) {
         if (!(dominfo = libxl_list_domain(ctx, &nb_domain))) {
             fprintf(stderr, "libxl_list_domain failed.\n");
@@ -4653,17 +4656,27 @@ int main_vcpulist(int argc, char **argv)
     return 0;
 }
 
-static int vcpupin(uint32_t domid, const char *vcpu, char *cpu)
+int main_vcpupin(int argc, char **argv)
 {
     libxl_vcpuinfo *vcpuinfo;
-    libxl_bitmap cpumap;
-
-    uint32_t vcpuid;
+    libxl_bitmap cpumap, cpumap_soft;;
+    uint32_t vcpuid, domid;
+    const char *vcpu;
     char *endptr;
-    int i, nb_cpu, nb_vcpu, rc = -1;
+    int opt, nb_cpu, nb_vcpu, rc = -1;
+    libxl_bitmap *soft = &cpumap_soft, *hard = &cpumap;
 
     libxl_bitmap_init(&cpumap);
+    libxl_bitmap_init(&cpumap_soft);
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "vcpu-pin", 3) {
+        /* No options */
+    }
+
+    domid = find_domain(argv[optind]);
+    vcpu = argv[optind+1];
 
+    /* Figure out with which vCPU we are dealing with */
     vcpuid = strtoul(vcpu, &endptr, 10);
     if (vcpu == endptr) {
         if (strcmp(vcpu, "all")) {
@@ -4673,11 +4686,40 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu)
         vcpuid = -1;
     }
 
-    if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0))
+    if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0) ||
+        libxl_cpu_bitmap_alloc(ctx, &cpumap_soft, 0))
         goto out;
 
-    if (vcpupin_parse(cpu, &cpumap))
+    /*
+     * Syntax is: xl vcpu-pin <domid> <vcpu> <hard> <soft>
+     * We want to handle all the following cases ('-' means
+     * "leave it alone"):
+     *  xl vcpu-pin 0 3 3,4
+     *  xl vcpu-pin 0 3 3,4 -
+     *  xl vcpu-pin 0 3 - 6-9
+     *  xl vcpu-pin 0 3 3,4 6-9
+     */
+
+#define hard_aff_str (argv[optind+2])
+#define soft_aff_str (argv[optind+3])
+    /*
+     * Hard affinity is always present. However, if it's "-", all we need
+     * is passing a NULL pointer to the libxl_set_vcpuaffinity() call below.
+     */
+    if (!strcmp(hard_aff_str, "-"))
+        hard = NULL;
+    else if (vcpupin_parse(hard_aff_str, &cpumap))
         goto out;
+    /*
+     * Soft affinity is handled similarly. Only difference: we also want
+     * to pass NULL to libxl_set_vcpuaffinity() if it is not specified.
+     */
+    if (argc <= optind+3 || !strcmp(soft_aff_str, "-"))
+        soft = NULL;
+    else if (vcpupin_parse(soft_aff_str, &cpumap_soft))
+        goto out;
+#undef hard_aff_str
+#undef soft_aff_str
 
     if (dryrun_only) {
         nb_cpu = libxl_get_online_cpus(ctx);
@@ -4687,7 +4729,14 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu)
         }
 
         fprintf(stdout, "cpumap: ");
-        print_bitmap(cpumap.map, nb_cpu, stdout);
+        if (hard)
+            print_bitmap(cpumap.map, nb_cpu, stdout);
+        else
+            fprintf(stdout, "-");
+        if (soft) {
+            fprintf(stdout, " ");
+            print_bitmap(cpumap_soft.map, nb_cpu, stdout);
+        }
         fprintf(stdout, "\n");
 
         if (ferror(stdout) || fflush(stdout)) {
@@ -4700,43 +4749,29 @@ static int vcpupin(uint32_t domid, const char *vcpu, char *cpu)
     }
 
     if (vcpuid != -1) {
-        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, &cpumap, NULL)) {
-            fprintf(stderr, "Could not set affinity for vcpu `%u'.\n", vcpuid);
+        if (libxl_set_vcpuaffinity(ctx, domid, vcpuid, hard, soft)) {
+            fprintf(stderr, "Could not set affinity for vcpu `%u'.\n",
+                    vcpuid);
             goto out;
         }
     }
     else {
-        if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu, &i))) {
+        if (!(vcpuinfo = libxl_list_vcpu(ctx, domid, &nb_vcpu, &nb_cpu))) {
             fprintf(stderr, "libxl_list_vcpu failed.\n");
             goto out;
         }
-        for (i = 0; i < nb_vcpu; i++) {
-            if (libxl_set_vcpuaffinity(ctx, domid, vcpuinfo[i].vcpuid,
-                                       &cpumap, NULL)) {
-                fprintf(stderr, "libxl_set_vcpuaffinity failed"
-                                " on vcpu `%u'.\n", vcpuinfo[i].vcpuid);
-            }
-        }
+        if (libxl_set_vcpuaffinity_all(ctx, domid, nb_vcpu, hard, soft))
+            fprintf(stderr, "Could not set affinity.\n");
         libxl_vcpuinfo_list_free(vcpuinfo, nb_vcpu);
     }
 
     rc = 0;
  out:
+    libxl_bitmap_dispose(&cpumap_soft);
     libxl_bitmap_dispose(&cpumap);
     return rc;
 }
 
-int main_vcpupin(int argc, char **argv)
-{
-    int opt;
-
-    SWITCH_FOREACH_OPT(opt, "", NULL, "vcpu-pin", 3) {
-        /* No options */
-    }
-
-    return vcpupin(find_domain(argv[optind]), argv[optind+1] , argv[optind+2]);
-}
-
 static void vcpuset(uint32_t domid, const char* nr_vcpus, int check_host)
 {
     char *endptr;
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 4279b9f..7b7fa92 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -218,7 +218,7 @@ struct cmd_spec cmd_table[] = {
     { "vcpu-pin",
       &main_vcpupin, 1, 1,
       "Set which CPUs a VCPU can use",
-      "<Domain> <VCPU|all> <CPUs|all>",
+      "<Domain> <VCPU|all> <Hard affinity|-|all> <Soft affinity|-|all>",
     },
     { "vcpu-set",
       &main_vcpuset, 0, 1,

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [v7 PATCH 09/10] xl: enable for specifying soft-affinity in the config file
  2014-06-10  0:44 [v7 PATCH 00/10] Implement vcpu soft affinity for credit1 Dario Faggioli
                   ` (7 preceding siblings ...)
  2014-06-10  0:45 ` [v7 PATCH 08/10] xl: enable getting and setting soft Dario Faggioli
@ 2014-06-10  0:45 ` Dario Faggioli
  2014-06-10 14:38   ` Ian Campbell
  2014-06-10  0:45 ` [v7 PATCH 10/10] libxl: automatic NUMA placement affects soft affinity Dario Faggioli
  9 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10  0:45 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

in a similar way to how hard-affinity is specified (i.e.,
exactly how plain vcpu-affinity was being specified before
this change).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Changes from v6:
 * update and improve the changelog.

Changes from v4:
 * fix typos and rephrase docs, as suggested during review;
 * more refactoring, i.e., more addressing factor of potential
   common code, as requested during review.

Changes from v3:
 * fix typos and language issues in docs and comments, as
   suggested during review;
 * common code to soft and hard affinity parsing factored
   together, as requested uring review.

Changes from v2:
 * use the new libxl API. Although the implementation changed
   only a little bit, I removed IanJ's Acked-by, although I am
   here saying that he did provided it, as requested.
---
 docs/man/xl.cfg.pod.5    |   23 ++++-
 tools/libxl/xl_cmdimpl.c |  216 ++++++++++++++++++++++++++++++----------------
 2 files changed, 161 insertions(+), 78 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a94d037..08a61da 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -152,19 +152,36 @@ run on cpu #3 of the host.
 =back
 
 If this option is not specified, no vcpu to cpu pinning is established,
-and the vcpus of the guest can run on all the cpus of the host.
+and the vcpus of the guest can run on all the cpus of the host. If this
+option is specified, the intersection of the vcpu pinning mask, provided
+here, and the soft affinity mask, provided via B<cpus\_soft=> (if any),
+is utilized to compute the domain node-affinity, for driving memory
+allocations.
 
 If we are on a NUMA machine (i.e., if the host has more than one NUMA
 node) and this option is not specified, libxl automatically tries to
 place the guest on the least possible number of nodes. That, however,
 will not affect vcpu pinning, so the guest will still be able to run on
-all the cpus, it will just prefer the ones from the node it has been
-placed on. A heuristic approach is used for choosing the best node (or
+all the cpus. A heuristic approach is used for choosing the best node (or
 set of nodes), with the goals of maximizing performance for the guest
 and, at the same time, achieving efficient utilization of host cpus
 and memory. See F<docs/misc/xl-numa-placement.markdown> for more
 details.
 
+=item B<cpus_soft="CPU-LIST">
+
+Exactly as B<cpus=>, but specifies soft affinity, rather than pinning
+(hard affinity). When using the credit scheduler, this means what cpus
+the vcpus of the domain prefer.
+
+A C<CPU-LIST> is specified exactly as above, for B<cpus=>.
+
+If this option is not specified, the vcpus of the guest will not have
+any preference regarding on what cpu to run. If this option is specified,
+the intersection of the soft affinity mask, provided here, and the vcpu
+pinning, provided via B<cpus=> (if any), is utilized to compute the
+domain node-affinity, for driving memory allocations.
+
 =back
 
 =head3 CPU Scheduling
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 97a1b8a..166bd97 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -88,8 +88,9 @@ xlchild children[child_max];
 static const char *common_domname;
 static int fd_lock = -1;
 
-/* Stash for specific vcpu to pcpu mappping */
+/* Stash for specific vcpu to pcpu hard and soft mapping */
 static int *vcpu_to_pcpu;
+static int *vcpu_to_pcpu_soft;
 
 static const char savefileheader_magic[32]=
     "Xen saved domain, xl format\n \0 \r";
@@ -725,6 +726,92 @@ static void parse_top_level_sdl_options(XLU_Config *config,
     xlu_cfg_replace_string (config, "xauthority", &sdl->xauthority, 0);
 }
 
+static int *parse_config_cpumap_list(XLU_ConfigList *cpus,
+                                     libxl_bitmap *cpumap,
+                                     int max_vcpus)
+{
+    int i, n_cpus = 0;
+    int *to_pcpu;
+    const char *buf;
+
+    if (libxl_cpu_bitmap_alloc(ctx, cpumap, 0)) {
+        fprintf(stderr, "Unable to allocate cpumap\n");
+        exit(1);
+    }
+
+    /* Prepare the array for single vcpu to pcpu mappings */
+    to_pcpu = xmalloc(sizeof(int) * max_vcpus);
+    memset(to_pcpu, -1, sizeof(int) * max_vcpus);
+
+    /*
+     * Idea here is to let libxl think all the domain's vcpus
+     * have cpu affinity with all the pcpus on the list. Doing
+     * that ensures memory is allocated on the proper NUMA nodes.
+     * It is then us, here in xl, that matches each single vcpu
+     * to its pcpu (and that's why we need to stash such info in
+     * the to_pcpu array now) after the domain has been created.
+     * This way, we avoid having to pass to libxl some big array
+     * hosting the single mappings.
+     */
+    libxl_bitmap_set_none(cpumap);
+    while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) {
+        i = atoi(buf);
+        if (!libxl_bitmap_cpu_valid(cpumap, i)) {
+            fprintf(stderr, "cpu %d illegal\n", i);
+            exit(1);
+        }
+        libxl_bitmap_set(cpumap, i);
+        if (n_cpus < max_vcpus)
+            to_pcpu[n_cpus] = i;
+        n_cpus++;
+    }
+
+    return to_pcpu;
+}
+
+static void parse_config_cpumap_string(const char *buf, libxl_bitmap *cpumap)
+{
+        char *buf2 = strdup(buf);
+
+        if (libxl_cpu_bitmap_alloc(ctx, cpumap, 0)) {
+            fprintf(stderr, "Unable to allocate cpumap\n");
+            exit(1);
+        }
+
+        libxl_bitmap_set_none(cpumap);
+        if (vcpupin_parse(buf2, cpumap))
+            exit(1);
+        free(buf2);
+}
+
+static void parse_cpu_affinity(XLU_Config *config, const char *what,
+                               libxl_domain_build_info *b_info)
+{
+    XLU_ConfigList *cpus;
+    const char *buf;
+    libxl_bitmap *map;
+    int **array;
+
+    if (!strcmp(what, "cpus")) {
+        map = &b_info->cpumap;
+        array = &vcpu_to_pcpu;
+    } else if (!strcmp(what, "cpus_soft")) {
+        map = &b_info->cpumap_soft;
+        array = &vcpu_to_pcpu_soft;
+    } else
+        return;
+
+    if (!xlu_cfg_get_list (config, what, &cpus, 0, 1))
+        *array = parse_config_cpumap_list(cpus, map, b_info->max_vcpus);
+    else if (!xlu_cfg_get_string (config, what, &buf, 0))
+        parse_config_cpumap_string(buf, map);
+    else
+        return;
+
+    /* We have an hard and/or soft affinity: disable automatic placement */
+    libxl_defbool_set(&b_info->numa_placement, false);
+}
+
 static void parse_config_data(const char *config_source,
                               const char *config_data,
                               int config_len,
@@ -735,7 +822,8 @@ static void parse_config_data(const char *config_source,
     const char *buf;
     long l;
     XLU_Config *config;
-    XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms;
+    XLU_ConfigList *vbds, *nics, *pcis;
+    XLU_ConfigList *cvfbs, *cpuids, *vtpms;
     XLU_ConfigList *ioports, *irqs, *iomem;
     int num_ioports, num_irqs, num_iomem;
     int pci_power_mgmt = 0;
@@ -858,60 +946,9 @@ static void parse_config_data(const char *config_source,
     if (!xlu_cfg_get_long (config, "maxvcpus", &l, 0))
         b_info->max_vcpus = l;
 
-    if (!xlu_cfg_get_list (config, "cpus", &cpus, 0, 1)) {
-        int n_cpus = 0;
 
-        if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) {
-            fprintf(stderr, "Unable to allocate cpumap\n");
-            exit(1);
-        }
-
-        /* Prepare the array for single vcpu to pcpu mappings */
-        vcpu_to_pcpu = xmalloc(sizeof(int) * b_info->max_vcpus);
-        memset(vcpu_to_pcpu, -1, sizeof(int) * b_info->max_vcpus);
-
-        /*
-         * Idea here is to let libxl think all the domain's vcpus
-         * have cpu affinity with all the pcpus on the list.
-         * It is then us, here in xl, that matches each single vcpu
-         * to its pcpu (and that's why we need to stash such info in
-         * the vcpu_to_pcpu array now) after the domain has been created.
-         * Doing it like this saves the burden of passing to libxl
-         * some big array hosting the single mappings. Also, using
-         * the cpumap derived from the list ensures memory is being
-         * allocated on the proper nodes anyway.
-         */
-        libxl_bitmap_set_none(&b_info->cpumap);
-        while ((buf = xlu_cfg_get_listitem(cpus, n_cpus)) != NULL) {
-            i = atoi(buf);
-            if (!libxl_bitmap_cpu_valid(&b_info->cpumap, i)) {
-                fprintf(stderr, "cpu %d illegal\n", i);
-                exit(1);
-            }
-            libxl_bitmap_set(&b_info->cpumap, i);
-            if (n_cpus < b_info->max_vcpus)
-                vcpu_to_pcpu[n_cpus] = i;
-            n_cpus++;
-        }
-
-        /* We have a cpumap, disable automatic placement */
-        libxl_defbool_set(&b_info->numa_placement, false);
-    }
-    else if (!xlu_cfg_get_string (config, "cpus", &buf, 0)) {
-        char *buf2 = strdup(buf);
-
-        if (libxl_cpu_bitmap_alloc(ctx, &b_info->cpumap, 0)) {
-            fprintf(stderr, "Unable to allocate cpumap\n");
-            exit(1);
-        }
-
-        libxl_bitmap_set_none(&b_info->cpumap);
-        if (vcpupin_parse(buf2, &b_info->cpumap))
-            exit(1);
-        free(buf2);
-
-        libxl_defbool_set(&b_info->numa_placement, false);
-    }
+    parse_cpu_affinity(config, "cpus", b_info);
+    parse_cpu_affinity(config, "cpus_soft", b_info);
 
     if (!xlu_cfg_get_long (config, "memory", &l, 0)) {
         b_info->max_memkb = l * 1024;
@@ -2035,6 +2072,40 @@ static void evdisable_disk_ejects(libxl_evgen_disk_eject **diskws,
     }
 }
 
+static inline int set_vcpu_to_pcpu_affinity(uint32_t domid, int *to_pcpu,
+                                            int max_vcpus, int soft)
+{
+    libxl_bitmap vcpu_cpumap;
+    libxl_bitmap *softmap = NULL, *hardmap = NULL;
+    int i, ret = 0;
+
+    ret = libxl_cpu_bitmap_alloc(ctx, &vcpu_cpumap, 0);
+    if (ret)
+        return -1;
+
+    if (soft)
+        softmap = &vcpu_cpumap;
+    else
+        hardmap = &vcpu_cpumap;
+
+    for (i = 0; i < max_vcpus; i++) {
+        if (to_pcpu[i] != -1) {
+            libxl_bitmap_set_none(&vcpu_cpumap);
+            libxl_bitmap_set(&vcpu_cpumap, to_pcpu[i]);
+        } else {
+            libxl_bitmap_set_any(&vcpu_cpumap);
+        }
+        if (libxl_set_vcpuaffinity(ctx, domid, i, hardmap, softmap)) {
+            fprintf(stderr, "setting affinity failed on vcpu `%d'.\n", i);
+            ret = -1;
+            break;
+        }
+    }
+    libxl_bitmap_dispose(&vcpu_cpumap);
+
+    return ret;
+}
+
 static uint32_t create_domain(struct domain_create *dom_info)
 {
     uint32_t domid = INVALID_DOMID;
@@ -2254,31 +2325,26 @@ start:
     if ( ret )
         goto error_out;
 
-    /* If single vcpu to pcpu mapping was requested, honour it */
+    /* If single vcpu pinning or soft affinity was requested, honour it */
     if (vcpu_to_pcpu) {
-        libxl_bitmap vcpu_cpumap;
+        ret = set_vcpu_to_pcpu_affinity(domid, vcpu_to_pcpu,
+                                        d_config.b_info.max_vcpus, 0);
+        free(vcpu_to_pcpu);
 
-        ret = libxl_cpu_bitmap_alloc(ctx, &vcpu_cpumap, 0);
         if (ret)
             goto error_out;
-        for (i = 0; i < d_config.b_info.max_vcpus; i++) {
 
-            if (vcpu_to_pcpu[i] != -1) {
-                libxl_bitmap_set_none(&vcpu_cpumap);
-                libxl_bitmap_set(&vcpu_cpumap, vcpu_to_pcpu[i]);
-            } else {
-                libxl_bitmap_set_any(&vcpu_cpumap);
-            }
-            if (libxl_set_vcpuaffinity(ctx, domid, i, &vcpu_cpumap, NULL)) {
-                fprintf(stderr, "setting affinity failed on vcpu `%d'.\n", i);
-                libxl_bitmap_dispose(&vcpu_cpumap);
-                free(vcpu_to_pcpu);
-                ret = ERROR_FAIL;
-                goto error_out;
-            }
-        }
-        libxl_bitmap_dispose(&vcpu_cpumap);
-        free(vcpu_to_pcpu); vcpu_to_pcpu = NULL;
+        vcpu_to_pcpu = NULL;
+    }
+    if (vcpu_to_pcpu_soft) {
+        ret = set_vcpu_to_pcpu_affinity(domid, vcpu_to_pcpu_soft,
+                                        d_config.b_info.max_vcpus, 1);
+        free(vcpu_to_pcpu_soft);
+
+        if (ret)
+            goto error_out;
+
+        vcpu_to_pcpu_soft = NULL;
     }
 
     ret = libxl_userdata_store(ctx, domid, "xl",

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [v7 PATCH 10/10] libxl: automatic NUMA placement affects soft affinity
  2014-06-10  0:44 [v7 PATCH 00/10] Implement vcpu soft affinity for credit1 Dario Faggioli
                   ` (8 preceding siblings ...)
  2014-06-10  0:45 ` [v7 PATCH 09/10] xl: enable for specifying soft-affinity in the config file Dario Faggioli
@ 2014-06-10  0:45 ` Dario Faggioli
  9 siblings, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10  0:45 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

vCPU soft affinity and NUMA-aware scheduling does not have
to be related. However, soft affinity is how NUMA-aware
scheduling is actually implemented, and therefore, by default,
the results of automatic NUMA placement (at VM creation time)
are also used to set the soft affinity of all the vCPUs of
the domain.

Of course, this only happens if automatic NUMA placement is
enabled and actually takes place (for instance, if the user
does not specify any hard and soft affiniy in the xl config
file).

This also takes care of the vice-versa, i.e., don't trigger
automatic placement if the config file specifies either an
hard (the check for which was already there) or a soft (the
check for which is introduced by this commit) affinity.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
Changes from v3:
 * rephrase comments and docs, as suggestd during review.
---
 docs/man/xl.cfg.pod.5                |   21 +++++++++++----------
 docs/misc/xl-numa-placement.markdown |   14 ++++++++++++--
 tools/libxl/libxl_dom.c              |   20 ++++++++++++++++++--
 3 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 08a61da..a97a2b1 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -158,16 +158,6 @@ here, and the soft affinity mask, provided via B<cpus\_soft=> (if any),
 is utilized to compute the domain node-affinity, for driving memory
 allocations.
 
-If we are on a NUMA machine (i.e., if the host has more than one NUMA
-node) and this option is not specified, libxl automatically tries to
-place the guest on the least possible number of nodes. That, however,
-will not affect vcpu pinning, so the guest will still be able to run on
-all the cpus. A heuristic approach is used for choosing the best node (or
-set of nodes), with the goals of maximizing performance for the guest
-and, at the same time, achieving efficient utilization of host cpus
-and memory. See F<docs/misc/xl-numa-placement.markdown> for more
-details.
-
 =item B<cpus_soft="CPU-LIST">
 
 Exactly as B<cpus=>, but specifies soft affinity, rather than pinning
@@ -182,6 +172,17 @@ the intersection of the soft affinity mask, provided here, and the vcpu
 pinning, provided via B<cpus=> (if any), is utilized to compute the
 domain node-affinity, for driving memory allocations.
 
+If this option is not specified (and B<cpus=> is not specified either),
+libxl automatically tries to place the guest on the least possible
+number of nodes. A heuristic approach is used for choosing the best
+node (or set of nodes), with the goal of maximizing performance for
+the guest and, at the same time, achieving efficient utilization of
+host cpus and memory. In that case, the soft affinity of all the vcpus
+of the domain will be set to the pcpus belonging to the NUMA nodes
+chosen during placement.
+
+For more details, see F<docs/misc/xl-numa-placement.markdown>.
+
 =back
 
 =head3 CPU Scheduling
diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown
index 9d64eae..f863492 100644
--- a/docs/misc/xl-numa-placement.markdown
+++ b/docs/misc/xl-numa-placement.markdown
@@ -126,10 +126,20 @@ or Xen won't be able to guarantee the locality for their memory accesses.
 That, of course, also mean the vCPUs of the domain will only be able to
 execute on those same pCPUs.
 
+It is is also possible to have a "cpus\_soft=" option in the xl config file,
+to specify the soft affinity for all the vCPUs of the domain. This affects
+the NUMA placement in the following way:
+
+ * if only "cpus\_soft=" is present, the VM's node-affinity will be equal
+   to the nodes to which the pCPUs in the soft affinity mask belong;
+ * if both "cpus\_soft=" and "cpus=" are present, the VM's node-affinity
+   will be equal to the nodes to which the pCPUs present both in hard and
+   soft affinity belong.
+
 ### Placing the guest automatically ###
 
-If no "cpus=" option is specified in the config file, libxl tries
-to figure out on its own on which node(s) the domain could fit best.
+If neither "cpus=" nor "cpus\_soft=" are present in the config file, libxl
+tries to figure out on its own on which node(s) the domain could fit best.
 If it finds one (some), the domain's node affinity get set to there,
 and both memory allocations and NUMA aware scheduling (for the credit
 scheduler and starting from Xen 4.3) will comply with it. Starting from
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 91af7f8..3511756 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -248,17 +248,33 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
      * some weird error manifests) the subsequent call to
      * libxl_domain_set_nodeaffinity() will do the actual placement,
      * whatever that turns out to be.
+     *
+     * As far as scheduling is concerned, we achieve NUMA-aware scheduling
+     * by having the results of placement affect the soft affinity of all
+     * the vcpus of the domain. Of course, we want that iff placement is
+     * enabled and actually happens, so we only change info->cpumap_soft to
+     * reflect the placement result if that is the case
      */
     if (libxl_defbool_val(info->numa_placement)) {
-        if (!libxl_bitmap_is_full(&info->cpumap)) {
+        /* We require both hard and soft affinity not to be set */
+        if (!libxl_bitmap_is_full(&info->cpumap) ||
+            !libxl_bitmap_is_full(&info->cpumap_soft)) {
             LOG(ERROR, "Can run NUMA placement only if no vcpu "
-                       "affinity is specified");
+                       "(hard or soft) affinity is specified");
             return ERROR_INVAL;
         }
 
         rc = numa_place_domain(gc, domid, info);
         if (rc)
             return rc;
+
+        /*
+         * We change the soft affinity in domain_build_info here, of course
+         * after converting the result of placement from nodes to cpus. the
+         * following call to libxl_set_vcpuaffinity_all_soft() will do the
+         * actual updating of the domain's vcpus' soft affinity.
+         */
+        libxl_nodemap_to_cpumap(ctx, &info->nodemap, &info->cpumap_soft);
     }
     libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap);
     libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap,

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 02/10] xen: sched: introduce soft-affinity and use it instead d->node-affinity
  2014-06-10  0:44 ` [v7 PATCH 02/10] xen: sched: introduce soft-affinity and use it instead d->node-affinity Dario Faggioli
@ 2014-06-10 11:26   ` George Dunlap
  0 siblings, 0 replies; 25+ messages in thread
From: George Dunlap @ 2014-06-10 11:26 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

On 06/10/2014 01:44 AM, Dario Faggioli wrote:
> Before this change, each vcpu had its own vcpu-affinity
> (in v->cpu_affinity), representing the set of pcpus where
> the vcpu is allowed to run. Since when NUMA-aware scheduling
> was introduced the (credit1 only, for now) scheduler also
> tries as much as it can to run all the vcpus of a domain
> on one of the nodes that constitutes the domain's
> node-affinity.
>
> The idea here is making the mechanism more general by:
>    * allowing for this 'preference' for some pcpus/nodes to be
>      expressed on a per-vcpu basis, instead than for the domain
>      as a whole. That is to say, each vcpu should have its own
>      set of preferred pcpus/nodes, instead than it being the
>      very same for all the vcpus of the domain;
>    * generalizing the idea of 'preferred pcpus' to not only NUMA
>      awareness and support. That is to say, independently from
>      it being or not (mostly) useful on NUMA systems, it should
>      be possible to specify, for each vcpu, a set of pcpus where
>      it prefers to run (in addition, and possibly unrelated to,
>      the set of pcpus where it is allowed to run).
>
> We will be calling this set of *preferred* pcpus the vcpu's
> soft affinity, and this changes introduce it, and starts using it
> for scheduling, replacing the indirect use of the domain's NUMA
> node-affinity. This is more general, as soft affinity does not
> have to be related to NUMA. Nevertheless, it allows to achieve the
> same results of NUMA-aware scheduling, just by making soft affinity
> equal to the domain's node affinity, for all the vCPUs (e.g.,
> from the toolstack).
>
> This also means renaming most of the NUMA-aware scheduling related
> functions, in credit1, to something more generic, hinting toward
> the concept of soft affinity rather than directly to NUMA awareness.
>
> As a side effects, this simplifies the code quit a bit. In fact,
> prior to this change, we needed to cache the translation of
> d->node_affinity (which is a nodemask_t) to a cpumask_t, since that
> is what scheduling decisions require (we used to keep it in
> node_affinity_cpumask). This, and all the complicated logic
> required to keep it updated, is not necessary any longer.
>
> The high level description of NUMA placement and scheduling in
> docs/misc/xl-numa-placement.markdown is being updated too, to match
> the new architecture.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

Probably should have taken this off; but in any case:

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 05/10] libxc/libxl: bump library SONAMEs
  2014-06-10  0:45 ` [v7 PATCH 05/10] libxc/libxl: bump library SONAMEs Dario Faggioli
@ 2014-06-10 13:46   ` Ian Campbell
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Campbell @ 2014-06-10 13:46 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: keir, Andrew.Cooper3, George.Dunlap, xen-devel, JBeulich, Ian.Jackson

On Tue, 2014-06-10 at 02:45 +0200, Dario Faggioli wrote:
> The following two patches break both libxc and libxl ABI and
> API, so we better bump the MAJORs.
> 
> Of course, for libxl, proper measures are taken (in the
> relevant patch) in order to guarantee API stability.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Git tells me that these haven't been bumped since RELEASE-4.4.0 so:
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 07/10] libxl: get and set soft affinity
  2014-06-10  0:45 ` [v7 PATCH 07/10] libxl: get and set soft affinity Dario Faggioli
@ 2014-06-10 14:02   ` Ian Campbell
  2014-06-11  7:13     ` Dario Faggioli
  2014-06-10 15:39   ` George Dunlap
  1 sibling, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2014-06-10 14:02 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: keir, Andrew.Cooper3, George.Dunlap, xen-devel, JBeulich, Ian.Jackson

On Tue, 2014-06-10 at 02:45 +0200, Dario Faggioli wrote:
> Make space for two new cpumap-s, one in vcpu_info (for getting
> soft affinity) and build_info (for setting it) and amend the
> API for setting vCPU affinity.
> 
> libxl_set_vcpuaffinity() now takes two cpumaps, one for hard
> and one for soft affinity (LIBXL_API_VERSION is exploited to
> retain source level backword compatibility). Either of the
> two cpumap can be NULL, in which case, only the affinity
> corresponding to the non-NULL cpumap will be affected.
> 
> Getting soft affinity happens indirectly, via `xl vcpu-list'
> (as it is already for hard affinity).
> 
> This commit also introduces some logic to check whether the
> affinity which will be used by Xen to schedule the vCPU(s)
> does actually match with the cpumaps provided. In fact, we
> want to allow every possible combination of hard and soft
> affinity to be set, but we warn the user upon particularly
> weird situations (e.g., hard and soft being disjoint sets
> of pCPUs).
> 
> This very change also update the error handling for calls
> to libxl_set_vcpuaffinity() in xl, as that can now be any
> libxl error code, not just only -1.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

This relies indirectly on a patch of Wei's to add 0x040500 to the list
of valid versions in libxl.h, but other than that.

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 08/10] xl: enable getting and setting soft
  2014-06-10  0:45 ` [v7 PATCH 08/10] xl: enable getting and setting soft Dario Faggioli
@ 2014-06-10 14:10   ` Ian Campbell
  2014-06-10 15:10     ` Dario Faggioli
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2014-06-10 14:10 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: keir, Andrew.Cooper3, George.Dunlap, xen-devel, JBeulich, Ian.Jackson

On Tue, 2014-06-10 at 02:45 +0200, Dario Faggioli wrote:

$subject is truncated?

> +#define hard_aff_str (argv[optind+2])
> +#define soft_aff_str (argv[optind+3])

const char *foo = argv[...] please.

> +    /*
> +     * Hard affinity is always present. However, if it's "-", all we need
> +     * is passing a NULL pointer to the libxl_set_vcpuaffinity() call below.
> +     */
> +    if (!strcmp(hard_aff_str, "-"))
> +        hard = NULL;
> +    else if (vcpupin_parse(hard_aff_str, &cpumap))

Perhaps consider consistently using hard instead of &cpumap sometimes
(likewise below for soft). That seems like it might be clearer (the
cpumap* are really just stack vars in lieu of a malloc).

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 09/10] xl: enable for specifying soft-affinity in the config file
  2014-06-10  0:45 ` [v7 PATCH 09/10] xl: enable for specifying soft-affinity in the config file Dario Faggioli
@ 2014-06-10 14:38   ` Ian Campbell
  2014-06-10 15:36     ` Dario Faggioli
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2014-06-10 14:38 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: keir, Andrew.Cooper3, George.Dunlap, xen-devel, JBeulich, Ian.Jackson

On Tue, 2014-06-10 at 02:45 +0200, Dario Faggioli wrote:
> in a similar way to how hard-affinity is specified (i.e.,
> exactly how plain vcpu-affinity was being specified before
> this change).

It seems that the bulk of this is just code motion, is that right?

> +    if (!strcmp(what, "cpus")) {

Elsewhere you use an "int soft", which was the correct choice (if not a
bool_t).

If that is changed: Acked-by: Ian Campbell <ian.campbell@citrix.com>

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 03/10] xen: derive NUMA node affinity from hard and soft CPU affinity
  2014-06-10  0:44 ` [v7 PATCH 03/10] xen: derive NUMA node affinity from hard and soft CPU affinity Dario Faggioli
@ 2014-06-10 14:53   ` George Dunlap
  2014-06-10 15:20     ` Dario Faggioli
  0 siblings, 1 reply; 25+ messages in thread
From: George Dunlap @ 2014-06-10 14:53 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

On 06/10/2014 01:44 AM, Dario Faggioli wrote:
> if a domain's NUMA node-affinity (which is what controls
> memory allocations) is provided by the user/toolstack, it
> just is not touched. However, if the user does not say
> anything, leaving it all to Xen, let's compute it in the
> following way:
>
>   1. cpupool's cpus & hard-affinity & soft-affinity
>   2. if (1) is empty: cpupool's cpus & hard-affinity
>
> This guarantees memory to be allocated from the narrowest
> possible set of NUMA nodes, ad makes it relatively easy to
> set up NUMA-aware scheduling on top of soft affinity.
>
> Note that such 'narrowest set' is guaranteed to be non-empty.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
> ---
> Chenges from v6:
>   * fixed a bug when a domain was being created inside a
>     cpupool;

This definitely should have erased the Reviewed-by, as it implies I 
reviewed the bug fix.

>   * coding style.
>
> Changes from v3:
>   * avoid pointless calls to cpumask_clear(), as requested
>     during review;
>   * ASSERT() non emptyness of cpupool & hard affinity, as
>     suggested during review.
>
> Changes from v2:
>   * the loop computing the mask is now only executed when
>     it really is useful, as suggested during review;
>   * the loop, and all the cpumask handling is optimized,
>     in a way similar to what was suggested during review.
> ---
>   xen/common/domain.c   |   61 +++++++++++++++++++++++++++++++------------------
>   xen/common/schedule.c |    4 ++-
>   2 files changed, 42 insertions(+), 23 deletions(-)
>
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index e20d3bf..c3a576e 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -409,17 +409,17 @@ struct domain *domain_create(
>   
>   void domain_update_node_affinity(struct domain *d)
>   {
> -    cpumask_var_t cpumask;
> -    cpumask_var_t online_affinity;
> +    cpumask_var_t dom_cpumask, dom_cpumask_soft;
> +    cpumask_t *dom_affinity;

Also, just curious, did you rename these variables since the last series?

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 08/10] xl: enable getting and setting soft
  2014-06-10 14:10   ` Ian Campbell
@ 2014-06-10 15:10     ` Dario Faggioli
  0 siblings, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10 15:10 UTC (permalink / raw)
  To: Ian Campbell
  Cc: keir, Andrew.Cooper3, George.Dunlap, xen-devel, JBeulich, Ian.Jackson


[-- Attachment #1.1: Type: text/plain, Size: 1440 bytes --]

On mar, 2014-06-10 at 15:10 +0100, Ian Campbell wrote:
> On Tue, 2014-06-10 at 02:45 +0200, Dario Faggioli wrote:
> 
> $subject is truncated?
> 
I think it is. Sorry, will pay attention this does not happen for next
round.

> > +#define hard_aff_str (argv[optind+2])
> > +#define soft_aff_str (argv[optind+3])
> 
> const char *foo = argv[...] please.
> 
Ok.

> > +    /*
> > +     * Hard affinity is always present. However, if it's "-", all we need
> > +     * is passing a NULL pointer to the libxl_set_vcpuaffinity() call below.
> > +     */
> > +    if (!strcmp(hard_aff_str, "-"))
> > +        hard = NULL;
> > +    else if (vcpupin_parse(hard_aff_str, &cpumap))
> 
> Perhaps consider consistently using hard instead of &cpumap sometimes
> (likewise below for soft). That seems like it might be clearer (the
> cpumap* are really just stack vars in lieu of a malloc).
> 
ISWYM. Yes, I think I can use hard and soft for these two calls. While
at it, I'm thinking about renaming cpumap to cpumap_hard (i.e., ending
up with *hard-->cpumap_hard, *soft-->cpumap_soft)... Shout if you don't
this to happen in v8. :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 03/10] xen: derive NUMA node affinity from hard and soft CPU affinity
  2014-06-10 14:53   ` George Dunlap
@ 2014-06-10 15:20     ` Dario Faggioli
  2014-06-10 15:33       ` George Dunlap
  0 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10 15:20 UTC (permalink / raw)
  To: George Dunlap
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, xen-devel,
	JBeulich, Ian.Jackson


[-- Attachment #1.1: Type: text/plain, Size: 1794 bytes --]

On mar, 2014-06-10 at 15:53 +0100, George Dunlap wrote:
> On 06/10/2014 01:44 AM, Dario Faggioli wrote:
> > if a domain's NUMA node-affinity (which is what controls
> > memory allocations) is provided by the user/toolstack, it
> > just is not touched. However, if the user does not say
> > anything, leaving it all to Xen, let's compute it in the
> > following way:
> >
> >   1. cpupool's cpus & hard-affinity & soft-affinity
> >   2. if (1) is empty: cpupool's cpus & hard-affinity
> >
> > This guarantees memory to be allocated from the narrowest
> > possible set of NUMA nodes, ad makes it relatively easy to
> > set up NUMA-aware scheduling on top of soft affinity.
> >
> > Note that such 'narrowest set' is guaranteed to be non-empty.
> >
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> > Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
> > Acked-by: Jan Beulich <jbeulich@suse.com>
> > ---
> > Chenges from v6:
> >   * fixed a bug when a domain was being created inside a
> >     cpupool;
> 
> This definitely should have erased the Reviewed-by, as it implies I 
> reviewed the bug fix.
> 
Right! Sorry for that. I actually wanted to do it, but I just forgot to
before pressing enter on `stg email'! :-(

> Also, just curious, did you rename these variables since the last series?
> 
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
> 
Thanks and sorry again. So, for v8, should I kill the Reviewed-by and
replace it with the Acked-by?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 03/10] xen: derive NUMA node affinity from hard and soft CPU affinity
  2014-06-10 15:20     ` Dario Faggioli
@ 2014-06-10 15:33       ` George Dunlap
  0 siblings, 0 replies; 25+ messages in thread
From: George Dunlap @ 2014-06-10 15:33 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, xen-devel,
	JBeulich, Ian.Jackson

On 06/10/2014 04:20 PM, Dario Faggioli wrote:
> On mar, 2014-06-10 at 15:53 +0100, George Dunlap wrote:
>> On 06/10/2014 01:44 AM, Dario Faggioli wrote:
>>> if a domain's NUMA node-affinity (which is what controls
>>> memory allocations) is provided by the user/toolstack, it
>>> just is not touched. However, if the user does not say
>>> anything, leaving it all to Xen, let's compute it in the
>>> following way:
>>>
>>>    1. cpupool's cpus & hard-affinity & soft-affinity
>>>    2. if (1) is empty: cpupool's cpus & hard-affinity
>>>
>>> This guarantees memory to be allocated from the narrowest
>>> possible set of NUMA nodes, ad makes it relatively easy to
>>> set up NUMA-aware scheduling on top of soft affinity.
>>>
>>> Note that such 'narrowest set' is guaranteed to be non-empty.
>>>
>>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>>> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
>>> Acked-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> Chenges from v6:
>>>    * fixed a bug when a domain was being created inside a
>>>      cpupool;
>> This definitely should have erased the Reviewed-by, as it implies I
>> reviewed the bug fix.
>>
> Right! Sorry for that. I actually wanted to do it, but I just forgot to
> before pressing enter on `stg email'! :-(
>
>> Also, just curious, did you rename these variables since the last series?
>>
>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>>
> Thanks and sorry again. So, for v8, should I kill the Reviewed-by and
> replace it with the Acked-by?

Yes, I think so -- basically I haven't had time to do a thorough review 
of the cpupool stuff, but at a first glance it looks good.  I can do so 
if needed.

  -George

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 09/10] xl: enable for specifying soft-affinity in the config file
  2014-06-10 14:38   ` Ian Campbell
@ 2014-06-10 15:36     ` Dario Faggioli
  2014-06-10 15:46       ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Dario Faggioli @ 2014-06-10 15:36 UTC (permalink / raw)
  To: Ian Campbell
  Cc: keir, Andrew.Cooper3, George.Dunlap, xen-devel, JBeulich, Ian.Jackson


[-- Attachment #1.1: Type: text/plain, Size: 1544 bytes --]

On mar, 2014-06-10 at 15:38 +0100, Ian Campbell wrote:
> On Tue, 2014-06-10 at 02:45 +0200, Dario Faggioli wrote:
> > in a similar way to how hard-affinity is specified (i.e.,
> > exactly how plain vcpu-affinity was being specified before
> > this change).
> 
> It seems that the bulk of this is just code motion, is that right?
> 
I'd call it more refactoring than motion, as what I'm doing is actually
adding a config switch "cpus_soft=", but I'm generalizing the code so
that it can be used to deal with both the new and the already existing
one ("cpus=")... That's why I'm not advertising it as code motion. The
refactoring was requested during v2 and v3 reviews.

Also, this patch is the one that will clash the most with Wei's series'.
Actually, most of what is being refactored will be either killed or
moved to libxl. (I'm just mentioning this, as we've already agreed with
Wei that we will cooperate on taking care of conflicts properly, basing,
of course, on which series goes in first).

> > +    if (!strcmp(what, "cpus")) {
> 
> Elsewhere you use an "int soft", which was the correct choice (if not a
> bool_t).
> 
> If that is changed: Acked-by: Ian Campbell <ian.campbell@citrix.com>
> 
Ok, I'll go for it.

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 07/10] libxl: get and set soft affinity
  2014-06-10  0:45 ` [v7 PATCH 07/10] libxl: get and set soft affinity Dario Faggioli
  2014-06-10 14:02   ` Ian Campbell
@ 2014-06-10 15:39   ` George Dunlap
  2014-06-10 15:44     ` Ian Campbell
  1 sibling, 1 reply; 25+ messages in thread
From: George Dunlap @ 2014-06-10 15:39 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: keir, Ian.Campbell, Andrew.Cooper3, George.Dunlap, JBeulich, Ian.Jackson

On 06/10/2014 01:45 AM, Dario Faggioli wrote:
> Make space for two new cpumap-s, one in vcpu_info (for getting
> soft affinity) and build_info (for setting it) and amend the
> API for setting vCPU affinity.
>
> libxl_set_vcpuaffinity() now takes two cpumaps, one for hard
> and one for soft affinity (LIBXL_API_VERSION is exploited to
> retain source level backword compatibility). Either of the
> two cpumap can be NULL, in which case, only the affinity
> corresponding to the non-NULL cpumap will be affected.
>
> Getting soft affinity happens indirectly, via `xl vcpu-list'
> (as it is already for hard affinity).
>
> This commit also introduces some logic to check whether the
> affinity which will be used by Xen to schedule the vCPU(s)
> does actually match with the cpumaps provided. In fact, we
> want to allow every possible combination of hard and soft
> affinity to be set, but we warn the user upon particularly
> weird situations (e.g., hard and soft being disjoint sets
> of pCPUs).
>
> This very change also update the error handling for calls
> to libxl_set_vcpuaffinity() in xl, as that can now be any
> libxl error code, not just only -1.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

>
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 80947c3..e549db8 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -82,6 +82,15 @@
>   #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1
>   
>   /*
> + * LIBXL_HAVE_SOFT_AFFINITY indicates that a 'cpumap_soft'
> + * field (of libxl_bitmap type) is present in:
> + *  - libxl_vcpuinfo, containing the soft affinity of a vcpu;
> + *  - libxl_domain_build_info, for setting the soft affinity of
> + *    all vcpus while creating the domain.
> + */
> +#define LIBXL_HAVE_SOFT_AFFINITY 1

Did we say we were going to move these to where the actual change was 
happening (say, down in the libxl_set_vcpuaffinity definitions)?  I'm OK 
either way.

> +
> +/*
>    * LIBXL_HAVE_BUILDINFO_HVM_VENDOR_DEVICE indicates that the
>    * libxl_vendor_device field is present in the hvm sections of
>    * libxl_domain_build_info. This field tells libxl which
> @@ -1097,9 +1106,22 @@ int libxl_userdata_retrieve(libxl_ctx *ctx, uint32_t domid,
>   
>   int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo);
>   int libxl_set_vcpuaffinity(libxl_ctx *ctx, uint32_t domid, uint32_t vcpuid,
> -                           libxl_bitmap *cpumap);
> +                           const libxl_bitmap *cpumap_hard,
> +                           const libxl_bitmap *cpumap_soft);
>   int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
> -                               unsigned int max_vcpus, libxl_bitmap *cpumap);
> +                               unsigned int max_vcpus,
> +                               const libxl_bitmap *cpumap_hard,
> +                               const libxl_bitmap *cpumap_soft);
> +
> +#if defined (LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x040500
> +
> +#define libxl_set_vcpuaffinity(ctx, domid, vcpuid, map) \
> +    libxl_set_vcpuaffinity((ctx), (domid), (vcpuid), (map), NULL)
> +#define libxl_set_vcpuaffinity_all(ctx, domid, max_vcpus, map) \
> +    libxl_set_vcpuaffinity_all((ctx), (domid), (max_vcpus), (map), NULL)
> +
> +#endif
> +
>   int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
>                                     libxl_bitmap *nodemap);
>   int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index d015cf4..49a01a7 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -193,6 +193,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
>           libxl_bitmap_set_any(&b_info->cpumap);
>       }
>   
> +    if (!b_info->cpumap_soft.size) {
> +        if (libxl_cpu_bitmap_alloc(CTX, &b_info->cpumap_soft, 0))
> +            return ERROR_FAIL;
> +        libxl_bitmap_set_any(&b_info->cpumap_soft);
> +    }
> +
>       libxl_defbool_setdefault(&b_info->numa_placement, true);
>   
>       if (!b_info->nodemap.size) {
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 661999c..91af7f8 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -261,7 +261,8 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
>               return rc;
>       }
>       libxl_domain_set_nodeaffinity(ctx, domid, &info->nodemap);
> -    libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap);
> +    libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus, &info->cpumap,
> +                               &info->cpumap_soft);
>   
>       if (xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb +
>           LIBXL_MAXMEM_CONSTANT) < 0) {
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 52f1aa9..15382cc 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -300,6 +300,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>       ("max_vcpus",       integer),
>       ("avail_vcpus",     libxl_bitmap),
>       ("cpumap",          libxl_bitmap),
> +    ("cpumap_soft",     libxl_bitmap),
>       ("nodemap",         libxl_bitmap),
>       ("numa_placement",  libxl_defbool),
>       ("tsc_mode",        libxl_tsc_mode),
> @@ -516,7 +517,8 @@ libxl_vcpuinfo = Struct("vcpuinfo", [
>       ("blocked", bool),
>       ("running", bool),
>       ("vcpu_time", uint64), # total vcpu time ran (ns)
> -    ("cpumap", libxl_bitmap), # current cpu's affinities
> +    ("cpumap", libxl_bitmap), # current hard cpu affinity
> +    ("cpumap_soft", libxl_bitmap), # current soft cpu affinity
>       ], dir=DIR_OUT)
>   
>   libxl_physinfo = Struct("physinfo", [
> diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h
> index e37fb89..014f1f6 100644
> --- a/tools/libxl/libxl_utils.h
> +++ b/tools/libxl/libxl_utils.h
> @@ -98,6 +98,31 @@ static inline int libxl_bitmap_cpu_valid(libxl_bitmap *bitmap, int bit)
>   #define libxl_for_each_set_bit(v, m) for (v = 0; v < (m).size * 8; v++) \
>                                                if (libxl_bitmap_test(&(m), v))
>   
> +/*
> + * Compares two bitmaps bit by bit, up to nr_bits or, if nr_bits is 0, up
> + * to the size of the lagest bitmap. If sizes does not match, bits past the

*largest

Other than that:

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 07/10] libxl: get and set soft affinity
  2014-06-10 15:39   ` George Dunlap
@ 2014-06-10 15:44     ` Ian Campbell
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Campbell @ 2014-06-10 15:44 UTC (permalink / raw)
  To: George Dunlap
  Cc: keir, Andrew.Cooper3, Dario Faggioli, George.Dunlap, xen-devel,
	JBeulich, Ian.Jackson

On Tue, 2014-06-10 at 16:39 +0100, George Dunlap wrote:

> > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> > index 80947c3..e549db8 100644
> > --- a/tools/libxl/libxl.h
> > +++ b/tools/libxl/libxl.h
> > @@ -82,6 +82,15 @@
> >   #define LIBXL_HAVE_DOMAIN_NODEAFFINITY 1
> >   
> >   /*
> > + * LIBXL_HAVE_SOFT_AFFINITY indicates that a 'cpumap_soft'
> > + * field (of libxl_bitmap type) is present in:
> > + *  - libxl_vcpuinfo, containing the soft affinity of a vcpu;
> > + *  - libxl_domain_build_info, for setting the soft affinity of
> > + *    all vcpus while creating the domain.
> > + */
> > +#define LIBXL_HAVE_SOFT_AFFINITY 1
> 
> Did we say we were going to move these to where the actual change was 
> happening (say, down in the libxl_set_vcpuaffinity definitions)?  I'm OK 
> either way.

I'm not sure we reached consensus on that. I'd like to do this. Although
in this specific case it isn't clear since it affects two or more
places...

In any case I think it is fine for new patches to follow the existing
pattern until we decide.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 09/10] xl: enable for specifying soft-affinity in the config file
  2014-06-10 15:36     ` Dario Faggioli
@ 2014-06-10 15:46       ` Ian Campbell
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Campbell @ 2014-06-10 15:46 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: keir, Andrew.Cooper3, George.Dunlap, xen-devel, JBeulich, Ian.Jackson

On Tue, 2014-06-10 at 17:36 +0200, Dario Faggioli wrote:
> On mar, 2014-06-10 at 15:38 +0100, Ian Campbell wrote:
> > On Tue, 2014-06-10 at 02:45 +0200, Dario Faggioli wrote:
> > > in a similar way to how hard-affinity is specified (i.e.,
> > > exactly how plain vcpu-affinity was being specified before
> > > this change).
> > 
> > It seems that the bulk of this is just code motion, is that right?
> > 
> I'd call it more refactoring than motion, as what I'm doing is actually
> adding a config switch "cpus_soft=", but I'm generalizing the code so
> that it can be used to deal with both the new and the already existing
> one ("cpus=")... That's why I'm not advertising it as code motion. The
> refactoring was requested during v2 and v3 reviews.

The affect is that a bunch of code has moved (or refactored) and then
had new functionality added at the same time, which makes review harder.
Anyway, it's done now but next time please try and separate out into
movement+related fixups and then actual functional changes.

> 
> Also, this patch is the one that will clash the most with Wei's series'.
> Actually, most of what is being refactored will be either killed or
> moved to libxl. (I'm just mentioning this, as we've already agreed with
> Wei that we will cooperate on taking care of conflicts properly, basing,
> of course, on which series goes in first).
> 
> > > +    if (!strcmp(what, "cpus")) {
> > 
> > Elsewhere you use an "int soft", which was the correct choice (if not a
> > bool_t).
> > 
> > If that is changed: Acked-by: Ian Campbell <ian.campbell@citrix.com>
> > 
> Ok, I'll go for it.
> 
> Thanks and Regards,
> Dario
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v7 PATCH 07/10] libxl: get and set soft affinity
  2014-06-10 14:02   ` Ian Campbell
@ 2014-06-11  7:13     ` Dario Faggioli
  0 siblings, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2014-06-11  7:13 UTC (permalink / raw)
  To: Ian Campbell
  Cc: keir, Andrew.Cooper3, George.Dunlap, xen-devel, JBeulich, Ian.Jackson


[-- Attachment #1.1: Type: text/plain, Size: 1312 bytes --]

On mar, 2014-06-10 at 15:02 +0100, Ian Campbell wrote:
> On Tue, 2014-06-10 at 02:45 +0200, Dario Faggioli wrote:
> > Make space for two new cpumap-s, one in vcpu_info (for getting
> > soft affinity) and build_info (for setting it) and amend the
> > API for setting vCPU affinity.
> > 
> > libxl_set_vcpuaffinity() now takes two cpumaps, one for hard
> > and one for soft affinity (LIBXL_API_VERSION is exploited to
> > retain source level backword compatibility). Either of the
> > two cpumap can be NULL, in which case, only the affinity
> > corresponding to the non-NULL cpumap will be affected.
> > 

> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> > Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
> 
> This relies indirectly on a patch of Wei's to add 0x040500 to the list
> of valid versions in libxl.h, but other than that.
> 
Right! Sorry I missed it. :-P

BTW, since Wei's series is probably taking a bit more time, I'm adding
that myself in v8 of this patch.

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-06-11  7:13 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-10  0:44 [v7 PATCH 00/10] Implement vcpu soft affinity for credit1 Dario Faggioli
2014-06-10  0:44 ` [v7 PATCH 01/10] xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity Dario Faggioli
2014-06-10  0:44 ` [v7 PATCH 02/10] xen: sched: introduce soft-affinity and use it instead d->node-affinity Dario Faggioli
2014-06-10 11:26   ` George Dunlap
2014-06-10  0:44 ` [v7 PATCH 03/10] xen: derive NUMA node affinity from hard and soft CPU affinity Dario Faggioli
2014-06-10 14:53   ` George Dunlap
2014-06-10 15:20     ` Dario Faggioli
2014-06-10 15:33       ` George Dunlap
2014-06-10  0:44 ` [v7 PATCH 04/10] xen/libxc: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity Dario Faggioli
2014-06-10  0:45 ` [v7 PATCH 05/10] libxc/libxl: bump library SONAMEs Dario Faggioli
2014-06-10 13:46   ` Ian Campbell
2014-06-10  0:45 ` [v7 PATCH 06/10] libxc: get and set soft and hard affinity Dario Faggioli
2014-06-10  0:45 ` [v7 PATCH 07/10] libxl: get and set soft affinity Dario Faggioli
2014-06-10 14:02   ` Ian Campbell
2014-06-11  7:13     ` Dario Faggioli
2014-06-10 15:39   ` George Dunlap
2014-06-10 15:44     ` Ian Campbell
2014-06-10  0:45 ` [v7 PATCH 08/10] xl: enable getting and setting soft Dario Faggioli
2014-06-10 14:10   ` Ian Campbell
2014-06-10 15:10     ` Dario Faggioli
2014-06-10  0:45 ` [v7 PATCH 09/10] xl: enable for specifying soft-affinity in the config file Dario Faggioli
2014-06-10 14:38   ` Ian Campbell
2014-06-10 15:36     ` Dario Faggioli
2014-06-10 15:46       ` Ian Campbell
2014-06-10  0:45 ` [v7 PATCH 10/10] libxl: automatic NUMA placement affects soft affinity Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.