All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Improving dumping of scheduler related info
@ 2015-03-16 17:04 Dario Faggioli
  2015-03-16 17:04 ` [PATCH 1/7] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains Dario Faggioli
                   ` (6 more replies)
  0 siblings, 7 replies; 42+ messages in thread
From: Dario Faggioli @ 2015-03-16 17:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Juergen Gross, Keir Fraser, Meng Xu, Jan Beulich

Hi,

Some pseudo-random bits for improving dumping (via debug keys) scheduling
related information.

In some more details, the series:
 - fixes two bugs in the RTDS scheduler (patches 1 and 2),
 - improve how the whole process of dumping scheduling info is serialized,
   by moving all locking code into specific schedulers, in patch 3; this
   is the most important patch of the series, I would say,
 - print better formatted and/or more useful scheduling related
   information, in patches 4, 5, 6 and 7.

This is available as a git branch in here:

 git://xenbits.xen.org/people/dariof/xen.git  rel/sched/dump-v1
 http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/rel/sched/dump-v1

Regards,
Dario

---
Dario Faggioli (7):
      xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains
      xen: sched_rt: implement the .free_pdata hook
      xen: rework locking for dump of scheduler info (debug-key r)
      xen: print online pCPUs and free pCPUs when dumping scheduler info
      xen: make dumping vcpu info look better
      xen: sched_credit2: more info when dumping
      xen: sched_rt: print useful affinity info when dumping


 xen/common/cpupool.c       |   13 ++++++++
 xen/common/keyhandler.c    |    6 ++--
 xen/common/sched_credit.c  |   42 +++++++++++++++++++++++--
 xen/common/sched_credit2.c |   53 ++++++++++++++++++++++++++-----
 xen/common/sched_rt.c      |   75 +++++++++++++++++++++++++++++++++++---------
 xen/common/schedule.c      |    5 +--
 6 files changed, 162 insertions(+), 32 deletions(-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/7] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains
  2015-03-16 17:04 [PATCH 0/7] Improving dumping of scheduler related info Dario Faggioli
@ 2015-03-16 17:04 ` Dario Faggioli
  2015-03-16 18:18   ` George Dunlap
                     ` (2 more replies)
  2015-03-16 17:04 ` [PATCH 2/7] xen: sched_rt: implement the .free_pdata hook Dario Faggioli
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 42+ messages in thread
From: Dario Faggioli @ 2015-03-16 17:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Keir Fraser, Meng Xu, Jan Beulich

being serviced by the RTDS scheduler, as that is a
legit situation to be in: think, for instance, of a
newly created RTDS cpupool, with no domains migrated
to it yet.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Meng Xu <xumengpanda@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
---
 xen/common/sched_rt.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index df4adac..055a50f 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -264,18 +264,17 @@ rt_dump(const struct scheduler *ops)
     struct list_head *iter_sdom, *iter_svc, *runq, *depletedq, *iter;
     struct rt_private *prv = rt_priv(ops);
     struct rt_vcpu *svc;
-    cpumask_t *online;
     struct rt_dom *sdom;
     unsigned long flags;
 
-    ASSERT(!list_empty(&prv->sdom));
+    spin_lock_irqsave(&prv->lock, flags);
+
+    if (list_empty(&prv->sdom))
+        goto out;
 
-    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
-    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
     runq = rt_runq(ops);
     depletedq = rt_depletedq(ops);
 
-    spin_lock_irqsave(&prv->lock, flags);
     printk("Global RunQueue info:\n");
     list_for_each( iter, runq )
     {
@@ -303,6 +302,7 @@ rt_dump(const struct scheduler *ops)
         }
     }
 
+out:
     spin_unlock_irqrestore(&prv->lock, flags);
 }

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/7] xen: sched_rt: implement the .free_pdata hook
  2015-03-16 17:04 [PATCH 0/7] Improving dumping of scheduler related info Dario Faggioli
  2015-03-16 17:04 ` [PATCH 1/7] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains Dario Faggioli
@ 2015-03-16 17:04 ` Dario Faggioli
  2015-03-16 17:10   ` Dario Faggioli
                     ` (2 more replies)
  2015-03-16 17:05 ` [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r) Dario Faggioli
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 42+ messages in thread
From: Dario Faggioli @ 2015-03-16 17:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Keir Fraser, Meng Xu, Jan Beulich

which is called by cpu_schedule_down(), and is necessary
for resetting the spinlock pointers in schedule_data from
the RTDS global runqueue lock, back to the default _lock
fields in the struct.

Not doing so causes Xen to explode, e.g., when removing
pCPUs from an RTDS cpupool and assigning them to another
one.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Meng Xu <xumengpanda@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
---
 xen/common/sched_rt.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 055a50f..3e37e62 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -444,6 +444,23 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
     return (void *)1;
 }
 
+static void
+rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
+{
+    struct rt_private *prv = rt_priv(ops);
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+    unsigned long flags;
+
+    spin_lock_irqsave(&prv->lock, flags);
+
+    /* Move spinlock back to the default lock */
+    ASSERT(sd->schedule_lock == &prv->lock);
+    ASSERT(!spin_is_locked(&sd->_lock));
+    sd->schedule_lock = &sd->_lock;
+
+    spin_unlock_irqrestore(&prv->lock, flags);
+}
+
 static void *
 rt_alloc_domdata(const struct scheduler *ops, struct domain *dom)
 {
@@ -1090,6 +1107,7 @@ const struct scheduler sched_rtds_def = {
     .init           = rt_init,
     .deinit         = rt_deinit,
     .alloc_pdata    = rt_alloc_pdata,
+    .free_pdata     = rt_free_pdata,
     .alloc_domdata  = rt_alloc_domdata,
     .free_domdata   = rt_free_domdata,
     .init_domain    = rt_dom_init,

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)
  2015-03-16 17:04 [PATCH 0/7] Improving dumping of scheduler related info Dario Faggioli
  2015-03-16 17:04 ` [PATCH 1/7] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains Dario Faggioli
  2015-03-16 17:04 ` [PATCH 2/7] xen: sched_rt: implement the .free_pdata hook Dario Faggioli
@ 2015-03-16 17:05 ` Dario Faggioli
  2015-03-16 18:41   ` George Dunlap
                     ` (2 more replies)
  2015-03-16 17:05 ` [PATCH 4/7] xen: print online pCPUs and free pCPUs when dumping scheduler info Dario Faggioli
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 42+ messages in thread
From: Dario Faggioli @ 2015-03-16 17:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Keir Fraser, Meng Xu, Jan Beulich

such as it is taken care of by the various schedulers, rather
than happening in schedule.c. In fact, it is the schedulers
that know better which locks are necessary for the specific
dumping operations.

While there, fix a few style issues (indentation, trailing
whitespace, parentheses and blank line after var declarations)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Meng Xu <xumengpanda@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
---
 xen/common/sched_credit.c  |   42 ++++++++++++++++++++++++++++++++++++++++--
 xen/common/sched_credit2.c |   40 ++++++++++++++++++++++++++++++++--------
 xen/common/sched_rt.c      |    7 +++++--
 xen/common/schedule.c      |    5 ++---
 4 files changed, 79 insertions(+), 15 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index bec67ff..953ecb0 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -26,6 +26,23 @@
 
 
 /*
+ * Locking:
+ * - Scheduler-lock (a.k.a. runqueue lock):
+ *  + is per-runqueue, and there is one runqueue per-cpu;
+ *  + serializes all runqueue manipulation operations;
+ * - Private data lock (a.k.a. private scheduler lock):
+ *  + serializes accesses to the scheduler global state (weight,
+ *    credit, balance_credit, etc);
+ *  + serializes updates to the domains' scheduling parameters.
+ *
+ * Ordering is "private lock always comes first":
+ *  + if we need both locks, we must acquire the private
+ *    scheduler lock for first;
+ *  + if we already own a runqueue lock, we must never acquire
+ *    the private scheduler lock.
+ */
+
+/*
  * Basic constants
  */
 #define CSCHED_DEFAULT_WEIGHT       256
@@ -1750,11 +1767,24 @@ static void
 csched_dump_pcpu(const struct scheduler *ops, int cpu)
 {
     struct list_head *runq, *iter;
+    struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_pcpu *spc;
     struct csched_vcpu *svc;
+    spinlock_t *lock = lock;
+    unsigned long flags;
     int loop;
 #define cpustr keyhandler_scratch
 
+    /*
+     * We need both locks:
+     * - csched_dump_vcpu() wants to access domains' scheduling
+     *   parameters, which are protected by the private scheduler lock;
+     * - we scan through the runqueue, so we need the proper runqueue
+     *   lock (the one of the runqueue of this cpu).
+     */
+    spin_lock_irqsave(&prv->lock, flags);
+    lock = pcpu_schedule_lock(cpu);
+
     spc = CSCHED_PCPU(cpu);
     runq = &spc->runq;
 
@@ -1781,6 +1811,9 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
             csched_dump_vcpu(svc);
         }
     }
+
+    pcpu_schedule_unlock(lock, cpu);
+    spin_unlock_irqrestore(&prv->lock, flags);
 #undef cpustr
 }
 
@@ -1792,7 +1825,7 @@ csched_dump(const struct scheduler *ops)
     int loop;
     unsigned long flags;
 
-    spin_lock_irqsave(&(prv->lock), flags);
+    spin_lock_irqsave(&prv->lock, flags);
 
 #define idlers_buf keyhandler_scratch
 
@@ -1835,15 +1868,20 @@ csched_dump(const struct scheduler *ops)
         list_for_each( iter_svc, &sdom->active_vcpu )
         {
             struct csched_vcpu *svc;
+            spinlock_t *lock;
+
             svc = list_entry(iter_svc, struct csched_vcpu, active_vcpu_elem);
+            lock = vcpu_schedule_lock(svc->vcpu);
 
             printk("\t%3d: ", ++loop);
             csched_dump_vcpu(svc);
+
+            vcpu_schedule_unlock(lock, svc->vcpu);
         }
     }
 #undef idlers_buf
 
-    spin_unlock_irqrestore(&(prv->lock), flags);
+    spin_unlock_irqrestore(&prv->lock, flags);
 }
 
 static int
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index f0e2c82..564f890 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -51,8 +51,6 @@
  * credit2 wiki page:
  *  http://wiki.xen.org/wiki/Credit2_Scheduler_Development
  * TODO:
- * + Immediate bug-fixes
- *  - Do per-runqueue, grab proper lock for dump debugkey
  * + Multiple sockets
  *  - Detect cpu layout and make runqueue map, one per L2 (make_runq_map())
  *  - Simple load balancer / runqueue assignment
@@ -1800,12 +1798,24 @@ csched2_dump_vcpu(struct csched2_vcpu *svc)
 static void
 csched2_dump_pcpu(const struct scheduler *ops, int cpu)
 {
+    struct csched2_private *prv = CSCHED2_PRIV(ops);
     struct list_head *runq, *iter;
     struct csched2_vcpu *svc;
+    unsigned long flags;
+    spinlock_t *lock;
     int loop;
     char cpustr[100];
 
-    /* FIXME: Do locking properly for access to runqueue structures */
+    /*
+     * We need both locks:
+     * - csched2_dump_vcpu() wants to access domains' scheduling
+     *   parameters, which are protected by the private scheduler lock;
+     * - we scan through the runqueue, so we need the proper runqueue
+     *   lock (the one of the runqueue this cpu is associated to).
+     */
+    spin_lock_irqsave(&prv->lock, flags);
+    lock = per_cpu(schedule_data, cpu).schedule_lock;
+    spin_lock(lock);
 
     runq = &RQD(ops, cpu)->runq;
 
@@ -1832,6 +1842,9 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu)
             csched2_dump_vcpu(svc);
         }
     }
+
+    spin_unlock(lock);
+    spin_unlock_irqrestore(&prv->lock, flags);
 }
 
 static void
@@ -1839,8 +1852,13 @@ csched2_dump(const struct scheduler *ops)
 {
     struct list_head *iter_sdom, *iter_svc;
     struct csched2_private *prv = CSCHED2_PRIV(ops);
+    unsigned long flags;
     int i, loop;
 
+    /* We need the private lock as we access global scheduler data
+     * and (below) the list of active domains. */
+    spin_lock_irqsave(&prv->lock, flags);
+
     printk("Active queues: %d\n"
            "\tdefault-weight     = %d\n",
            cpumask_weight(&prv->active_queues),
@@ -1863,7 +1881,6 @@ csched2_dump(const struct scheduler *ops)
                fraction);
 
     }
-    /* FIXME: Locking! */
 
     printk("Domain info:\n");
     loop = 0;
@@ -1872,20 +1889,27 @@ csched2_dump(const struct scheduler *ops)
         struct csched2_dom *sdom;
         sdom = list_entry(iter_sdom, struct csched2_dom, sdom_elem);
 
-       printk("\tDomain: %d w %d v %d\n\t", 
-              sdom->dom->domain_id, 
-              sdom->weight, 
-              sdom->nr_vcpus);
+        printk("\tDomain: %d w %d v %d\n\t",
+               sdom->dom->domain_id,
+               sdom->weight,
+               sdom->nr_vcpus);
 
         list_for_each( iter_svc, &sdom->vcpu )
         {
             struct csched2_vcpu *svc;
+            spinlock_t *lock;
+
             svc = list_entry(iter_svc, struct csched2_vcpu, sdom_elem);
+            lock = vcpu_schedule_lock(svc->vcpu);
 
             printk("\t%3d: ", ++loop);
             csched2_dump_vcpu(svc);
+
+            vcpu_schedule_unlock(lock, svc->vcpu);
         }
     }
+
+    spin_unlock_irqrestore(&prv->lock, flags);
 }
 
 static void activate_runqueue(struct csched2_private *prv, int rqi)
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 3e37e62..167a484 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -253,9 +253,12 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
 static void
 rt_dump_pcpu(const struct scheduler *ops, int cpu)
 {
-    struct rt_vcpu *svc = rt_vcpu(curr_on_cpu(cpu));
+    struct rt_private *prv = rt_priv(ops);
+    unsigned long flags;
 
-    rt_dump_vcpu(ops, svc);
+    spin_lock_irqsave(&prv->lock, flags);
+    rt_dump_vcpu(ops, rt_vcpu(curr_on_cpu(cpu)));
+    spin_unlock_irqrestore(&prv->lock, flags);
 }
 
 static void
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ef79847..30eec0f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1515,6 +1515,8 @@ void schedule_dump(struct cpupool *c)
     struct scheduler *sched;
     cpumask_t        *cpus;
 
+    /* Locking, if necessary, must be handled withing each scheduler */
+
     sched = (c == NULL) ? &ops : c->sched;
     cpus = cpupool_scheduler_cpumask(c);
     printk("Scheduler: %s (%s)\n", sched->name, sched->opt_name);
@@ -1522,11 +1524,8 @@ void schedule_dump(struct cpupool *c)
 
     for_each_cpu (i, cpus)
     {
-        spinlock_t *lock = pcpu_schedule_lock(i);
-
         printk("CPU[%02d] ", i);
         SCHED_OP(sched, dump_cpu_state, i);
-        pcpu_schedule_unlock(lock, i);
     }
 }

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/7] xen: print online pCPUs and free pCPUs when dumping scheduler info
  2015-03-16 17:04 [PATCH 0/7] Improving dumping of scheduler related info Dario Faggioli
                   ` (2 preceding siblings ...)
  2015-03-16 17:05 ` [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r) Dario Faggioli
@ 2015-03-16 17:05 ` Dario Faggioli
  2015-03-16 18:37   ` Juergen Gross
                     ` (2 more replies)
  2015-03-16 17:05 ` [PATCH 5/7] xen: make dumping vcpu info look better Dario Faggioli
                   ` (2 subsequent siblings)
  6 siblings, 3 replies; 42+ messages in thread
From: Dario Faggioli @ 2015-03-16 17:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: Juergen Gross, George Dunlap, Keir Fraser, Jan Beulich

e.g., with  `xl debug-key r'. This change adds something
like the following lines to the dump output:

  (XEN) Num online Cpus: 16, cpu_online_map: 0-15
  (XEN) Num free Cpus: 8, cpupool_free_cpus: 8-15

Meaning there are 16 pCPUs online, and they are pCPUs
0-15, and that 8 of them, more specifically 8-15, are
not part of any cpupool.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Juergen Gross <JGross@suse.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
---
 xen/common/cpupool.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index cd6aab9..3bf0174 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -17,6 +17,7 @@
 #include <xen/percpu.h>
 #include <xen/sched.h>
 #include <xen/sched-if.h>
+#include <xen/keyhandler.h>
 #include <xen/cpu.h>
 
 #define for_each_cpupool(ptr)    \
@@ -658,6 +659,12 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
     return ret;
 }
 
+static void _print_cpumap(const char *str, const cpumask_t *map)
+{
+    cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), map);
+    printk("%s: %s\n", str, keyhandler_scratch);
+}
+
 void dump_runq(unsigned char key)
 {
     unsigned long    flags;
@@ -671,12 +678,18 @@ void dump_runq(unsigned char key)
             sched_smt_power_savings? "enabled":"disabled");
     printk("NOW=0x%08X%08X\n",  (u32)(now>>32), (u32)now);
 
+    printk("Num online Cpus: %d, ", cpumask_weight(&cpu_online_map));
+    _print_cpumap("cpu_online_map", &cpu_online_map);
+    printk("Num free Cpus: %d, ", cpumask_weight(&cpupool_free_cpus));
+    _print_cpumap("cpupool_free_cpus", &cpupool_free_cpus);
+
     printk("Idle cpupool:\n");
     schedule_dump(NULL);
 
     for_each_cpupool(c)
     {
         printk("Cpupool %d:\n", (*c)->cpupool_id);
+        _print_cpumap("Cpus", (*c)->cpu_valid);
         schedule_dump(*c);
     }

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/7] xen: make dumping vcpu info look better
  2015-03-16 17:04 [PATCH 0/7] Improving dumping of scheduler related info Dario Faggioli
                   ` (3 preceding siblings ...)
  2015-03-16 17:05 ` [PATCH 4/7] xen: print online pCPUs and free pCPUs when dumping scheduler info Dario Faggioli
@ 2015-03-16 17:05 ` Dario Faggioli
  2015-03-16 18:48   ` George Dunlap
  2015-03-16 20:06   ` Meng Xu
  2015-03-16 17:05 ` [PATCH 6/7] xen: sched_credit2: more info when dumping Dario Faggioli
  2015-03-16 17:05 ` [PATCH 7/7] xen: sched_rt: print useful affinity " Dario Faggioli
  6 siblings, 2 replies; 42+ messages in thread
From: Dario Faggioli @ 2015-03-16 17:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Keir Fraser, Meng Xu, Jan Beulich

and more consistent. In fact, before this changes, it looks
like this:

 (XEN) VCPU information and callbacks for domain 0:
 (XEN)     VCPU0: CPU4 [has=F] poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={4} cpu_affinity={0-15}
 (XEN) cpu_soft_affinity={0-15}
 (XEN)     pause_count=0 pause_flags=1
 (XEN)     No periodic timer

After, it looks like this:

 (XEN) VCPU information and callbacks for domain 0:
 (XEN)     VCPU0: CPU4 [has=F] poll=0 upcall_pend=00 upcall_mask=00 dirty_cpus={4}
 (XEN)     cpu_hard_affinity={0-15} cpu_soft_affinity={0-15}
 (XEN)     pause_count=0 pause_flags=1
 (XEN)     No periodic timer

So, consistently _not_ put space between fields and '=',
and consistently _not_ use ',' as separator. Also, put the
info about affinity on the same, properly indented.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Meng Xu <xumengpanda@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
---
 xen/common/keyhandler.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index a917726..5d21e48 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -294,15 +294,15 @@ static void dump_domains(unsigned char key)
                 process_pending_softirqs();
 
             printk("    VCPU%d: CPU%d [has=%c] poll=%d "
-                   "upcall_pend = %02x, upcall_mask = %02x ",
+                   "upcall_pend=%02x upcall_mask=%02x ",
                    v->vcpu_id, v->processor,
                    v->is_running ? 'T':'F', v->poll_evtchn,
                    vcpu_info(v, evtchn_upcall_pending),
                    !vcpu_event_delivery_is_enabled(v));
             cpuset_print(tmpstr, sizeof(tmpstr), v->vcpu_dirty_cpumask);
-            printk("dirty_cpus=%s ", tmpstr);
+            printk("dirty_cpus=%s\n", tmpstr);
             cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_hard_affinity);
-            printk("cpu_affinity=%s\n", tmpstr);
+            printk("    cpu_hard_affinity=%s ", tmpstr);
             cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_soft_affinity);
             printk("cpu_soft_affinity=%s\n", tmpstr);
             printk("    pause_count=%d pause_flags=%lx\n",

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/7] xen: sched_credit2: more info when dumping
  2015-03-16 17:04 [PATCH 0/7] Improving dumping of scheduler related info Dario Faggioli
                   ` (4 preceding siblings ...)
  2015-03-16 17:05 ` [PATCH 5/7] xen: make dumping vcpu info look better Dario Faggioli
@ 2015-03-16 17:05 ` Dario Faggioli
  2015-03-16 18:50   ` George Dunlap
  2015-03-16 17:05 ` [PATCH 7/7] xen: sched_rt: print useful affinity " Dario Faggioli
  6 siblings, 1 reply; 42+ messages in thread
From: Dario Faggioli @ 2015-03-16 17:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Keir Fraser, Jan Beulich

more specifically, for each runqueue, print what pCPUs
belong to it, which ones are idle and which ones have
been tickled.

While there, also convert the whole file to use
keyhandler_scratch for printing cpumask-s.

Signed-off-b: Dario Faggioli <dario.faggioli@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
---
 xen/common/sched_credit2.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 564f890..df29438 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -25,6 +25,7 @@
 #include <xen/errno.h>
 #include <xen/trace.h>
 #include <xen/cpu.h>
+#include <xen/keyhandler.h>
 
 #define d2printk(x...)
 //#define d2printk printk
@@ -1804,7 +1805,7 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu)
     unsigned long flags;
     spinlock_t *lock;
     int loop;
-    char cpustr[100];
+#define cpustr keyhandler_scratch
 
     /*
      * We need both locks:
@@ -1845,6 +1846,7 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu)
 
     spin_unlock(lock);
     spin_unlock_irqrestore(&prv->lock, flags);
+#undef cpustr
 }
 
 static void
@@ -1854,6 +1856,7 @@ csched2_dump(const struct scheduler *ops)
     struct csched2_private *prv = CSCHED2_PRIV(ops);
     unsigned long flags;
     int i, loop;
+#define cpustr keyhandler_scratch
 
     /* We need the private lock as we access global scheduler data
      * and (below) the list of active domains. */
@@ -1869,17 +1872,24 @@ csched2_dump(const struct scheduler *ops)
         
         fraction = prv->rqd[i].avgload * 100 / (1ULL<<prv->load_window_shift);
 
+        cpulist_scnprintf(cpustr, sizeof(cpustr), &prv->rqd[i].active);
         printk("Runqueue %d:\n"
                "\tncpus              = %u\n"
+               "\tcpus               = %s\n"
                "\tmax_weight         = %d\n"
                "\tinstload           = %d\n"
                "\taveload            = %3"PRI_stime"\n",
                i,
                cpumask_weight(&prv->rqd[i].active),
+               cpustr,
                prv->rqd[i].max_weight,
                prv->rqd[i].load,
                fraction);
 
+        cpumask_scnprintf(cpustr, sizeof(cpustr), &prv->rqd[i].idle);
+        printk("\tidlers: %s\n", cpustr);
+        cpumask_scnprintf(cpustr, sizeof(cpustr), &prv->rqd[i].tickled);
+        printk("\ttickled: %s\n", cpustr);
     }
 
     printk("Domain info:\n");
@@ -1910,6 +1920,7 @@ csched2_dump(const struct scheduler *ops)
     }
 
     spin_unlock_irqrestore(&prv->lock, flags);
+#undef cpustr
 }
 
 static void activate_runqueue(struct csched2_private *prv, int rqi)

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
  2015-03-16 17:04 [PATCH 0/7] Improving dumping of scheduler related info Dario Faggioli
                   ` (5 preceding siblings ...)
  2015-03-16 17:05 ` [PATCH 6/7] xen: sched_credit2: more info when dumping Dario Faggioli
@ 2015-03-16 17:05 ` Dario Faggioli
  2015-03-16 19:05   ` George Dunlap
  2015-03-16 20:30   ` Meng Xu
  6 siblings, 2 replies; 42+ messages in thread
From: Dario Faggioli @ 2015-03-16 17:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Keir Fraser, Meng Xu, Jan Beulich

In fact, printing the cpupool's CPU online mask
for each vCPU is just redundant, as that is the
same for all the vCPUs of all the domains in the
same cpupool, while hard affinity is already part
of the output of dumping domains info.

Instead, print the intersection between hard
affinity and online CPUs, which is --in case of this
scheduler-- the effective affinity always used for
the vCPUs.

This change also takes the chance to add a scratch
cpumask, to avoid having to create one more
cpumask_var_t on the stack of the dumping routine.
Such scratch area can be used to kill most of the
cpumasks_var_t local variables in other functions
in the file, but that is *NOT* done in this chage.

Finally, convert the file to use keyhandler scratch,
instead of open coded string buffers.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Meng Xu <xumengpanda@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
---
 xen/common/sched_rt.c |   40 ++++++++++++++++++++++++++++++++--------
 1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 167a484..7aa3ac8 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -124,6 +124,12 @@
 #define TRC_RTDS_BUDGET_REPLENISH TRC_SCHED_CLASS_EVT(RTDS, 4)
 #define TRC_RTDS_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(RTDS, 5)
 
+ /*
+  * Useful to avoid too many cpumask_var_t on the stack.
+  */
+static cpumask_t **_cpumask_scratch;
+#define cpumask_scratch _cpumask_scratch[smp_processor_id()]
+
 /*
  * Systme-wide private data, include global RunQueue/DepletedQ
  * Global lock is referenced by schedule_data.schedule_lock from all
@@ -218,7 +224,6 @@ __q_elem(struct list_head *elem)
 static void
 rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
 {
-    char cpustr[1024];
     cpumask_t *cpupool_mask;
 
     ASSERT(svc != NULL);
@@ -229,10 +234,22 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
         return;
     }
 
-    cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity);
+    cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
+    /*
+     * We can't just use 'cpumask_scratch' because the dumping can
+     * happen from a pCPU outside of this scheduler's cpupool, and
+     * hence it's not right to use the pCPU's scratch mask (which
+     * may even not exist!). On the other hand, it is safe to use
+     * svc->vcpu->processor's own scratch space, since we own the
+     * runqueue lock.
+     */
+    cpumask_and(_cpumask_scratch[svc->vcpu->processor], cpupool_mask,
+                svc->vcpu->cpu_hard_affinity);
+    cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch),
+                      _cpumask_scratch[svc->vcpu->processor]);
     printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
            " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
-           " \t\t onQ=%d runnable=%d cpu_hard_affinity=%s ",
+           " \t\t onQ=%d runnable=%d flags=%x effective hard_affinity=%s\n",
             svc->vcpu->domain->domain_id,
             svc->vcpu->vcpu_id,
             svc->vcpu->processor,
@@ -243,11 +260,8 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
             svc->last_start,
             __vcpu_on_q(svc),
             vcpu_runnable(svc->vcpu),
-            cpustr);
-    memset(cpustr, 0, sizeof(cpustr));
-    cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
-    cpumask_scnprintf(cpustr, sizeof(cpustr), cpupool_mask);
-    printk("cpupool=%s\n", cpustr);
+            svc->flags,
+            keyhandler_scratch);
 }
 
 static void
@@ -409,6 +423,10 @@ rt_init(struct scheduler *ops)
     if ( prv == NULL )
         return -ENOMEM;
 
+    _cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids);
+    if ( _cpumask_scratch == NULL )
+        return -ENOMEM;
+
     spin_lock_init(&prv->lock);
     INIT_LIST_HEAD(&prv->sdom);
     INIT_LIST_HEAD(&prv->runq);
@@ -426,6 +444,7 @@ rt_deinit(const struct scheduler *ops)
 {
     struct rt_private *prv = rt_priv(ops);
 
+    xfree(_cpumask_scratch);
     xfree(prv);
 }
 
@@ -443,6 +462,9 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
     per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
     spin_unlock_irqrestore(&prv->lock, flags);
 
+    if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
+        return NULL;
+
     /* 1 indicates alloc. succeed in schedule.c */
     return (void *)1;
 }
@@ -462,6 +484,8 @@ rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
     sd->schedule_lock = &sd->_lock;
 
     spin_unlock_irqrestore(&prv->lock, flags);
+
+    free_cpumask_var(_cpumask_scratch[cpu]);
 }
 
 static void *

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/7] xen: sched_rt: implement the .free_pdata hook
  2015-03-16 17:04 ` [PATCH 2/7] xen: sched_rt: implement the .free_pdata hook Dario Faggioli
@ 2015-03-16 17:10   ` Dario Faggioli
  2015-03-16 18:23     ` Meng Xu
  2015-03-16 18:17   ` Meng Xu
  2015-03-16 18:21   ` George Dunlap
  2 siblings, 1 reply; 42+ messages in thread
From: Dario Faggioli @ 2015-03-16 17:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir (Xen.org), xumengpanda, George Dunlap, JBeulich


[-- Attachment #1.1: Type: text/plain, Size: 962 bytes --]

On Mon, 2015-03-16 at 18:04 +0100, Dario Faggioli wrote:
> which is called by cpu_schedule_down(), and is necessary
> for resetting the spinlock pointers in schedule_data from
> the RTDS global runqueue lock, back to the default _lock
> fields in the struct.
> 
> Not doing so causes Xen to explode, e.g., when removing
> pCPUs from an RTDS cpupool and assigning them to another
> one.
> 
Hey Meng,

We discussed about this issue before, here: 
http://lists.xen.org/archives/html/xen-devel/2014-09/msg02765.html

But, as far as it looks, the code we ended up merging still had the
issue (i.e., it did not have the hook implemented)!

I was working a bit on/off (for family reasons) in that period, so I
can't really tell how that happened. :-/
Not a big deal... this code is marked as experimental for a reason, but
it probably make sense to double check that anything else similar
happened, can you do that?

Thanks and Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/7] xen: sched_rt: implement the .free_pdata hook
  2015-03-16 17:04 ` [PATCH 2/7] xen: sched_rt: implement the .free_pdata hook Dario Faggioli
  2015-03-16 17:10   ` Dario Faggioli
@ 2015-03-16 18:17   ` Meng Xu
  2015-03-16 18:21   ` George Dunlap
  2 siblings, 0 replies; 42+ messages in thread
From: Meng Xu @ 2015-03-16 18:17 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, Keir Fraser, Jan Beulich, Xen-devel

2015-03-16 13:04 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> which is called by cpu_schedule_down(), and is necessary
> for resetting the spinlock pointers in schedule_data from
> the RTDS global runqueue lock, back to the default _lock
> fields in the struct.
>
> Not doing so causes Xen to explode, e.g., when removing
> pCPUs from an RTDS cpupool and assigning them to another
> one.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <xumengpanda@gmail.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> ---
>  xen/common/sched_rt.c |   18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 055a50f..3e37e62 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -444,6 +444,23 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
>      return (void *)1;
>  }
>
> +static void
> +rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
> +{
> +    struct rt_private *prv = rt_priv(ops);
> +    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
> +    unsigned long flags;
> +
> +    spin_lock_irqsave(&prv->lock, flags);
> +
> +    /* Move spinlock back to the default lock */
> +    ASSERT(sd->schedule_lock == &prv->lock);
> +    ASSERT(!spin_is_locked(&sd->_lock));
> +    sd->schedule_lock = &sd->_lock;
> +
> +    spin_unlock_irqrestore(&prv->lock, flags);
> +}
> +
>  static void *
>  rt_alloc_domdata(const struct scheduler *ops, struct domain *dom)
>  {
> @@ -1090,6 +1107,7 @@ const struct scheduler sched_rtds_def = {
>      .init           = rt_init,
>      .deinit         = rt_deinit,
>      .alloc_pdata    = rt_alloc_pdata,
> +    .free_pdata     = rt_free_pdata,
>      .alloc_domdata  = rt_alloc_domdata,
>      .free_domdata   = rt_free_domdata,
>      .init_domain    = rt_dom_init,
>

Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>

Thank you very much!

Best,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/7] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains
  2015-03-16 17:04 ` [PATCH 1/7] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains Dario Faggioli
@ 2015-03-16 18:18   ` George Dunlap
  2015-03-16 20:31   ` Meng Xu
  2015-03-17 10:41   ` Jan Beulich
  2 siblings, 0 replies; 42+ messages in thread
From: George Dunlap @ 2015-03-16 18:18 UTC (permalink / raw)
  To: Dario Faggioli, Xen-devel; +Cc: Keir Fraser, Meng Xu, Jan Beulich

On 03/16/2015 05:04 PM, Dario Faggioli wrote:
> being serviced by the RTDS scheduler, as that is a
> legit situation to be in: think, for instance, of a
> newly created RTDS cpupool, with no domains migrated
> to it yet.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <xumengpanda@gmail.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>

You're not going to mention fixing a race condition while you're at it? :-)

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
>  xen/common/sched_rt.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index df4adac..055a50f 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -264,18 +264,17 @@ rt_dump(const struct scheduler *ops)
>      struct list_head *iter_sdom, *iter_svc, *runq, *depletedq, *iter;
>      struct rt_private *prv = rt_priv(ops);
>      struct rt_vcpu *svc;
> -    cpumask_t *online;
>      struct rt_dom *sdom;
>      unsigned long flags;
>  
> -    ASSERT(!list_empty(&prv->sdom));
> +    spin_lock_irqsave(&prv->lock, flags);
> +
> +    if (list_empty(&prv->sdom))
> +        goto out;
>  
> -    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
> -    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
>      runq = rt_runq(ops);
>      depletedq = rt_depletedq(ops);
>  
> -    spin_lock_irqsave(&prv->lock, flags);
>      printk("Global RunQueue info:\n");
>      list_for_each( iter, runq )
>      {
> @@ -303,6 +302,7 @@ rt_dump(const struct scheduler *ops)
>          }
>      }
>  
> +out:
>      spin_unlock_irqrestore(&prv->lock, flags);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/7] xen: sched_rt: implement the .free_pdata hook
  2015-03-16 17:04 ` [PATCH 2/7] xen: sched_rt: implement the .free_pdata hook Dario Faggioli
  2015-03-16 17:10   ` Dario Faggioli
  2015-03-16 18:17   ` Meng Xu
@ 2015-03-16 18:21   ` George Dunlap
  2 siblings, 0 replies; 42+ messages in thread
From: George Dunlap @ 2015-03-16 18:21 UTC (permalink / raw)
  To: Dario Faggioli, Xen-devel; +Cc: Keir Fraser, Meng Xu, Jan Beulich

On 03/16/2015 05:04 PM, Dario Faggioli wrote:
> which is called by cpu_schedule_down(), and is necessary
> for resetting the spinlock pointers in schedule_data from
> the RTDS global runqueue lock, back to the default _lock
> fields in the struct.
> 
> Not doing so causes Xen to explode, e.g., when removing
> pCPUs from an RTDS cpupool and assigning them to another
> one.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <xumengpanda@gmail.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
>  xen/common/sched_rt.c |   18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 055a50f..3e37e62 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -444,6 +444,23 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
>      return (void *)1;
>  }
>  
> +static void
> +rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
> +{
> +    struct rt_private *prv = rt_priv(ops);
> +    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
> +    unsigned long flags;
> +
> +    spin_lock_irqsave(&prv->lock, flags);
> +
> +    /* Move spinlock back to the default lock */
> +    ASSERT(sd->schedule_lock == &prv->lock);
> +    ASSERT(!spin_is_locked(&sd->_lock));
> +    sd->schedule_lock = &sd->_lock;
> +
> +    spin_unlock_irqrestore(&prv->lock, flags);
> +}
> +
>  static void *
>  rt_alloc_domdata(const struct scheduler *ops, struct domain *dom)
>  {
> @@ -1090,6 +1107,7 @@ const struct scheduler sched_rtds_def = {
>      .init           = rt_init,
>      .deinit         = rt_deinit,
>      .alloc_pdata    = rt_alloc_pdata,
> +    .free_pdata     = rt_free_pdata,
>      .alloc_domdata  = rt_alloc_domdata,
>      .free_domdata   = rt_free_domdata,
>      .init_domain    = rt_dom_init,
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/7] xen: sched_rt: implement the .free_pdata hook
  2015-03-16 17:10   ` Dario Faggioli
@ 2015-03-16 18:23     ` Meng Xu
  2015-03-17 13:00       ` Dario Faggioli
  0 siblings, 1 reply; 42+ messages in thread
From: Meng Xu @ 2015-03-16 18:23 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Keir (Xen.org), George Dunlap, JBeulich, xen-devel

Hi Dario,

2015-03-16 13:10 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> On Mon, 2015-03-16 at 18:04 +0100, Dario Faggioli wrote:
>> which is called by cpu_schedule_down(), and is necessary
>> for resetting the spinlock pointers in schedule_data from
>> the RTDS global runqueue lock, back to the default _lock
>> fields in the struct.

I noticed this issue the day before yesterday (last weekend) as well
and fixed it in my local repository:
https://github.com/PennPanda/xenproject/blob/rtds/nooh/v3/xen/common/sched_rt.c
from line 522 to 537.

Thank you very much for pointing it out! :-)

>>
>> Not doing so causes Xen to explode, e.g., when removing
>> pCPUs from an RTDS cpupool and assigning them to another
>> one.
>>
> Hey Meng,
>
> We discussed about this issue before, here:
> http://lists.xen.org/archives/html/xen-devel/2014-09/msg02765.html
>
> But, as far as it looks, the code we ended up merging still had the
> issue (i.e., it did not have the hook implemented)!
>
> I was working a bit on/off (for family reasons) in that period, so I
> can't really tell how that happened. :-/
> Not a big deal... this code is marked as experimental for a reason, but
> it probably make sense to double check that anything else similar
> happened, can you do that?


Sure! I will review that problem, do the double check and let you know
the result!

>
> Thanks and Regards,

Thank you very much for doing this!

Best,

Meng

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/7] xen: print online pCPUs and free pCPUs when dumping scheduler info
  2015-03-16 17:05 ` [PATCH 4/7] xen: print online pCPUs and free pCPUs when dumping scheduler info Dario Faggioli
@ 2015-03-16 18:37   ` Juergen Gross
  2015-03-16 18:46   ` George Dunlap
  2015-03-17 10:59   ` Jan Beulich
  2 siblings, 0 replies; 42+ messages in thread
From: Juergen Gross @ 2015-03-16 18:37 UTC (permalink / raw)
  To: Dario Faggioli, Xen-devel; +Cc: George Dunlap, Keir Fraser, Jan Beulich

On 03/16/2015 06:05 PM, Dario Faggioli wrote:
> e.g., with  `xl debug-key r'. This change adds something
> like the following lines to the dump output:
>
>    (XEN) Num online Cpus: 16, cpu_online_map: 0-15
>    (XEN) Num free Cpus: 8, cpupool_free_cpus: 8-15
>
> Meaning there are 16 pCPUs online, and they are pCPUs
> 0-15, and that 8 of them, more specifically 8-15, are
> not part of any cpupool.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Juergen Gross <JGross@suse.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>

Acked-by: Juergen Gross <jgross@suse.com>

> ---
>   xen/common/cpupool.c |   13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index cd6aab9..3bf0174 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -17,6 +17,7 @@
>   #include <xen/percpu.h>
>   #include <xen/sched.h>
>   #include <xen/sched-if.h>
> +#include <xen/keyhandler.h>
>   #include <xen/cpu.h>
>
>   #define for_each_cpupool(ptr)    \
> @@ -658,6 +659,12 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
>       return ret;
>   }
>
> +static void _print_cpumap(const char *str, const cpumask_t *map)
> +{
> +    cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), map);
> +    printk("%s: %s\n", str, keyhandler_scratch);
> +}
> +
>   void dump_runq(unsigned char key)
>   {
>       unsigned long    flags;
> @@ -671,12 +678,18 @@ void dump_runq(unsigned char key)
>               sched_smt_power_savings? "enabled":"disabled");
>       printk("NOW=0x%08X%08X\n",  (u32)(now>>32), (u32)now);
>
> +    printk("Num online Cpus: %d, ", cpumask_weight(&cpu_online_map));
> +    _print_cpumap("cpu_online_map", &cpu_online_map);
> +    printk("Num free Cpus: %d, ", cpumask_weight(&cpupool_free_cpus));
> +    _print_cpumap("cpupool_free_cpus", &cpupool_free_cpus);
> +
>       printk("Idle cpupool:\n");
>       schedule_dump(NULL);
>
>       for_each_cpupool(c)
>       {
>           printk("Cpupool %d:\n", (*c)->cpupool_id);
> +        _print_cpumap("Cpus", (*c)->cpu_valid);
>           schedule_dump(*c);
>       }
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)
  2015-03-16 17:05 ` [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r) Dario Faggioli
@ 2015-03-16 18:41   ` George Dunlap
  2015-03-16 20:04   ` Meng Xu
  2015-03-17 10:54   ` Jan Beulich
  2 siblings, 0 replies; 42+ messages in thread
From: George Dunlap @ 2015-03-16 18:41 UTC (permalink / raw)
  To: Dario Faggioli, Xen-devel; +Cc: Keir Fraser, Meng Xu, Jan Beulich

On 03/16/2015 05:05 PM, Dario Faggioli wrote:
> such as it is taken care of by the various schedulers, rather
> than happening in schedule.c. In fact, it is the schedulers
> that know better which locks are necessary for the specific
> dumping operations.
> 
> While there, fix a few style issues (indentation, trailing
> whitespace, parentheses and blank line after var declarations)
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <xumengpanda@gmail.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>

Thanks!

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>


> ---
>  xen/common/sched_credit.c  |   42 ++++++++++++++++++++++++++++++++++++++++--
>  xen/common/sched_credit2.c |   40 ++++++++++++++++++++++++++++++++--------
>  xen/common/sched_rt.c      |    7 +++++--
>  xen/common/schedule.c      |    5 ++---
>  4 files changed, 79 insertions(+), 15 deletions(-)
> 
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index bec67ff..953ecb0 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -26,6 +26,23 @@
>  
>  
>  /*
> + * Locking:
> + * - Scheduler-lock (a.k.a. runqueue lock):
> + *  + is per-runqueue, and there is one runqueue per-cpu;
> + *  + serializes all runqueue manipulation operations;
> + * - Private data lock (a.k.a. private scheduler lock):
> + *  + serializes accesses to the scheduler global state (weight,
> + *    credit, balance_credit, etc);
> + *  + serializes updates to the domains' scheduling parameters.
> + *
> + * Ordering is "private lock always comes first":
> + *  + if we need both locks, we must acquire the private
> + *    scheduler lock for first;
> + *  + if we already own a runqueue lock, we must never acquire
> + *    the private scheduler lock.
> + */
> +
> +/*
>   * Basic constants
>   */
>  #define CSCHED_DEFAULT_WEIGHT       256
> @@ -1750,11 +1767,24 @@ static void
>  csched_dump_pcpu(const struct scheduler *ops, int cpu)
>  {
>      struct list_head *runq, *iter;
> +    struct csched_private *prv = CSCHED_PRIV(ops);
>      struct csched_pcpu *spc;
>      struct csched_vcpu *svc;
> +    spinlock_t *lock = lock;
> +    unsigned long flags;
>      int loop;
>  #define cpustr keyhandler_scratch
>  
> +    /*
> +     * We need both locks:
> +     * - csched_dump_vcpu() wants to access domains' scheduling
> +     *   parameters, which are protected by the private scheduler lock;
> +     * - we scan through the runqueue, so we need the proper runqueue
> +     *   lock (the one of the runqueue of this cpu).
> +     */
> +    spin_lock_irqsave(&prv->lock, flags);
> +    lock = pcpu_schedule_lock(cpu);
> +
>      spc = CSCHED_PCPU(cpu);
>      runq = &spc->runq;
>  
> @@ -1781,6 +1811,9 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
>              csched_dump_vcpu(svc);
>          }
>      }
> +
> +    pcpu_schedule_unlock(lock, cpu);
> +    spin_unlock_irqrestore(&prv->lock, flags);
>  #undef cpustr
>  }
>  
> @@ -1792,7 +1825,7 @@ csched_dump(const struct scheduler *ops)
>      int loop;
>      unsigned long flags;
>  
> -    spin_lock_irqsave(&(prv->lock), flags);
> +    spin_lock_irqsave(&prv->lock, flags);
>  
>  #define idlers_buf keyhandler_scratch
>  
> @@ -1835,15 +1868,20 @@ csched_dump(const struct scheduler *ops)
>          list_for_each( iter_svc, &sdom->active_vcpu )
>          {
>              struct csched_vcpu *svc;
> +            spinlock_t *lock;
> +
>              svc = list_entry(iter_svc, struct csched_vcpu, active_vcpu_elem);
> +            lock = vcpu_schedule_lock(svc->vcpu);
>  
>              printk("\t%3d: ", ++loop);
>              csched_dump_vcpu(svc);
> +
> +            vcpu_schedule_unlock(lock, svc->vcpu);
>          }
>      }
>  #undef idlers_buf
>  
> -    spin_unlock_irqrestore(&(prv->lock), flags);
> +    spin_unlock_irqrestore(&prv->lock, flags);
>  }
>  
>  static int
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index f0e2c82..564f890 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -51,8 +51,6 @@
>   * credit2 wiki page:
>   *  http://wiki.xen.org/wiki/Credit2_Scheduler_Development
>   * TODO:
> - * + Immediate bug-fixes
> - *  - Do per-runqueue, grab proper lock for dump debugkey
>   * + Multiple sockets
>   *  - Detect cpu layout and make runqueue map, one per L2 (make_runq_map())
>   *  - Simple load balancer / runqueue assignment
> @@ -1800,12 +1798,24 @@ csched2_dump_vcpu(struct csched2_vcpu *svc)
>  static void
>  csched2_dump_pcpu(const struct scheduler *ops, int cpu)
>  {
> +    struct csched2_private *prv = CSCHED2_PRIV(ops);
>      struct list_head *runq, *iter;
>      struct csched2_vcpu *svc;
> +    unsigned long flags;
> +    spinlock_t *lock;
>      int loop;
>      char cpustr[100];
>  
> -    /* FIXME: Do locking properly for access to runqueue structures */
> +    /*
> +     * We need both locks:
> +     * - csched2_dump_vcpu() wants to access domains' scheduling
> +     *   parameters, which are protected by the private scheduler lock;
> +     * - we scan through the runqueue, so we need the proper runqueue
> +     *   lock (the one of the runqueue this cpu is associated to).
> +     */
> +    spin_lock_irqsave(&prv->lock, flags);
> +    lock = per_cpu(schedule_data, cpu).schedule_lock;
> +    spin_lock(lock);
>  
>      runq = &RQD(ops, cpu)->runq;
>  
> @@ -1832,6 +1842,9 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu)
>              csched2_dump_vcpu(svc);
>          }
>      }
> +
> +    spin_unlock(lock);
> +    spin_unlock_irqrestore(&prv->lock, flags);
>  }
>  
>  static void
> @@ -1839,8 +1852,13 @@ csched2_dump(const struct scheduler *ops)
>  {
>      struct list_head *iter_sdom, *iter_svc;
>      struct csched2_private *prv = CSCHED2_PRIV(ops);
> +    unsigned long flags;
>      int i, loop;
>  
> +    /* We need the private lock as we access global scheduler data
> +     * and (below) the list of active domains. */
> +    spin_lock_irqsave(&prv->lock, flags);
> +
>      printk("Active queues: %d\n"
>             "\tdefault-weight     = %d\n",
>             cpumask_weight(&prv->active_queues),
> @@ -1863,7 +1881,6 @@ csched2_dump(const struct scheduler *ops)
>                 fraction);
>  
>      }
> -    /* FIXME: Locking! */
>  
>      printk("Domain info:\n");
>      loop = 0;
> @@ -1872,20 +1889,27 @@ csched2_dump(const struct scheduler *ops)
>          struct csched2_dom *sdom;
>          sdom = list_entry(iter_sdom, struct csched2_dom, sdom_elem);
>  
> -       printk("\tDomain: %d w %d v %d\n\t", 
> -              sdom->dom->domain_id, 
> -              sdom->weight, 
> -              sdom->nr_vcpus);
> +        printk("\tDomain: %d w %d v %d\n\t",
> +               sdom->dom->domain_id,
> +               sdom->weight,
> +               sdom->nr_vcpus);
>  
>          list_for_each( iter_svc, &sdom->vcpu )
>          {
>              struct csched2_vcpu *svc;
> +            spinlock_t *lock;
> +
>              svc = list_entry(iter_svc, struct csched2_vcpu, sdom_elem);
> +            lock = vcpu_schedule_lock(svc->vcpu);
>  
>              printk("\t%3d: ", ++loop);
>              csched2_dump_vcpu(svc);
> +
> +            vcpu_schedule_unlock(lock, svc->vcpu);
>          }
>      }
> +
> +    spin_unlock_irqrestore(&prv->lock, flags);
>  }
>  
>  static void activate_runqueue(struct csched2_private *prv, int rqi)
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 3e37e62..167a484 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -253,9 +253,12 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
>  static void
>  rt_dump_pcpu(const struct scheduler *ops, int cpu)
>  {
> -    struct rt_vcpu *svc = rt_vcpu(curr_on_cpu(cpu));
> +    struct rt_private *prv = rt_priv(ops);
> +    unsigned long flags;
>  
> -    rt_dump_vcpu(ops, svc);
> +    spin_lock_irqsave(&prv->lock, flags);
> +    rt_dump_vcpu(ops, rt_vcpu(curr_on_cpu(cpu)));
> +    spin_unlock_irqrestore(&prv->lock, flags);
>  }
>  
>  static void
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index ef79847..30eec0f 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1515,6 +1515,8 @@ void schedule_dump(struct cpupool *c)
>      struct scheduler *sched;
>      cpumask_t        *cpus;
>  
> +    /* Locking, if necessary, must be handled withing each scheduler */
> +
>      sched = (c == NULL) ? &ops : c->sched;
>      cpus = cpupool_scheduler_cpumask(c);
>      printk("Scheduler: %s (%s)\n", sched->name, sched->opt_name);
> @@ -1522,11 +1524,8 @@ void schedule_dump(struct cpupool *c)
>  
>      for_each_cpu (i, cpus)
>      {
> -        spinlock_t *lock = pcpu_schedule_lock(i);
> -
>          printk("CPU[%02d] ", i);
>          SCHED_OP(sched, dump_cpu_state, i);
> -        pcpu_schedule_unlock(lock, i);
>      }
>  }
>  
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/7] xen: print online pCPUs and free pCPUs when dumping scheduler info
  2015-03-16 17:05 ` [PATCH 4/7] xen: print online pCPUs and free pCPUs when dumping scheduler info Dario Faggioli
  2015-03-16 18:37   ` Juergen Gross
@ 2015-03-16 18:46   ` George Dunlap
  2015-03-17 10:59   ` Jan Beulich
  2 siblings, 0 replies; 42+ messages in thread
From: George Dunlap @ 2015-03-16 18:46 UTC (permalink / raw)
  To: Dario Faggioli, Xen-devel; +Cc: Juergen Gross, Keir Fraser, Jan Beulich

On 03/16/2015 05:05 PM, Dario Faggioli wrote:
> e.g., with  `xl debug-key r'. This change adds something
> like the following lines to the dump output:
> 
>   (XEN) Num online Cpus: 16, cpu_online_map: 0-15
>   (XEN) Num free Cpus: 8, cpupool_free_cpus: 8-15
> 
> Meaning there are 16 pCPUs online, and they are pCPUs
> 0-15, and that 8 of them, more specifically 8-15, are
> not part of any cpupool.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Juergen Gross <JGross@suse.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> ---
>  xen/common/cpupool.c |   13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index cd6aab9..3bf0174 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -17,6 +17,7 @@
>  #include <xen/percpu.h>
>  #include <xen/sched.h>
>  #include <xen/sched-if.h>
> +#include <xen/keyhandler.h>
>  #include <xen/cpu.h>
>  
>  #define for_each_cpupool(ptr)    \
> @@ -658,6 +659,12 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
>      return ret;
>  }
>  
> +static void _print_cpumap(const char *str, const cpumask_t *map)

Is there a particular reason to put "_" in front of a static function?

This has Juergen's Ack now, so that's just a suggestion / comment. :-)

 -George

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5/7] xen: make dumping vcpu info look better
  2015-03-16 17:05 ` [PATCH 5/7] xen: make dumping vcpu info look better Dario Faggioli
@ 2015-03-16 18:48   ` George Dunlap
  2015-03-16 20:06   ` Meng Xu
  1 sibling, 0 replies; 42+ messages in thread
From: George Dunlap @ 2015-03-16 18:48 UTC (permalink / raw)
  To: Dario Faggioli, Xen-devel; +Cc: Keir Fraser, Meng Xu, Jan Beulich

On 03/16/2015 05:05 PM, Dario Faggioli wrote:
> and more consistent. In fact, before this changes, it looks
> like this:
> 
>  (XEN) VCPU information and callbacks for domain 0:
>  (XEN)     VCPU0: CPU4 [has=F] poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={4} cpu_affinity={0-15}
>  (XEN) cpu_soft_affinity={0-15}
>  (XEN)     pause_count=0 pause_flags=1
>  (XEN)     No periodic timer
> 
> After, it looks like this:
> 
>  (XEN) VCPU information and callbacks for domain 0:
>  (XEN)     VCPU0: CPU4 [has=F] poll=0 upcall_pend=00 upcall_mask=00 dirty_cpus={4}
>  (XEN)     cpu_hard_affinity={0-15} cpu_soft_affinity={0-15}
>  (XEN)     pause_count=0 pause_flags=1
>  (XEN)     No periodic timer
> 
> So, consistently _not_ put space between fields and '=',
> and consistently _not_ use ',' as separator. Also, put the
> info about affinity on the same, properly indented.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <xumengpanda@gmail.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
>  xen/common/keyhandler.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
> index a917726..5d21e48 100644
> --- a/xen/common/keyhandler.c
> +++ b/xen/common/keyhandler.c
> @@ -294,15 +294,15 @@ static void dump_domains(unsigned char key)
>                  process_pending_softirqs();
>  
>              printk("    VCPU%d: CPU%d [has=%c] poll=%d "
> -                   "upcall_pend = %02x, upcall_mask = %02x ",
> +                   "upcall_pend=%02x upcall_mask=%02x ",
>                     v->vcpu_id, v->processor,
>                     v->is_running ? 'T':'F', v->poll_evtchn,
>                     vcpu_info(v, evtchn_upcall_pending),
>                     !vcpu_event_delivery_is_enabled(v));
>              cpuset_print(tmpstr, sizeof(tmpstr), v->vcpu_dirty_cpumask);
> -            printk("dirty_cpus=%s ", tmpstr);
> +            printk("dirty_cpus=%s\n", tmpstr);
>              cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_hard_affinity);
> -            printk("cpu_affinity=%s\n", tmpstr);
> +            printk("    cpu_hard_affinity=%s ", tmpstr);
>              cpuset_print(tmpstr, sizeof(tmpstr), v->cpu_soft_affinity);
>              printk("cpu_soft_affinity=%s\n", tmpstr);
>              printk("    pause_count=%d pause_flags=%lx\n",
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 6/7] xen: sched_credit2: more info when dumping
  2015-03-16 17:05 ` [PATCH 6/7] xen: sched_credit2: more info when dumping Dario Faggioli
@ 2015-03-16 18:50   ` George Dunlap
  0 siblings, 0 replies; 42+ messages in thread
From: George Dunlap @ 2015-03-16 18:50 UTC (permalink / raw)
  To: Dario Faggioli, Xen-devel; +Cc: Keir Fraser, Jan Beulich

On 03/16/2015 05:05 PM, Dario Faggioli wrote:
> more specifically, for each runqueue, print what pCPUs
> belong to it, which ones are idle and which ones have
> been tickled.
> 
> While there, also convert the whole file to use
> keyhandler_scratch for printing cpumask-s.
> 
> Signed-off-b: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
>  xen/common/sched_credit2.c |   13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 564f890..df29438 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -25,6 +25,7 @@
>  #include <xen/errno.h>
>  #include <xen/trace.h>
>  #include <xen/cpu.h>
> +#include <xen/keyhandler.h>
>  
>  #define d2printk(x...)
>  //#define d2printk printk
> @@ -1804,7 +1805,7 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu)
>      unsigned long flags;
>      spinlock_t *lock;
>      int loop;
> -    char cpustr[100];
> +#define cpustr keyhandler_scratch
>  
>      /*
>       * We need both locks:
> @@ -1845,6 +1846,7 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu)
>  
>      spin_unlock(lock);
>      spin_unlock_irqrestore(&prv->lock, flags);
> +#undef cpustr
>  }
>  
>  static void
> @@ -1854,6 +1856,7 @@ csched2_dump(const struct scheduler *ops)
>      struct csched2_private *prv = CSCHED2_PRIV(ops);
>      unsigned long flags;
>      int i, loop;
> +#define cpustr keyhandler_scratch
>  
>      /* We need the private lock as we access global scheduler data
>       * and (below) the list of active domains. */
> @@ -1869,17 +1872,24 @@ csched2_dump(const struct scheduler *ops)
>          
>          fraction = prv->rqd[i].avgload * 100 / (1ULL<<prv->load_window_shift);
>  
> +        cpulist_scnprintf(cpustr, sizeof(cpustr), &prv->rqd[i].active);
>          printk("Runqueue %d:\n"
>                 "\tncpus              = %u\n"
> +               "\tcpus               = %s\n"
>                 "\tmax_weight         = %d\n"
>                 "\tinstload           = %d\n"
>                 "\taveload            = %3"PRI_stime"\n",
>                 i,
>                 cpumask_weight(&prv->rqd[i].active),
> +               cpustr,
>                 prv->rqd[i].max_weight,
>                 prv->rqd[i].load,
>                 fraction);
>  
> +        cpumask_scnprintf(cpustr, sizeof(cpustr), &prv->rqd[i].idle);
> +        printk("\tidlers: %s\n", cpustr);
> +        cpumask_scnprintf(cpustr, sizeof(cpustr), &prv->rqd[i].tickled);
> +        printk("\ttickled: %s\n", cpustr);
>      }
>  
>      printk("Domain info:\n");
> @@ -1910,6 +1920,7 @@ csched2_dump(const struct scheduler *ops)
>      }
>  
>      spin_unlock_irqrestore(&prv->lock, flags);
> +#undef cpustr
>  }
>  
>  static void activate_runqueue(struct csched2_private *prv, int rqi)
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
  2015-03-16 17:05 ` [PATCH 7/7] xen: sched_rt: print useful affinity " Dario Faggioli
@ 2015-03-16 19:05   ` George Dunlap
  2015-03-17 13:51     ` Dario Faggioli
  2015-03-16 20:30   ` Meng Xu
  1 sibling, 1 reply; 42+ messages in thread
From: George Dunlap @ 2015-03-16 19:05 UTC (permalink / raw)
  To: Dario Faggioli, Xen-devel; +Cc: Keir Fraser, Meng Xu, Jan Beulich

On 03/16/2015 05:05 PM, Dario Faggioli wrote:
> In fact, printing the cpupool's CPU online mask
> for each vCPU is just redundant, as that is the
> same for all the vCPUs of all the domains in the
> same cpupool, while hard affinity is already part
> of the output of dumping domains info.
> 
> Instead, print the intersection between hard
> affinity and online CPUs, which is --in case of this
> scheduler-- the effective affinity always used for
> the vCPUs.
> 
> This change also takes the chance to add a scratch
> cpumask, to avoid having to create one more
> cpumask_var_t on the stack of the dumping routine.
> Such scratch area can be used to kill most of the
> cpumasks_var_t local variables in other functions
> in the file, but that is *NOT* done in this chage.
> 
> Finally, convert the file to use keyhandler scratch,
> instead of open coded string buffers.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <xumengpanda@gmail.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> ---
>  xen/common/sched_rt.c |   40 ++++++++++++++++++++++++++++++++--------
>  1 file changed, 32 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 167a484..7aa3ac8 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -124,6 +124,12 @@
>  #define TRC_RTDS_BUDGET_REPLENISH TRC_SCHED_CLASS_EVT(RTDS, 4)
>  #define TRC_RTDS_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(RTDS, 5)
>  
> + /*
> +  * Useful to avoid too many cpumask_var_t on the stack.
> +  */
> +static cpumask_t **_cpumask_scratch;
> +#define cpumask_scratch _cpumask_scratch[smp_processor_id()]
> +
>  /*
>   * Systme-wide private data, include global RunQueue/DepletedQ
>   * Global lock is referenced by schedule_data.schedule_lock from all
> @@ -218,7 +224,6 @@ __q_elem(struct list_head *elem)
>  static void
>  rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
>  {
> -    char cpustr[1024];
>      cpumask_t *cpupool_mask;
>  
>      ASSERT(svc != NULL);
> @@ -229,10 +234,22 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
>          return;
>      }
>  
> -    cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity);
> +    cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
> +    /*
> +     * We can't just use 'cpumask_scratch' because the dumping can
> +     * happen from a pCPU outside of this scheduler's cpupool, and
> +     * hence it's not right to use the pCPU's scratch mask (which
> +     * may even not exist!). On the other hand, it is safe to use
> +     * svc->vcpu->processor's own scratch space, since we own the
> +     * runqueue lock.

Since we *hold* the lock.

> +     */
> +    cpumask_and(_cpumask_scratch[svc->vcpu->processor], cpupool_mask,
> +                svc->vcpu->cpu_hard_affinity);
> +    cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch),
> +                      _cpumask_scratch[svc->vcpu->processor]);

Just a suggestion, would it be worth making a local variable to avoid
typing this long thing twice?  Then you could also put the comment about
using the svc->vcpu->processor's scratch space above the place where you
set the local variable, while avoiding breaking up the logic of the
cpumask operations.

Other than that, looks good!

 -George

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)
  2015-03-16 17:05 ` [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r) Dario Faggioli
  2015-03-16 18:41   ` George Dunlap
@ 2015-03-16 20:04   ` Meng Xu
  2015-03-17 10:54   ` Jan Beulich
  2 siblings, 0 replies; 42+ messages in thread
From: Meng Xu @ 2015-03-16 20:04 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, Keir Fraser, Jan Beulich, Xen-devel

2015-03-16 13:05 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> such as it is taken care of by the various schedulers, rather
> than happening in schedule.c. In fact, it is the schedulers
> that know better which locks are necessary for the specific
> dumping operations.
>
> While there, fix a few style issues (indentation, trailing
> whitespace, parentheses and blank line after var declarations)
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <xumengpanda@gmail.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> ---
>  xen/common/sched_credit.c  |   42 ++++++++++++++++++++++++++++++++++++++++--
>  xen/common/sched_credit2.c |   40 ++++++++++++++++++++++++++++++++--------
>  xen/common/sched_rt.c      |    7 +++++--
>  xen/common/schedule.c      |    5 ++---
>  4 files changed, 79 insertions(+), 15 deletions(-)
>

Thank you very much!

Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5/7] xen: make dumping vcpu info look better
  2015-03-16 17:05 ` [PATCH 5/7] xen: make dumping vcpu info look better Dario Faggioli
  2015-03-16 18:48   ` George Dunlap
@ 2015-03-16 20:06   ` Meng Xu
  1 sibling, 0 replies; 42+ messages in thread
From: Meng Xu @ 2015-03-16 20:06 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, Keir Fraser, Jan Beulich, Xen-devel

2015-03-16 13:05 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> and more consistent. In fact, before this changes, it looks
> like this:
>
>  (XEN) VCPU information and callbacks for domain 0:
>  (XEN)     VCPU0: CPU4 [has=F] poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={4} cpu_affinity={0-15}
>  (XEN) cpu_soft_affinity={0-15}
>  (XEN)     pause_count=0 pause_flags=1
>  (XEN)     No periodic timer
>
> After, it looks like this:
>
>  (XEN) VCPU information and callbacks for domain 0:
>  (XEN)     VCPU0: CPU4 [has=F] poll=0 upcall_pend=00 upcall_mask=00 dirty_cpus={4}
>  (XEN)     cpu_hard_affinity={0-15} cpu_soft_affinity={0-15}
>  (XEN)     pause_count=0 pause_flags=1
>  (XEN)     No periodic timer
>
> So, consistently _not_ put space between fields and '=',
> and consistently _not_ use ',' as separator. Also, put the
> info about affinity on the same, properly indented.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <xumengpanda@gmail.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>

Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>

Thanks,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
  2015-03-16 17:05 ` [PATCH 7/7] xen: sched_rt: print useful affinity " Dario Faggioli
  2015-03-16 19:05   ` George Dunlap
@ 2015-03-16 20:30   ` Meng Xu
  2015-03-17 11:10     ` George Dunlap
  2015-03-17 14:12     ` Dario Faggioli
  1 sibling, 2 replies; 42+ messages in thread
From: Meng Xu @ 2015-03-16 20:30 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, Keir Fraser, Jan Beulich, Xen-devel

Hi Dario,

2015-03-16 13:05 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> In fact, printing the cpupool's CPU online mask
> for each vCPU is just redundant, as that is the
> same for all the vCPUs of all the domains in the
> same cpupool, while hard affinity is already part
> of the output of dumping domains info.
>
> Instead, print the intersection between hard
> affinity and online CPUs, which is --in case of this
> scheduler-- the effective affinity always used for
> the vCPUs.

Agree.

>
> This change also takes the chance to add a scratch
> cpumask, to avoid having to create one more
> cpumask_var_t on the stack of the dumping routine.

Actually, I have a question about the strength of this design. When we
have a machine with many cpus, we will end up with allocating a
cpumask for each cpu. Is this better than having a cpumask_var_t on
the stack of the dumping routine, since the dumping routine is not in
the hot path?

> Such scratch area can be used to kill most of the
> cpumasks_var_t local variables in other functions
> in the file, but that is *NOT* done in this chage.
>
> Finally, convert the file to use keyhandler scratch,
> instead of open coded string buffers.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <xumengpanda@gmail.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> ---
>  xen/common/sched_rt.c |   40 ++++++++++++++++++++++++++++++++--------
>  1 file changed, 32 insertions(+), 8 deletions(-)
>
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 167a484..7aa3ac8 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -124,6 +124,12 @@
>  #define TRC_RTDS_BUDGET_REPLENISH TRC_SCHED_CLASS_EVT(RTDS, 4)
>  #define TRC_RTDS_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(RTDS, 5)
>
> + /*
> +  * Useful to avoid too many cpumask_var_t on the stack.
> +  */
> +static cpumask_t **_cpumask_scratch;
> +#define cpumask_scratch _cpumask_scratch[smp_processor_id()]
> +
>  /*
>   * Systme-wide private data, include global RunQueue/DepletedQ
>   * Global lock is referenced by schedule_data.schedule_lock from all
> @@ -218,7 +224,6 @@ __q_elem(struct list_head *elem)
>  static void
>  rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
>  {
> -    char cpustr[1024];
>      cpumask_t *cpupool_mask;
>
>      ASSERT(svc != NULL);
> @@ -229,10 +234,22 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
>          return;
>      }
>
> -    cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity);
> +    cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
> +    /*
> +     * We can't just use 'cpumask_scratch' because the dumping can
> +     * happen from a pCPU outside of this scheduler's cpupool, and
> +     * hence it's not right to use the pCPU's scratch mask (which
> +     * may even not exist!). On the other hand, it is safe to use
> +     * svc->vcpu->processor's own scratch space, since we own the
> +     * runqueue lock.
> +     */
> +    cpumask_and(_cpumask_scratch[svc->vcpu->processor], cpupool_mask,
> +                svc->vcpu->cpu_hard_affinity);
> +    cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch),
> +                      _cpumask_scratch[svc->vcpu->processor]);
>      printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
>             " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
> -           " \t\t onQ=%d runnable=%d cpu_hard_affinity=%s ",
> +           " \t\t onQ=%d runnable=%d flags=%x effective hard_affinity=%s\n",
>              svc->vcpu->domain->domain_id,
>              svc->vcpu->vcpu_id,
>              svc->vcpu->processor,
> @@ -243,11 +260,8 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
>              svc->last_start,
>              __vcpu_on_q(svc),
>              vcpu_runnable(svc->vcpu),
> -            cpustr);
> -    memset(cpustr, 0, sizeof(cpustr));
> -    cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
> -    cpumask_scnprintf(cpustr, sizeof(cpustr), cpupool_mask);
> -    printk("cpupool=%s\n", cpustr);
> +            svc->flags,
> +            keyhandler_scratch);
>  }
>
>  static void
> @@ -409,6 +423,10 @@ rt_init(struct scheduler *ops)
>      if ( prv == NULL )
>          return -ENOMEM;
>
> +    _cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids);

Is it better to use xzalloc_array?

> +    if ( _cpumask_scratch == NULL )
> +        return -ENOMEM;
> +
>      spin_lock_init(&prv->lock);
>      INIT_LIST_HEAD(&prv->sdom);
>      INIT_LIST_HEAD(&prv->runq);
> @@ -426,6 +444,7 @@ rt_deinit(const struct scheduler *ops)
>  {
>      struct rt_private *prv = rt_priv(ops);
>
> +    xfree(_cpumask_scratch);
>      xfree(prv);
>  }
>
> @@ -443,6 +462,9 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
>      per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
>      spin_unlock_irqrestore(&prv->lock, flags);
>
> +    if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )

Is it better to use zalloc_cpumask_var() here?


> +        return NULL;
> +
>      /* 1 indicates alloc. succeed in schedule.c */
>      return (void *)1;
>  }
> @@ -462,6 +484,8 @@ rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>      sd->schedule_lock = &sd->_lock;
>
>      spin_unlock_irqrestore(&prv->lock, flags);
> +
> +    free_cpumask_var(_cpumask_scratch[cpu]);
>  }
>
>  static void *
>

Others look good to me!

Thank you very much! :-)

Best,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/7] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains
  2015-03-16 17:04 ` [PATCH 1/7] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains Dario Faggioli
  2015-03-16 18:18   ` George Dunlap
@ 2015-03-16 20:31   ` Meng Xu
  2015-03-17 10:41   ` Jan Beulich
  2 siblings, 0 replies; 42+ messages in thread
From: Meng Xu @ 2015-03-16 20:31 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, Keir Fraser, Jan Beulich, Xen-devel

2015-03-16 13:04 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> being serviced by the RTDS scheduler, as that is a
> legit situation to be in: think, for instance, of a
> newly created RTDS cpupool, with no domains migrated
> to it yet.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <xumengpanda@gmail.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> ---

Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>

Thanks!

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/7] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains
  2015-03-16 17:04 ` [PATCH 1/7] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains Dario Faggioli
  2015-03-16 18:18   ` George Dunlap
  2015-03-16 20:31   ` Meng Xu
@ 2015-03-17 10:41   ` Jan Beulich
  2015-03-17 11:00     ` Dario Faggioli
  2 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2015-03-17 10:41 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, Keir Fraser, Meng Xu, Xen-devel

>>> On 16.03.15 at 18:04, <dario.faggioli@citrix.com> wrote:
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -264,18 +264,17 @@ rt_dump(const struct scheduler *ops)
>      struct list_head *iter_sdom, *iter_svc, *runq, *depletedq, *iter;
>      struct rt_private *prv = rt_priv(ops);
>      struct rt_vcpu *svc;
> -    cpumask_t *online;
>      struct rt_dom *sdom;
>      unsigned long flags;
>  
> -    ASSERT(!list_empty(&prv->sdom));
> +    spin_lock_irqsave(&prv->lock, flags);
> +
> +    if (list_empty(&prv->sdom))

Coding style.

> +        goto out;
>  
> -    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
> -    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);

Unrelated change (together with the variable deletion above). I'm
fine for this to stay here, but it should at least be mentioned in the
description so that future archeologists don't wonder about the
connection to the actual issue fixed here.

> @@ -303,6 +302,7 @@ rt_dump(const struct scheduler *ops)
>          }
>      }
>  
> +out:

Labels should be indented by at least one space.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)
  2015-03-16 17:05 ` [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r) Dario Faggioli
  2015-03-16 18:41   ` George Dunlap
  2015-03-16 20:04   ` Meng Xu
@ 2015-03-17 10:54   ` Jan Beulich
  2015-03-17 11:05     ` George Dunlap
  2015-03-17 11:14     ` Dario Faggioli
  2 siblings, 2 replies; 42+ messages in thread
From: Jan Beulich @ 2015-03-17 10:54 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, Keir Fraser, Meng Xu, Xen-devel

>>> On 16.03.15 at 18:05, <dario.faggioli@citrix.com> wrote:
> such as it is taken care of by the various schedulers, rather
> than happening in schedule.c. In fact, it is the schedulers
> that know better which locks are necessary for the specific
> dumping operations.
> 
> While there, fix a few style issues (indentation, trailing
> whitespace, parentheses and blank line after var declarations)
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <xumengpanda@gmail.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> ---
>  xen/common/sched_credit.c  |   42 ++++++++++++++++++++++++++++++++++++++++--
>  xen/common/sched_credit2.c |   40 ++++++++++++++++++++++++++++++++--------
>  xen/common/sched_rt.c      |    7 +++++--
>  xen/common/schedule.c      |    5 ++---
>  4 files changed, 79 insertions(+), 15 deletions(-)

Is it really correct that sched_sedf.c doesn't need any change here?

> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -26,6 +26,23 @@
>  
>  
>  /*
> + * Locking:
> + * - Scheduler-lock (a.k.a. runqueue lock):
> + *  + is per-runqueue, and there is one runqueue per-cpu;
> + *  + serializes all runqueue manipulation operations;
> + * - Private data lock (a.k.a. private scheduler lock):
> + *  + serializes accesses to the scheduler global state (weight,
> + *    credit, balance_credit, etc);
> + *  + serializes updates to the domains' scheduling parameters.
> + *
> + * Ordering is "private lock always comes first":
> + *  + if we need both locks, we must acquire the private
> + *    scheduler lock for first;
> + *  + if we already own a runqueue lock, we must never acquire
> + *    the private scheduler lock.
> + */

And this is Credit1 specific?

Regardless of that, even if that's just reflecting current state, isn't
acquiring a private lock around a generic lock backwards?

Finally, as said in different contexts earlier, I think unconditionally
acquiring locks in dumping routines isn't the best practice. At least
in non-debug builds I think these should be try-locks only, skipping
the dumping when a lock is busy.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/7] xen: print online pCPUs and free pCPUs when dumping scheduler info
  2015-03-16 17:05 ` [PATCH 4/7] xen: print online pCPUs and free pCPUs when dumping scheduler info Dario Faggioli
  2015-03-16 18:37   ` Juergen Gross
  2015-03-16 18:46   ` George Dunlap
@ 2015-03-17 10:59   ` Jan Beulich
  2 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2015-03-17 10:59 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, Juergen Gross, Keir Fraser, Xen-devel

>>> On 16.03.15 at 18:05, <dario.faggioli@citrix.com> wrote:
> e.g., with  `xl debug-key r'. This change adds something
> like the following lines to the dump output:
> 
>   (XEN) Num online Cpus: 16, cpu_online_map: 0-15
>   (XEN) Num free Cpus: 8, cpupool_free_cpus: 8-15

I dislike such redundant information: Just printing the maps seems
entirely sufficient to me (and without including the names of internal
variables). And printing the free ones when there are none should
probably be skipped too.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/7] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains
  2015-03-17 10:41   ` Jan Beulich
@ 2015-03-17 11:00     ` Dario Faggioli
  0 siblings, 0 replies; 42+ messages in thread
From: Dario Faggioli @ 2015-03-17 11:00 UTC (permalink / raw)
  To: JBeulich; +Cc: Keir (Xen.org), xumengpanda, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1947 bytes --]

On Tue, 2015-03-17 at 10:41 +0000, Jan Beulich wrote:
> >>> On 16.03.15 at 18:04, <dario.faggioli@citrix.com> wrote:
> > --- a/xen/common/sched_rt.c
> > +++ b/xen/common/sched_rt.c
> > @@ -264,18 +264,17 @@ rt_dump(const struct scheduler *ops)
> >      struct list_head *iter_sdom, *iter_svc, *runq, *depletedq, *iter;
> >      struct rt_private *prv = rt_priv(ops);
> >      struct rt_vcpu *svc;
> > -    cpumask_t *online;
> >      struct rt_dom *sdom;
> >      unsigned long flags;
> >  
> > -    ASSERT(!list_empty(&prv->sdom));
> > +    spin_lock_irqsave(&prv->lock, flags);
> > +
> > +    if (list_empty(&prv->sdom))
> 
> Coding style.
> 
Gah! Sure.... sorry!

> > +        goto out;
> >  
> > -    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
> > -    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
> 
> Unrelated change (together with the variable deletion above). I'm
> fine for this to stay here, but it should at least be mentioned in the
> description so that future archeologists don't wonder about the
> connection to the actual issue fixed here.
> 
Ok, I'll update the changelog, including also a mention to the fact that
I'm moving up the spinlock acquisition as (I think) George is also
suggesting to.

> > @@ -303,6 +302,7 @@ rt_dump(const struct scheduler *ops)
> >          }
> >      }
> >  
> > +out:
> 
> Labels should be indented by at least one space.
> 
I checked all the sched_* file, and this is rather inconsistent. What
I'll do is indent this one appropriately, which will make things even
more inconsistent (in sched_rt.c there is only another 'out:' label, and
it's not indented), and then send an independent cleanup patch
afterwards... hope it is fine.

Thanks and Regards,
Dario

> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)
  2015-03-17 10:54   ` Jan Beulich
@ 2015-03-17 11:05     ` George Dunlap
  2015-03-17 11:18       ` Dario Faggioli
  2015-03-17 11:25       ` Jan Beulich
  2015-03-17 11:14     ` Dario Faggioli
  1 sibling, 2 replies; 42+ messages in thread
From: George Dunlap @ 2015-03-17 11:05 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli; +Cc: Keir Fraser, Meng Xu, Xen-devel

On 03/17/2015 10:54 AM, Jan Beulich wrote:
>>>> On 16.03.15 at 18:05, <dario.faggioli@citrix.com> wrote:
>> such as it is taken care of by the various schedulers, rather
>> than happening in schedule.c. In fact, it is the schedulers
>> that know better which locks are necessary for the specific
>> dumping operations.
>>
>> While there, fix a few style issues (indentation, trailing
>> whitespace, parentheses and blank line after var declarations)
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> Cc: Meng Xu <xumengpanda@gmail.com>
>> Cc: Jan Beulich <JBeulich@suse.com>
>> Cc: Keir Fraser <keir@xen.org>
>> ---
>>  xen/common/sched_credit.c  |   42 ++++++++++++++++++++++++++++++++++++++++--
>>  xen/common/sched_credit2.c |   40 ++++++++++++++++++++++++++++++++--------
>>  xen/common/sched_rt.c      |    7 +++++--
>>  xen/common/schedule.c      |    5 ++---
>>  4 files changed, 79 insertions(+), 15 deletions(-)
> 
> Is it really correct that sched_sedf.c doesn't need any change here?
> 
>> --- a/xen/common/sched_credit.c
>> +++ b/xen/common/sched_credit.c
>> @@ -26,6 +26,23 @@
>>  
>>  
>>  /*
>> + * Locking:
>> + * - Scheduler-lock (a.k.a. runqueue lock):
>> + *  + is per-runqueue, and there is one runqueue per-cpu;
>> + *  + serializes all runqueue manipulation operations;
>> + * - Private data lock (a.k.a. private scheduler lock):
>> + *  + serializes accesses to the scheduler global state (weight,
>> + *    credit, balance_credit, etc);
>> + *  + serializes updates to the domains' scheduling parameters.
>> + *
>> + * Ordering is "private lock always comes first":
>> + *  + if we need both locks, we must acquire the private
>> + *    scheduler lock for first;
>> + *  + if we already own a runqueue lock, we must never acquire
>> + *    the private scheduler lock.
>> + */
> 
> And this is Credit1 specific?

Credit1 and credit2 have slightly different lock and data layouts, and
thus a slightly different locking discipline.  This looks like it was
copied from the credit2 description and then modified for credit1.

> 
> Regardless of that, even if that's just reflecting current state, isn't
> acquiring a private lock around a generic lock backwards?

As the description says, the private lock tends to be global, whereas
the scheduler lock tends to be per-cpu.  There are lots of operations
where you need to iterate over cpus (or the vcpus running on them).
(For instance, the dumping routines that Dario modifies in this patch.)
 Grabbing the private lock once and then grabbing the scheduler locks
sequentially is obviously the right thing to do here.

> Finally, as said in different contexts earlier, I think unconditionally
> acquiring locks in dumping routines isn't the best practice. At least
> in non-debug builds I think these should be try-locks only, skipping
> the dumping when a lock is busy.

You mean so that we don't block the console if there turns out to be a
deadlock?

That makes some sense; but on a busy system that would mean a
non-negligible chance that any give keystroke would be missing
information about some cpu or other, which would be pretty frustrating
for someone trying to figure out the state of their system.

Would it make sense to have a version of spin_trylock for use in this
kind of situation that waits & retries a reasonable number of times
before giving up?

 -George

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
  2015-03-16 20:30   ` Meng Xu
@ 2015-03-17 11:10     ` George Dunlap
  2015-03-17 11:28       ` Jan Beulich
  2015-03-18  1:05       ` Meng Xu
  2015-03-17 14:12     ` Dario Faggioli
  1 sibling, 2 replies; 42+ messages in thread
From: George Dunlap @ 2015-03-17 11:10 UTC (permalink / raw)
  To: Meng Xu, Dario Faggioli; +Cc: Keir Fraser, Jan Beulich, Xen-devel

On 03/16/2015 08:30 PM, Meng Xu wrote:
> Hi Dario,
> 
> 2015-03-16 13:05 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
>> In fact, printing the cpupool's CPU online mask
>> for each vCPU is just redundant, as that is the
>> same for all the vCPUs of all the domains in the
>> same cpupool, while hard affinity is already part
>> of the output of dumping domains info.
>>
>> Instead, print the intersection between hard
>> affinity and online CPUs, which is --in case of this
>> scheduler-- the effective affinity always used for
>> the vCPUs.
> 
> Agree.
> 
>>
>> This change also takes the chance to add a scratch
>> cpumask, to avoid having to create one more
>> cpumask_var_t on the stack of the dumping routine.
> 
> Actually, I have a question about the strength of this design. When we
> have a machine with many cpus, we will end up with allocating a
> cpumask for each cpu. Is this better than having a cpumask_var_t on
> the stack of the dumping routine, since the dumping routine is not in
> the hot path?

The reason for taking this off the stack is that the hypervisor stack is
a fairly limited resource -- IIRC it's only 8k (for each cpu).  If the
call stack gets too deep, the hypervisor will triple-fault.  Keeping
really large variables like cpumasks off the stack is key to making sure
we don't get close to that.

Systems with a lot of cpus can also be assumed I think to have a lot of
memory, so having one per cpu shouldn't really be that bad.

 -George

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)
  2015-03-17 10:54   ` Jan Beulich
  2015-03-17 11:05     ` George Dunlap
@ 2015-03-17 11:14     ` Dario Faggioli
  2015-03-17 11:31       ` Jan Beulich
  1 sibling, 1 reply; 42+ messages in thread
From: Dario Faggioli @ 2015-03-17 11:14 UTC (permalink / raw)
  To: JBeulich; +Cc: Keir (Xen.org), xumengpanda, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3373 bytes --]

On Tue, 2015-03-17 at 10:54 +0000, Jan Beulich wrote:
> >>> On 16.03.15 at 18:05, <dario.faggioli@citrix.com> wrote:
> > such as it is taken care of by the various schedulers, rather
> > than happening in schedule.c. In fact, it is the schedulers
> > that know better which locks are necessary for the specific
> > dumping operations.
> > 
> > While there, fix a few style issues (indentation, trailing
> > whitespace, parentheses and blank line after var declarations)
> > 
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> > Cc: George Dunlap <george.dunlap@eu.citrix.com>
> > Cc: Meng Xu <xumengpanda@gmail.com>
> > Cc: Jan Beulich <JBeulich@suse.com>
> > Cc: Keir Fraser <keir@xen.org>
> > ---
> >  xen/common/sched_credit.c  |   42 ++++++++++++++++++++++++++++++++++++++++--
> >  xen/common/sched_credit2.c |   40 ++++++++++++++++++++++++++++++++--------
> >  xen/common/sched_rt.c      |    7 +++++--
> >  xen/common/schedule.c      |    5 ++---
> >  4 files changed, 79 insertions(+), 15 deletions(-)
> 
> Is it really correct that sched_sedf.c doesn't need any change here?
> 
Good point.

I feel like mentioning though that SEDF is broken in many ways, and I
(and I suspect George is in a similar situation) really don't have the
bandwidth to fix it. Therefore, I really think we should start a
discussion on how to deprecate/get rid of it.

That being said, this is simple enough a case that I certainly can have
a quick look.

> > --- a/xen/common/sched_credit.c
> > +++ b/xen/common/sched_credit.c
> > @@ -26,6 +26,23 @@
> >  
> >  
> >  /*
> > + * Locking:
> > + * - Scheduler-lock (a.k.a. runqueue lock):
> > + *  + is per-runqueue, and there is one runqueue per-cpu;
> > + *  + serializes all runqueue manipulation operations;
> > + * - Private data lock (a.k.a. private scheduler lock):
> > + *  + serializes accesses to the scheduler global state (weight,
> > + *    credit, balance_credit, etc);
> > + *  + serializes updates to the domains' scheduling parameters.
> > + *
> > + * Ordering is "private lock always comes first":
> > + *  + if we need both locks, we must acquire the private
> > + *    scheduler lock for first;
> > + *  + if we already own a runqueue lock, we must never acquire
> > + *    the private scheduler lock.
> > + */
> 
> And this is Credit1 specific?
> 
It is similar (although with some differences) in Credit2, but there is
already a comment there, similar to this, stating it. RTDS, as of now,
only use _one_ lock, so not much to say about ordering. :-D

> Regardless of that, even if that's just reflecting current state, isn't
> acquiring a private lock around a generic lock backwards?
> 
It sounds like so, I agree, when describing it in English. But then,
looking at the code, things make sense, IMO.

> Finally, as said in different contexts earlier, I think unconditionally
> acquiring locks in dumping routines isn't the best practice. At least
> in non-debug builds I think these should be try-locks only, skipping
> the dumping when a lock is busy.
> 
That makes sense. It also looks, if not completely orthogonal, something
that can be done in a follow-up patch wrt this series. I.e., we first
move the locking logic where it belongs, and then we rework it toward
using _trylock(), does this make sense?

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)
  2015-03-17 11:05     ` George Dunlap
@ 2015-03-17 11:18       ` Dario Faggioli
  2015-03-17 11:25       ` Jan Beulich
  1 sibling, 0 replies; 42+ messages in thread
From: Dario Faggioli @ 2015-03-17 11:18 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir (Xen.org), xumengpanda, JBeulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1141 bytes --]

On Tue, 2015-03-17 at 11:05 +0000, George Dunlap wrote:
> On 03/17/2015 10:54 AM, Jan Beulich wrote:

> > Finally, as said in different contexts earlier, I think unconditionally
> > acquiring locks in dumping routines isn't the best practice. At least
> > in non-debug builds I think these should be try-locks only, skipping
> > the dumping when a lock is busy.
> 
> You mean so that we don't block the console if there turns out to be a
> deadlock?
> 
> That makes some sense; but on a busy system that would mean a
> non-negligible chance that any give keystroke would be missing
> information about some cpu or other, which would be pretty frustrating
> for someone trying to figure out the state of their system.
> 
Mmm... that's a very good point, actually, which I did not think about.

> Would it make sense to have a version of spin_trylock for use in this
> kind of situation that waits & retries a reasonable number of times
> before giving up?
> 
Probably. After all, if one comes to the point of sending a debug-key,
he most likely want to see the system status at (almost) any cost...

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)
  2015-03-17 11:05     ` George Dunlap
  2015-03-17 11:18       ` Dario Faggioli
@ 2015-03-17 11:25       ` Jan Beulich
  2015-03-17 11:32         ` George Dunlap
  1 sibling, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2015-03-17 11:25 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir Fraser, Dario Faggioli, Meng Xu, Xen-devel

>>> On 17.03.15 at 12:05, <george.dunlap@eu.citrix.com> wrote:
> On 03/17/2015 10:54 AM, Jan Beulich wrote:
>> Finally, as said in different contexts earlier, I think unconditionally
>> acquiring locks in dumping routines isn't the best practice. At least
>> in non-debug builds I think these should be try-locks only, skipping
>> the dumping when a lock is busy.
> 
> You mean so that we don't block the console if there turns out to be a
> deadlock?

For example. And also to not unduly get in the way of an otherwise
extremely busy system.

> That makes some sense; but on a busy system that would mean a
> non-negligible chance that any give keystroke would be missing
> information about some cpu or other, which would be pretty frustrating
> for someone trying to figure out the state of their system.

Right - ideally there would be an indication of the skipped state
left in the log.

> Would it make sense to have a version of spin_trylock for use in this
> kind of situation that waits & retries a reasonable number of times
> before giving up?

Yes, that might be a possible compromise. I could also imagine
another debug key allowing to alter the behavior, i.e. for when
one absolutely wants the information and doesn't care about
the state of the system.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
  2015-03-17 11:10     ` George Dunlap
@ 2015-03-17 11:28       ` Jan Beulich
  2015-03-18  1:05       ` Meng Xu
  1 sibling, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2015-03-17 11:28 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap, Meng Xu; +Cc: Keir Fraser, Xen-devel

>>> On 17.03.15 at 12:10, <george.dunlap@eu.citrix.com> wrote:
> On 03/16/2015 08:30 PM, Meng Xu wrote:
>> 2015-03-16 13:05 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
>>> This change also takes the chance to add a scratch
>>> cpumask, to avoid having to create one more
>>> cpumask_var_t on the stack of the dumping routine.
>> 
>> Actually, I have a question about the strength of this design. When we
>> have a machine with many cpus, we will end up with allocating a
>> cpumask for each cpu. Is this better than having a cpumask_var_t on
>> the stack of the dumping routine, since the dumping routine is not in
>> the hot path?
> 
> The reason for taking this off the stack is that the hypervisor stack is
> a fairly limited resource -- IIRC it's only 8k (for each cpu).  If the
> call stack gets too deep, the hypervisor will triple-fault.  Keeping
> really large variables like cpumasks off the stack is key to making sure
> we don't get close to that.

Actually here you talk about cpumask_t-s on the stack.
cpumask_var_t-s aren't a problem stack size wise, but are an
issue due to the need to dynamically allocate them (and the
potential failure thereof) when the hypervisor was built for a
large enough number of CPUs.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)
  2015-03-17 11:14     ` Dario Faggioli
@ 2015-03-17 11:31       ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2015-03-17 11:31 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Keir (Xen.org), xumengpanda, George Dunlap, xen-devel

>>> On 17.03.15 at 12:14, <dario.faggioli@citrix.com> wrote:
> On Tue, 2015-03-17 at 10:54 +0000, Jan Beulich wrote:
>> Finally, as said in different contexts earlier, I think unconditionally
>> acquiring locks in dumping routines isn't the best practice. At least
>> in non-debug builds I think these should be try-locks only, skipping
>> the dumping when a lock is busy.
>> 
> That makes sense. It also looks, if not completely orthogonal, something
> that can be done in a follow-up patch wrt this series. I.e., we first
> move the locking logic where it belongs, and then we rework it toward
> using _trylock(), does this make sense?

Sure, the order doesn't matter all that much. And the keyhandler
issue is certainly affecting code elsewhere too, so getting this
improved would presumably not be part of a scheduler specific
patch series.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)
  2015-03-17 11:25       ` Jan Beulich
@ 2015-03-17 11:32         ` George Dunlap
  2015-03-17 11:43           ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: George Dunlap @ 2015-03-17 11:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, Dario Faggioli, Meng Xu, Xen-devel

On 03/17/2015 11:25 AM, Jan Beulich wrote:
>>>> On 17.03.15 at 12:05, <george.dunlap@eu.citrix.com> wrote:
>> On 03/17/2015 10:54 AM, Jan Beulich wrote:
>>> Finally, as said in different contexts earlier, I think unconditionally
>>> acquiring locks in dumping routines isn't the best practice. At least
>>> in non-debug builds I think these should be try-locks only, skipping
>>> the dumping when a lock is busy.
>>
>> You mean so that we don't block the console if there turns out to be a
>> deadlock?
> 
> For example. And also to not unduly get in the way of an otherwise
> extremely busy system.

I don't understand this last argument.  If you're using the debug keys,
you want to know about the state of the system.  I would much rather my
system ran 25% slower for the 5 seconds the debug key was dumping
information, and have a complete snapshot of the system, than for it to
only run 10% slower and to have half the information missing.  The
upshot of missing information would likely be that I have to press the
debug key 3-4 times in a row, meaning I'd be running 10% slower for 20
seconds rather than 25% slower for 5 seconds.

And in any case, the effect of being able to *successfully* grab the
private lock is going to have a much larger impact on the system.

All in all, I don't think the performance of the debug keys should be a
major concern.  The only thing I'd be worried about is making the system
as diagnosable as possible if things have already gone pear-shaped
(e.g., if there's a deadlock).

> Yes, that might be a possible compromise. I could also imagine
> another debug key allowing to alter the behavior, i.e. for when
> one absolutely wants the information and doesn't care about
> the state of the system.

Possible, but it seems like a lot of complication for what it buys you.

 -George

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)
  2015-03-17 11:32         ` George Dunlap
@ 2015-03-17 11:43           ` Jan Beulich
  2015-03-17 12:01             ` George Dunlap
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2015-03-17 11:43 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir Fraser, Dario Faggioli, Meng Xu, Xen-devel

>>> On 17.03.15 at 12:32, <george.dunlap@eu.citrix.com> wrote:
> On 03/17/2015 11:25 AM, Jan Beulich wrote:
>>>>> On 17.03.15 at 12:05, <george.dunlap@eu.citrix.com> wrote:
>>> On 03/17/2015 10:54 AM, Jan Beulich wrote:
>>>> Finally, as said in different contexts earlier, I think unconditionally
>>>> acquiring locks in dumping routines isn't the best practice. At least
>>>> in non-debug builds I think these should be try-locks only, skipping
>>>> the dumping when a lock is busy.
>>>
>>> You mean so that we don't block the console if there turns out to be a
>>> deadlock?
>> 
>> For example. And also to not unduly get in the way of an otherwise
>> extremely busy system.
> 
> I don't understand this last argument.  If you're using the debug keys,
> you want to know about the state of the system.  I would much rather my
> system ran 25% slower for the 5 seconds the debug key was dumping
> information, and have a complete snapshot of the system, than for it to
> only run 10% slower and to have half the information missing.  The
> upshot of missing information would likely be that I have to press the
> debug key 3-4 times in a row, meaning I'd be running 10% slower for 20
> seconds rather than 25% slower for 5 seconds.

Yes, I understand this, and in many cases this is the perspective to
take. Yet I've been in the situation where suggesting the use of
debug keys to learn something about a (partially) live locked system
would have had the risk of causing further corruption to it, and
hence a more careful state dumping approach would have been
desirable.

> All in all, I don't think the performance of the debug keys should be a
> major concern.  The only thing I'd be worried about is making the system
> as diagnosable as possible if things have already gone pear-shaped
> (e.g., if there's a deadlock).

It's not their performance that's of concern, but the effect they
may have on the performance (or even correctness - see how
many process_pending_softirqs() calls we had to sprinkle around
over the years) of other code.

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r)
  2015-03-17 11:43           ` Jan Beulich
@ 2015-03-17 12:01             ` George Dunlap
  0 siblings, 0 replies; 42+ messages in thread
From: George Dunlap @ 2015-03-17 12:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, Dario Faggioli, Meng Xu, Xen-devel

On 03/17/2015 11:43 AM, Jan Beulich wrote:
>>>> On 17.03.15 at 12:32, <george.dunlap@eu.citrix.com> wrote:
>> On 03/17/2015 11:25 AM, Jan Beulich wrote:
>>>>>> On 17.03.15 at 12:05, <george.dunlap@eu.citrix.com> wrote:
>>>> On 03/17/2015 10:54 AM, Jan Beulich wrote:
>>>>> Finally, as said in different contexts earlier, I think unconditionally
>>>>> acquiring locks in dumping routines isn't the best practice. At least
>>>>> in non-debug builds I think these should be try-locks only, skipping
>>>>> the dumping when a lock is busy.
>>>>
>>>> You mean so that we don't block the console if there turns out to be a
>>>> deadlock?
>>>
>>> For example. And also to not unduly get in the way of an otherwise
>>> extremely busy system.
>>
>> I don't understand this last argument.  If you're using the debug keys,
>> you want to know about the state of the system.  I would much rather my
>> system ran 25% slower for the 5 seconds the debug key was dumping
>> information, and have a complete snapshot of the system, than for it to
>> only run 10% slower and to have half the information missing.  The
>> upshot of missing information would likely be that I have to press the
>> debug key 3-4 times in a row, meaning I'd be running 10% slower for 20
>> seconds rather than 25% slower for 5 seconds.
> 
> Yes, I understand this, and in many cases this is the perspective to
> take. Yet I've been in the situation where suggesting the use of
> debug keys to learn something about a (partially) live locked system
> would have had the risk of causing further corruption to it, and
> hence a more careful state dumping approach would have been
> desirable.
> 
>> All in all, I don't think the performance of the debug keys should be a
>> major concern.  The only thing I'd be worried about is making the system
>> as diagnosable as possible if things have already gone pear-shaped
>> (e.g., if there's a deadlock).
> 
> It's not their performance that's of concern, but the effect they
> may have on the performance (or even correctness - see how
> many process_pending_softirqs() calls we had to sprinkle around
> over the years) of other code.

So it sounds like maybe we're actually on the same page, but are using
words slightly differently.  :-)  It sounds like we agree that the
ability to tread carefully on a system which may be having trouble, in
order not to make it worse, is important.  For instance, not wedging the
serial console behind a deadlocked lock, and not further corrupting a
system that had gotten itself wedged in livelock.  Those are things I
would classify under "correctness" and/or "diagnosis".

When I say "performance is not a concern", I mean "it does not concern
me that someone's web page loads 25% slower for the five seconds it
takes to dump the information".  If delaying other parts of the system
causes the system to get wedged or crash, that's obviously a problem.

 -George

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/7] xen: sched_rt: implement the .free_pdata hook
  2015-03-16 18:23     ` Meng Xu
@ 2015-03-17 13:00       ` Dario Faggioli
  0 siblings, 0 replies; 42+ messages in thread
From: Dario Faggioli @ 2015-03-17 13:00 UTC (permalink / raw)
  To: xumengpanda; +Cc: Keir (Xen.org), George Dunlap, JBeulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1244 bytes --]

On Mon, 2015-03-16 at 14:23 -0400, Meng Xu wrote:
> 2015-03-16 13:10 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:

> > We discussed about this issue before, here:
> > http://lists.xen.org/archives/html/xen-devel/2014-09/msg02765.html
> >
> > But, as far as it looks, the code we ended up merging still had the
> > issue (i.e., it did not have the hook implemented)!
> >
> > I was working a bit on/off (for family reasons) in that period, so I
> > can't really tell how that happened. :-/
> > Not a big deal... this code is marked as experimental for a reason, but
> > it probably make sense to double check that anything else similar
> > happened, can you do that?
>
> Sure! I will review that problem, do the double check and let you know
> the result!
> 
I've re-tested that scenario myself, and verified it's working before
sending this series. :-)

What I was suggesting was for you to double check that everything that
you have in your local dev repo(s)/branch(es) (either from that time or
more recent ones) is actually upstream.

In fact, looking at the referenced thread, it looks like this issue
shouldn't have been present in the code we have checked-in, and yet,
there it was.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
  2015-03-16 19:05   ` George Dunlap
@ 2015-03-17 13:51     ` Dario Faggioli
  0 siblings, 0 replies; 42+ messages in thread
From: Dario Faggioli @ 2015-03-17 13:51 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir (Xen.org), xumengpanda, JBeulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1813 bytes --]

On Mon, 2015-03-16 at 19:05 +0000, George Dunlap wrote:
> On 03/16/2015 05:05 PM, Dario Faggioli wrote:

> > @@ -218,7 +224,6 @@ __q_elem(struct list_head *elem)
> >  static void
> >  rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
> >  {
> > -    char cpustr[1024];
> >      cpumask_t *cpupool_mask;
> >  
> >      ASSERT(svc != NULL);
> > @@ -229,10 +234,22 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
> >          return;
> >      }
> >  
> > -    cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity);
> > +    cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
> > +    /*
> > +     * We can't just use 'cpumask_scratch' because the dumping can
> > +     * happen from a pCPU outside of this scheduler's cpupool, and
> > +     * hence it's not right to use the pCPU's scratch mask (which
> > +     * may even not exist!). On the other hand, it is safe to use
> > +     * svc->vcpu->processor's own scratch space, since we own the
> > +     * runqueue lock.
> 
> Since we *hold* the lock.
> 
Right, thanks.

> > +     */
> > +    cpumask_and(_cpumask_scratch[svc->vcpu->processor], cpupool_mask,
> > +                svc->vcpu->cpu_hard_affinity);
> > +    cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch),
> > +                      _cpumask_scratch[svc->vcpu->processor]);
> 
> Just a suggestion, would it be worth making a local variable to avoid
> typing this long thing twice?  
>
It probably would.

> Then you could also put the comment about
> using the svc->vcpu->processor's scratch space above the place where you
> set the local variable, while avoiding breaking up the logic of the
> cpumask operations.
> 
I like this, will do.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
  2015-03-16 20:30   ` Meng Xu
  2015-03-17 11:10     ` George Dunlap
@ 2015-03-17 14:12     ` Dario Faggioli
  2015-03-18  1:07       ` Meng Xu
  1 sibling, 1 reply; 42+ messages in thread
From: Dario Faggioli @ 2015-03-17 14:12 UTC (permalink / raw)
  To: xumengpanda; +Cc: Keir (Xen.org), George Dunlap, JBeulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3017 bytes --]

On Mon, 2015-03-16 at 16:30 -0400, Meng Xu wrote:
> Hi Dario,
> 
Hey,

> 2015-03-16 13:05 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:

> >
> > This change also takes the chance to add a scratch
> > cpumask, to avoid having to create one more
> > cpumask_var_t on the stack of the dumping routine.
> 
> Actually, I have a question about the strength of this design. When we
> have a machine with many cpus, we will end up with allocating a
> cpumask for each cpu. 
>
Just FTR, what we will end up allocating is:
 - an array of *pointers* to cpumasks with as many elements as the 
   number of pCPUs,
 - a cpumask *only* for the pCPUs subjected to an instance of the RTDS 
   scheduler.

So, for instance, if you have 64 pCPUs, but are using the RTDS scheduler
only in a cpupool with 2 pCPUs, you'll have an array of 64 pointers to
cpumask_t, but only 2 actual cpumasks.

> Is this better than having a cpumask_var_t on
> the stack of the dumping routine, since the dumping routine is not in
> the hot path?
> 
George and Jan replied to this already, I think. Allow me to add just a
few words:
> > Such scratch area can be used to kill most of the
> > cpumasks_var_t local variables in other functions
> > in the file, but that is *NOT* done in this chage.
> >
This is the point, actually! As said here, this is not only for the sake
of the dumping routine. In fact, ideally, someone will, in the near
future, go throughout the whole file and kill most of the cpumask_t
local variables, and most of the cpumask dynamic allocations, in favour
of using this scratch area.

> > @@ -409,6 +423,10 @@ rt_init(struct scheduler *ops)
> >      if ( prv == NULL )
> >          return -ENOMEM;
> >
> > +    _cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids);
> 
> Is it better to use xzalloc_array?
> 
Why? IMO, not really. I'm only free()-ing (in rt_free_pdata()) the
elements of the array that have been previously successfully allocated
(in rt_alloc_pdata()), so I don't think there is any special requirement
for all the elements to be NULL right away.

> > +    if ( _cpumask_scratch == NULL )
> > +        return -ENOMEM;
> > +
> >      spin_lock_init(&prv->lock);
> >      INIT_LIST_HEAD(&prv->sdom);
> >      INIT_LIST_HEAD(&prv->runq);
> > @@ -426,6 +444,7 @@ rt_deinit(const struct scheduler *ops)
> >  {
> >      struct rt_private *prv = rt_priv(ops);
> >
> > +    xfree(_cpumask_scratch);
> >      xfree(prv);
> >  }
> >
> > @@ -443,6 +462,9 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
> >      per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
> >      spin_unlock_irqrestore(&prv->lock, flags);
> >
> > +    if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
> 
> Is it better to use zalloc_cpumask_var() here?
> 
Nope. It's a scratch area, after all, so one really should not assume it
to be in a specific state (e.g., no bits set as you're suggesting) when
using it.

Thanks and Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
  2015-03-17 11:10     ` George Dunlap
  2015-03-17 11:28       ` Jan Beulich
@ 2015-03-18  1:05       ` Meng Xu
  1 sibling, 0 replies; 42+ messages in thread
From: Meng Xu @ 2015-03-18  1:05 UTC (permalink / raw)
  To: George Dunlap; +Cc: Dario Faggioli, Keir Fraser, Jan Beulich, Xen-devel

>>>
>>> This change also takes the chance to add a scratch
>>> cpumask, to avoid having to create one more
>>> cpumask_var_t on the stack of the dumping routine.
>>
>> Actually, I have a question about the strength of this design. When we
>> have a machine with many cpus, we will end up with allocating a
>> cpumask for each cpu. Is this better than having a cpumask_var_t on
>> the stack of the dumping routine, since the dumping routine is not in
>> the hot path?
>
> The reason for taking this off the stack is that the hypervisor stack is
> a fairly limited resource -- IIRC it's only 8k (for each cpu).  If the
> call stack gets too deep, the hypervisor will triple-fault.  Keeping
> really large variables like cpumasks off the stack is key to making sure
> we don't get close to that.

I see. I didn't realize the fact of the limited size of hypervisor
stack. That makes sense.


Thank you very much for clarification! :-)

Best,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
  2015-03-17 14:12     ` Dario Faggioli
@ 2015-03-18  1:07       ` Meng Xu
  0 siblings, 0 replies; 42+ messages in thread
From: Meng Xu @ 2015-03-18  1:07 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Keir (Xen.org), George Dunlap, JBeulich, xen-devel

Hi Dario,

2015-03-17 10:12 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
> On Mon, 2015-03-16 at 16:30 -0400, Meng Xu wrote:
>> Hi Dario,
>>
> Hey,
>
>> 2015-03-16 13:05 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>:
>
>> >
>> > This change also takes the chance to add a scratch
>> > cpumask, to avoid having to create one more
>> > cpumask_var_t on the stack of the dumping routine.
>>
>> Actually, I have a question about the strength of this design. When we
>> have a machine with many cpus, we will end up with allocating a
>> cpumask for each cpu.
>>
> Just FTR, what we will end up allocating is:
>  - an array of *pointers* to cpumasks with as many elements as the
>    number of pCPUs,
>  - a cpumask *only* for the pCPUs subjected to an instance of the RTDS
>    scheduler.
>
> So, for instance, if you have 64 pCPUs, but are using the RTDS scheduler
> only in a cpupool with 2 pCPUs, you'll have an array of 64 pointers to
> cpumask_t, but only 2 actual cpumasks.
>
>> Is this better than having a cpumask_var_t on
>> the stack of the dumping routine, since the dumping routine is not in
>> the hot path?
>>
> George and Jan replied to this already, I think. Allow me to add just a
> few words:
>> > Such scratch area can be used to kill most of the
>> > cpumasks_var_t local variables in other functions
>> > in the file, but that is *NOT* done in this chage.
>> >
> This is the point, actually! As said here, this is not only for the sake
> of the dumping routine. In fact, ideally, someone will, in the near
> future, go throughout the whole file and kill most of the cpumask_t
> local variables, and most of the cpumask dynamic allocations, in favour
> of using this scratch area.
>
>> > @@ -409,6 +423,10 @@ rt_init(struct scheduler *ops)
>> >      if ( prv == NULL )
>> >          return -ENOMEM;
>> >
>> > +    _cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids);
>>
>> Is it better to use xzalloc_array?
>>
> Why? IMO, not really. I'm only free()-ing (in rt_free_pdata()) the
> elements of the array that have been previously successfully allocated
> (in rt_alloc_pdata()), so I don't think there is any special requirement
> for all the elements to be NULL right away.

OK. I see.

>
>> > +    if ( _cpumask_scratch == NULL )
>> > +        return -ENOMEM;
>> > +
>> >      spin_lock_init(&prv->lock);
>> >      INIT_LIST_HEAD(&prv->sdom);
>> >      INIT_LIST_HEAD(&prv->runq);
>> > @@ -426,6 +444,7 @@ rt_deinit(const struct scheduler *ops)
>> >  {
>> >      struct rt_private *prv = rt_priv(ops);
>> >
>> > +    xfree(_cpumask_scratch);
>> >      xfree(prv);
>> >  }
>> >
>> > @@ -443,6 +462,9 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
>> >      per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
>> >      spin_unlock_irqrestore(&prv->lock, flags);
>> >
>> > +    if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
>>
>> Is it better to use zalloc_cpumask_var() here?
>>
> Nope. It's a scratch area, after all, so one really should not assume it
> to be in a specific state (e.g., no bits set as you're suggesting) when
> using it.

I see the point. Now I got it. :-)

>
> Thanks and Regards,

Thank you very much for clarification! :-)

Best,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2015-03-18  1:07 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-16 17:04 [PATCH 0/7] Improving dumping of scheduler related info Dario Faggioli
2015-03-16 17:04 ` [PATCH 1/7] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains Dario Faggioli
2015-03-16 18:18   ` George Dunlap
2015-03-16 20:31   ` Meng Xu
2015-03-17 10:41   ` Jan Beulich
2015-03-17 11:00     ` Dario Faggioli
2015-03-16 17:04 ` [PATCH 2/7] xen: sched_rt: implement the .free_pdata hook Dario Faggioli
2015-03-16 17:10   ` Dario Faggioli
2015-03-16 18:23     ` Meng Xu
2015-03-17 13:00       ` Dario Faggioli
2015-03-16 18:17   ` Meng Xu
2015-03-16 18:21   ` George Dunlap
2015-03-16 17:05 ` [PATCH 3/7] xen: rework locking for dump of scheduler info (debug-key r) Dario Faggioli
2015-03-16 18:41   ` George Dunlap
2015-03-16 20:04   ` Meng Xu
2015-03-17 10:54   ` Jan Beulich
2015-03-17 11:05     ` George Dunlap
2015-03-17 11:18       ` Dario Faggioli
2015-03-17 11:25       ` Jan Beulich
2015-03-17 11:32         ` George Dunlap
2015-03-17 11:43           ` Jan Beulich
2015-03-17 12:01             ` George Dunlap
2015-03-17 11:14     ` Dario Faggioli
2015-03-17 11:31       ` Jan Beulich
2015-03-16 17:05 ` [PATCH 4/7] xen: print online pCPUs and free pCPUs when dumping scheduler info Dario Faggioli
2015-03-16 18:37   ` Juergen Gross
2015-03-16 18:46   ` George Dunlap
2015-03-17 10:59   ` Jan Beulich
2015-03-16 17:05 ` [PATCH 5/7] xen: make dumping vcpu info look better Dario Faggioli
2015-03-16 18:48   ` George Dunlap
2015-03-16 20:06   ` Meng Xu
2015-03-16 17:05 ` [PATCH 6/7] xen: sched_credit2: more info when dumping Dario Faggioli
2015-03-16 18:50   ` George Dunlap
2015-03-16 17:05 ` [PATCH 7/7] xen: sched_rt: print useful affinity " Dario Faggioli
2015-03-16 19:05   ` George Dunlap
2015-03-17 13:51     ` Dario Faggioli
2015-03-16 20:30   ` Meng Xu
2015-03-17 11:10     ` George Dunlap
2015-03-17 11:28       ` Jan Beulich
2015-03-18  1:05       ` Meng Xu
2015-03-17 14:12     ` Dario Faggioli
2015-03-18  1:07       ` Meng Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.