xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/6] Fair scheduling
@ 2020-06-12  0:22 Volodymyr Babchuk
  2020-06-12  0:22 ` [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks Volodymyr Babchuk
                   ` (5 more replies)
  0 siblings, 6 replies; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12  0:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Dario Faggioli, Jan Beulich,
	Volodymyr Babchuk, Roger Pau Monné

There was number of discussions about fair scheduling, including latest
one at [1].

In a nutshell, schedulers don't know when pCPU is doing guest-related
work or when it is busy with something else, like running
tasklets. All time spent between two calls to schedule() is being
charged to active vCPU, which can be unfair in some cases.

Andrii Anisov tried to overcome this by counting time spent in
different "modes". But his approach was "entry.S-centric". He tried to
guess correct trap reason as early as possible. This was arm-specific
and quite intrusive. On other hand, it theoretically provided more
precise results.

As a result of that discussion with Dario at [1], we came to
conclusion that for in the first approximation we should not charge
guests for time spent in do_IRQ() and in do_softirq().

This patch series does exactly this. It separately collects time for
IRQ handling and for internal hypervisor tasks. Actually, this
separation is not needed for making scheduling decisions, but it can
prove more information to system administrators.

This is minimal implementation, so it supports only x86 and credit2
scheduler. I chose x86 over ARM because more people can try this
patches. This series provide tracing and xentop support as well, to
ease up experimenting.

I'm open to all suggestions, especially for naming things :)

Those patches also can be pulled from my GH account at [2]

[1] https://lists.xenproject.org/archives/html/xen-devel/2020-06/msg00092.html
[2] https://github.com/lorc/xen/tree/fair_sched_rfc_v1

Volodymyr Babchuk (6):
  sched: track time spent in IRQ handler
  sched: track time spent in hypervisor tasks
  sched, credit2: improve scheduler fairness
  xentop: collect IRQ and HYP time statistics.
  tools: xentop: show time spent in IRQ and HYP states.
  trace: add fair scheduling trace events

 tools/xenstat/libxenstat/src/xenstat.c      |  12 +++
 tools/xenstat/libxenstat/src/xenstat.h      |   6 ++
 tools/xenstat/libxenstat/src/xenstat_priv.h |   2 +
 tools/xenstat/xentop/xentop.c               |  54 ++++++++--
 tools/xentrace/xenalyze.c                   |  37 +++++++
 xen/arch/arm/irq.c                          |   2 +
 xen/arch/x86/irq.c                          |   2 +
 xen/common/sched/core.c                     | 110 ++++++++++++++++++++
 xen/common/sched/credit2.c                  |   2 +-
 xen/common/sched/private.h                  |  10 ++
 xen/common/softirq.c                        |   2 +
 xen/common/sysctl.c                         |   1 +
 xen/include/public/sysctl.h                 |   4 +-
 xen/include/public/trace.h                  |   5 +
 xen/include/xen/sched.h                     |  29 ++++++
 15 files changed, 265 insertions(+), 13 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks
  2020-06-12  0:22 [RFC PATCH v1 0/6] Fair scheduling Volodymyr Babchuk
@ 2020-06-12  0:22 ` Volodymyr Babchuk
  2020-06-12  4:43   ` Jürgen Groß
  2020-06-16 10:10   ` Jan Beulich
  2020-06-12  0:22 ` [RFC PATCH v1 1/6] sched: track time spent in IRQ handler Volodymyr Babchuk
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12  0:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Dario Faggioli, Jan Beulich,
	Volodymyr Babchuk

In most cases hypervisor code performs guest-related jobs. Tasks like
hypercall handling or MMIO access emulation are done for calling vCPU
so it is okay to charge time spent in hypervisor to the current vCPU.

But, there are also tasks that are not originated from guests. This
includes things like TLB flushing or running tasklets. We don't want
to track time spent in this tasks to a total scheduling unit run
time. So we need to track time spent in such housekeeping tasks
separately.

Those hypervisor tasks are run in do_softirq() function, so we'll
install our hooks there.

TODO: This change is not tested on ARM, and probably we'll get a
failing assertion there. This is because ARM code exits from
schedule() and have chance to get to end of do_softirq().

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 xen/common/sched/core.c | 32 ++++++++++++++++++++++++++++++++
 xen/common/softirq.c    |  2 ++
 xen/include/xen/sched.h | 16 +++++++++++++++-
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 8f642ada05..d597811fef 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -945,6 +945,37 @@ void vcpu_end_irq_handler(void)
     atomic_add(delta, &current->sched_unit->irq_time);
 }
 
+void vcpu_begin_hyp_task(struct vcpu *v)
+{
+    if ( is_idle_vcpu(v) )
+        return;
+
+    ASSERT(!v->in_hyp_task);
+
+    v->hyp_entry_time = NOW();
+#ifndef NDEBUG
+    v->in_hyp_task = true;
+#endif
+}
+
+void vcpu_end_hyp_task(struct vcpu *v)
+{
+    int delta;
+
+    if ( is_idle_vcpu(v) )
+        return;
+
+    ASSERT(v->in_hyp_task);
+
+    /* We assume that hypervisor task time will not overflow int */
+    delta = NOW() - v->hyp_entry_time;
+    atomic_add(delta, &v->sched_unit->hyp_time);
+
+#ifndef NDEBUG
+    v->in_hyp_task = false;
+#endif
+}
+
 /*
  * Do the actual movement of an unit from old to new CPU. Locks for *both*
  * CPUs needs to have been taken already when calling this!
@@ -2615,6 +2646,7 @@ static void schedule(void)
 
     SCHED_STAT_CRANK(sched_run);
 
+    vcpu_end_hyp_task(current);
     rcu_read_lock(&sched_res_rculock);
 
     lock = pcpu_schedule_lock_irq(cpu);
diff --git a/xen/common/softirq.c b/xen/common/softirq.c
index 063e93cbe3..03a29384d1 100644
--- a/xen/common/softirq.c
+++ b/xen/common/softirq.c
@@ -71,7 +71,9 @@ void process_pending_softirqs(void)
 void do_softirq(void)
 {
     ASSERT_NOT_IN_ATOMIC();
+    vcpu_begin_hyp_task(current);
     __do_softirq(0);
+    vcpu_end_hyp_task(current);
 }
 
 void open_softirq(int nr, softirq_handler handler)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ceed53364b..51dc7c4551 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -239,7 +239,12 @@ struct vcpu
 
     /* Fair scheduling state */
     uint64_t         irq_entry_time;
+    uint64_t         hyp_entry_time;
     unsigned int     irq_nesting;
+#ifndef NDEBUG
+    bool             in_hyp_task;
+#endif
+
     /* Tasklet for continue_hypercall_on_cpu(). */
     struct tasklet   continue_hypercall_tasklet;
 
@@ -279,8 +284,9 @@ struct sched_unit {
     /* Vcpu state summary. */
     unsigned int           runstate_cnt[4];
 
-    /* Fair scheduling correction value */
+    /* Fair scheduling correction values */
     atomic_t               irq_time;
+    atomic_t               hyp_time;
 
     /* Bitmask of CPUs on which this VCPU may run. */
     cpumask_var_t          cpu_hard_affinity;
@@ -703,6 +709,14 @@ void vcpu_sleep_sync(struct vcpu *v);
 void vcpu_begin_irq_handler(void);
 void vcpu_end_irq_handler(void);
 
+/*
+ * Report to scheduler when we are doing housekeeping tasks on the
+ * current vcpu. This is called during do_softirq() but can be called
+ * anywhere else.
+ */
+void vcpu_begin_hyp_task(struct vcpu *v);
+void vcpu_end_hyp_task(struct vcpu *v);
+
 /*
  * Force synchronisation of given VCPU's state. If it is currently descheduled,
  * this call will ensure that all its state is committed to memory and that
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH v1 1/6] sched: track time spent in IRQ handler
  2020-06-12  0:22 [RFC PATCH v1 0/6] Fair scheduling Volodymyr Babchuk
  2020-06-12  0:22 ` [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks Volodymyr Babchuk
@ 2020-06-12  0:22 ` Volodymyr Babchuk
  2020-06-12  4:36   ` Jürgen Groß
  2020-06-16 10:06   ` Jan Beulich
  2020-06-12  0:22 ` [RFC PATCH v1 3/6] sched, credit2: improve scheduler fairness Volodymyr Babchuk
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12  0:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Dario Faggioli, Jan Beulich,
	Volodymyr Babchuk, Roger Pau Monné

Add code that saves time spent in IRQ handler, so later we can make
adjustments to schedule unit run time.

This and following changes are called upon to provide fair
scheduling. Problem is that any running vCPU can be interrupted by to
handle IRQ which is bound to some other vCPU. Thus, current vCPU can
be charged for a time, it actually didn't used.

TODO: move vcpu_{begin|end}_irq_handler() calls to entry.S for even
more fair time tracking.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 xen/arch/arm/irq.c      |  2 ++
 xen/arch/x86/irq.c      |  2 ++
 xen/common/sched/core.c | 29 +++++++++++++++++++++++++++++
 xen/include/xen/sched.h | 13 +++++++++++++
 4 files changed, 46 insertions(+)

diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 3877657a52..51b517c0cd 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -201,6 +201,7 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq)
     struct irq_desc *desc = irq_to_desc(irq);
     struct irqaction *action;
 
+    vcpu_begin_irq_handler();
     perfc_incr(irqs);
 
     ASSERT(irq >= 16); /* SGIs do not come down this path */
@@ -267,6 +268,7 @@ out:
 out_no_end:
     spin_unlock(&desc->lock);
     irq_exit();
+    vcpu_end_irq_handler();
 }
 
 void release_irq(unsigned int irq, const void *dev_id)
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index a69937c840..3ef4221b64 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1895,6 +1895,7 @@ void do_IRQ(struct cpu_user_regs *regs)
     int               irq = this_cpu(vector_irq)[vector];
     struct cpu_user_regs *old_regs = set_irq_regs(regs);
 
+    vcpu_begin_irq_handler();
     perfc_incr(irqs);
     this_cpu(irq_count)++;
     irq_enter();
@@ -2024,6 +2025,7 @@ void do_IRQ(struct cpu_user_regs *regs)
  out_no_unlock:
     irq_exit();
     set_irq_regs(old_regs);
+    vcpu_end_irq_handler();
 }
 
 static inline bool is_free_pirq(const struct domain *d,
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index cb49a8bc02..8f642ada05 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -916,6 +916,35 @@ void vcpu_unblock(struct vcpu *v)
     vcpu_wake(v);
 }
 
+void vcpu_begin_irq_handler(void)
+{
+    if (is_idle_vcpu(current))
+        return;
+
+    /* XXX: Looks like ASSERT_INTERRUPTS_DISABLED() is available only for x86 */
+    if ( current->irq_nesting++ )
+        return;
+
+    current->irq_entry_time = NOW();
+}
+
+void vcpu_end_irq_handler(void)
+{
+    int delta;
+
+    if (is_idle_vcpu(current))
+        return;
+
+    ASSERT(current->irq_nesting);
+
+    if ( --current->irq_nesting )
+        return;
+
+    /* We assume that irq handling time will not overflow int */
+    delta = NOW() - current->irq_entry_time;
+    atomic_add(delta, &current->sched_unit->irq_time);
+}
+
 /*
  * Do the actual movement of an unit from old to new CPU. Locks for *both*
  * CPUs needs to have been taken already when calling this!
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ac53519d7f..ceed53364b 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -237,6 +237,9 @@ struct vcpu
     evtchn_port_t    virq_to_evtchn[NR_VIRQS];
     spinlock_t       virq_lock;
 
+    /* Fair scheduling state */
+    uint64_t         irq_entry_time;
+    unsigned int     irq_nesting;
     /* Tasklet for continue_hypercall_on_cpu(). */
     struct tasklet   continue_hypercall_tasklet;
 
@@ -276,6 +279,9 @@ struct sched_unit {
     /* Vcpu state summary. */
     unsigned int           runstate_cnt[4];
 
+    /* Fair scheduling correction value */
+    atomic_t               irq_time;
+
     /* Bitmask of CPUs on which this VCPU may run. */
     cpumask_var_t          cpu_hard_affinity;
     /* Used to save affinity during temporary pinning. */
@@ -690,6 +696,13 @@ long vcpu_yield(void);
 void vcpu_sleep_nosync(struct vcpu *v);
 void vcpu_sleep_sync(struct vcpu *v);
 
+/*
+ * Report IRQ handling time to scheduler. As IRQs can be nested,
+ * next two functions are re-enterable.
+ */
+void vcpu_begin_irq_handler(void);
+void vcpu_end_irq_handler(void);
+
 /*
  * Force synchronisation of given VCPU's state. If it is currently descheduled,
  * this call will ensure that all its state is committed to memory and that
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH v1 3/6] sched, credit2: improve scheduler fairness
  2020-06-12  0:22 [RFC PATCH v1 0/6] Fair scheduling Volodymyr Babchuk
  2020-06-12  0:22 ` [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks Volodymyr Babchuk
  2020-06-12  0:22 ` [RFC PATCH v1 1/6] sched: track time spent in IRQ handler Volodymyr Babchuk
@ 2020-06-12  0:22 ` Volodymyr Babchuk
  2020-06-12  4:51   ` Jürgen Groß
  2020-06-12  0:22 ` [RFC PATCH v1 5/6] tools: xentop: show time spent in IRQ and HYP states Volodymyr Babchuk
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12  0:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Volodymyr Babchuk, George Dunlap, Dario Faggioli

Now we can make corrections for scheduling unit run time, based on
data gathered in previous patches. This includes time spent in IRQ
handlers and time spent for hypervisor housekeeping tasks. Those time
spans needs to be deduced from a total run time.

This patch adds sched_get_time_correction() function which returns
time correction value. This value should be subtracted by a scheduler
implementation from a total vCPU/shced_unit run time.

TODO: Make the corresponding changes to all other schedulers.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 xen/common/sched/core.c    | 23 +++++++++++++++++++++++
 xen/common/sched/credit2.c |  2 +-
 xen/common/sched/private.h | 10 ++++++++++
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index d597811fef..a7294ff5c3 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -974,6 +974,29 @@ void vcpu_end_hyp_task(struct vcpu *v)
 #ifndef NDEBUG
     v->in_hyp_task = false;
 #endif
+
+s_time_t sched_get_time_correction(struct sched_unit *u)
+{
+    unsigned long flags;
+    int irq, hyp;
+
+    while ( true )
+    {
+        irq = atomic_read(&u->irq_time);
+        if ( likely( irq == atomic_cmpxchg(&u->irq_time, irq, 0)) )
+            break;
+    }
+
+    while ( true )
+    {
+        hyp = atomic_read(&u->hyp_time);
+        if ( likely( hyp == atomic_cmpxchg(&u->hyp_time, hyp, 0)) )
+            break;
+    }
+
+    return irq + hyp;
+}
+
 }
 
 /*
diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index 34f05c3e2a..7a0aca078b 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -1722,7 +1722,7 @@ void burn_credits(struct csched2_runqueue_data *rqd,
         return;
     }
 
-    delta = now - svc->start_time;
+    delta = now - svc->start_time - sched_get_time_correction(svc->unit);
 
     if ( unlikely(delta <= 0) )
     {
diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index b9a5b4c01c..3f4859ce23 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -604,4 +604,14 @@ void cpupool_put(struct cpupool *pool);
 int cpupool_add_domain(struct domain *d, int poolid);
 void cpupool_rm_domain(struct domain *d);
 
+/*
+ * Get amount of time spent doing non-guest related work on
+ * current scheduling unit. This includes time spent in soft IRQs
+ * and in hardware interrupt handlers.
+ *
+ * Call to this function resets the counters, so it is supposed to
+ * be called when scheduler calculates time used by the scheduling
+ * unit.
+ */
+s_time_t sched_get_time_correction(struct sched_unit *u);
 #endif /* __XEN_SCHED_IF_H__ */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-12  0:22 [RFC PATCH v1 0/6] Fair scheduling Volodymyr Babchuk
                   ` (4 preceding siblings ...)
  2020-06-12  0:22 ` [RFC PATCH v1 6/6] trace: add fair scheduling trace events Volodymyr Babchuk
@ 2020-06-12  0:22 ` Volodymyr Babchuk
  2020-06-12  4:57   ` Jürgen Groß
  5 siblings, 1 reply; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12  0:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Dario Faggioli, Jan Beulich,
	Volodymyr Babchuk

As scheduler code now collects time spent in IRQ handlers and in
do_softirq(), we can present those values to userspace tools like
xentop, so system administrator can see how system behaves.

We are updating counters only in sched_get_time_correction() function
to minimize number of taken spinlocks. As atomic_t is 32 bit wide, it
is not enough to store time with nanosecond precision. So we need to
use 64 bit variables and protect them with spinlock.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 xen/common/sched/core.c     | 17 +++++++++++++++++
 xen/common/sysctl.c         |  1 +
 xen/include/public/sysctl.h |  4 +++-
 xen/include/xen/sched.h     |  2 ++
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index a7294ff5c3..ee6b1d9161 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -95,6 +95,10 @@ static struct scheduler __read_mostly ops;
 
 static bool scheduler_active;
 
+static DEFINE_SPINLOCK(sched_stat_lock);
+s_time_t sched_stat_irq_time;
+s_time_t sched_stat_hyp_time;
+
 static void sched_set_affinity(
     struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft);
 
@@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct sched_unit *u)
             break;
     }
 
+    spin_lock_irqsave(&sched_stat_lock, flags);
+    sched_stat_irq_time += irq;
+    sched_stat_hyp_time += hyp;
+    spin_unlock_irqrestore(&sched_stat_lock, flags);
+
     return irq + hyp;
 }
 
+void sched_get_time_stats(uint64_t *irq_time, uint64_t *hyp_time)
+{
+    unsigned long flags;
+
+    spin_lock_irqsave(&sched_stat_lock, flags);
+    *irq_time = sched_stat_irq_time;
+    *hyp_time = sched_stat_hyp_time;
+    spin_unlock_irqrestore(&sched_stat_lock, flags);
 }
 
 /*
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 1c6a817476..00683bc93f 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -270,6 +270,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         pi->scrub_pages = 0;
         pi->cpu_khz = cpu_khz;
         pi->max_mfn = get_upper_mfn_bound();
+        sched_get_time_stats(&pi->irq_time, &pi->hyp_time);
         arch_do_physinfo(pi);
         if ( iommu_enabled )
         {
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 3a08c512e8..f320144d40 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -35,7 +35,7 @@
 #include "domctl.h"
 #include "physdev.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x00000013
+#define XEN_SYSCTL_INTERFACE_VERSION 0x00000014
 
 /*
  * Read console content from Xen buffer ring.
@@ -118,6 +118,8 @@ struct xen_sysctl_physinfo {
     uint64_aligned_t scrub_pages;
     uint64_aligned_t outstanding_pages;
     uint64_aligned_t max_mfn; /* Largest possible MFN on this host */
+    uint64_aligned_t irq_time;
+    uint64_aligned_t hyp_time;
     uint32_t hw_cap[8];
 };
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 51dc7c4551..869d4efbd6 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -717,6 +717,8 @@ void vcpu_end_irq_handler(void);
 void vcpu_begin_hyp_task(struct vcpu *v);
 void vcpu_end_hyp_task(struct vcpu *v);
 
+void sched_get_time_stats(uint64_t *irq_time, uint64_t *hyp_time);
+
 /*
  * Force synchronisation of given VCPU's state. If it is currently descheduled,
  * this call will ensure that all its state is committed to memory and that
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH v1 6/6] trace: add fair scheduling trace events
  2020-06-12  0:22 [RFC PATCH v1 0/6] Fair scheduling Volodymyr Babchuk
                   ` (3 preceding siblings ...)
  2020-06-12  0:22 ` [RFC PATCH v1 5/6] tools: xentop: show time spent in IRQ and HYP states Volodymyr Babchuk
@ 2020-06-12  0:22 ` Volodymyr Babchuk
  2020-06-12  0:22 ` [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics Volodymyr Babchuk
  5 siblings, 0 replies; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12  0:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Dario Faggioli, Jan Beulich,
	Volodymyr Babchuk

We are tracing each IRQ or HYP mode change and the calculated time
adjustment values.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 tools/xentrace/xenalyze.c  | 37 +++++++++++++++++++++++++++++++++++++
 xen/common/sched/core.c    |  9 +++++++++
 xen/include/public/trace.h |  5 +++++
 3 files changed, 51 insertions(+)

diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index b7f4e2bea8..bcde830f0e 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -7546,6 +7546,43 @@ void sched_process(struct pcpu_info *p)
                 printf("\n");
             }
             break;
+        case TRC_SCHED_IRQ_ENTRY:
+        case TRC_SCHED_IRQ_LEAVE:
+            if(opt.dump_all)
+            {
+                struct {
+                    unsigned int nesting;
+                } *r = (typeof(r))ri->d;
+
+                printf(" %s sched_irq_%s nesting = %u\n", ri->dump_header,
+                       ri->event == TRC_SCHED_IRQ_ENTRY ? "entry":"leave",
+                       r->nesting);
+            }
+            break;
+        case TRC_SCHED_HYP_ENTRY:
+        case TRC_SCHED_HYP_LEAVE:
+            if(opt.dump_all)
+            {
+                struct {
+                    unsigned int domid, vcpuid;
+                } *r = (typeof(r))ri->d;
+
+                printf(" %s sched_hyp_%s d%uv%u\n", ri->dump_header,
+                       ri->event == TRC_SCHED_HYP_ENTRY ? "entry":"leave",
+                       r->domid, r->vcpuid);
+            }
+            break;
+        case TRC_SCHED_TIME_ADJ:
+            if(opt.dump_all)
+            {
+                struct {
+                    unsigned int irq, hyp;
+                } *r = (typeof(r))ri->d;
+
+                printf(" %s sched time adjust IRQ %uns HYP %uns Total %uns\n", ri->dump_header,
+                       r->irq, r->hyp, r->irq + r->hyp);
+            }
+            break;
         case TRC_SCHED_CTL:
         case TRC_SCHED_S_TIMER_FN:
         case TRC_SCHED_T_TIMER_FN:
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index ee6b1d9161..9e82a6a22b 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -925,6 +925,8 @@ void vcpu_begin_irq_handler(void)
     if (is_idle_vcpu(current))
         return;
 
+    TRACE_1D(TRC_SCHED_IRQ_ENTRY, current->irq_nesting);
+
     /* XXX: Looks like ASSERT_INTERRUPTS_DISABLED() is available only for x86 */
     if ( current->irq_nesting++ )
         return;
@@ -941,6 +943,8 @@ void vcpu_end_irq_handler(void)
 
     ASSERT(current->irq_nesting);
 
+    TRACE_1D(TRC_SCHED_IRQ_LEAVE, current->irq_nesting - 1);
+
     if ( --current->irq_nesting )
         return;
 
@@ -960,6 +964,7 @@ void vcpu_begin_hyp_task(struct vcpu *v)
 #ifndef NDEBUG
     v->in_hyp_task = true;
 #endif
+    TRACE_2D(TRC_SCHED_HYP_ENTRY, v->domain->domain_id, v->vcpu_id);
 }
 
 void vcpu_end_hyp_task(struct vcpu *v)
@@ -978,6 +983,8 @@ void vcpu_end_hyp_task(struct vcpu *v)
 #ifndef NDEBUG
     v->in_hyp_task = false;
 #endif
+    TRACE_2D(TRC_SCHED_HYP_LEAVE, v->domain->domain_id, v->vcpu_id);
+}
 
 s_time_t sched_get_time_correction(struct sched_unit *u)
 {
@@ -1003,6 +1010,8 @@ s_time_t sched_get_time_correction(struct sched_unit *u)
     sched_stat_hyp_time += hyp;
     spin_unlock_irqrestore(&sched_stat_lock, flags);
 
+    TRACE_2D(TRC_SCHED_TIME_ADJ, irq, hyp);
+
     return irq + hyp;
 }
 
diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h
index d5fa4aea8d..6161980095 100644
--- a/xen/include/public/trace.h
+++ b/xen/include/public/trace.h
@@ -117,6 +117,11 @@
 #define TRC_SCHED_SWITCH_INFNEXT (TRC_SCHED_VERBOSE + 15)
 #define TRC_SCHED_SHUTDOWN_CODE  (TRC_SCHED_VERBOSE + 16)
 #define TRC_SCHED_SWITCH_INFCONT (TRC_SCHED_VERBOSE + 17)
+#define TRC_SCHED_IRQ_ENTRY      (TRC_SCHED_VERBOSE + 18)
+#define TRC_SCHED_IRQ_LEAVE      (TRC_SCHED_VERBOSE + 19)
+#define TRC_SCHED_HYP_ENTRY      (TRC_SCHED_VERBOSE + 20)
+#define TRC_SCHED_HYP_LEAVE      (TRC_SCHED_VERBOSE + 21)
+#define TRC_SCHED_TIME_ADJ       (TRC_SCHED_VERBOSE + 22)
 
 #define TRC_DOM0_DOM_ADD         (TRC_DOM0_DOMOPS + 1)
 #define TRC_DOM0_DOM_REM         (TRC_DOM0_DOMOPS + 2)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH v1 5/6] tools: xentop: show time spent in IRQ and HYP states.
  2020-06-12  0:22 [RFC PATCH v1 0/6] Fair scheduling Volodymyr Babchuk
                   ` (2 preceding siblings ...)
  2020-06-12  0:22 ` [RFC PATCH v1 3/6] sched, credit2: improve scheduler fairness Volodymyr Babchuk
@ 2020-06-12  0:22 ` Volodymyr Babchuk
  2020-06-12  0:22 ` [RFC PATCH v1 6/6] trace: add fair scheduling trace events Volodymyr Babchuk
  2020-06-12  0:22 ` [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics Volodymyr Babchuk
  5 siblings, 0 replies; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12  0:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Volodymyr Babchuk, Wei Liu

xentop show the values in the header like this:

IRQ Time 0.2s    0.0% HYP Time 1.3s    0.1%

The first value is the total time spent in corresponding mode, the
second value is the instant load percentage, similar to vCPU load
value.

"IRQ" corresponds to time spent in IRQ handler.
"HYP" is the time used by hypervisor for own tasks.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 tools/xenstat/libxenstat/src/xenstat.c      | 12 +++++
 tools/xenstat/libxenstat/src/xenstat.h      |  6 +++
 tools/xenstat/libxenstat/src/xenstat_priv.h |  2 +
 tools/xenstat/xentop/xentop.c               | 54 ++++++++++++++++-----
 4 files changed, 63 insertions(+), 11 deletions(-)

diff --git a/tools/xenstat/libxenstat/src/xenstat.c b/tools/xenstat/libxenstat/src/xenstat.c
index 6f93d4e982..30c9d3d2cc 100644
--- a/tools/xenstat/libxenstat/src/xenstat.c
+++ b/tools/xenstat/libxenstat/src/xenstat.c
@@ -162,6 +162,8 @@ xenstat_node *xenstat_get_node(xenstat_handle * handle, unsigned int flags)
 	node->free_mem = ((unsigned long long)physinfo.free_pages)
 	    * handle->page_size;
 
+	node->irq_time = physinfo.irq_time;
+	node->hyp_time = physinfo.hyp_time;
 	node->freeable_mb = 0;
 	/* malloc(0) is not portable, so allocate a single domain.  This will
 	 * be resized below. */
@@ -332,6 +334,16 @@ unsigned long long xenstat_node_cpu_hz(xenstat_node * node)
 	return node->cpu_hz;
 }
 
+unsigned long long xenstat_node_irq_time(xenstat_node * node)
+{
+	return node->irq_time;
+}
+
+unsigned long long xenstat_node_hyp_time(xenstat_node * node)
+{
+	return node->hyp_time;
+}
+
 /* Get the domain ID for this domain */
 unsigned xenstat_domain_id(xenstat_domain * domain)
 {
diff --git a/tools/xenstat/libxenstat/src/xenstat.h b/tools/xenstat/libxenstat/src/xenstat.h
index 76a660f321..8d2e561008 100644
--- a/tools/xenstat/libxenstat/src/xenstat.h
+++ b/tools/xenstat/libxenstat/src/xenstat.h
@@ -80,6 +80,12 @@ unsigned int xenstat_node_num_cpus(xenstat_node * node);
 /* Get information about the CPU speed */
 unsigned long long xenstat_node_cpu_hz(xenstat_node * node);
 
+/* Get information about time spent in IRQ handlers */
+unsigned long long xenstat_node_irq_time(xenstat_node * node);
+
+/* Get information about time spent doing hypervisor work */
+unsigned long long xenstat_node_hyp_time(xenstat_node * node);
+
 /*
  * Domain functions - extract information from a xenstat_domain
  */
diff --git a/tools/xenstat/libxenstat/src/xenstat_priv.h b/tools/xenstat/libxenstat/src/xenstat_priv.h
index 4eb44a8ebb..d259765593 100644
--- a/tools/xenstat/libxenstat/src/xenstat_priv.h
+++ b/tools/xenstat/libxenstat/src/xenstat_priv.h
@@ -48,6 +48,8 @@ struct xenstat_node {
 	unsigned long long tot_mem;
 	unsigned long long free_mem;
 	unsigned int num_domains;
+	unsigned long long irq_time;
+	unsigned long long hyp_time;
 	xenstat_domain *domains;	/* Array of length num_domains */
 	long freeable_mb;
 };
diff --git a/tools/xenstat/xentop/xentop.c b/tools/xenstat/xentop/xentop.c
index ebed070c0f..aaeba81cd9 100644
--- a/tools/xenstat/xentop/xentop.c
+++ b/tools/xenstat/xentop/xentop.c
@@ -496,11 +496,25 @@ static void print_cpu(xenstat_domain *domain)
 	print("%10llu", xenstat_domain_cpu_ns(domain)/1000000000);
 }
 
+/* Helper to calculate CPU load percentage */
+static double calc_time_pct(uint64_t cur_time_ns, uint64_t prev_time_ns)
+{
+	double us_elapsed;
+
+	/* Calculate the time elapsed in microseconds */
+	us_elapsed = ((curtime.tv_sec-oldtime.tv_sec)*1000000.0
+		      +(curtime.tv_usec - oldtime.tv_usec));
+
+	/* In the following, nanoseconds must be multiplied by 1000.0 to
+	 * convert to microseconds, then divided by 100.0 to get a percentage,
+	 * resulting in a multiplication by 10.0 */
+	return ((cur_time_ns - prev_time_ns) / 10.0) / us_elapsed;
+}
+
 /* Computes the CPU percentage used for a specified domain */
 static double get_cpu_pct(xenstat_domain *domain)
 {
 	xenstat_domain *old_domain;
-	double us_elapsed;
 
 	/* Can't calculate CPU percentage without a previous sample. */
 	if(prev_node == NULL)
@@ -510,15 +524,8 @@ static double get_cpu_pct(xenstat_domain *domain)
 	if(old_domain == NULL)
 		return 0.0;
 
-	/* Calculate the time elapsed in microseconds */
-	us_elapsed = ((curtime.tv_sec-oldtime.tv_sec)*1000000.0
-		      +(curtime.tv_usec - oldtime.tv_usec));
-
-	/* In the following, nanoseconds must be multiplied by 1000.0 to
-	 * convert to microseconds, then divided by 100.0 to get a percentage,
-	 * resulting in a multiplication by 10.0 */
-	return ((xenstat_domain_cpu_ns(domain)
-		 -xenstat_domain_cpu_ns(old_domain))/10.0)/us_elapsed;
+	return calc_time_pct(xenstat_domain_cpu_ns(domain),
+						 xenstat_domain_cpu_ns(old_domain));
 }
 
 static int compare_cpu_pct(xenstat_domain *domain1, xenstat_domain *domain2)
@@ -878,6 +885,23 @@ static void print_ssid(xenstat_domain *domain)
 	print("%4u", xenstat_domain_ssid(domain));
 }
 
+/* Computes the Xen time stats in percents */
+static void get_xen_time_stats(double *irq_pct, double *hyp_pct)
+{
+	/* Can't calculate CPU percentage without a previous sample. */
+	if(prev_node == NULL)
+	{
+		*irq_pct = 0.0;
+		*hyp_pct = 0.0;
+		return;
+	}
+
+	*irq_pct = calc_time_pct(xenstat_node_irq_time(cur_node),
+							 xenstat_node_irq_time(prev_node));
+	*hyp_pct = calc_time_pct(xenstat_node_hyp_time(cur_node),
+							 xenstat_node_hyp_time(prev_node));
+}
+
 /* Resets default_width for fields with potentially large numbers */
 void reset_field_widths(void)
 {
@@ -943,6 +967,7 @@ void do_summary(void)
 	         crash = 0, dying = 0, shutdown = 0;
 	unsigned i, num_domains = 0;
 	unsigned long long used = 0;
+	double irq_pct, hyp_pct;
 	xenstat_domain *domain;
 	time_t curt;
 
@@ -975,9 +1000,16 @@ void do_summary(void)
 	      xenstat_node_tot_mem(cur_node)/1024, used/1024,
 	      xenstat_node_free_mem(cur_node)/1024);
 
-	print("CPUs: %u @ %lluMHz\n",
+	print("CPUs: %u @ %lluMHz  ",
 	      xenstat_node_num_cpus(cur_node),
 	      xenstat_node_cpu_hz(cur_node)/1000000);
+
+	get_xen_time_stats(&irq_pct, &hyp_pct);
+	print("IRQ Time %.1fs %6.1f%% HYP Time %.1fs %6.1f%%\n",
+		  xenstat_node_irq_time(cur_node) / 1000000000.0,
+		  irq_pct,
+		  xenstat_node_hyp_time(cur_node) / 1000000000.0,
+		  hyp_pct);
 }
 
 /* Display the top header for the domain table */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 1/6] sched: track time spent in IRQ handler
  2020-06-12  0:22 ` [RFC PATCH v1 1/6] sched: track time spent in IRQ handler Volodymyr Babchuk
@ 2020-06-12  4:36   ` Jürgen Groß
  2020-06-12 11:26     ` Volodymyr Babchuk
  2020-06-16 10:06   ` Jan Beulich
  1 sibling, 1 reply; 43+ messages in thread
From: Jürgen Groß @ 2020-06-12  4:36 UTC (permalink / raw)
  To: Volodymyr Babchuk, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Dario Faggioli, Jan Beulich,
	Roger Pau Monné

On 12.06.20 02:22, Volodymyr Babchuk wrote:
> Add code that saves time spent in IRQ handler, so later we can make
> adjustments to schedule unit run time.
> 
> This and following changes are called upon to provide fair
> scheduling. Problem is that any running vCPU can be interrupted by to
> handle IRQ which is bound to some other vCPU. Thus, current vCPU can
> be charged for a time, it actually didn't used.
> 
> TODO: move vcpu_{begin|end}_irq_handler() calls to entry.S for even
> more fair time tracking.
> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>   xen/arch/arm/irq.c      |  2 ++
>   xen/arch/x86/irq.c      |  2 ++
>   xen/common/sched/core.c | 29 +++++++++++++++++++++++++++++
>   xen/include/xen/sched.h | 13 +++++++++++++
>   4 files changed, 46 insertions(+)
> 
> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> index 3877657a52..51b517c0cd 100644
> --- a/xen/arch/arm/irq.c
> +++ b/xen/arch/arm/irq.c
> @@ -201,6 +201,7 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq)
>       struct irq_desc *desc = irq_to_desc(irq);
>       struct irqaction *action;
>   
> +    vcpu_begin_irq_handler();
>       perfc_incr(irqs);
>   
>       ASSERT(irq >= 16); /* SGIs do not come down this path */
> @@ -267,6 +268,7 @@ out:
>   out_no_end:
>       spin_unlock(&desc->lock);
>       irq_exit();
> +    vcpu_end_irq_handler();
>   }
>   
>   void release_irq(unsigned int irq, const void *dev_id)
> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> index a69937c840..3ef4221b64 100644
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1895,6 +1895,7 @@ void do_IRQ(struct cpu_user_regs *regs)
>       int               irq = this_cpu(vector_irq)[vector];
>       struct cpu_user_regs *old_regs = set_irq_regs(regs);
>   
> +    vcpu_begin_irq_handler();
>       perfc_incr(irqs);
>       this_cpu(irq_count)++;
>       irq_enter();
> @@ -2024,6 +2025,7 @@ void do_IRQ(struct cpu_user_regs *regs)
>    out_no_unlock:
>       irq_exit();
>       set_irq_regs(old_regs);
> +    vcpu_end_irq_handler();
>   }
>   
>   static inline bool is_free_pirq(const struct domain *d,
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index cb49a8bc02..8f642ada05 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -916,6 +916,35 @@ void vcpu_unblock(struct vcpu *v)
>       vcpu_wake(v);
>   }
>   
> +void vcpu_begin_irq_handler(void)
> +{
> +    if (is_idle_vcpu(current))
> +        return;
> +
> +    /* XXX: Looks like ASSERT_INTERRUPTS_DISABLED() is available only for x86 */
> +    if ( current->irq_nesting++ )
> +        return;
> +
> +    current->irq_entry_time = NOW();
> +}
> +
> +void vcpu_end_irq_handler(void)
> +{
> +    int delta;
> +
> +    if (is_idle_vcpu(current))
> +        return;
> +
> +    ASSERT(current->irq_nesting);
> +
> +    if ( --current->irq_nesting )
> +        return;
> +
> +    /* We assume that irq handling time will not overflow int */

This assumption might not hold for long running VMs.


Juergen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks
  2020-06-12  0:22 ` [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks Volodymyr Babchuk
@ 2020-06-12  4:43   ` Jürgen Groß
  2020-06-12 11:30     ` Volodymyr Babchuk
  2020-06-16 10:10   ` Jan Beulich
  1 sibling, 1 reply; 43+ messages in thread
From: Jürgen Groß @ 2020-06-12  4:43 UTC (permalink / raw)
  To: Volodymyr Babchuk, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Dario Faggioli, Jan Beulich

On 12.06.20 02:22, Volodymyr Babchuk wrote:
> In most cases hypervisor code performs guest-related jobs. Tasks like
> hypercall handling or MMIO access emulation are done for calling vCPU
> so it is okay to charge time spent in hypervisor to the current vCPU.
> 
> But, there are also tasks that are not originated from guests. This
> includes things like TLB flushing or running tasklets. We don't want
> to track time spent in this tasks to a total scheduling unit run
> time. So we need to track time spent in such housekeeping tasks
> separately.
> 
> Those hypervisor tasks are run in do_softirq() function, so we'll
> install our hooks there.
> 
> TODO: This change is not tested on ARM, and probably we'll get a
> failing assertion there. This is because ARM code exits from
> schedule() and have chance to get to end of do_softirq().
> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>   xen/common/sched/core.c | 32 ++++++++++++++++++++++++++++++++
>   xen/common/softirq.c    |  2 ++
>   xen/include/xen/sched.h | 16 +++++++++++++++-
>   3 files changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index 8f642ada05..d597811fef 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -945,6 +945,37 @@ void vcpu_end_irq_handler(void)
>       atomic_add(delta, &current->sched_unit->irq_time);
>   }
>   
> +void vcpu_begin_hyp_task(struct vcpu *v)
> +{
> +    if ( is_idle_vcpu(v) )
> +        return;
> +
> +    ASSERT(!v->in_hyp_task);
> +
> +    v->hyp_entry_time = NOW();
> +#ifndef NDEBUG
> +    v->in_hyp_task = true;
> +#endif
> +}
> +
> +void vcpu_end_hyp_task(struct vcpu *v)
> +{
> +    int delta;
> +
> +    if ( is_idle_vcpu(v) )
> +        return;
> +
> +    ASSERT(v->in_hyp_task);
> +
> +    /* We assume that hypervisor task time will not overflow int */

This will definitely happen for long running VMs. Please use a 64-bit
variable.

> +    delta = NOW() - v->hyp_entry_time;
> +    atomic_add(delta, &v->sched_unit->hyp_time);
> +
> +#ifndef NDEBUG
> +    v->in_hyp_task = false;
> +#endif
> +}
> +
>   /*
>    * Do the actual movement of an unit from old to new CPU. Locks for *both*
>    * CPUs needs to have been taken already when calling this!
> @@ -2615,6 +2646,7 @@ static void schedule(void)
>   
>       SCHED_STAT_CRANK(sched_run);
>   
> +    vcpu_end_hyp_task(current);
>       rcu_read_lock(&sched_res_rculock);
>   
>       lock = pcpu_schedule_lock_irq(cpu);
> diff --git a/xen/common/softirq.c b/xen/common/softirq.c
> index 063e93cbe3..03a29384d1 100644
> --- a/xen/common/softirq.c
> +++ b/xen/common/softirq.c
> @@ -71,7 +71,9 @@ void process_pending_softirqs(void)
>   void do_softirq(void)
>   {
>       ASSERT_NOT_IN_ATOMIC();
> +    vcpu_begin_hyp_task(current);
>       __do_softirq(0);
> +    vcpu_end_hyp_task(current);

This won't work for scheduling. current will either have changed,
or in x86 case __do_softirq() might just not return. You need to
handle that case explicitly in schedule() (you did that for the
old vcpu, but for the case schedule() is returning you need to
call vcpu_begin_hyp_task(current) there).


Juergen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 3/6] sched, credit2: improve scheduler fairness
  2020-06-12  0:22 ` [RFC PATCH v1 3/6] sched, credit2: improve scheduler fairness Volodymyr Babchuk
@ 2020-06-12  4:51   ` Jürgen Groß
  2020-06-12 11:38     ` Volodymyr Babchuk
  0 siblings, 1 reply; 43+ messages in thread
From: Jürgen Groß @ 2020-06-12  4:51 UTC (permalink / raw)
  To: Volodymyr Babchuk, xen-devel; +Cc: George Dunlap, Dario Faggioli

On 12.06.20 02:22, Volodymyr Babchuk wrote:
> Now we can make corrections for scheduling unit run time, based on
> data gathered in previous patches. This includes time spent in IRQ
> handlers and time spent for hypervisor housekeeping tasks. Those time
> spans needs to be deduced from a total run time.
> 
> This patch adds sched_get_time_correction() function which returns
> time correction value. This value should be subtracted by a scheduler
> implementation from a total vCPU/shced_unit run time.
> 
> TODO: Make the corresponding changes to all other schedulers.
> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>   xen/common/sched/core.c    | 23 +++++++++++++++++++++++
>   xen/common/sched/credit2.c |  2 +-
>   xen/common/sched/private.h | 10 ++++++++++
>   3 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index d597811fef..a7294ff5c3 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -974,6 +974,29 @@ void vcpu_end_hyp_task(struct vcpu *v)
>   #ifndef NDEBUG
>       v->in_hyp_task = false;
>   #endif
> +
> +s_time_t sched_get_time_correction(struct sched_unit *u)
> +{
> +    unsigned long flags;
> +    int irq, hyp;

Using "irq" for a time value is misleading IMO.

> +
> +    while ( true )
> +    {
> +        irq = atomic_read(&u->irq_time);
> +        if ( likely( irq == atomic_cmpxchg(&u->irq_time, irq, 0)) )
> +            break;
> +    }

Just use atomic_xchg()?

> +
> +    while ( true )
> +    {
> +        hyp = atomic_read(&u->hyp_time);
> +        if ( likely( hyp == atomic_cmpxchg(&u->hyp_time, hyp, 0)) )
> +            break;
> +    }
> +
> +    return irq + hyp;

Ah, I didn't look into this patch until now.

You can replace my comments about overflow of an int for patches 1 and 2
with:

   Please modify the comment about not overflowing hinting to the value
   being reset when making scheduling decisions.

And this (of course) needs to be handled in all other schedulers, too.


Juergen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-12  0:22 ` [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics Volodymyr Babchuk
@ 2020-06-12  4:57   ` Jürgen Groß
  2020-06-12 11:44     ` Volodymyr Babchuk
  2020-06-12 12:29     ` Julien Grall
  0 siblings, 2 replies; 43+ messages in thread
From: Jürgen Groß @ 2020-06-12  4:57 UTC (permalink / raw)
  To: Volodymyr Babchuk, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Dario Faggioli, Jan Beulich

On 12.06.20 02:22, Volodymyr Babchuk wrote:
> As scheduler code now collects time spent in IRQ handlers and in
> do_softirq(), we can present those values to userspace tools like
> xentop, so system administrator can see how system behaves.
> 
> We are updating counters only in sched_get_time_correction() function
> to minimize number of taken spinlocks. As atomic_t is 32 bit wide, it
> is not enough to store time with nanosecond precision. So we need to
> use 64 bit variables and protect them with spinlock.
> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>   xen/common/sched/core.c     | 17 +++++++++++++++++
>   xen/common/sysctl.c         |  1 +
>   xen/include/public/sysctl.h |  4 +++-
>   xen/include/xen/sched.h     |  2 ++
>   4 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index a7294ff5c3..ee6b1d9161 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -95,6 +95,10 @@ static struct scheduler __read_mostly ops;
>   
>   static bool scheduler_active;
>   
> +static DEFINE_SPINLOCK(sched_stat_lock);
> +s_time_t sched_stat_irq_time;
> +s_time_t sched_stat_hyp_time;
> +
>   static void sched_set_affinity(
>       struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft);
>   
> @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct sched_unit *u)
>               break;
>       }
>   
> +    spin_lock_irqsave(&sched_stat_lock, flags);
> +    sched_stat_irq_time += irq;
> +    sched_stat_hyp_time += hyp;
> +    spin_unlock_irqrestore(&sched_stat_lock, flags);

Please don't use a lock. Just use add_sized() instead which will add
atomically.

> +
>       return irq + hyp;
>   }
>   
> +void sched_get_time_stats(uint64_t *irq_time, uint64_t *hyp_time)
> +{
> +    unsigned long flags;
> +
> +    spin_lock_irqsave(&sched_stat_lock, flags);
> +    *irq_time = sched_stat_irq_time;
> +    *hyp_time = sched_stat_hyp_time;
> +    spin_unlock_irqrestore(&sched_stat_lock, flags);

read_atomic() will do the job without lock.

>   }
>   
>   /*
> diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
> index 1c6a817476..00683bc93f 100644
> --- a/xen/common/sysctl.c
> +++ b/xen/common/sysctl.c
> @@ -270,6 +270,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>           pi->scrub_pages = 0;
>           pi->cpu_khz = cpu_khz;
>           pi->max_mfn = get_upper_mfn_bound();
> +        sched_get_time_stats(&pi->irq_time, &pi->hyp_time);
>           arch_do_physinfo(pi);
>           if ( iommu_enabled )
>           {
> diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
> index 3a08c512e8..f320144d40 100644
> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -35,7 +35,7 @@
>   #include "domctl.h"
>   #include "physdev.h"
>   
> -#define XEN_SYSCTL_INTERFACE_VERSION 0x00000013
> +#define XEN_SYSCTL_INTERFACE_VERSION 0x00000014
>   
>   /*
>    * Read console content from Xen buffer ring.
> @@ -118,6 +118,8 @@ struct xen_sysctl_physinfo {
>       uint64_aligned_t scrub_pages;
>       uint64_aligned_t outstanding_pages;
>       uint64_aligned_t max_mfn; /* Largest possible MFN on this host */
> +    uint64_aligned_t irq_time;
> +    uint64_aligned_t hyp_time;

Would hypfs work, too? This would avoid the need for extending another
hypercall.


Juergen



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 1/6] sched: track time spent in IRQ handler
  2020-06-12  4:36   ` Jürgen Groß
@ 2020-06-12 11:26     ` Volodymyr Babchuk
  2020-06-12 11:29       ` Julien Grall
  0 siblings, 1 reply; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12 11:26 UTC (permalink / raw)
  To: jgross, xen-devel
  Cc: sstabellini, julien, wl, andrew.cooper3, ian.jackson,
	george.dunlap, dfaggioli, jbeulich, roger.pau

Hi Jurgen,

thanks for the review

On Fri, 2020-06-12 at 06:36 +0200, Jürgen Groß wrote:

> On 12.06.20 02:22, Volodymyr Babchuk wrote:

[...]

> > +void vcpu_end_irq_handler(void)
> > +{
> > +    int delta;
> > +
> > +    if (is_idle_vcpu(current))
> > +        return;
> > +
> > +    ASSERT(current->irq_nesting);
> > +
> > +    if ( --current->irq_nesting )
> > +        return;
> > +
> > +    /* We assume that irq handling time will not overflow int */
> 
> This assumption might not hold for long running VMs.

Basically, this value holds time span between calls to schedule(). This
variable gets zeroed out every time scheduler requests for time
adjustment value. So, it should not depend on total VM run time. 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 1/6] sched: track time spent in IRQ handler
  2020-06-12 11:26     ` Volodymyr Babchuk
@ 2020-06-12 11:29       ` Julien Grall
  2020-06-12 11:33         ` Volodymyr Babchuk
  0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2020-06-12 11:29 UTC (permalink / raw)
  To: Volodymyr Babchuk, jgross, xen-devel
  Cc: sstabellini, wl, andrew.cooper3, ian.jackson, george.dunlap,
	dfaggioli, jbeulich, roger.pau



On 12/06/2020 12:26, Volodymyr Babchuk wrote:
> Hi Jurgen,
> 
> thanks for the review
> 
> On Fri, 2020-06-12 at 06:36 +0200, Jürgen Groß wrote:
> 
>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
> 
> [...]
> 
>>> +void vcpu_end_irq_handler(void)
>>> +{
>>> +    int delta;
>>> +
>>> +    if (is_idle_vcpu(current))
>>> +        return;
>>> +
>>> +    ASSERT(current->irq_nesting);
>>> +
>>> +    if ( --current->irq_nesting )
>>> +        return;
>>> +
>>> +    /* We assume that irq handling time will not overflow int */
>>
>> This assumption might not hold for long running VMs.
> 
> Basically, this value holds time span between calls to schedule(). This
> variable gets zeroed out every time scheduler requests for time
> adjustment value. So, it should not depend on total VM run time.
This is assuming that the scheduler will be called. With the NULL 
scheduler in place, there is a fair chance this may never be called.

So I think using a 64-bit value is likely safer.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks
  2020-06-12  4:43   ` Jürgen Groß
@ 2020-06-12 11:30     ` Volodymyr Babchuk
  2020-06-12 11:40       ` Jürgen Groß
  0 siblings, 1 reply; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12 11:30 UTC (permalink / raw)
  To: jgross, xen-devel
  Cc: sstabellini, julien, wl, andrew.cooper3, ian.jackson,
	george.dunlap, dfaggioli, jbeulich

On Fri, 2020-06-12 at 06:43 +0200, Jürgen Groß wrote:
> On 12.06.20 02:22, Volodymyr Babchuk wrote:
> > +void vcpu_end_hyp_task(struct vcpu *v)
> > +{
> > +    int delta;
> > +
> > +    if ( is_idle_vcpu(v) )
> > +        return;
> > +
> > +    ASSERT(v->in_hyp_task);
> > +
> > +    /* We assume that hypervisor task time will not overflow int */
> 
> This will definitely happen for long running VMs. Please use a 64-bit
> variable.
> 

It is not suposed to hold long time spans, as I described in the reply
to previous email.

> > +    delta = NOW() - v->hyp_entry_time;
> > +    atomic_add(delta, &v->sched_unit->hyp_time);
> > +
> > +#ifndef NDEBUG
> > +    v->in_hyp_task = false;
> > +#endif
> > +}
> > +
> >   /*
> >    * Do the actual movement of an unit from old to new CPU. Locks for *both*
> >    * CPUs needs to have been taken already when calling this!
> > @@ -2615,6 +2646,7 @@ static void schedule(void)
> >   
> >       SCHED_STAT_CRANK(sched_run);
> >   
> > +    vcpu_end_hyp_task(current);
> >       rcu_read_lock(&sched_res_rculock);
> >   
> >       lock = pcpu_schedule_lock_irq(cpu);
> > diff --git a/xen/common/softirq.c b/xen/common/softirq.c
> > index 063e93cbe3..03a29384d1 100644
> > --- a/xen/common/softirq.c
> > +++ b/xen/common/softirq.c
> > @@ -71,7 +71,9 @@ void process_pending_softirqs(void)
> >   void do_softirq(void)
> >   {
> >       ASSERT_NOT_IN_ATOMIC();
> > +    vcpu_begin_hyp_task(current);
> >       __do_softirq(0);
> > +    vcpu_end_hyp_task(current);
> 
> This won't work for scheduling. current will either have changed,
> or in x86 case __do_softirq() might just not return. You need to
> handle that case explicitly in schedule() (you did that for the
> old vcpu, but for the case schedule() is returning you need to
> call vcpu_begin_hyp_task(current) there).
> 

Well, this is one of questions, I wanted to discuss. I certainly need
to call vcpu_begin_hyp_task(current) after context switch. But what it
is the right place? If my understaning is right, code on x86 platform
will never reach this point. Or I'm wrong there?



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 1/6] sched: track time spent in IRQ handler
  2020-06-12 11:29       ` Julien Grall
@ 2020-06-12 11:33         ` Volodymyr Babchuk
  2020-06-12 12:21           ` Julien Grall
  0 siblings, 1 reply; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12 11:33 UTC (permalink / raw)
  To: jgross, julien, xen-devel
  Cc: sstabellini, wl, andrew.cooper3, ian.jackson, george.dunlap,
	dfaggioli, jbeulich, roger.pau


On Fri, 2020-06-12 at 12:29 +0100, Julien Grall wrote:
> 
> On 12/06/2020 12:26, Volodymyr Babchuk wrote:
> > Hi Jurgen,
> > 
> > thanks for the review
> > 
> > On Fri, 2020-06-12 at 06:36 +0200, Jürgen Groß wrote:
> > 
> > > On 12.06.20 02:22, Volodymyr Babchuk wrote:
> > 
> > [...]
> > 
> > > > +void vcpu_end_irq_handler(void)
> > > > +{
> > > > +    int delta;
> > > > +
> > > > +    if (is_idle_vcpu(current))
> > > > +        return;
> > > > +
> > > > +    ASSERT(current->irq_nesting);
> > > > +
> > > > +    if ( --current->irq_nesting )
> > > > +        return;
> > > > +
> > > > +    /* We assume that irq handling time will not overflow int */
> > > 
> > > This assumption might not hold for long running VMs.
> > 
> > Basically, this value holds time span between calls to schedule(). This
> > variable gets zeroed out every time scheduler requests for time
> > adjustment value. So, it should not depend on total VM run time.
> This is assuming that the scheduler will be called. With the NULL 
> scheduler in place, there is a fair chance this may never be called.
> 
> So I think using a 64-bit value is likely safer.

Well, I wanted to use 64-bit value in the first place. But I got the
impression that atomic_t supports only 32-bit values. At least, this is
what I'm seeing in atomic.h

Am I wrong?

> Cheers,
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 3/6] sched, credit2: improve scheduler fairness
  2020-06-12  4:51   ` Jürgen Groß
@ 2020-06-12 11:38     ` Volodymyr Babchuk
  0 siblings, 0 replies; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12 11:38 UTC (permalink / raw)
  To: jgross, xen-devel; +Cc: george.dunlap, dfaggioli

On Fri, 2020-06-12 at 06:51 +0200, Jürgen Groß wrote:
> On 12.06.20 02:22, Volodymyr Babchuk wrote:
> > Now we can make corrections for scheduling unit run time, based on
> > data gathered in previous patches. This includes time spent in IRQ
> > handlers and time spent for hypervisor housekeeping tasks. Those time
> > spans needs to be deduced from a total run time.
> > 
> > This patch adds sched_get_time_correction() function which returns
> > time correction value. This value should be subtracted by a scheduler
> > implementation from a total vCPU/shced_unit run time.
> > 
> > TODO: Make the corresponding changes to all other schedulers.
> > 
> > Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > ---
> >   xen/common/sched/core.c    | 23 +++++++++++++++++++++++
> >   xen/common/sched/credit2.c |  2 +-
> >   xen/common/sched/private.h | 10 ++++++++++
> >   3 files changed, 34 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index d597811fef..a7294ff5c3 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -974,6 +974,29 @@ void vcpu_end_hyp_task(struct vcpu *v)
> >   #ifndef NDEBUG
> >       v->in_hyp_task = false;
> >   #endif
> > +
> > +s_time_t sched_get_time_correction(struct sched_unit *u)
> > +{
> > +    unsigned long flags;
> > +    int irq, hyp;
> 
> Using "irq" for a time value is misleading IMO.

Yes, you are right. I'll rename this variables to irq_time and
hyp_time. 

> > +
> > +    while ( true )
> > +    {
> > +        irq = atomic_read(&u->irq_time);
> > +        if ( likely( irq == atomic_cmpxchg(&u->irq_time, irq, 0)) )
> > +            break;
> > +    }
> 
> Just use atomic_xchg()?

Thanks. I somehow missed this macro.

> > +
> > +    while ( true )
> > +    {
> > +        hyp = atomic_read(&u->hyp_time);
> > +        if ( likely( hyp == atomic_cmpxchg(&u->hyp_time, hyp, 0)) )
> > +            break;
> > +    }
> > +
> > +    return irq + hyp;
> 
> Ah, I didn't look into this patch until now.
> 
> You can replace my comments about overflow of an int for patches 1 and 2
> with:
> 
>    Please modify the comment about not overflowing hinting to the value
>    being reset when making scheduling decisions.

Will do.

> And this (of course) needs to be handled in all other schedulers, too.
> 

Yes, the plan is to call this function in all schedulers. I skipped
this in RFC, because I wanted to discuss the general approch. I'll add
support for all other schedulers in the next version.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks
  2020-06-12 11:30     ` Volodymyr Babchuk
@ 2020-06-12 11:40       ` Jürgen Groß
  2020-09-24 18:08         ` Volodymyr Babchuk
  0 siblings, 1 reply; 43+ messages in thread
From: Jürgen Groß @ 2020-06-12 11:40 UTC (permalink / raw)
  To: Volodymyr Babchuk, xen-devel
  Cc: sstabellini, julien, wl, andrew.cooper3, ian.jackson,
	george.dunlap, dfaggioli, jbeulich

On 12.06.20 13:30, Volodymyr Babchuk wrote:
> On Fri, 2020-06-12 at 06:43 +0200, Jürgen Groß wrote:
>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
>>> +void vcpu_end_hyp_task(struct vcpu *v)
>>> +{
>>> +    int delta;
>>> +
>>> +    if ( is_idle_vcpu(v) )
>>> +        return;
>>> +
>>> +    ASSERT(v->in_hyp_task);
>>> +
>>> +    /* We assume that hypervisor task time will not overflow int */
>>
>> This will definitely happen for long running VMs. Please use a 64-bit
>> variable.
>>
> 
> It is not suposed to hold long time spans, as I described in the reply
> to previous email.
> 
>>> +    delta = NOW() - v->hyp_entry_time;
>>> +    atomic_add(delta, &v->sched_unit->hyp_time);
>>> +
>>> +#ifndef NDEBUG
>>> +    v->in_hyp_task = false;
>>> +#endif
>>> +}
>>> +
>>>    /*
>>>     * Do the actual movement of an unit from old to new CPU. Locks for *both*
>>>     * CPUs needs to have been taken already when calling this!
>>> @@ -2615,6 +2646,7 @@ static void schedule(void)
>>>    
>>>        SCHED_STAT_CRANK(sched_run);
>>>    
>>> +    vcpu_end_hyp_task(current);
>>>        rcu_read_lock(&sched_res_rculock);
>>>    
>>>        lock = pcpu_schedule_lock_irq(cpu);
>>> diff --git a/xen/common/softirq.c b/xen/common/softirq.c
>>> index 063e93cbe3..03a29384d1 100644
>>> --- a/xen/common/softirq.c
>>> +++ b/xen/common/softirq.c
>>> @@ -71,7 +71,9 @@ void process_pending_softirqs(void)
>>>    void do_softirq(void)
>>>    {
>>>        ASSERT_NOT_IN_ATOMIC();
>>> +    vcpu_begin_hyp_task(current);
>>>        __do_softirq(0);
>>> +    vcpu_end_hyp_task(current);
>>
>> This won't work for scheduling. current will either have changed,
>> or in x86 case __do_softirq() might just not return. You need to
>> handle that case explicitly in schedule() (you did that for the
>> old vcpu, but for the case schedule() is returning you need to
>> call vcpu_begin_hyp_task(current) there).
>>
> 
> Well, this is one of questions, I wanted to discuss. I certainly need
> to call vcpu_begin_hyp_task(current) after context switch. But what it
> is the right place? If my understaning is right, code on x86 platform
> will never reach this point. Or I'm wrong there?

No, this is correct.

You can add the call to context_switch() just after set_current() has
been called.


Juergen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-12  4:57   ` Jürgen Groß
@ 2020-06-12 11:44     ` Volodymyr Babchuk
  2020-06-12 12:45       ` Julien Grall
  2020-06-12 12:29     ` Julien Grall
  1 sibling, 1 reply; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12 11:44 UTC (permalink / raw)
  To: jgross, julien, xen-devel
  Cc: sstabellini, wl, andrew.cooper3, ian.jackson, george.dunlap,
	dfaggioli, jbeulich


On Fri, 2020-06-12 at 06:57 +0200, Jürgen Groß wrote:
> On 12.06.20 02:22, Volodymyr Babchuk wrote:
> > As scheduler code now collects time spent in IRQ handlers and in
> > do_softirq(), we can present those values to userspace tools like
> > xentop, so system administrator can see how system behaves.
> > 
> > We are updating counters only in sched_get_time_correction() function
> > to minimize number of taken spinlocks. As atomic_t is 32 bit wide, it
> > is not enough to store time with nanosecond precision. So we need to
> > use 64 bit variables and protect them with spinlock.
> > 
> > Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > ---
> >   xen/common/sched/core.c     | 17 +++++++++++++++++
> >   xen/common/sysctl.c         |  1 +
> >   xen/include/public/sysctl.h |  4 +++-
> >   xen/include/xen/sched.h     |  2 ++
> >   4 files changed, 23 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index a7294ff5c3..ee6b1d9161 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -95,6 +95,10 @@ static struct scheduler __read_mostly ops;
> >   
> >   static bool scheduler_active;
> >   
> > +static DEFINE_SPINLOCK(sched_stat_lock);
> > +s_time_t sched_stat_irq_time;
> > +s_time_t sched_stat_hyp_time;
> > +
> >   static void sched_set_affinity(
> >       struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft);
> >   
> > @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct sched_unit *u)
> >               break;
> >       }
> >   
> > +    spin_lock_irqsave(&sched_stat_lock, flags);
> > +    sched_stat_irq_time += irq;
> > +    sched_stat_hyp_time += hyp;
> > +    spin_unlock_irqrestore(&sched_stat_lock, flags);
> 
> Please don't use a lock. Just use add_sized() instead which will add
> atomically.

Looks like arm does not support 64 bit variables.

Julien, I believe, this is armv7 limitation? Should armv8 work with 64-
bit atomics?

> > +
> >       return irq + hyp;
> >   }
> >   
> > +void sched_get_time_stats(uint64_t *irq_time, uint64_t *hyp_time)
> > +{
> > +    unsigned long flags;
> > +
> > +    spin_lock_irqsave(&sched_stat_lock, flags);
> > +    *irq_time = sched_stat_irq_time;
> > +    *hyp_time = sched_stat_hyp_time;
> > +    spin_unlock_irqrestore(&sched_stat_lock, flags);
> 
> read_atomic() will do the job without lock.

Yes, I really want to use atomics there. Just need to clarify 64 bit
support on ARM.

> >   }
> >   
> >   /*
> > diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
> > index 1c6a817476..00683bc93f 100644
> > --- a/xen/common/sysctl.c
> > +++ b/xen/common/sysctl.c
> > @@ -270,6 +270,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
> >           pi->scrub_pages = 0;
> >           pi->cpu_khz = cpu_khz;
> >           pi->max_mfn = get_upper_mfn_bound();
> > +        sched_get_time_stats(&pi->irq_time, &pi->hyp_time);
> >           arch_do_physinfo(pi);
> >           if ( iommu_enabled )
> >           {
> > diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
> > index 3a08c512e8..f320144d40 100644
> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -35,7 +35,7 @@
> >   #include "domctl.h"
> >   #include "physdev.h"
> >   
> > -#define XEN_SYSCTL_INTERFACE_VERSION 0x00000013
> > +#define XEN_SYSCTL_INTERFACE_VERSION 0x00000014
> >   
> >   /*
> >    * Read console content from Xen buffer ring.
> > @@ -118,6 +118,8 @@ struct xen_sysctl_physinfo {
> >       uint64_aligned_t scrub_pages;
> >       uint64_aligned_t outstanding_pages;
> >       uint64_aligned_t max_mfn; /* Largest possible MFN on this host */
> > +    uint64_aligned_t irq_time;
> > +    uint64_aligned_t hyp_time;
> 
> Would hypfs work, too? This would avoid the need for extending another
> hypercall.

Good point. I'll take a look at this from toolstack side. I didn't see
any hypfs calls in the xentop. But this is a good time to begin using
it.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 1/6] sched: track time spent in IRQ handler
  2020-06-12 11:33         ` Volodymyr Babchuk
@ 2020-06-12 12:21           ` Julien Grall
  2020-06-12 20:08             ` Dario Faggioli
  0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2020-06-12 12:21 UTC (permalink / raw)
  To: Volodymyr Babchuk, jgross, xen-devel
  Cc: sstabellini, wl, andrew.cooper3, ian.jackson, george.dunlap,
	dfaggioli, jbeulich, roger.pau



On 12/06/2020 12:33, Volodymyr Babchuk wrote:
> 
> On Fri, 2020-06-12 at 12:29 +0100, Julien Grall wrote:
>>
>> On 12/06/2020 12:26, Volodymyr Babchuk wrote:
>>> Hi Jurgen,
>>>
>>> thanks for the review
>>>
>>> On Fri, 2020-06-12 at 06:36 +0200, Jürgen Groß wrote:
>>>
>>>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
>>>
>>> [...]
>>>
>>>>> +void vcpu_end_irq_handler(void)
>>>>> +{
>>>>> +    int delta;
>>>>> +
>>>>> +    if (is_idle_vcpu(current))
>>>>> +        return;
>>>>> +
>>>>> +    ASSERT(current->irq_nesting);
>>>>> +
>>>>> +    if ( --current->irq_nesting )
>>>>> +        return;
>>>>> +
>>>>> +    /* We assume that irq handling time will not overflow int */
>>>>
>>>> This assumption might not hold for long running VMs.
>>>
>>> Basically, this value holds time span between calls to schedule(). This
>>> variable gets zeroed out every time scheduler requests for time
>>> adjustment value. So, it should not depend on total VM run time.
>> This is assuming that the scheduler will be called. With the NULL
>> scheduler in place, there is a fair chance this may never be called.
>>
>> So I think using a 64-bit value is likely safer.
> 
> Well, I wanted to use 64-bit value in the first place. But I got the
> impression that atomic_t supports only 32-bit values. At least, this is
> what I'm seeing in atomic.h
> 
> Am I wrong?

There is no atomic64_t support in Xen yet. It shouldn't be very 
difficult to add support for it if you require them.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-12  4:57   ` Jürgen Groß
  2020-06-12 11:44     ` Volodymyr Babchuk
@ 2020-06-12 12:29     ` Julien Grall
  2020-06-12 12:41       ` Jürgen Groß
  1 sibling, 1 reply; 43+ messages in thread
From: Julien Grall @ 2020-06-12 12:29 UTC (permalink / raw)
  To: Jürgen Groß, Volodymyr Babchuk, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Dario Faggioli, Jan Beulich

Hi Juergen,

On 12/06/2020 05:57, Jürgen Groß wrote:
> On 12.06.20 02:22, Volodymyr Babchuk wrote:
>> As scheduler code now collects time spent in IRQ handlers and in
>> do_softirq(), we can present those values to userspace tools like
>> xentop, so system administrator can see how system behaves.
>>
>> We are updating counters only in sched_get_time_correction() function
>> to minimize number of taken spinlocks. As atomic_t is 32 bit wide, it
>> is not enough to store time with nanosecond precision. So we need to
>> use 64 bit variables and protect them with spinlock.
>>
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>> ---
>>   xen/common/sched/core.c     | 17 +++++++++++++++++
>>   xen/common/sysctl.c         |  1 +
>>   xen/include/public/sysctl.h |  4 +++-
>>   xen/include/xen/sched.h     |  2 ++
>>   4 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>> index a7294ff5c3..ee6b1d9161 100644
>> --- a/xen/common/sched/core.c
>> +++ b/xen/common/sched/core.c
>> @@ -95,6 +95,10 @@ static struct scheduler __read_mostly ops;
>>   static bool scheduler_active;
>> +static DEFINE_SPINLOCK(sched_stat_lock);
>> +s_time_t sched_stat_irq_time;
>> +s_time_t sched_stat_hyp_time;
>> +
>>   static void sched_set_affinity(
>>       struct sched_unit *unit, const cpumask_t *hard, const cpumask_t 
>> *soft);
>> @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct 
>> sched_unit *u)
>>               break;
>>       }
>> +    spin_lock_irqsave(&sched_stat_lock, flags);
>> +    sched_stat_irq_time += irq;
>> +    sched_stat_hyp_time += hyp;
>> +    spin_unlock_irqrestore(&sched_stat_lock, flags);
> 
> Please don't use a lock. Just use add_sized() instead which will add
> atomically.

add_sized() is definitely not atomic. It will only prevent the compiler 
to read/write multiple time the variable.

If we expect sched_get_time_correction to be called concurrently then we 
would need to introduce atomic64_t or a spin lock.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-12 12:29     ` Julien Grall
@ 2020-06-12 12:41       ` Jürgen Groß
  2020-06-12 15:29         ` Dario Faggioli
  0 siblings, 1 reply; 43+ messages in thread
From: Jürgen Groß @ 2020-06-12 12:41 UTC (permalink / raw)
  To: Julien Grall, Volodymyr Babchuk, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Dario Faggioli, Jan Beulich

On 12.06.20 14:29, Julien Grall wrote:
> Hi Juergen,
> 
> On 12/06/2020 05:57, Jürgen Groß wrote:
>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
>>> As scheduler code now collects time spent in IRQ handlers and in
>>> do_softirq(), we can present those values to userspace tools like
>>> xentop, so system administrator can see how system behaves.
>>>
>>> We are updating counters only in sched_get_time_correction() function
>>> to minimize number of taken spinlocks. As atomic_t is 32 bit wide, it
>>> is not enough to store time with nanosecond precision. So we need to
>>> use 64 bit variables and protect them with spinlock.
>>>
>>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>>> ---
>>>   xen/common/sched/core.c     | 17 +++++++++++++++++
>>>   xen/common/sysctl.c         |  1 +
>>>   xen/include/public/sysctl.h |  4 +++-
>>>   xen/include/xen/sched.h     |  2 ++
>>>   4 files changed, 23 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>>> index a7294ff5c3..ee6b1d9161 100644
>>> --- a/xen/common/sched/core.c
>>> +++ b/xen/common/sched/core.c
>>> @@ -95,6 +95,10 @@ static struct scheduler __read_mostly ops;
>>>   static bool scheduler_active;
>>> +static DEFINE_SPINLOCK(sched_stat_lock);
>>> +s_time_t sched_stat_irq_time;
>>> +s_time_t sched_stat_hyp_time;
>>> +
>>>   static void sched_set_affinity(
>>>       struct sched_unit *unit, const cpumask_t *hard, const cpumask_t 
>>> *soft);
>>> @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct 
>>> sched_unit *u)
>>>               break;
>>>       }
>>> +    spin_lock_irqsave(&sched_stat_lock, flags);
>>> +    sched_stat_irq_time += irq;
>>> +    sched_stat_hyp_time += hyp;
>>> +    spin_unlock_irqrestore(&sched_stat_lock, flags);
>>
>> Please don't use a lock. Just use add_sized() instead which will add
>> atomically.
> 
> add_sized() is definitely not atomic. It will only prevent the compiler 
> to read/write multiple time the variable.

Oh, my bad, I let myself fool by it being defined in atomic.h.

> 
> If we expect sched_get_time_correction to be called concurrently then we 
> would need to introduce atomic64_t or a spin lock.

Or we could use percpu variables and add the cpu values up when
fetching the values.


Juergen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-12 11:44     ` Volodymyr Babchuk
@ 2020-06-12 12:45       ` Julien Grall
  2020-06-12 22:16         ` Volodymyr Babchuk
  2020-06-18 20:24         ` Volodymyr Babchuk
  0 siblings, 2 replies; 43+ messages in thread
From: Julien Grall @ 2020-06-12 12:45 UTC (permalink / raw)
  To: Volodymyr Babchuk, jgross, xen-devel
  Cc: sstabellini, wl, andrew.cooper3, ian.jackson, george.dunlap,
	dfaggioli, jbeulich

Hi Volodymyr,

On 12/06/2020 12:44, Volodymyr Babchuk wrote:
> 
> On Fri, 2020-06-12 at 06:57 +0200, Jürgen Groß wrote:
>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
>>> As scheduler code now collects time spent in IRQ handlers and in
>>> do_softirq(), we can present those values to userspace tools like
>>> xentop, so system administrator can see how system behaves.
>>>
>>> We are updating counters only in sched_get_time_correction() function
>>> to minimize number of taken spinlocks. As atomic_t is 32 bit wide, it
>>> is not enough to store time with nanosecond precision. So we need to
>>> use 64 bit variables and protect them with spinlock.
>>>
>>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>>> ---
>>>    xen/common/sched/core.c     | 17 +++++++++++++++++
>>>    xen/common/sysctl.c         |  1 +
>>>    xen/include/public/sysctl.h |  4 +++-
>>>    xen/include/xen/sched.h     |  2 ++
>>>    4 files changed, 23 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>>> index a7294ff5c3..ee6b1d9161 100644
>>> --- a/xen/common/sched/core.c
>>> +++ b/xen/common/sched/core.c
>>> @@ -95,6 +95,10 @@ static struct scheduler __read_mostly ops;
>>>    
>>>    static bool scheduler_active;
>>>    
>>> +static DEFINE_SPINLOCK(sched_stat_lock);
>>> +s_time_t sched_stat_irq_time;
>>> +s_time_t sched_stat_hyp_time;
>>> +
>>>    static void sched_set_affinity(
>>>        struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft);
>>>    
>>> @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct sched_unit *u)
>>>                break;
>>>        }
>>>    
>>> +    spin_lock_irqsave(&sched_stat_lock, flags);
>>> +    sched_stat_irq_time += irq;
>>> +    sched_stat_hyp_time += hyp;
>>> +    spin_unlock_irqrestore(&sched_stat_lock, flags);
>>
>> Please don't use a lock. Just use add_sized() instead which will add
>> atomically.
> 
> Looks like arm does not support 64 bit variables. >
> Julien, I believe, this is armv7 limitation? Should armv8 work with 64-
> bit atomics?

64-bit atomics can work on both Armv7 and Armv8 :). It just haven't been 
plumbed yet.

I am happy to write a patch if you need atomic64_t or even a 64-bit 
add_sized().

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-12 12:41       ` Jürgen Groß
@ 2020-06-12 15:29         ` Dario Faggioli
  2020-06-12 22:27           ` Volodymyr Babchuk
  0 siblings, 1 reply; 43+ messages in thread
From: Dario Faggioli @ 2020-06-12 15:29 UTC (permalink / raw)
  To: Jürgen Groß, Julien Grall, Volodymyr Babchuk, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 1274 bytes --]

On Fri, 2020-06-12 at 14:41 +0200, Jürgen Groß wrote:
> On 12.06.20 14:29, Julien Grall wrote:
> > On 12/06/2020 05:57, Jürgen Groß wrote:
> > > On 12.06.20 02:22, Volodymyr Babchuk wrote:
> > > > 
> > > > @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct 
> > > > sched_unit *u)
> > > >               break;
> > > >       }
> > > > +    spin_lock_irqsave(&sched_stat_lock, flags);
> > > > +    sched_stat_irq_time += irq;
> > > > +    sched_stat_hyp_time += hyp;
> > > > +    spin_unlock_irqrestore(&sched_stat_lock, flags);
> > > 
> > > Please don't use a lock. Just use add_sized() instead which will
> > > add
> > > atomically.
> > 
> > If we expect sched_get_time_correction to be called concurrently
> > then we 
> > would need to introduce atomic64_t or a spin lock.
> 
> Or we could use percpu variables and add the cpu values up when
> fetching the values.
> 
Yes, either percpu or atomic looks much better than locking, to me, for
this.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 1/6] sched: track time spent in IRQ handler
  2020-06-12 12:21           ` Julien Grall
@ 2020-06-12 20:08             ` Dario Faggioli
  2020-06-12 22:25               ` Volodymyr Babchuk
  2020-06-12 22:54               ` Julien Grall
  0 siblings, 2 replies; 43+ messages in thread
From: Dario Faggioli @ 2020-06-12 20:08 UTC (permalink / raw)
  To: Julien Grall, Volodymyr Babchuk, jgross, xen-devel
  Cc: sstabellini, wl, andrew.cooper3, ian.jackson, george.dunlap,
	jbeulich, roger.pau

[-- Attachment #1: Type: text/plain, Size: 1751 bytes --]

On Fri, 2020-06-12 at 13:21 +0100, Julien Grall wrote:
> On 12/06/2020 12:33, Volodymyr Babchuk wrote:
> > On Fri, 2020-06-12 at 12:29 +0100, Julien Grall wrote:
> > > > Basically, this value holds time span between calls to
> > > > schedule(). This
> > > > variable gets zeroed out every time scheduler requests for time
> > > > adjustment value. So, it should not depend on total VM run
> > > > time.
> > > This is assuming that the scheduler will be called. With the NULL
> > > scheduler in place, there is a fair chance this may never be
> > > called.
> > > 
>
Yeah, this is a good point. I mean, I wouldn't be sure about "never",
as even there, we'd probably have softirqs, tasklets, etc... And I
still have to look at these patches in more details to figure out
properly whether they'd help for this.

But I'd say that, in general, we should depend of the frequency of the
scheduling events as few as possible. Therefore, using 64 bits from the
start would be preferrable IMO.

> > > So I think using a 64-bit value is likely safer.
> > 
Yep.

> > Well, I wanted to use 64-bit value in the first place. But I got
> > the
> > impression that atomic_t supports only 32-bit values. At least,
> > this is
> > what I'm seeing in atomic.h
> > 
> > Am I wrong?
> 
> There is no atomic64_t support in Xen yet. It shouldn't be very 
> difficult to add support for it if you require them.
> 
Cool! That would be much appreciated. :-D

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-12 12:45       ` Julien Grall
@ 2020-06-12 22:16         ` Volodymyr Babchuk
  2020-06-18 20:24         ` Volodymyr Babchuk
  1 sibling, 0 replies; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12 22:16 UTC (permalink / raw)
  To: jgross, julien, xen-devel
  Cc: sstabellini, wl, andrew.cooper3, ian.jackson, george.dunlap,
	dfaggioli, jbeulich

Hi Julien,

On Fri, 2020-06-12 at 13:45 +0100, Julien Grall wrote:
> Hi Volodymyr,
> 
> On 12/06/2020 12:44, Volodymyr Babchuk wrote:
> > On Fri, 2020-06-12 at 06:57 +0200, Jürgen Groß wrote:
> > > On 12.06.20 02:22, Volodymyr Babchuk wrote:
> > > > As scheduler code now collects time spent in IRQ handlers and in
> > > > do_softirq(), we can present those values to userspace tools like
> > > > xentop, so system administrator can see how system behaves.
> > > > 
> > > > We are updating counters only in sched_get_time_correction() function
> > > > to minimize number of taken spinlocks. As atomic_t is 32 bit wide, it
> > > > is not enough to store time with nanosecond precision. So we need to
> > > > use 64 bit variables and protect them with spinlock.
> > > > 
> > > > Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > > > ---
> > > >    xen/common/sched/core.c     | 17 +++++++++++++++++
> > > >    xen/common/sysctl.c         |  1 +
> > > >    xen/include/public/sysctl.h |  4 +++-
> > > >    xen/include/xen/sched.h     |  2 ++
> > > >    4 files changed, 23 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > > > index a7294ff5c3..ee6b1d9161 100644
> > > > --- a/xen/common/sched/core.c
> > > > +++ b/xen/common/sched/core.c
> > > > @@ -95,6 +95,10 @@ static struct scheduler __read_mostly ops;
> > > >    
> > > >    static bool scheduler_active;
> > > >    
> > > > +static DEFINE_SPINLOCK(sched_stat_lock);
> > > > +s_time_t sched_stat_irq_time;
> > > > +s_time_t sched_stat_hyp_time;
> > > > +
> > > >    static void sched_set_affinity(
> > > >        struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft);
> > > >    
> > > > @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct sched_unit *u)
> > > >                break;
> > > >        }
> > > >    
> > > > +    spin_lock_irqsave(&sched_stat_lock, flags);
> > > > +    sched_stat_irq_time += irq;
> > > > +    sched_stat_hyp_time += hyp;
> > > > +    spin_unlock_irqrestore(&sched_stat_lock, flags);
> > > 
> > > Please don't use a lock. Just use add_sized() instead which will add
> > > atomically.
> > 
> > Looks like arm does not support 64 bit variables. >
> > Julien, I believe, this is armv7 limitation? Should armv8 work with 64-
> > bit atomics?
> 
> 64-bit atomics can work on both Armv7 and Armv8 :). It just haven't been 
> plumbed yet.

Wow, didn't know that armv7 is capable of that.

> I am happy to write a patch if you need atomic64_t or even a 64-bit 
> add_sized().

That would be cool. Certainly. But looks like x86 code does not have
implementation for atomic64_t as well. So, there would be lots of
changes just for one use case. I don't know if it is worth it.

Let's finish discussion of other parts of the series. If it will appear
that atomic64_t is absolutely necessay, I'll return back to you.
Thanks for offer anyways. 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 1/6] sched: track time spent in IRQ handler
  2020-06-12 20:08             ` Dario Faggioli
@ 2020-06-12 22:25               ` Volodymyr Babchuk
  2020-06-12 22:54               ` Julien Grall
  1 sibling, 0 replies; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12 22:25 UTC (permalink / raw)
  To: dfaggioli, jgross, julien, xen-devel
  Cc: sstabellini, wl, andrew.cooper3, ian.jackson, george.dunlap,
	jbeulich, roger.pau

Hi Dario,

On Fri, 2020-06-12 at 22:08 +0200, Dario Faggioli wrote:
> On Fri, 2020-06-12 at 13:21 +0100, Julien Grall wrote:
> > On 12/06/2020 12:33, Volodymyr Babchuk wrote:
> > > On Fri, 2020-06-12 at 12:29 +0100, Julien Grall wrote:
> > > > > Basically, this value holds time span between calls to
> > > > > schedule(). This
> > > > > variable gets zeroed out every time scheduler requests for time
> > > > > adjustment value. So, it should not depend on total VM run
> > > > > time.
> > > > This is assuming that the scheduler will be called. With the NULL
> > > > scheduler in place, there is a fair chance this may never be
> > > > called.
> > > > 
> Yeah, this is a good point. I mean, I wouldn't be sure about "never",
> as even there, we'd probably have softirqs, tasklets, etc... And I
> still have to look at these patches in more details to figure out
> properly whether they'd help for this.

Well. I think, it is possible to reset counters when we are switching
to a different scheduler. Just for cases like that.

> But I'd say that, in general, we should depend of the frequency of the
> scheduling events as few as possible. Therefore, using 64 bits from the
> start would be preferrable IMO.

I should done that calculation earlier... So, it appears that 32 bit
counter can count up to 4 mere seconds. It should be enought for normal
flow. But I'm agree with you - 64 bits looks much safer. 

> 
> > > > So I think using a 64-bit value is likely safer.
> Yep.
> 
> > > Well, I wanted to use 64-bit value in the first place. But I got
> > > the
> > > impression that atomic_t supports only 32-bit values. At least,
> > > this is
> > > what I'm seeing in atomic.h
> > > 
> > > Am I wrong?
> > 
> > There is no atomic64_t support in Xen yet. It shouldn't be very 
> > difficult to add support for it if you require them.
> > 
> Cool! That would be much appreciated. :-D
> 

Certainly! :)

I believe, there will be another users for atmic64_t as well.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-12 15:29         ` Dario Faggioli
@ 2020-06-12 22:27           ` Volodymyr Babchuk
  2020-06-13  6:22             ` Jürgen Groß
  0 siblings, 1 reply; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-12 22:27 UTC (permalink / raw)
  To: dfaggioli, jgross, julien, xen-devel
  Cc: sstabellini, wl, andrew.cooper3, ian.jackson, george.dunlap, jbeulich

On Fri, 2020-06-12 at 17:29 +0200, Dario Faggioli wrote:
> On Fri, 2020-06-12 at 14:41 +0200, Jürgen Groß wrote:
> > On 12.06.20 14:29, Julien Grall wrote:
> > > On 12/06/2020 05:57, Jürgen Groß wrote:
> > > > On 12.06.20 02:22, Volodymyr Babchuk wrote:
> > > > > @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct 
> > > > > sched_unit *u)
> > > > >               break;
> > > > >       }
> > > > > +    spin_lock_irqsave(&sched_stat_lock, flags);
> > > > > +    sched_stat_irq_time += irq;
> > > > > +    sched_stat_hyp_time += hyp;
> > > > > +    spin_unlock_irqrestore(&sched_stat_lock, flags);
> > > > 
> > > > Please don't use a lock. Just use add_sized() instead which will
> > > > add
> > > > atomically.
> > > 
> > > If we expect sched_get_time_correction to be called concurrently
> > > then we 
> > > would need to introduce atomic64_t or a spin lock.
> > 
> > Or we could use percpu variables and add the cpu values up when
> > fetching the values.
> > 
> Yes, either percpu or atomic looks much better than locking, to me, for
> this.

Looks like we going to have atomic64_t after all. So, I'll prefer to to
use atomics there.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 1/6] sched: track time spent in IRQ handler
  2020-06-12 20:08             ` Dario Faggioli
  2020-06-12 22:25               ` Volodymyr Babchuk
@ 2020-06-12 22:54               ` Julien Grall
  1 sibling, 0 replies; 43+ messages in thread
From: Julien Grall @ 2020-06-12 22:54 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: jgross, sstabellini, wl, andrew.cooper3, ian.jackson,
	george.dunlap, jbeulich, xen-devel, Volodymyr Babchuk, roger.pau

On Fri, 12 Jun 2020 at 21:08, Dario Faggioli <dfaggioli@suse.com> wrote:
>
> On Fri, 2020-06-12 at 13:21 +0100, Julien Grall wrote:
> > On 12/06/2020 12:33, Volodymyr Babchuk wrote:
> > > On Fri, 2020-06-12 at 12:29 +0100, Julien Grall wrote:
> > > > > Basically, this value holds time span between calls to
> > > > > schedule(). This
> > > > > variable gets zeroed out every time scheduler requests for time
> > > > > adjustment value. So, it should not depend on total VM run
> > > > > time.
> > > > This is assuming that the scheduler will be called. With the NULL
> > > > scheduler in place, there is a fair chance this may never be
> > > > called.
> > > >
> >
> Yeah, this is a good point. I mean, I wouldn't be sure about "never",
> as even there, we'd probably have softirqs, tasklets, etc... And I
> still have to look at these patches in more details to figure out
> properly whether they'd help for this.

Unlike x86, Xen doesn't prod another pCPU consistently. :) This was
already discussed in multiple threads in the past (see [1] which not
resolved yet).

So yes, I am pretty confident I can recreate a case where the
scheduling function may never be called on Arm :).

Cheers,

[1] 315740e1-3591-0e11-923a-718e06c36445@arm.com


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-12 22:27           ` Volodymyr Babchuk
@ 2020-06-13  6:22             ` Jürgen Groß
  2020-06-18  2:58               ` Volodymyr Babchuk
  0 siblings, 1 reply; 43+ messages in thread
From: Jürgen Groß @ 2020-06-13  6:22 UTC (permalink / raw)
  To: Volodymyr Babchuk, dfaggioli, julien, xen-devel
  Cc: sstabellini, wl, andrew.cooper3, ian.jackson, george.dunlap, jbeulich

On 13.06.20 00:27, Volodymyr Babchuk wrote:
> On Fri, 2020-06-12 at 17:29 +0200, Dario Faggioli wrote:
>> On Fri, 2020-06-12 at 14:41 +0200, Jürgen Groß wrote:
>>> On 12.06.20 14:29, Julien Grall wrote:
>>>> On 12/06/2020 05:57, Jürgen Groß wrote:
>>>>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
>>>>>> @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct
>>>>>> sched_unit *u)
>>>>>>                break;
>>>>>>        }
>>>>>> +    spin_lock_irqsave(&sched_stat_lock, flags);
>>>>>> +    sched_stat_irq_time += irq;
>>>>>> +    sched_stat_hyp_time += hyp;
>>>>>> +    spin_unlock_irqrestore(&sched_stat_lock, flags);
>>>>>
>>>>> Please don't use a lock. Just use add_sized() instead which will
>>>>> add
>>>>> atomically.
>>>>
>>>> If we expect sched_get_time_correction to be called concurrently
>>>> then we
>>>> would need to introduce atomic64_t or a spin lock.
>>>
>>> Or we could use percpu variables and add the cpu values up when
>>> fetching the values.
>>>
>> Yes, either percpu or atomic looks much better than locking, to me, for
>> this.
> 
> Looks like we going to have atomic64_t after all. So, I'll prefer to to
> use atomics there.

Performance would be better using percpu variables, as those would avoid
the cacheline moved between cpus a lot.


Juergen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 1/6] sched: track time spent in IRQ handler
  2020-06-12  0:22 ` [RFC PATCH v1 1/6] sched: track time spent in IRQ handler Volodymyr Babchuk
  2020-06-12  4:36   ` Jürgen Groß
@ 2020-06-16 10:06   ` Jan Beulich
  1 sibling, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2020-06-16 10:06 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Dario Faggioli, xen-devel,
	Roger Pau Monné

On 12.06.2020 02:22, Volodymyr Babchuk wrote:
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1895,6 +1895,7 @@ void do_IRQ(struct cpu_user_regs *regs)
>      int               irq = this_cpu(vector_irq)[vector];
>      struct cpu_user_regs *old_regs = set_irq_regs(regs);
>  
> +    vcpu_begin_irq_handler();
>      perfc_incr(irqs);
>      this_cpu(irq_count)++;
>      irq_enter();
> @@ -2024,6 +2025,7 @@ void do_IRQ(struct cpu_user_regs *regs)
>   out_no_unlock:
>      irq_exit();
>      set_irq_regs(old_regs);
> +    vcpu_end_irq_handler();
>  }

This looks like a fight for who's going to be first/last here. I
think you want to add your calls after irq_enter() and before
irq_exit().

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks
  2020-06-12  0:22 ` [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks Volodymyr Babchuk
  2020-06-12  4:43   ` Jürgen Groß
@ 2020-06-16 10:10   ` Jan Beulich
  2020-06-18  2:50     ` Volodymyr Babchuk
  1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2020-06-16 10:10 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Dario Faggioli, xen-devel

On 12.06.2020 02:22, Volodymyr Babchuk wrote:
> In most cases hypervisor code performs guest-related jobs. Tasks like
> hypercall handling or MMIO access emulation are done for calling vCPU
> so it is okay to charge time spent in hypervisor to the current vCPU.
> 
> But, there are also tasks that are not originated from guests. This
> includes things like TLB flushing or running tasklets. We don't want
> to track time spent in this tasks to a total scheduling unit run
> time. So we need to track time spent in such housekeeping tasks
> separately.
> 
> Those hypervisor tasks are run in do_softirq() function, so we'll
> install our hooks there.

I can see the point and desire, but it feels like you're moving from
one kind of unfairness to another: A softirq may very well be on
behalf of a specific vCPU, in which case not charging current should
lead to charging that specific one (which may still be current then).
Even more than for TLB flushes this may be relevant for the cases
where (on x86) we issue WBINVD on behalf of a guest.

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks
  2020-06-16 10:10   ` Jan Beulich
@ 2020-06-18  2:50     ` Volodymyr Babchuk
  2020-06-18  6:34       ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-18  2:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Dario Faggioli, xen-devel


Hi Jan,

Jan Beulich writes:

> On 12.06.2020 02:22, Volodymyr Babchuk wrote:
>> In most cases hypervisor code performs guest-related jobs. Tasks like
>> hypercall handling or MMIO access emulation are done for calling vCPU
>> so it is okay to charge time spent in hypervisor to the current vCPU.
>> 
>> But, there are also tasks that are not originated from guests. This
>> includes things like TLB flushing or running tasklets. We don't want
>> to track time spent in this tasks to a total scheduling unit run
>> time. So we need to track time spent in such housekeeping tasks
>> separately.
>> 
>> Those hypervisor tasks are run in do_softirq() function, so we'll
>> install our hooks there.
>
> I can see the point and desire, but it feels like you're moving from
> one kind of unfairness to another: A softirq may very well be on
> behalf of a specific vCPU, in which case not charging current should
> lead to charging that specific one (which may still be current then).
> Even more than for TLB flushes this may be relevant for the cases
> where (on x86) we issue WBINVD on behalf of a guest.

I'm agree with you. Something similar we discussed with Dario, but in
do_IRQ() context: we can determine for which CPU we are handling
interrupt and we can charge that vcpu for the spent time. The same
stands correct for cases that you described: for some soft irqs there is
a known benefactor, so we can charge it for the spent time.

I and Dario agreed to implement this in the second stage. I'm working on
the next version of the patches and I'll look at this more
closely. There is a possibility that I'll introduce that feature. But
I'll need some help from you or some other x86 expert.

Anyways, are you okay with the general approach? We will work out the
details, but I want to be sure that I'm moving in the right direction.

-- 
Volodymyr Babchuk at EPAM

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-13  6:22             ` Jürgen Groß
@ 2020-06-18  2:58               ` Volodymyr Babchuk
  2020-06-18 15:17                 ` Julien Grall
  0 siblings, 1 reply; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-18  2:58 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: sstabellini, julien, wl, andrew.cooper3, ian.jackson,
	george.dunlap, dfaggioli, jbeulich, xen-devel


Hi Jürgen,

Jürgen Groß writes:

> On 13.06.20 00:27, Volodymyr Babchuk wrote:
>> On Fri, 2020-06-12 at 17:29 +0200, Dario Faggioli wrote:
>>> On Fri, 2020-06-12 at 14:41 +0200, Jürgen Groß wrote:
>>>> On 12.06.20 14:29, Julien Grall wrote:
>>>>> On 12/06/2020 05:57, Jürgen Groß wrote:
>>>>>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
>>>>>>> @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct
>>>>>>> sched_unit *u)
>>>>>>>                break;
>>>>>>>        }
>>>>>>> +    spin_lock_irqsave(&sched_stat_lock, flags);
>>>>>>> +    sched_stat_irq_time += irq;
>>>>>>> +    sched_stat_hyp_time += hyp;
>>>>>>> +    spin_unlock_irqrestore(&sched_stat_lock, flags);
>>>>>>
>>>>>> Please don't use a lock. Just use add_sized() instead which will
>>>>>> add
>>>>>> atomically.
>>>>>
>>>>> If we expect sched_get_time_correction to be called concurrently
>>>>> then we
>>>>> would need to introduce atomic64_t or a spin lock.
>>>>
>>>> Or we could use percpu variables and add the cpu values up when
>>>> fetching the values.
>>>>
>>> Yes, either percpu or atomic looks much better than locking, to me, for
>>> this.
>>
>> Looks like we going to have atomic64_t after all. So, I'll prefer to to
>> use atomics there.
>
> Performance would be better using percpu variables, as those would avoid
> the cacheline moved between cpus a lot.

I see. But don't we need locking in this case? I can see scenario, when
one pCPU updates own counters while another pCPU is reading them.

IIRC, ARMv8 guarantees that 64 bit read of aligned data would be
consistent. "Consistent" in the sense that, for example, we would not
see lower 32 bits of the new value and upper 32 bits of the old value.

I can't say for sure about ARMv7 and about x86.

-- 
Volodymyr Babchuk at EPAM

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks
  2020-06-18  2:50     ` Volodymyr Babchuk
@ 2020-06-18  6:34       ` Jan Beulich
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2020-06-18  6:34 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Dario Faggioli, xen-devel

On 18.06.2020 04:50, Volodymyr Babchuk wrote:
> Anyways, are you okay with the general approach? We will work out the
> details, but I want to be sure that I'm moving in the right direction.

I'm certainly okay with the goal; I didn't look closely enough to say
I'm okay with the approach - I trust Dario there.

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-18  2:58               ` Volodymyr Babchuk
@ 2020-06-18 15:17                 ` Julien Grall
  2020-06-18 15:23                   ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2020-06-18 15:17 UTC (permalink / raw)
  To: Volodymyr Babchuk, Jürgen Groß
  Cc: sstabellini, wl, andrew.cooper3, ian.jackson, george.dunlap,
	dfaggioli, jbeulich, xen-devel



On 18/06/2020 03:58, Volodymyr Babchuk wrote:
> 
> Hi Jürgen,
> 
> Jürgen Groß writes:
> 
>> On 13.06.20 00:27, Volodymyr Babchuk wrote:
>>> On Fri, 2020-06-12 at 17:29 +0200, Dario Faggioli wrote:
>>>> On Fri, 2020-06-12 at 14:41 +0200, Jürgen Groß wrote:
>>>>> On 12.06.20 14:29, Julien Grall wrote:
>>>>>> On 12/06/2020 05:57, Jürgen Groß wrote:
>>>>>>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
>>>>>>>> @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct
>>>>>>>> sched_unit *u)
>>>>>>>>                 break;
>>>>>>>>         }
>>>>>>>> +    spin_lock_irqsave(&sched_stat_lock, flags);
>>>>>>>> +    sched_stat_irq_time += irq;
>>>>>>>> +    sched_stat_hyp_time += hyp;
>>>>>>>> +    spin_unlock_irqrestore(&sched_stat_lock, flags);
>>>>>>>
>>>>>>> Please don't use a lock. Just use add_sized() instead which will
>>>>>>> add
>>>>>>> atomically.
>>>>>>
>>>>>> If we expect sched_get_time_correction to be called concurrently
>>>>>> then we
>>>>>> would need to introduce atomic64_t or a spin lock.
>>>>>
>>>>> Or we could use percpu variables and add the cpu values up when
>>>>> fetching the values.
>>>>>
>>>> Yes, either percpu or atomic looks much better than locking, to me, for
>>>> this.
>>>
>>> Looks like we going to have atomic64_t after all. So, I'll prefer to to
>>> use atomics there.
>>
>> Performance would be better using percpu variables, as those would avoid
>> the cacheline moved between cpus a lot.
> 
> I see. But don't we need locking in this case? I can see scenario, when
> one pCPU updates own counters while another pCPU is reading them.
> 
> IIRC, ARMv8 guarantees that 64 bit read of aligned data would be
> consistent. "Consistent" in the sense that, for example, we would not
> see lower 32 bits of the new value and upper 32 bits of the old value.

That's right. Although this would be valid so long you use {read, 
write}_atomic().

> 
> I can't say for sure about ARMv7 and about x86.
ARMv7 with LPAE support will guarantee 64-bit atomicity when using 
strd/ldrd as long as the alignment is correct. LPAE is mandatory when 
supporting HYP mode, so you can safely assume this will work.

64-bit on x86 is also guaranteed to be atomic when using write_atomic().

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-18 15:17                 ` Julien Grall
@ 2020-06-18 15:23                   ` Jan Beulich
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2020-06-18 15:23 UTC (permalink / raw)
  To: Julien Grall, Volodymyr Babchuk
  Cc: Jürgen Groß,
	sstabellini, wl, andrew.cooper3, ian.jackson, george.dunlap,
	dfaggioli, xen-devel

On 18.06.2020 17:17, Julien Grall wrote:
> 
> 
> On 18/06/2020 03:58, Volodymyr Babchuk wrote:
>>
>> Hi Jürgen,
>>
>> Jürgen Groß writes:
>>
>>> On 13.06.20 00:27, Volodymyr Babchuk wrote:
>>>> On Fri, 2020-06-12 at 17:29 +0200, Dario Faggioli wrote:
>>>>> On Fri, 2020-06-12 at 14:41 +0200, Jürgen Groß wrote:
>>>>>> On 12.06.20 14:29, Julien Grall wrote:
>>>>>>> On 12/06/2020 05:57, Jürgen Groß wrote:
>>>>>>>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
>>>>>>>>> @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct
>>>>>>>>> sched_unit *u)
>>>>>>>>>                 break;
>>>>>>>>>         }
>>>>>>>>> +    spin_lock_irqsave(&sched_stat_lock, flags);
>>>>>>>>> +    sched_stat_irq_time += irq;
>>>>>>>>> +    sched_stat_hyp_time += hyp;
>>>>>>>>> +    spin_unlock_irqrestore(&sched_stat_lock, flags);
>>>>>>>>
>>>>>>>> Please don't use a lock. Just use add_sized() instead which will
>>>>>>>> add
>>>>>>>> atomically.
>>>>>>>
>>>>>>> If we expect sched_get_time_correction to be called concurrently
>>>>>>> then we
>>>>>>> would need to introduce atomic64_t or a spin lock.
>>>>>>
>>>>>> Or we could use percpu variables and add the cpu values up when
>>>>>> fetching the values.
>>>>>>
>>>>> Yes, either percpu or atomic looks much better than locking, to me, for
>>>>> this.
>>>>
>>>> Looks like we going to have atomic64_t after all. So, I'll prefer to to
>>>> use atomics there.
>>>
>>> Performance would be better using percpu variables, as those would avoid
>>> the cacheline moved between cpus a lot.
>>
>> I see. But don't we need locking in this case? I can see scenario, when
>> one pCPU updates own counters while another pCPU is reading them.
>>
>> IIRC, ARMv8 guarantees that 64 bit read of aligned data would be
>> consistent. "Consistent" in the sense that, for example, we would not
>> see lower 32 bits of the new value and upper 32 bits of the old value.
> 
> That's right. Although this would be valid so long you use {read, 
> write}_atomic().
> 
>>
>> I can't say for sure about ARMv7 and about x86.
> ARMv7 with LPAE support will guarantee 64-bit atomicity when using 
> strd/ldrd as long as the alignment is correct. LPAE is mandatory when 
> supporting HYP mode, so you can safely assume this will work.
> 
> 64-bit on x86 is also guaranteed to be atomic when using write_atomic().

... and when again the data is suitably aligned, or at the very least
(for WB RAM) doesn't cross certain boundaries.

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-12 12:45       ` Julien Grall
  2020-06-12 22:16         ` Volodymyr Babchuk
@ 2020-06-18 20:24         ` Volodymyr Babchuk
  2020-06-18 20:34           ` Julien Grall
  1 sibling, 1 reply; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-18 20:24 UTC (permalink / raw)
  To: Julien Grall
  Cc: jgross, sstabellini, wl, andrew.cooper3, ian.jackson,
	george.dunlap, dfaggioli, jbeulich, xen-devel


Hi Julien,

Julien Grall writes:

> Hi Volodymyr,
>
> On 12/06/2020 12:44, Volodymyr Babchuk wrote:
>>
>> On Fri, 2020-06-12 at 06:57 +0200, Jürgen Groß wrote:
>>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
>>>> As scheduler code now collects time spent in IRQ handlers and in
>>>> do_softirq(), we can present those values to userspace tools like
>>>> xentop, so system administrator can see how system behaves.
>>>>
>>>> We are updating counters only in sched_get_time_correction() function
>>>> to minimize number of taken spinlocks. As atomic_t is 32 bit wide, it
>>>> is not enough to store time with nanosecond precision. So we need to
>>>> use 64 bit variables and protect them with spinlock.
>>>>
>>>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>>>> ---
>>>>    xen/common/sched/core.c     | 17 +++++++++++++++++
>>>>    xen/common/sysctl.c         |  1 +
>>>>    xen/include/public/sysctl.h |  4 +++-
>>>>    xen/include/xen/sched.h     |  2 ++
>>>>    4 files changed, 23 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>>>> index a7294ff5c3..ee6b1d9161 100644
>>>> --- a/xen/common/sched/core.c
>>>> +++ b/xen/common/sched/core.c
>>>> @@ -95,6 +95,10 @@ static struct scheduler __read_mostly ops;
>>>>       static bool scheduler_active;
>>>>    +static DEFINE_SPINLOCK(sched_stat_lock);
>>>> +s_time_t sched_stat_irq_time;
>>>> +s_time_t sched_stat_hyp_time;
>>>> +
>>>>    static void sched_set_affinity(
>>>>        struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft);
>>>>    @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct
>>>> sched_unit *u)
>>>>                break;
>>>>        }
>>>>    +    spin_lock_irqsave(&sched_stat_lock, flags);
>>>> +    sched_stat_irq_time += irq;
>>>> +    sched_stat_hyp_time += hyp;
>>>> +    spin_unlock_irqrestore(&sched_stat_lock, flags);
>>>
>>> Please don't use a lock. Just use add_sized() instead which will add
>>> atomically.
>>
>> Looks like arm does not support 64 bit variables. >
>> Julien, I believe, this is armv7 limitation? Should armv8 work with 64-
>> bit atomics?
>
> 64-bit atomics can work on both Armv7 and Armv8 :). It just haven't
> been plumbed yet.
>
> I am happy to write a patch if you need atomic64_t or even a 64-bit
> add_sized().

Looks like I'll need this patch. So, if you still have time, it will be
great, if you'll write it.

-- 
Volodymyr Babchuk at EPAM

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-18 20:24         ` Volodymyr Babchuk
@ 2020-06-18 20:34           ` Julien Grall
  2020-06-18 23:35             ` Volodymyr Babchuk
  0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2020-06-18 20:34 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: jgross, sstabellini, wl, andrew.cooper3, ian.jackson,
	george.dunlap, dfaggioli, jbeulich, xen-devel

On Thu, 18 Jun 2020 at 21:24, Volodymyr Babchuk
<Volodymyr_Babchuk@epam.com> wrote:
>
>
> Hi Julien,
>
> Julien Grall writes:
>
> > Hi Volodymyr,
> >
> > On 12/06/2020 12:44, Volodymyr Babchuk wrote:
> >>
> >> On Fri, 2020-06-12 at 06:57 +0200, Jürgen Groß wrote:
> >>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
> >>>> As scheduler code now collects time spent in IRQ handlers and in
> >>>> do_softirq(), we can present those values to userspace tools like
> >>>> xentop, so system administrator can see how system behaves.
> >>>>
> >>>> We are updating counters only in sched_get_time_correction() function
> >>>> to minimize number of taken spinlocks. As atomic_t is 32 bit wide, it
> >>>> is not enough to store time with nanosecond precision. So we need to
> >>>> use 64 bit variables and protect them with spinlock.
> >>>>
> >>>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> >>>> ---
> >>>>    xen/common/sched/core.c     | 17 +++++++++++++++++
> >>>>    xen/common/sysctl.c         |  1 +
> >>>>    xen/include/public/sysctl.h |  4 +++-
> >>>>    xen/include/xen/sched.h     |  2 ++
> >>>>    4 files changed, 23 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> >>>> index a7294ff5c3..ee6b1d9161 100644
> >>>> --- a/xen/common/sched/core.c
> >>>> +++ b/xen/common/sched/core.c
> >>>> @@ -95,6 +95,10 @@ static struct scheduler __read_mostly ops;
> >>>>       static bool scheduler_active;
> >>>>    +static DEFINE_SPINLOCK(sched_stat_lock);
> >>>> +s_time_t sched_stat_irq_time;
> >>>> +s_time_t sched_stat_hyp_time;
> >>>> +
> >>>>    static void sched_set_affinity(
> >>>>        struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft);
> >>>>    @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct
> >>>> sched_unit *u)
> >>>>                break;
> >>>>        }
> >>>>    +    spin_lock_irqsave(&sched_stat_lock, flags);
> >>>> +    sched_stat_irq_time += irq;
> >>>> +    sched_stat_hyp_time += hyp;
> >>>> +    spin_unlock_irqrestore(&sched_stat_lock, flags);
> >>>
> >>> Please don't use a lock. Just use add_sized() instead which will add
> >>> atomically.
> >>
> >> Looks like arm does not support 64 bit variables. >
> >> Julien, I believe, this is armv7 limitation? Should armv8 work with 64-
> >> bit atomics?
> >
> > 64-bit atomics can work on both Armv7 and Armv8 :). It just haven't
> > been plumbed yet.
> >
> > I am happy to write a patch if you need atomic64_t or even a 64-bit
> > add_sized().
>
> Looks like I'll need this patch. So, if you still have time, it will be
> great, if you'll write it.

I offered help for either the atomic64_t or the add_sized(). Can you
confirm which one you need?

Cheers,


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics.
  2020-06-18 20:34           ` Julien Grall
@ 2020-06-18 23:35             ` Volodymyr Babchuk
  0 siblings, 0 replies; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-06-18 23:35 UTC (permalink / raw)
  To: Julien Grall
  Cc: jgross, sstabellini, wl, andrew.cooper3, ian.jackson,
	george.dunlap, dfaggioli, jbeulich, xen-devel


Hi Julien,

Julien Grall writes:

> On Thu, 18 Jun 2020 at 21:24, Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com> wrote:
>>
>>
>> Hi Julien,
>>
>> Julien Grall writes:
>>
>> > Hi Volodymyr,
>> >
>> > On 12/06/2020 12:44, Volodymyr Babchuk wrote:
>> >>
>> >> On Fri, 2020-06-12 at 06:57 +0200, Jürgen Groß wrote:
>> >>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
>> >>>> As scheduler code now collects time spent in IRQ handlers and in
>> >>>> do_softirq(), we can present those values to userspace tools like
>> >>>> xentop, so system administrator can see how system behaves.
>> >>>>
>> >>>> We are updating counters only in sched_get_time_correction() function
>> >>>> to minimize number of taken spinlocks. As atomic_t is 32 bit wide, it
>> >>>> is not enough to store time with nanosecond precision. So we need to
>> >>>> use 64 bit variables and protect them with spinlock.
>> >>>>
>> >>>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>> >>>> ---
>> >>>>    xen/common/sched/core.c     | 17 +++++++++++++++++
>> >>>>    xen/common/sysctl.c         |  1 +
>> >>>>    xen/include/public/sysctl.h |  4 +++-
>> >>>>    xen/include/xen/sched.h     |  2 ++
>> >>>>    4 files changed, 23 insertions(+), 1 deletion(-)
>> >>>>
>> >>>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>> >>>> index a7294ff5c3..ee6b1d9161 100644
>> >>>> --- a/xen/common/sched/core.c
>> >>>> +++ b/xen/common/sched/core.c
>> >>>> @@ -95,6 +95,10 @@ static struct scheduler __read_mostly ops;
>> >>>>       static bool scheduler_active;
>> >>>>    +static DEFINE_SPINLOCK(sched_stat_lock);
>> >>>> +s_time_t sched_stat_irq_time;
>> >>>> +s_time_t sched_stat_hyp_time;
>> >>>> +
>> >>>>    static void sched_set_affinity(
>> >>>>        struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft);
>> >>>>    @@ -994,9 +998,22 @@ s_time_t sched_get_time_correction(struct
>> >>>> sched_unit *u)
>> >>>>                break;
>> >>>>        }
>> >>>>    +    spin_lock_irqsave(&sched_stat_lock, flags);
>> >>>> +    sched_stat_irq_time += irq;
>> >>>> +    sched_stat_hyp_time += hyp;
>> >>>> +    spin_unlock_irqrestore(&sched_stat_lock, flags);
>> >>>
>> >>> Please don't use a lock. Just use add_sized() instead which will add
>> >>> atomically.
>> >>
>> >> Looks like arm does not support 64 bit variables. >
>> >> Julien, I believe, this is armv7 limitation? Should armv8 work with 64-
>> >> bit atomics?
>> >
>> > 64-bit atomics can work on both Armv7 and Armv8 :). It just haven't
>> > been plumbed yet.
>> >
>> > I am happy to write a patch if you need atomic64_t or even a 64-bit
>> > add_sized().
>>
>> Looks like I'll need this patch. So, if you still have time, it will be
>> great, if you'll write it.
>
> I offered help for either the atomic64_t or the add_sized(). Can you
> confirm which one you need?

Yes, sorry. I had atomic64_t in mind.

-- 
Volodymyr Babchuk at EPAM

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks
  2020-06-12 11:40       ` Jürgen Groß
@ 2020-09-24 18:08         ` Volodymyr Babchuk
  2020-09-25 17:22           ` Dario Faggioli
  0 siblings, 1 reply; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-09-24 18:08 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: xen-devel, julien, jbeulich, wl, sstabellini, ian.jackson,
	george.dunlap, andrew.cooper3, dfaggioli


Hello Jürgen,

Jürgen Groß writes:

> On 12.06.20 13:30, Volodymyr Babchuk wrote:
>> On Fri, 2020-06-12 at 06:43 +0200, Jürgen Groß wrote:
>>> On 12.06.20 02:22, Volodymyr Babchuk wrote:

[...]
>>>> +    delta = NOW() - v->hyp_entry_time;
>>>> +    atomic_add(delta, &v->sched_unit->hyp_time);
>>>> +
>>>> +#ifndef NDEBUG
>>>> +    v->in_hyp_task = false;
>>>> +#endif
>>>> +}
>>>> +
>>>>    /*
>>>>     * Do the actual movement of an unit from old to new CPU. Locks for *both*
>>>>     * CPUs needs to have been taken already when calling this!
>>>> @@ -2615,6 +2646,7 @@ static void schedule(void)
>>>>           SCHED_STAT_CRANK(sched_run);
>>>>    +    vcpu_end_hyp_task(current);
>>>>        rcu_read_lock(&sched_res_rculock);
>>>>           lock = pcpu_schedule_lock_irq(cpu);
>>>> diff --git a/xen/common/softirq.c b/xen/common/softirq.c
>>>> index 063e93cbe3..03a29384d1 100644
>>>> --- a/xen/common/softirq.c
>>>> +++ b/xen/common/softirq.c
>>>> @@ -71,7 +71,9 @@ void process_pending_softirqs(void)
>>>>    void do_softirq(void)
>>>>    {
>>>>        ASSERT_NOT_IN_ATOMIC();
>>>> +    vcpu_begin_hyp_task(current);
>>>>        __do_softirq(0);
>>>> +    vcpu_end_hyp_task(current);
>>>
>>> This won't work for scheduling. current will either have changed,
>>> or in x86 case __do_softirq() might just not return. You need to
>>> handle that case explicitly in schedule() (you did that for the
>>> old vcpu, but for the case schedule() is returning you need to
>>> call vcpu_begin_hyp_task(current) there).
>>>
>> Well, this is one of questions, I wanted to discuss. I certainly
>> need
>> to call vcpu_begin_hyp_task(current) after context switch. But what it
>> is the right place? If my understaning is right, code on x86 platform
>> will never reach this point. Or I'm wrong there?
>
> No, this is correct.
>
> You can add the call to context_switch() just after set_current() has
> been called.

Looks like I'm missing something there. If I get this right, code you
mentioned is executed right before leaving hypervisor.

So, as I see this, functions are called in the following way (on x86):

1. do_softirq() calls vcpu_begin_hyp_task() and then executes
__do_softirq()

2. __do_softirq() does different jobs and eventually calls schedule()

3. schedule() calls vcpu_end_hyp_task() and makes scheduling decision
which leads to call to context_switch()

4. On end context_switch() we will exit hypervisor and enter VM. At
least, this is how I understand

       nextd->arch.ctxt_switch->tail(next);

call.

So, no need to call vcpu_begin_hyp_task() in context_switch() for x86.

On ARM, this is different story. There, I am calling
vcpu_begin_hyp_task() after set_current() because ARM code will
eventually return to do_softirq() and there will be called corresponding
vcpu_end_hyp_task().

I have put bunch of ASSERTs to ensure that vcpu_begin_hyp_task() or
vcpu_end_hyp_task() are not called twice and that vcpu_end_hyp_task() is
called after vcpu_begin_hyp_task(). Those asserts are not failing, so I
assume that I did all this in the right way :)

-- 
Volodymyr Babchuk at EPAM

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks
  2020-09-24 18:08         ` Volodymyr Babchuk
@ 2020-09-25 17:22           ` Dario Faggioli
  2020-09-25 20:21             ` Volodymyr Babchuk
  0 siblings, 1 reply; 43+ messages in thread
From: Dario Faggioli @ 2020-09-25 17:22 UTC (permalink / raw)
  To: Volodymyr Babchuk, Jürgen Groß
  Cc: xen-devel, julien, jbeulich, wl, sstabellini, ian.jackson,
	george.dunlap, andrew.cooper3

[-- Attachment #1: Type: text/plain, Size: 1634 bytes --]

On Thu, 2020-09-24 at 18:08 +0000, Volodymyr Babchuk wrote:
> So, as I see this, functions are called in the following way (on
> x86):
> 
> 1. do_softirq() calls vcpu_begin_hyp_task() and then executes
> __do_softirq()
> 
> 2. __do_softirq() does different jobs and eventually calls schedule()
> 
> 3. schedule() calls vcpu_end_hyp_task() and makes scheduling decision
> which leads to call to context_switch()
> 
> 4. On end context_switch() we will exit hypervisor and enter VM. At
> least, this is how I understand
> 
>        nextd->arch.ctxt_switch->tail(next);
> 
> call.
> 
> So, no need to call vcpu_begin_hyp_task() in context_switch() for
> x86.
> 
Mmm... This looks correct to me too.

And what about the cases where schedule() does return?

Are these also fine because they're handled within __do_softirq()
(i.e., without actually going back to do_softirq() and hence never
calling end_hyp_task() for a second time)?


> I have put bunch of ASSERTs to ensure that vcpu_begin_hyp_task() or
> vcpu_end_hyp_task() are not called twice and that vcpu_end_hyp_task()
> is
> called after vcpu_begin_hyp_task(). Those asserts are not failing, so
> I
> assume that I did all this in the right way :)
> 
Yeah, good to know. :-)

Are you doing these tests with both core-scheduling disabled and
enabled?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks
  2020-09-25 17:22           ` Dario Faggioli
@ 2020-09-25 20:21             ` Volodymyr Babchuk
  2020-09-25 21:42               ` Dario Faggioli
  0 siblings, 1 reply; 43+ messages in thread
From: Volodymyr Babchuk @ 2020-09-25 20:21 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Jürgen Groß,
	xen-devel, julien, jbeulich, wl, sstabellini, ian.jackson,
	george.dunlap, andrew.cooper3


Hi Dario,


Dario Faggioli writes:

> On Thu, 2020-09-24 at 18:08 +0000, Volodymyr Babchuk wrote:
>> So, as I see this, functions are called in the following way (on
>> x86):
>> 
>> 1. do_softirq() calls vcpu_begin_hyp_task() and then executes
>> __do_softirq()
>> 
>> 2. __do_softirq() does different jobs and eventually calls schedule()
>> 
>> 3. schedule() calls vcpu_end_hyp_task() and makes scheduling decision
>> which leads to call to context_switch()
>> 
>> 4. On end context_switch() we will exit hypervisor and enter VM. At
>> least, this is how I understand
>> 
>>        nextd->arch.ctxt_switch->tail(next);
>> 
>> call.
>> 
>> So, no need to call vcpu_begin_hyp_task() in context_switch() for
>> x86.
>> 
> Mmm... This looks correct to me too.
>
> And what about the cases where schedule() does return?

Can it return on x86? I want to test this case, but how force it? Null
scheduler, perhaps?

> Are these also fine because they're handled within __do_softirq()
> (i.e., without actually going back to do_softirq() and hence never
> calling end_hyp_task() for a second time)?

I afraid, that there will be a bug. schedule() calls end_hyp_task(), and
if it will eventually return from __do_softirq() to do_softirq(),
end_hyp_task() will be called twice.

>
>> I have put bunch of ASSERTs to ensure that vcpu_begin_hyp_task() or
>> vcpu_end_hyp_task() are not called twice and that vcpu_end_hyp_task()
>> is
>> called after vcpu_begin_hyp_task(). Those asserts are not failing, so
>> I
>> assume that I did all this in the right way :)
>> 
> Yeah, good to know. :-)
>
> Are you doing these tests with both core-scheduling disabled and
> enabled?

Good question. On x86 I am running Xen in QEMU. With -smp=2 it sees two
CPUs:

(XEN) Brought up 2 CPUs
(XEN) Scheduling granularity: cpu, 1 CPU per sched-resource

You are right, I need to try other variants of scheduling granularity.

Do you by any chance know how to emulate more complex setup in QEMU?
Also, what is the preferred way to test/debug Xen on x86?

-- 
Volodymyr Babchuk at EPAM

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks
  2020-09-25 20:21             ` Volodymyr Babchuk
@ 2020-09-25 21:42               ` Dario Faggioli
  0 siblings, 0 replies; 43+ messages in thread
From: Dario Faggioli @ 2020-09-25 21:42 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Jürgen Groß,
	xen-devel, julien, jbeulich, wl, sstabellini, ian.jackson,
	george.dunlap, andrew.cooper3

[-- Attachment #1: Type: text/plain, Size: 3764 bytes --]

On Fri, 2020-09-25 at 20:21 +0000, Volodymyr Babchuk wrote:
> Hi Dario,
> 
Hi! :-)

> Dario Faggioli writes:
> > And what about the cases where schedule() does return?
> 
> Can it return on x86? I want to test this case, but how force it?
> Null
> scheduler, perhaps?
> 
> > Are these also fine because they're handled within __do_softirq()
> > (i.e., without actually going back to do_softirq() and hence never
> > calling end_hyp_task() for a second time)?
> 
> I afraid, that there will be a bug. schedule() calls end_hyp_task(),
> and
> if it will eventually return from __do_softirq() to do_softirq(),
> end_hyp_task() will be called twice.
>
Yeah, exactly. That's why I was asking whether you had verified that we
actually never get to this. Either because we context switch or because
we stay inside __do_schedule() and never go back to do_schedule().

I was, in fact, referring to all the various cases of handling primary
and secondary scheduling request, when core-scheduling is enabled.

> > > I have put bunch of ASSERTs to ensure that vcpu_begin_hyp_task()
> > > or
> > > vcpu_end_hyp_task() are not called twice and that
> > > vcpu_end_hyp_task()
> > > is
> > > called after vcpu_begin_hyp_task(). Those asserts are not
> > > failing, so
> > > I
> > > assume that I did all this in the right way :)
> > > 
> > Yeah, good to know. :-)
> > 
> > Are you doing these tests with both core-scheduling disabled and
> > enabled?
> 
> Good question. On x86 I am running Xen in QEMU. With -smp=2 it sees
> two
> CPUs:
> 
> (XEN) Brought up 2 CPUs
> (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
> 
> You are right, I need to try other variants of scheduling
> granularity.
> 
> Do you by any chance know how to emulate more complex setup in QEMU?
>
Like enabling a virtual topology, on top of which you could test core
(or socket) scheduling? If yes, indeed you can do that in QEMU:

https://www.qemu.org/docs/master/qemu-doc.html

-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies]
     [,sockets=sockets][,maxcpus=maxcpus]

Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
are supported. On Sparc32 target, Linux limits the number of usable
CPUs to 4. For the PC target, the number of cores per die, the number
of threads per cores, the number of dies per packages and the total
number of sockets can be specified. Missing values will be computed. If
any on the three values is given, the total number of CPUs n can be
omitted. maxcpus specifies the maximum number of hotpluggable CPUs.

Once you have an SMT virtual topology, you can boot Xen inside, with an
higher scheduling granularity.

A (rather big!) example would be:

-smp 224,sockets=4,cores=28,threads=2

You can even define a virtual NUMA topology, if you want.

And you can pin the vCPUs to the physical CPUs of the hosts, in such a
way that the virtual topology is mapped to the physical one. This is
good for performance but also increase a little bit the accuracy of
testing.

> Also, what is the preferred way to test/debug Xen on x86?
> 
I test on real hardware, at least most of the times, if this is what
you're asking.

Checking if the code is "functionally correct" is ok-ish if done in a
VM first. But then, especially for scheduling related things, where
timing plays a rather significant role, I personally prefer to test on
actual hardware sooner rather than later.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2020-09-25 21:42 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-12  0:22 [RFC PATCH v1 0/6] Fair scheduling Volodymyr Babchuk
2020-06-12  0:22 ` [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks Volodymyr Babchuk
2020-06-12  4:43   ` Jürgen Groß
2020-06-12 11:30     ` Volodymyr Babchuk
2020-06-12 11:40       ` Jürgen Groß
2020-09-24 18:08         ` Volodymyr Babchuk
2020-09-25 17:22           ` Dario Faggioli
2020-09-25 20:21             ` Volodymyr Babchuk
2020-09-25 21:42               ` Dario Faggioli
2020-06-16 10:10   ` Jan Beulich
2020-06-18  2:50     ` Volodymyr Babchuk
2020-06-18  6:34       ` Jan Beulich
2020-06-12  0:22 ` [RFC PATCH v1 1/6] sched: track time spent in IRQ handler Volodymyr Babchuk
2020-06-12  4:36   ` Jürgen Groß
2020-06-12 11:26     ` Volodymyr Babchuk
2020-06-12 11:29       ` Julien Grall
2020-06-12 11:33         ` Volodymyr Babchuk
2020-06-12 12:21           ` Julien Grall
2020-06-12 20:08             ` Dario Faggioli
2020-06-12 22:25               ` Volodymyr Babchuk
2020-06-12 22:54               ` Julien Grall
2020-06-16 10:06   ` Jan Beulich
2020-06-12  0:22 ` [RFC PATCH v1 3/6] sched, credit2: improve scheduler fairness Volodymyr Babchuk
2020-06-12  4:51   ` Jürgen Groß
2020-06-12 11:38     ` Volodymyr Babchuk
2020-06-12  0:22 ` [RFC PATCH v1 5/6] tools: xentop: show time spent in IRQ and HYP states Volodymyr Babchuk
2020-06-12  0:22 ` [RFC PATCH v1 6/6] trace: add fair scheduling trace events Volodymyr Babchuk
2020-06-12  0:22 ` [RFC PATCH v1 4/6] xentop: collect IRQ and HYP time statistics Volodymyr Babchuk
2020-06-12  4:57   ` Jürgen Groß
2020-06-12 11:44     ` Volodymyr Babchuk
2020-06-12 12:45       ` Julien Grall
2020-06-12 22:16         ` Volodymyr Babchuk
2020-06-18 20:24         ` Volodymyr Babchuk
2020-06-18 20:34           ` Julien Grall
2020-06-18 23:35             ` Volodymyr Babchuk
2020-06-12 12:29     ` Julien Grall
2020-06-12 12:41       ` Jürgen Groß
2020-06-12 15:29         ` Dario Faggioli
2020-06-12 22:27           ` Volodymyr Babchuk
2020-06-13  6:22             ` Jürgen Groß
2020-06-18  2:58               ` Volodymyr Babchuk
2020-06-18 15:17                 ` Julien Grall
2020-06-18 15:23                   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).