xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/5] xen/arm: support big.little SoC
@ 2016-09-19  2:08 van.freenix
  2016-09-19  2:08 ` [RFC 1/5] xen/arm: domain_build: setting opt_dom0_max_vcpus according to cpupool0 info van.freenix
                   ` (5 more replies)
  0 siblings, 6 replies; 85+ messages in thread
From: van.freenix @ 2016-09-19  2:08 UTC (permalink / raw)
  To: julien.grall, sstabellini, jbeulich, andrew.cooper3, jgross,
	dario.faggioli
  Cc: Peng Fan, xen-devel

From: Peng Fan <peng.fan@nxp.com>

This patchset is to support XEN run on big.little SoC.
The idea of the patch is from
"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html"

There are some changes to cpupool and add x86 stub functions to avoid build
break. Sending The RFC patchset out is to request for comments to see whether
this implementation is acceptable or not. Patchset have been tested based on
xen-4.8 unstable on NXP i.MX8.

I use Big/Little CPU and cpupool to explain the idea.
A pool contains Big CPUs is called Big Pool.
A pool contains Little CPUs is called Little Pool.
If a pool does not contains any physical cpus, Little CPUs or Big CPUs
can be added to the cpupool. But the cpupool can not contain both Little
and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for ARM).
CPUs can not be added to the cpupool which contains cpus that have different cpu type.
Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
When XEN tries to bringup all the CPUs, only add CPUs with the same cpu type(same midr value)
into cpupool0.

Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
`xl cpupool-list -c` will show cpu[0-3] in Pool-0.

Then use the following script to create a new cpupool and add cpu[4-5] to
the cpupool.
 #xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
 #xl cpupool-cpu-add Pool-A72 4
 #xl cpupool-cpu-add Pool-A72 5
 #xl create -d /root/xen/domu-test pool=\"Pool-A72\"
Now `xl cpupool-list -c` shows:
Name            CPU list
Pool-0          0,1,2,3 
Pool-A72        4,5

`xl cpupool-list` shows:
Name               CPUs   Sched     Active   Domain count
Pool-0               4    credit       y          1
Pool-A72             2   credit2       y          1

`xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.

`xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
in Pool-A72 with A72 vcpu, while Pool-0 have A53 physical cpus.

Patch 1/5:
use "cpumask_weight(cpupool0->cpu_valid);" to replace "num_online_cpus()",
because num_online_cpus() counts all the online CPUs, but now we only
need Big or Little CPUs.

Patch 2/5:
Introduce cpupool_arch_info. To ARM SoC, need to add midr info to the cpupool.
The info will be used in patch [3,4,5]/5.

Patch 3/5:
Need to check whether it is ok to add a physical cpu to a cpupool,
When the cpupool does not contain any physical cpus, it is ok
to add a cpu to the cpupool without care the cpu type.
Need to check whether it is ok to move a domain to another cpupool.

Patch 4/5:
move vpidr from arch_domain to arch_vcpu.
The vpidr in arch_domain is initialized in arch_domain_create,
at this time, the domain is still in cpupool0, not moved the specified
cpupool. We need to initialize vpidr later. But at the late stage,
no method to initialize vpidr in arch_domain, so I move it to
arch_vcpu.

Patch 5/5:
This is to check whether it is ok to move a domain to another cpupool.

Peng Fan (5):
  xen/arm: domain_build: setting opt_dom0_max_vcpus according to
    cpupool0 info
  xen: cpupool: introduce cpupool_arch_info
  xen: cpupool: add arch cpupool hook
  xen/arm: move vpidr from arch_domain to arch_vcpu
  xen/arm: cpupool: implement arch_domain_cpupool_compatible

 xen/arch/arm/Makefile         |  1 +
 xen/arch/arm/cpupool.c        | 60 +++++++++++++++++++++++++++++++++++++++++++
 xen/arch/arm/domain.c         |  9 ++++---
 xen/arch/arm/domain_build.c   |  3 ++-
 xen/arch/arm/traps.c          |  2 +-
 xen/arch/x86/cpu/Makefile     |  1 +
 xen/arch/x86/cpu/cpupool.c    | 30 ++++++++++++++++++++++
 xen/common/cpupool.c          | 30 ++++++++++++++++++++++
 xen/include/asm-arm/cpupool.h | 16 ++++++++++++
 xen/include/asm-arm/domain.h  |  9 ++++---
 xen/include/asm-x86/cpupool.h | 16 ++++++++++++
 xen/include/xen/sched-if.h    |  5 ++++
 12 files changed, 173 insertions(+), 9 deletions(-)
 create mode 100644 xen/arch/arm/cpupool.c
 create mode 100644 xen/arch/x86/cpu/cpupool.c
 create mode 100644 xen/include/asm-arm/cpupool.h
 create mode 100644 xen/include/asm-x86/cpupool.h

-- 
2.6.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [RFC 1/5] xen/arm: domain_build: setting opt_dom0_max_vcpus according to cpupool0 info
  2016-09-19  2:08 [RFC 0/5] xen/arm: support big.little SoC van.freenix
@ 2016-09-19  2:08 ` van.freenix
  2016-09-19  2:08 ` [RFC 2/5] xen: cpupool: introduce cpupool_arch_info van.freenix
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 85+ messages in thread
From: van.freenix @ 2016-09-19  2:08 UTC (permalink / raw)
  To: julien.grall, sstabellini, jbeulich, andrew.cooper3, jgross,
	dario.faggioli
  Cc: Peng Fan, xen-devel

From: Peng Fan <peng.fan@nxp.com>

Setting opt_dom0_max_vcpus according to cpu_valid in cpupool0.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
---
 xen/arch/arm/domain_build.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 35ab08d..d171c39 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -5,6 +5,7 @@
 #include <xen/mm.h>
 #include <xen/domain_page.h>
 #include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <asm/irq.h>
 #include <asm/regs.h>
 #include <xen/errno.h>
@@ -59,7 +60,7 @@ custom_param("dom0_mem", parse_dom0_mem);
 struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
 {
     if ( opt_dom0_max_vcpus == 0 )
-        opt_dom0_max_vcpus = num_online_cpus();
+        opt_dom0_max_vcpus = cpumask_weight(cpupool0->cpu_valid);
     if ( opt_dom0_max_vcpus > MAX_VIRT_CPUS )
         opt_dom0_max_vcpus = MAX_VIRT_CPUS;
 
-- 
2.6.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [RFC 2/5] xen: cpupool: introduce cpupool_arch_info
  2016-09-19  2:08 [RFC 0/5] xen/arm: support big.little SoC van.freenix
  2016-09-19  2:08 ` [RFC 1/5] xen/arm: domain_build: setting opt_dom0_max_vcpus according to cpupool0 info van.freenix
@ 2016-09-19  2:08 ` van.freenix
  2016-09-19  2:08 ` [RFC 3/5] xen: cpupool: add arch cpupool hook van.freenix
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 85+ messages in thread
From: van.freenix @ 2016-09-19  2:08 UTC (permalink / raw)
  To: julien.grall, sstabellini, jbeulich, andrew.cooper3, jgross,
	dario.faggioli
  Cc: Peng Fan, xen-devel

From: Peng Fan <peng.fan@nxp.com>

Intrdouce cpupool_arch_info.
To ARM, add a 'midr' entry to hold the MIDR info of the cpupool.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/include/asm-arm/cpupool.h | 16 ++++++++++++++++
 xen/include/asm-x86/cpupool.h | 16 ++++++++++++++++
 xen/include/xen/sched-if.h    |  2 ++
 3 files changed, 34 insertions(+)
 create mode 100644 xen/include/asm-arm/cpupool.h
 create mode 100644 xen/include/asm-x86/cpupool.h

diff --git a/xen/include/asm-arm/cpupool.h b/xen/include/asm-arm/cpupool.h
new file mode 100644
index 0000000..f450199
--- /dev/null
+++ b/xen/include/asm-arm/cpupool.h
@@ -0,0 +1,16 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+struct cpupool_arch_info
+{
+    uint32_t midr; /* Hold the MIDR info of the pool */
+};
diff --git a/xen/include/asm-x86/cpupool.h b/xen/include/asm-x86/cpupool.h
new file mode 100644
index 0000000..3251709
--- /dev/null
+++ b/xen/include/asm-x86/cpupool.h
@@ -0,0 +1,16 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+struct cpupool_arch_info
+{
+    /* Nothing now.. */
+};
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index bc0e794..eb52ac7 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -8,6 +8,7 @@
 #ifndef __XEN_SCHED_IF_H__
 #define __XEN_SCHED_IF_H__
 
+#include <asm/cpupool.h>
 #include <xen/percpu.h>
 
 /* A global pointer to the initial cpupool (POOL0). */
@@ -186,6 +187,7 @@ struct cpupool
     unsigned int     n_dom;
     struct scheduler *sched;
     atomic_t         refcnt;
+    struct cpupool_arch_info info;
 };
 
 #define cpupool_online_cpumask(_pool) \
-- 
2.6.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [RFC 3/5] xen: cpupool: add arch cpupool hook
  2016-09-19  2:08 [RFC 0/5] xen/arm: support big.little SoC van.freenix
  2016-09-19  2:08 ` [RFC 1/5] xen/arm: domain_build: setting opt_dom0_max_vcpus according to cpupool0 info van.freenix
  2016-09-19  2:08 ` [RFC 2/5] xen: cpupool: introduce cpupool_arch_info van.freenix
@ 2016-09-19  2:08 ` van.freenix
  2016-09-19  2:08 ` [RFC 4/5] xen/arm: move vpidr from arch_domain to arch_vcpu van.freenix
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 85+ messages in thread
From: van.freenix @ 2016-09-19  2:08 UTC (permalink / raw)
  To: julien.grall, sstabellini, jbeulich, andrew.cooper3, jgross,
	dario.faggioli
  Cc: Peng Fan, xen-devel

From: Peng Fan <peng.fan@nxp.com>

Introduce arch_cpupool_create, arch_cpupool_cpu_add,
and arch_domain_cpupool_compatible.

To X86, nothing to do, just add an empty stub functions.

To ARM, arch_cpupool_create initialize midr with value -1;
arch_cpupool_cpu_add, if there is cpu in the cpupool or the cpu
is the first one that will be added to the cpupool, assign cpu
midr to cpupool midr. If the midr already initialzed and there is
valid cpus in the pool, check whether the midr of cpu and cpupool
is compatbile or not; arch_domain_cpupool_compatible is to check
whether the domain is ok to be moved into the cpupool.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
---
 xen/arch/arm/Makefile      |  1 +
 xen/arch/arm/cpupool.c     | 45 +++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/cpu/Makefile  |  1 +
 xen/arch/x86/cpu/cpupool.c | 30 ++++++++++++++++++++++++++++++
 xen/common/cpupool.c       | 30 ++++++++++++++++++++++++++++++
 xen/include/xen/sched-if.h |  3 +++
 6 files changed, 110 insertions(+)
 create mode 100644 xen/arch/arm/cpupool.c
 create mode 100644 xen/arch/x86/cpu/cpupool.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 23aaf52..1b72f66 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -9,6 +9,7 @@ obj-y += bootfdt.o
 obj-y += cpu.o
 obj-y += cpuerrata.o
 obj-y += cpufeature.o
+obj-y += cpupool.o
 obj-y += decode.o
 obj-y += device.o
 obj-y += domain.o
diff --git a/xen/arch/arm/cpupool.c b/xen/arch/arm/cpupool.c
new file mode 100644
index 0000000..74a5ef3
--- /dev/null
+++ b/xen/arch/arm/cpupool.c
@@ -0,0 +1,45 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <xen/cpumask.h>
+#include <xen/sched.h>
+#include <xen/sched-if.h>
+
+int arch_cpupool_create(struct cpupool *c)
+{
+    c->info.midr = -1;
+
+    return 0;
+}
+
+int arch_cpupool_cpu_add(struct cpupool *c, unsigned int cpu)
+{
+    if (( c->info.midr == -1 ) || ( cpumask_weight(c->cpu_valid) == 0 ))
+    {
+        c->info.midr = cpu_data[cpu].midr.bits;
+
+        return 0;
+    }
+    else if (( c->info.midr == cpu_data[cpu].midr.bits ))
+    {
+        return 0;
+    }
+    else
+    {
+        return -EINVAL;
+    }
+}
+
+bool_t arch_domain_cpupool_compatible(struct domain *d, struct cpupool *c)
+{
+    return true;
+}
diff --git a/xen/arch/x86/cpu/Makefile b/xen/arch/x86/cpu/Makefile
index 74f23ae..0d3036e 100644
--- a/xen/arch/x86/cpu/Makefile
+++ b/xen/arch/x86/cpu/Makefile
@@ -4,6 +4,7 @@ subdir-y += mtrr
 obj-y += amd.o
 obj-y += centaur.o
 obj-y += common.o
+obj-y += cpupool.o
 obj-y += intel.o
 obj-y += intel_cacheinfo.o
 obj-y += mwait-idle.o
diff --git a/xen/arch/x86/cpu/cpupool.c b/xen/arch/x86/cpu/cpupool.c
new file mode 100644
index 0000000..d897a1f
--- /dev/null
+++ b/xen/arch/x86/cpu/cpupool.c
@@ -0,0 +1,30 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <xen/cpumask.h>
+#include <xen/sched.h>
+#include <xen/sched-if.h>
+
+int arch_cpupool_create(struct cpupool *c)
+{
+    return 0;
+}
+
+int arch_cpupool_cpu_add(struct cpupool *c, unsigned int cpu)
+{
+    return 0;
+}
+
+bool_t arch_domain_cpupool_compatible(struct domain *d, struct cpupool *c)
+{
+    return true;
+}
diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 9998394..798322d 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -134,11 +134,20 @@ static struct cpupool *cpupool_create(
     struct cpupool *c;
     struct cpupool **q;
     int last = 0;
+    int ret;
 
     *perr = -ENOMEM;
     if ( (c = alloc_cpupool_struct()) == NULL )
         return NULL;
 
+    ret = arch_cpupool_create(c);
+    if ( ret )
+    {
+        *perr = ret;
+        free_cpupool_struct(c);
+        return NULL;
+    }
+
     /* One reference for caller, one reference for cpupool_destroy(). */
     atomic_set(&c->refcnt, 2);
 
@@ -498,6 +507,8 @@ static int cpupool_cpu_add(unsigned int cpu)
         {
             if ( cpumask_test_cpu(cpu, (*c)->cpu_suspended ) )
             {
+                if ( arch_cpupool_cpu_add(*c, cpu) )
+                    continue;
                 ret = cpupool_assign_cpu_locked(*c, cpu);
                 if ( ret )
                     goto out;
@@ -516,6 +527,13 @@ static int cpupool_cpu_add(unsigned int cpu)
     }
     else
     {
+        if ( arch_cpupool_cpu_add(cpupool0, cpu) )
+        {
+           /* No fail, just skip adding the cpu to cpupool0 */
+            ret = 0;
+            goto out;
+	}
+
         /*
          * If we are not resuming, we are hot-plugging cpu, and in which case
          * we add it to pool0, as it certainly was there when hot-unplagged
@@ -657,6 +675,9 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
         ret = -ENOENT;
         if ( c == NULL )
             goto addcpu_out;
+        ret = -EINVAL;
+        if ( arch_cpupool_cpu_add(c, cpu) )
+            goto addcpu_out;
         ret = cpupool_assign_cpu_locked(c, cpu);
     addcpu_out:
         spin_unlock(&cpupool_lock);
@@ -707,7 +728,16 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
 
         c = cpupool_find_by_id(op->cpupool_id);
         if ( (c != NULL) && cpumask_weight(c->cpu_valid) )
+        {
+            if ( !arch_domain_cpupool_compatible (d, c) )
+            {
+                ret = -EINVAL;
+                spin_unlock(&cpupool_lock);
+                rcu_unlock_domain(d);
+                break;
+	    }
             ret = cpupool_move_domain_locked(d, c);
+        }
 
         spin_unlock(&cpupool_lock);
         cpupool_dprintk("cpupool move_domain(dom=%d)->pool=%d ret %d\n",
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index eb52ac7..1b2d4bf 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -203,4 +203,7 @@ static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
     return d->cpupool->cpu_valid;
 }
 
+int arch_cpupool_create(struct cpupool *c);
+int arch_cpupool_cpu_add(struct cpupool *c, unsigned int cpu);
+bool_t arch_domain_cpupool_compatible(struct domain *d, struct cpupool *c);
 #endif /* __XEN_SCHED_IF_H__ */
-- 
2.6.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [RFC 4/5] xen/arm: move vpidr from arch_domain to arch_vcpu
  2016-09-19  2:08 [RFC 0/5] xen/arm: support big.little SoC van.freenix
                   ` (2 preceding siblings ...)
  2016-09-19  2:08 ` [RFC 3/5] xen: cpupool: add arch cpupool hook van.freenix
@ 2016-09-19  2:08 ` van.freenix
  2016-09-19  2:08 ` [RFC 5/5] xen/arm: cpupool: implement arch_domain_cpupool_compatible van.freenix
  2016-09-19  8:09 ` [RFC 0/5] xen/arm: support big.little SoC Julien Grall
  5 siblings, 0 replies; 85+ messages in thread
From: van.freenix @ 2016-09-19  2:08 UTC (permalink / raw)
  To: julien.grall, sstabellini, jbeulich, andrew.cooper3, jgross,
	dario.faggioli
  Cc: Peng Fan, xen-devel

From: Peng Fan <peng.fan@nxp.com>

Move vpidr from arch_domain to arch_vcpu.

In order to support Big.Little, Big CPUs and Little CPUs are
assigned to different cpupools.

when a new domain is to be created with cpupool specificed, the domain
is first assigned to cpupool0 and then the domain moved from cpupool0
to the specified cpupool.

In domain creation process arch_domain_create, vpidr is initialized
with boot_cpu_data.midr.bits. But Big cpupool and Little cpupool have
different midr, the guest vcpu midr should use the midr info from
cpupool which the domain runs in.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
---
 xen/arch/arm/domain.c        | 9 +++++----
 xen/arch/arm/traps.c         | 2 +-
 xen/include/asm-arm/domain.h | 9 ++++++---
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 20bb2ba..934c112 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -14,6 +14,7 @@
 #include <xen/init.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/softirq.h>
 #include <xen/wait.h>
 #include <xen/errno.h>
@@ -150,7 +151,7 @@ static void ctxt_switch_to(struct vcpu *n)
 
     p2m_restore_state(n);
 
-    WRITE_SYSREG32(n->domain->arch.vpidr, VPIDR_EL2);
+    WRITE_SYSREG32(n->arch.vpidr, VPIDR_EL2);
     WRITE_SYSREG(n->arch.vmpidr, VMPIDR_EL2);
 
     /* VGIC */
@@ -521,6 +522,9 @@ int vcpu_initialise(struct vcpu *v)
 
     v->arch.actlr = READ_SYSREG32(ACTLR_EL1);
 
+    /* The virtual ID matches the physical id of the cpu in the cpupool */
+    v->arch.vpidr = v->domain->cpupool->info.midr;
+
     processor_vcpu_initialise(v);
 
     if ( (rc = vcpu_vgic_init(v)) != 0 )
@@ -562,9 +566,6 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
     if ( (d->shared_info = alloc_xenheap_pages(0, 0)) == NULL )
         goto fail;
 
-    /* Default the virtual ID to match the physical */
-    d->arch.vpidr = boot_cpu_data.midr.bits;
-
     clear_page(d->shared_info);
     share_xen_page_with_guest(
         virt_to_page(d->shared_info), d, XENSHARE_writable);
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 683bcb2..c0ad97e 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1975,7 +1975,7 @@ static void do_cp14_32(struct cpu_user_regs *regs, const union hsr hsr)
          *  - Variant and Revision bits match MDIR
          */
         val = (1 << 24) | (5 << 16);
-        val |= ((d->arch.vpidr >> 20) & 0xf) | (d->arch.vpidr & 0xf);
+        val |= ((d->vcpu[0]->arch.vpidr >> 20) & 0xf) | (d->vcpu[0]->arch.vpidr & 0xf);
         set_user_reg(regs, regidx, val);
 
         break;
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 9452fcd..b998c6d 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -63,9 +63,6 @@ struct arch_domain
         RELMEM_done,
     } relmem;
 
-    /* Virtual CPUID */
-    uint32_t vpidr;
-
     struct {
         uint64_t offset;
     } phys_timer_base;
@@ -173,6 +170,12 @@ struct arch_vcpu
     uint32_t esr;
 #endif
 
+    /*
+     * Holds the value of the Virtualization Processor ID.
+     * This is the value returned by Non-secure EL1 reads of MIDR_EL1.
+     */
+    uint32_t vpidr;
+
     uint32_t ifsr; /* 32-bit guests only */
     uint32_t afsr0, afsr1;
 
-- 
2.6.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [RFC 5/5] xen/arm: cpupool: implement arch_domain_cpupool_compatible
  2016-09-19  2:08 [RFC 0/5] xen/arm: support big.little SoC van.freenix
                   ` (3 preceding siblings ...)
  2016-09-19  2:08 ` [RFC 4/5] xen/arm: move vpidr from arch_domain to arch_vcpu van.freenix
@ 2016-09-19  2:08 ` van.freenix
  2016-09-19  8:09 ` [RFC 0/5] xen/arm: support big.little SoC Julien Grall
  5 siblings, 0 replies; 85+ messages in thread
From: van.freenix @ 2016-09-19  2:08 UTC (permalink / raw)
  To: julien.grall, sstabellini, jbeulich, andrew.cooper3, jgross,
	dario.faggioli
  Cc: Peng Fan, xen-devel

From: Peng Fan <peng.fan@nxp.com>

When migrating domain between different cpupools, need to check
whether the domain is compatible with the cpupool.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
---
 xen/arch/arm/cpupool.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/cpupool.c b/xen/arch/arm/cpupool.c
index 74a5ef3..6c1c092 100644
--- a/xen/arch/arm/cpupool.c
+++ b/xen/arch/arm/cpupool.c
@@ -41,5 +41,20 @@ int arch_cpupool_cpu_add(struct cpupool *c, unsigned int cpu)
 
 bool_t arch_domain_cpupool_compatible(struct domain *d, struct cpupool *c)
 {
-    return true;
+    if ( !d->vcpu || !d->vcpu[0] )
+    {
+        /*
+         * We are in process of domain creation, vcpu not constructed or
+         * initialiszed, ok to move domain from cpupool0 to other pool
+         */
+        return true;
+    }
+    else if ( d->vcpu[0] )
+    {
+        return !!( d->vcpu[0]->arch.vpidr == c->info.midr );
+    }
+    else
+    {
+        return false;
+    }
 }
-- 
2.6.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19  2:08 [RFC 0/5] xen/arm: support big.little SoC van.freenix
                   ` (4 preceding siblings ...)
  2016-09-19  2:08 ` [RFC 5/5] xen/arm: cpupool: implement arch_domain_cpupool_compatible van.freenix
@ 2016-09-19  8:09 ` Julien Grall
  2016-09-19  8:36   ` Peng Fan
  5 siblings, 1 reply; 85+ messages in thread
From: Julien Grall @ 2016-09-19  8:09 UTC (permalink / raw)
  To: van.freenix, sstabellini, jbeulich, andrew.cooper3, jgross,
	dario.faggioli
  Cc: Peng Fan, xen-devel

Hello Peng,

On 19/09/2016 04:08, van.freenix@gmail.com wrote:
> From: Peng Fan <peng.fan@nxp.com>
>
> This patchset is to support XEN run on big.little SoC.
> The idea of the patch is from
> "https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html"
>
> There are some changes to cpupool and add x86 stub functions to avoid build
> break. Sending The RFC patchset out is to request for comments to see whether
> this implementation is acceptable or not. Patchset have been tested based on
> xen-4.8 unstable on NXP i.MX8.
>
> I use Big/Little CPU and cpupool to explain the idea.
> A pool contains Big CPUs is called Big Pool.
> A pool contains Little CPUs is called Little Pool.
> If a pool does not contains any physical cpus, Little CPUs or Big CPUs
> can be added to the cpupool. But the cpupool can not contain both Little
> and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for ARM).
> CPUs can not be added to the cpupool which contains cpus that have different cpu type.
> Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
> and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
> When XEN tries to bringup all the CPUs, only add CPUs with the same cpu type(same midr value)
> into cpupool0.

As mentioned in the mail you pointed above, this series is not enough to 
make big.LITTLE working on then. Xen is always using the boot CPU to 
detect the list of features. With big.LITTLE features may not be the same.

And I would prefer to see Xen supporting big.LITTLE correctly before 
beginning to think to expose big.LITTLE to the userspace (via cpupool) 
automatically.

See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).

>
> Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
> that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
> cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
> `xl cpupool-list -c` will show cpu[0-3] in Pool-0.
>
> Then use the following script to create a new cpupool and add cpu[4-5] to
> the cpupool.
>  #xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
>  #xl cpupool-cpu-add Pool-A72 4
>  #xl cpupool-cpu-add Pool-A72 5
>  #xl create -d /root/xen/domu-test pool=\"Pool-A72\"

I am a bit confused with these runes. It means that only the first kind 
of CPUs have pool assigned. Why don't you directly create all the pools 
at boot time?

Also, in which pool a domain will be created if none is specified?

> Now `xl cpupool-list -c` shows:
> Name            CPU list
> Pool-0          0,1,2,3
> Pool-A72        4,5
>
> `xl cpupool-list` shows:
> Name               CPUs   Sched     Active   Domain count
> Pool-0               4    credit       y          1
> Pool-A72             2   credit2       y          1
>
> `xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
> not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.
>
> `xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
> in Pool-A72 with A72 vcpu, while Pool-0 have A53 physical cpus.
>
> Patch 1/5:
> use "cpumask_weight(cpupool0->cpu_valid);" to replace "num_online_cpus()",
> because num_online_cpus() counts all the online CPUs, but now we only
> need Big or Little CPUs.

So if I understand correctly, if the boot CPU is a little CPU, DOM0 will 
always be able to only use little ones. Is that right?

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19  8:09 ` [RFC 0/5] xen/arm: support big.little SoC Julien Grall
@ 2016-09-19  8:36   ` Peng Fan
  2016-09-19  8:53     ` Julien Grall
  0 siblings, 1 reply; 85+ messages in thread
From: Peng Fan @ 2016-09-19  8:36 UTC (permalink / raw)
  To: Julien Grall
  Cc: jgross, Peng Fan, sstabellini, andrew.cooper3, dario.faggioli,
	xen-devel, jbeulich

Hello Julien,

On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:
>Hello Peng,
>
>On 19/09/2016 04:08, van.freenix@gmail.com wrote:
>>From: Peng Fan <peng.fan@nxp.com>
>>
>>This patchset is to support XEN run on big.little SoC.
>>The idea of the patch is from
>>"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html"
>>
>>There are some changes to cpupool and add x86 stub functions to avoid build
>>break. Sending The RFC patchset out is to request for comments to see whether
>>this implementation is acceptable or not. Patchset have been tested based on
>>xen-4.8 unstable on NXP i.MX8.
>>
>>I use Big/Little CPU and cpupool to explain the idea.
>>A pool contains Big CPUs is called Big Pool.
>>A pool contains Little CPUs is called Little Pool.
>>If a pool does not contains any physical cpus, Little CPUs or Big CPUs
>>can be added to the cpupool. But the cpupool can not contain both Little
>>and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for ARM).
>>CPUs can not be added to the cpupool which contains cpus that have different cpu type.
>>Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
>>and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
>>When XEN tries to bringup all the CPUs, only add CPUs with the same cpu type(same midr value)
>>into cpupool0.
>
>As mentioned in the mail you pointed above, this series is not enough to make
>big.LITTLE working on then. Xen is always using the boot CPU to detect the
>list of features. With big.LITTLE features may not be the same.
>
>And I would prefer to see Xen supporting big.LITTLE correctly before
>beginning to think to expose big.LITTLE to the userspace (via cpupool)
>automatically.

Do you mean vcpus be scheduled between big and little cpus freely?

This patchset is to use cpupool to block the vcpu be scheduled between big and
little cpus.

>
>See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).

Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
the A53 and A72 TRM about AArch64 implementationd defined registers.
This actlr can be added to the cpupool_arch_info as midr.

Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
Please advise if I missed anything else.

>
>>
>>Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
>>that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
>>cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
>>`xl cpupool-list -c` will show cpu[0-3] in Pool-0.
>>
>>Then use the following script to create a new cpupool and add cpu[4-5] to
>>the cpupool.
>> #xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
>> #xl cpupool-cpu-add Pool-A72 4
>> #xl cpupool-cpu-add Pool-A72 5
>> #xl create -d /root/xen/domu-test pool=\"Pool-A72\"
>
>I am a bit confused with these runes. It means that only the first kind of
>CPUs have pool assigned. Why don't you directly create all the pools at boot
>time?

If need to create all the pools, need to decided how many pools need to be created.
I thought about this, but I do not come out a good idea.

The cpupool0 is defined in xen/common/cpupool.c, if need to create many pools,
need to alloc cpupools dynamically when booting. I would not like to change a
lot to common code.

The implementation in this patchset I think is an easy way to let Big and Little
CPUs all run.

>
>Also, in which pool a domain will be created if none is specified?
>
>>Now `xl cpupool-list -c` shows:
>>Name            CPU list
>>Pool-0          0,1,2,3
>>Pool-A72        4,5
>>
>>`xl cpupool-list` shows:
>>Name               CPUs   Sched     Active   Domain count
>>Pool-0               4    credit       y          1
>>Pool-A72             2   credit2       y          1
>>
>>`xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
>>not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.
>>
>>`xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
>>in Pool-A72 with A72 vcpu, while Pool-0 have A53 physical cpus.
>>
>>Patch 1/5:
>>use "cpumask_weight(cpupool0->cpu_valid);" to replace "num_online_cpus()",
>>because num_online_cpus() counts all the online CPUs, but now we only
>>need Big or Little CPUs.
>
>So if I understand correctly, if the boot CPU is a little CPU, DOM0 will
>always be able to only use little ones. Is that right?

Yeah. Dom0 only use the little ones.

Thanks,
Peng.
>
>Regards,
>
>-- 
>Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19  8:36   ` Peng Fan
@ 2016-09-19  8:53     ` Julien Grall
  2016-09-19  9:38       ` Peng Fan
                         ` (2 more replies)
  0 siblings, 3 replies; 85+ messages in thread
From: Julien Grall @ 2016-09-19  8:53 UTC (permalink / raw)
  To: Peng Fan
  Cc: jgross, Peng Fan, sstabellini, George Dunlap, andrew.cooper3,
	dario.faggioli, xen-devel, jbeulich

Hello,

On 19/09/2016 10:36, Peng Fan wrote:
> On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:
>> Hello Peng,
>>
>> On 19/09/2016 04:08, van.freenix@gmail.com wrote:
>>> From: Peng Fan <peng.fan@nxp.com>
>>>
>>> This patchset is to support XEN run on big.little SoC.
>>> The idea of the patch is from
>>> "https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html"
>>>
>>> There are some changes to cpupool and add x86 stub functions to avoid build
>>> break. Sending The RFC patchset out is to request for comments to see whether
>>> this implementation is acceptable or not. Patchset have been tested based on
>>> xen-4.8 unstable on NXP i.MX8.
>>>
>>> I use Big/Little CPU and cpupool to explain the idea.
>>> A pool contains Big CPUs is called Big Pool.
>>> A pool contains Little CPUs is called Little Pool.
>>> If a pool does not contains any physical cpus, Little CPUs or Big CPUs
>>> can be added to the cpupool. But the cpupool can not contain both Little
>>> and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for ARM).
>>> CPUs can not be added to the cpupool which contains cpus that have different cpu type.
>>> Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
>>> and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
>>> When XEN tries to bringup all the CPUs, only add CPUs with the same cpu type(same midr value)
>>> into cpupool0.
>>
>> As mentioned in the mail you pointed above, this series is not enough to make
>> big.LITTLE working on then. Xen is always using the boot CPU to detect the
>> list of features. With big.LITTLE features may not be the same.
>>
>> And I would prefer to see Xen supporting big.LITTLE correctly before
>> beginning to think to expose big.LITTLE to the userspace (via cpupool)
>> automatically.
>
> Do you mean vcpus be scheduled between big and little cpus freely?

By supporting big.LITTLE correctly I meant Xen thinks that all the cores 
has the same set of features. So the feature detection is only done the 
boot CPU. See processor_setup for instance...

Moving vCPUs between big and little cores would be a hard task (cache 
line issue, and possibly feature) and I don't expect to ever cross this 
in Xen. However, I am expecting to see big.LITTLE exposed to the guest 
(i.e having big and little vCPUs).

>
> This patchset is to use cpupool to block the vcpu be scheduled between big and
> little cpus.
>
>>
>> See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).
>
> Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
> the A53 and A72 TRM about AArch64 implementationd defined registers.
> This actlr can be added to the cpupool_arch_info as midr.
>
> Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
> Please advise if I missed anything else.

Have you check the register emulation?

>
>>
>>>
>>> Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
>>> that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
>>> cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
>>> `xl cpupool-list -c` will show cpu[0-3] in Pool-0.
>>>
>>> Then use the following script to create a new cpupool and add cpu[4-5] to
>>> the cpupool.
>>> #xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
>>> #xl cpupool-cpu-add Pool-A72 4
>>> #xl cpupool-cpu-add Pool-A72 5
>>> #xl create -d /root/xen/domu-test pool=\"Pool-A72\"
>>
>> I am a bit confused with these runes. It means that only the first kind of
>> CPUs have pool assigned. Why don't you directly create all the pools at boot
>> time?
>
> If need to create all the pools, need to decided how many pools need to be created.
> I thought about this, but I do not come out a good idea.
>
> The cpupool0 is defined in xen/common/cpupool.c, if need to create many pools,
> need to alloc cpupools dynamically when booting. I would not like to change a
> lot to common code.

Why? We should avoid to choose a specific design just because the common 
code does not allow you to do it without heavy change.

We never came across the big.LITTLE problem on x86, so it is normal to 
modify the code.

> The implementation in this patchset I think is an easy way to let Big and Little
> CPUs all run.

I care about having a design allowing an easy use of big.LITTLE on Xen. 
Your solution requires the administrator to know the underlying platform 
and create the pool.

In the solution I suggested, the pools would be created by Xen (and the 
info exposed to the userspace for the admin).

>
>>
>> Also, in which pool a domain will be created if none is specified?
>>
>>> Now `xl cpupool-list -c` shows:
>>> Name            CPU list
>>> Pool-0          0,1,2,3
>>> Pool-A72        4,5
>>>
>>> `xl cpupool-list` shows:
>>> Name               CPUs   Sched     Active   Domain count
>>> Pool-0               4    credit       y          1
>>> Pool-A72             2   credit2       y          1
>>>
>>> `xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
>>> not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.
>>>
>>> `xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
>>> in Pool-A72 with A72 vcpu, while Pool-0 have A53 physical cpus.
>>>
>>> Patch 1/5:
>>> use "cpumask_weight(cpupool0->cpu_valid);" to replace "num_online_cpus()",
>>> because num_online_cpus() counts all the online CPUs, but now we only
>>> need Big or Little CPUs.
>>
>> So if I understand correctly, if the boot CPU is a little CPU, DOM0 will
>> always be able to only use little ones. Is that right?
>
> Yeah. Dom0 only use the little ones.

This is really bad, dom0 on normal case will have all the backends. It 
may not be possible to select the boot CPU, and therefore always get a 
little CPU.

Creating the pool at boot time would have avoid a such issue because, 
unless we expose big.LITTLE to dom0 (I would need the input of George 
and Dario for this bits), we could have a parameter to specify which set 
of CPUs (e.g pool) to allocate dom0 vCPUs.

Note, that I am not asking you to implement everything. But I think we 
need a coherent view of big.LITTLE support in Xen today to go forward.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19  8:53     ` Julien Grall
@ 2016-09-19  9:38       ` Peng Fan
  2016-09-19  9:59         ` Julien Grall
  2016-09-19  9:45       ` George Dunlap
  2016-09-19 13:08       ` Peng Fan
  2 siblings, 1 reply; 85+ messages in thread
From: Peng Fan @ 2016-09-19  9:38 UTC (permalink / raw)
  To: Julien Grall
  Cc: jgross, Peng Fan, sstabellini, George Dunlap, andrew.cooper3,
	dario.faggioli, xen-devel, jbeulich

On Mon, Sep 19, 2016 at 10:53:56AM +0200, Julien Grall wrote:
>Hello,
>
>On 19/09/2016 10:36, Peng Fan wrote:
>>On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:
>>>Hello Peng,
>>>
>>>On 19/09/2016 04:08, van.freenix@gmail.com wrote:
>>>>From: Peng Fan <peng.fan@nxp.com>
>>>>
>>>>This patchset is to support XEN run on big.little SoC.
>>>>The idea of the patch is from
>>>>"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html"
>>>>
>>>>There are some changes to cpupool and add x86 stub functions to avoid build
>>>>break. Sending The RFC patchset out is to request for comments to see whether
>>>>this implementation is acceptable or not. Patchset have been tested based on
>>>>xen-4.8 unstable on NXP i.MX8.
>>>>
>>>>I use Big/Little CPU and cpupool to explain the idea.
>>>>A pool contains Big CPUs is called Big Pool.
>>>>A pool contains Little CPUs is called Little Pool.
>>>>If a pool does not contains any physical cpus, Little CPUs or Big CPUs
>>>>can be added to the cpupool. But the cpupool can not contain both Little
>>>>and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for ARM).
>>>>CPUs can not be added to the cpupool which contains cpus that have different cpu type.
>>>>Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
>>>>and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
>>>>When XEN tries to bringup all the CPUs, only add CPUs with the same cpu type(same midr value)
>>>>into cpupool0.
>>>
>>>As mentioned in the mail you pointed above, this series is not enough to make
>>>big.LITTLE working on then. Xen is always using the boot CPU to detect the
>>>list of features. With big.LITTLE features may not be the same.
>>>
>>>And I would prefer to see Xen supporting big.LITTLE correctly before
>>>beginning to think to expose big.LITTLE to the userspace (via cpupool)
>>>automatically.
>>
>>Do you mean vcpus be scheduled between big and little cpus freely?
>
>By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
>the same set of features. So the feature detection is only done the boot CPU.
>See processor_setup for instance...
>
>Moving vCPUs between big and little cores would be a hard task (cache line
>issue, and possibly feature) and I don't expect to ever cross this in Xen.
>However, I am expecting to see big.LITTLE exposed to the guest (i.e having
>big and little vCPUs).

big vCPUs scheduled on big Physical CPUs and little vCPUs scheduled on little
physical cpus, right?
If it is, is there is a need to let Xen think all the cores has the same set
of features?

Developing big.little guest support, I am not sure how much efforts needed.
Is this really needed? 

>
>>
>>This patchset is to use cpupool to block the vcpu be scheduled between big and
>>little cpus.
>>
>>>
>>>See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).
>>
>>Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
>>the A53 and A72 TRM about AArch64 implementationd defined registers.
>>This actlr can be added to the cpupool_arch_info as midr.
>>
>>Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
>>Please advise if I missed anything else.
>
>Have you check the register emulation?

Checked midr. Have not checked others.
I think I missed some registers in ctxt_switch_to.

>
>>
>>>
>>>>
>>>>Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
>>>>that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
>>>>cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
>>>>`xl cpupool-list -c` will show cpu[0-3] in Pool-0.
>>>>
>>>>Then use the following script to create a new cpupool and add cpu[4-5] to
>>>>the cpupool.
>>>>#xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
>>>>#xl cpupool-cpu-add Pool-A72 4
>>>>#xl cpupool-cpu-add Pool-A72 5
>>>>#xl create -d /root/xen/domu-test pool=\"Pool-A72\"
>>>
>>>I am a bit confused with these runes. It means that only the first kind of
>>>CPUs have pool assigned. Why don't you directly create all the pools at boot
>>>time?
>>
>>If need to create all the pools, need to decided how many pools need to be created.
>>I thought about this, but I do not come out a good idea.
>>
>>The cpupool0 is defined in xen/common/cpupool.c, if need to create many pools,
>>need to alloc cpupools dynamically when booting. I would not like to change a
>>lot to common code.
>
>Why? We should avoid to choose a specific design just because the common code
>does not allow you to do it without heavy change.
>
>We never came across the big.LITTLE problem on x86, so it is normal to modify
>the code.
>
>>The implementation in this patchset I think is an easy way to let Big and Little
>>CPUs all run.
>
>I care about having a design allowing an easy use of big.LITTLE on Xen. Your
>solution requires the administrator to know the underlying platform and
>create the pool.

I suppose big.little is mainly used in embedded SoC :). So the user(developer?)
needs to know the hardware platform.

>
>In the solution I suggested, the pools would be created by Xen (and the info
>exposed to the userspace for the admin).

I think the reason to create cpupools to support big.little SoC is to
avoid vcpus scheduled between big and little physical cpus.

If need to support big.little guest, I think no need to create more
cpupools expect cpupoo0. Need to make sure vcpus not be scheduled between
big and little physical cpus. All the cpus needs to be in one cpupool.

>
>>
>>>
>>>Also, in which pool a domain will be created if none is specified?
>>>
>>>>Now `xl cpupool-list -c` shows:
>>>>Name            CPU list
>>>>Pool-0          0,1,2,3
>>>>Pool-A72        4,5
>>>>
>>>>`xl cpupool-list` shows:
>>>>Name               CPUs   Sched     Active   Domain count
>>>>Pool-0               4    credit       y          1
>>>>Pool-A72             2   credit2       y          1
>>>>
>>>>`xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
>>>>not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.
>>>>
>>>>`xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
>>>>in Pool-A72 with A72 vcpu, while Pool-0 have A53 physical cpus.
>>>>
>>>>Patch 1/5:
>>>>use "cpumask_weight(cpupool0->cpu_valid);" to replace "num_online_cpus()",
>>>>because num_online_cpus() counts all the online CPUs, but now we only
>>>>need Big or Little CPUs.
>>>
>>>So if I understand correctly, if the boot CPU is a little CPU, DOM0 will
>>>always be able to only use little ones. Is that right?
>>
>>Yeah. Dom0 only use the little ones.
>
>This is really bad, dom0 on normal case will have all the backends. It may
>not be possible to select the boot CPU, and therefore always get a little
>CPU.

Dom0 runs in cpupool0. cpupool0 only contains the cpu[0-3] in my case.

>
>Creating the pool at boot time would have avoid a such issue because, unless
>we expose big.LITTLE to dom0 (I would need the input of George and Dario for
>this bits), we could have a parameter to specify which set of CPUs (e.g pool)
>to allocate dom0 vCPUs.

dom0 is control domain, I think no need to expose big.little for dom0.
Pin VCPU to specific physical cpus, this may help support big.little guest.

>
>Note, that I am not asking you to implement everything. But I think we need a
>coherent view of big.LITTLE support in Xen today to go forward.

Yeah. Then you prefer supporting big.little guest?
Please advise if you have any plan/ideas or what I can do on this.

Thanks,
Peng.

>
>Regards,
>
>-- 
>Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19  8:53     ` Julien Grall
  2016-09-19  9:38       ` Peng Fan
@ 2016-09-19  9:45       ` George Dunlap
  2016-09-19 10:06         ` Julien Grall
  2016-09-19 13:08       ` Peng Fan
  2 siblings, 1 reply; 85+ messages in thread
From: George Dunlap @ 2016-09-19  9:45 UTC (permalink / raw)
  To: Julien Grall
  Cc: Jürgen Groß,
	Peng Fan, Stefano Stabellini, Andrew Cooper, Dario Faggioli,
	xen-devel, Jan Beulich, Peng Fan

On Mon, Sep 19, 2016 at 9:53 AM, Julien Grall <julien.grall@arm.com> wrote:
>>> As mentioned in the mail you pointed above, this series is not enough to
>>> make
>>> big.LITTLE working on then. Xen is always using the boot CPU to detect
>>> the
>>> list of features. With big.LITTLE features may not be the same.
>>>
>>> And I would prefer to see Xen supporting big.LITTLE correctly before
>>> beginning to think to expose big.LITTLE to the userspace (via cpupool)
>>> automatically.
>>
>>
>> Do you mean vcpus be scheduled between big and little cpus freely?
>
>
> By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
> the same set of features. So the feature detection is only done the boot
> CPU. See processor_setup for instance...
>
> Moving vCPUs between big and little cores would be a hard task (cache line
> issue, and possibly feature) and I don't expect to ever cross this in Xen.
> However, I am expecting to see big.LITTLE exposed to the guest (i.e having
> big and little vCPUs).

So it sounds like the big and LITTLE cores are architecturally
different enough that software must be aware of which one it's running
on?

Exposing varying numbers of big and LITTLE vcpus to guests seems like
a sensible approach.  But at the moment cpupools only allow a domain
to be in exactly one pool -- meaning if we use cpupools to control the
big.LITTLE placement, you won't be *able* to have guests with both big
and LITTLE vcpus.

>> If need to create all the pools, need to decided how many pools need to be
>> created.
>> I thought about this, but I do not come out a good idea.
>>
>> The cpupool0 is defined in xen/common/cpupool.c, if need to create many
>> pools,
>> need to alloc cpupools dynamically when booting. I would not like to
>> change a
>> lot to common code.
>
>
> Why? We should avoid to choose a specific design just because the common
> code does not allow you to do it without heavy change.
>
> We never came across the big.LITTLE problem on x86, so it is normal to
> modify the code.

Julien is correct; there's no reason we couldn't have a default
multiple pools on boot.

>> The implementation in this patchset I think is an easy way to let Big and
>> Little
>> CPUs all run.
>
>
> I care about having a design allowing an easy use of big.LITTLE on Xen. Your
> solution requires the administrator to know the underlying platform and
> create the pool.
>
> In the solution I suggested, the pools would be created by Xen (and the info
> exposed to the userspace for the admin).

FWIW another approach could be the one taken by "xl
cpupool-numa-split": you could have "xl cpupool-bigLITTLE-split" or
something that would automatically set up the pools.

But expanding the schedulers to know about different classes of cpus,
and having vcpus specified as running only on specific types of pcpus,
seems like a more flexible approach.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19  9:38       ` Peng Fan
@ 2016-09-19  9:59         ` Julien Grall
  2016-09-19 13:15           ` Peng Fan
  0 siblings, 1 reply; 85+ messages in thread
From: Julien Grall @ 2016-09-19  9:59 UTC (permalink / raw)
  To: Peng Fan
  Cc: jgross, Peng Fan, sstabellini, George Dunlap, andrew.cooper3,
	dario.faggioli, xen-devel, jbeulich



On 19/09/2016 11:38, Peng Fan wrote:
> On Mon, Sep 19, 2016 at 10:53:56AM +0200, Julien Grall wrote:
>> Hello,
>>
>> On 19/09/2016 10:36, Peng Fan wrote:
>>> On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:
>>>> Hello Peng,
>>>>
>>>> On 19/09/2016 04:08, van.freenix@gmail.com wrote:
>>>>> From: Peng Fan <peng.fan@nxp.com>
>>>>>
>>>>> This patchset is to support XEN run on big.little SoC.
>>>>> The idea of the patch is from
>>>>> "https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html"
>>>>>
>>>>> There are some changes to cpupool and add x86 stub functions to avoid build
>>>>> break. Sending The RFC patchset out is to request for comments to see whether
>>>>> this implementation is acceptable or not. Patchset have been tested based on
>>>>> xen-4.8 unstable on NXP i.MX8.
>>>>>
>>>>> I use Big/Little CPU and cpupool to explain the idea.
>>>>> A pool contains Big CPUs is called Big Pool.
>>>>> A pool contains Little CPUs is called Little Pool.
>>>>> If a pool does not contains any physical cpus, Little CPUs or Big CPUs
>>>>> can be added to the cpupool. But the cpupool can not contain both Little
>>>>> and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for ARM).
>>>>> CPUs can not be added to the cpupool which contains cpus that have different cpu type.
>>>>> Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
>>>>> and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
>>>>> When XEN tries to bringup all the CPUs, only add CPUs with the same cpu type(same midr value)
>>>>> into cpupool0.
>>>>
>>>> As mentioned in the mail you pointed above, this series is not enough to make
>>>> big.LITTLE working on then. Xen is always using the boot CPU to detect the
>>>> list of features. With big.LITTLE features may not be the same.
>>>>
>>>> And I would prefer to see Xen supporting big.LITTLE correctly before
>>>> beginning to think to expose big.LITTLE to the userspace (via cpupool)
>>>> automatically.
>>>
>>> Do you mean vcpus be scheduled between big and little cpus freely?
>>
>> By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
>> the same set of features. So the feature detection is only done the boot CPU.
>> See processor_setup for instance...
>>
>> Moving vCPUs between big and little cores would be a hard task (cache line
>> issue, and possibly feature) and I don't expect to ever cross this in Xen.
>> However, I am expecting to see big.LITTLE exposed to the guest (i.e having
>> big and little vCPUs).
>
> big vCPUs scheduled on big Physical CPUs and little vCPUs scheduled on little
> physical cpus, right?
> If it is, is there is a need to let Xen think all the cores has the same set
> of features?

I think you missed my point. The feature registers on big and little 
cores may be different. Currently, Xen is reading the feature registers 
of the CPU boot and wrongly assumes that those features will exists on 
all CPUs. This is not the case and should be fixed before we are getting 
in trouble.

>
> Developing big.little guest support, I am not sure how much efforts needed.
> Is this really needed?

This is not necessary at the moment, although I have seen some interest 
about it. Running a guest only on a little core is a nice beginning, but 
a guest may want to take advantage of big.LITTLE (running hungry app on 
big one and little on small one).

>
>>
>>>
>>> This patchset is to use cpupool to block the vcpu be scheduled between big and
>>> little cpus.
>>>
>>>>
>>>> See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).
>>>
>>> Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
>>> the A53 and A72 TRM about AArch64 implementationd defined registers.
>>> This actlr can be added to the cpupool_arch_info as midr.
>>>
>>> Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
>>> Please advise if I missed anything else.
>>
>> Have you check the register emulation?
>
> Checked midr. Have not checked others.
> I think I missed some registers in ctxt_switch_to.
>
>>
>>>
>>>>
>>>>>
>>>>> Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
>>>>> that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
>>>>> cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
>>>>> `xl cpupool-list -c` will show cpu[0-3] in Pool-0.
>>>>>
>>>>> Then use the following script to create a new cpupool and add cpu[4-5] to
>>>>> the cpupool.
>>>>> #xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
>>>>> #xl cpupool-cpu-add Pool-A72 4
>>>>> #xl cpupool-cpu-add Pool-A72 5
>>>>> #xl create -d /root/xen/domu-test pool=\"Pool-A72\"
>>>>
>>>> I am a bit confused with these runes. It means that only the first kind of
>>>> CPUs have pool assigned. Why don't you directly create all the pools at boot
>>>> time?
>>>
>>> If need to create all the pools, need to decided how many pools need to be created.
>>> I thought about this, but I do not come out a good idea.
>>>
>>> The cpupool0 is defined in xen/common/cpupool.c, if need to create many pools,
>>> need to alloc cpupools dynamically when booting. I would not like to change a
>>> lot to common code.
>>
>> Why? We should avoid to choose a specific design just because the common code
>> does not allow you to do it without heavy change.
>>
>> We never came across the big.LITTLE problem on x86, so it is normal to modify
>> the code.
>>
>>> The implementation in this patchset I think is an easy way to let Big and Little
>>> CPUs all run.
>>
>> I care about having a design allowing an easy use of big.LITTLE on Xen. Your
>> solution requires the administrator to know the underlying platform and
>> create the pool.
>
> I suppose big.little is mainly used in embedded SoC :). So the user(developer?)
> needs to know the hardware platform.

The user will always be happy if Xen can save him a bit of time to 
create cpupool. ;)

>
>>
>> In the solution I suggested, the pools would be created by Xen (and the info
>> exposed to the userspace for the admin).
>
> I think the reason to create cpupools to support big.little SoC is to
> avoid vcpus scheduled between big and little physical cpus.
>
> If need to support big.little guest, I think no need to create more
> cpupools expect cpupoo0. Need to make sure vcpus not be scheduled between
> big and little physical cpus. All the cpus needs to be in one cpupool.
>
>>
>>>
>>>>
>>>> Also, in which pool a domain will be created if none is specified?
>>>>
>>>>> Now `xl cpupool-list -c` shows:
>>>>> Name            CPU list
>>>>> Pool-0          0,1,2,3
>>>>> Pool-A72        4,5
>>>>>
>>>>> `xl cpupool-list` shows:
>>>>> Name               CPUs   Sched     Active   Domain count
>>>>> Pool-0               4    credit       y          1
>>>>> Pool-A72             2   credit2       y          1
>>>>>
>>>>> `xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
>>>>> not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.
>>>>>
>>>>> `xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
>>>>> in Pool-A72 with A72 vcpu, while Pool-0 have A53 physical cpus.
>>>>>
>>>>> Patch 1/5:
>>>>> use "cpumask_weight(cpupool0->cpu_valid);" to replace "num_online_cpus()",
>>>>> because num_online_cpus() counts all the online CPUs, but now we only
>>>>> need Big or Little CPUs.
>>>>
>>>> So if I understand correctly, if the boot CPU is a little CPU, DOM0 will
>>>> always be able to only use little ones. Is that right?
>>>
>>> Yeah. Dom0 only use the little ones.
>>
>> This is really bad, dom0 on normal case will have all the backends. It may
>> not be possible to select the boot CPU, and therefore always get a little
>> CPU.
>
> Dom0 runs in cpupool0. cpupool0 only contains the cpu[0-3] in my case.

So the performance of dom0 will be impacted because it will only use 
little cores.

>
>>
>> Creating the pool at boot time would have avoid a such issue because, unless
>> we expose big.LITTLE to dom0 (I would need the input of George and Dario for
>> this bits), we could have a parameter to specify which set of CPUs (e.g pool)
>> to allocate dom0 vCPUs.
>
> dom0 is control domain, I think no need to expose big.little for dom0.
> Pin VCPU to specific physical cpus, this may help support big.little guest.
>
>>
>> Note, that I am not asking you to implement everything. But I think we need a
>> coherent view of big.LITTLE support in Xen today to go forward.
>
> Yeah. Then you prefer supporting big.little guest?

I have seen some interest on it.

> Please advise if you have any plan/ideas or what I can do on this.

I already gave some ideas on what could be done for big.LITTLE support. 
But, I admit I haven't yet much think about it, so I may miss some part.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19  9:45       ` George Dunlap
@ 2016-09-19 10:06         ` Julien Grall
  2016-09-19 10:23           ` Juergen Gross
  2016-09-19 10:33           ` George Dunlap
  0 siblings, 2 replies; 85+ messages in thread
From: Julien Grall @ 2016-09-19 10:06 UTC (permalink / raw)
  To: George Dunlap
  Cc: Jürgen Groß,
	Peng Fan, Stefano Stabellini, Andrew Cooper, Dario Faggioli,
	xen-devel, Jan Beulich, Peng Fan

Hi George,

On 19/09/2016 11:45, George Dunlap wrote:
> On Mon, Sep 19, 2016 at 9:53 AM, Julien Grall <julien.grall@arm.com> wrote:
>>>> As mentioned in the mail you pointed above, this series is not enough to
>>>> make
>>>> big.LITTLE working on then. Xen is always using the boot CPU to detect
>>>> the
>>>> list of features. With big.LITTLE features may not be the same.
>>>>
>>>> And I would prefer to see Xen supporting big.LITTLE correctly before
>>>> beginning to think to expose big.LITTLE to the userspace (via cpupool)
>>>> automatically.
>>>
>>>
>>> Do you mean vcpus be scheduled between big and little cpus freely?
>>
>>
>> By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
>> the same set of features. So the feature detection is only done the boot
>> CPU. See processor_setup for instance...
>>
>> Moving vCPUs between big and little cores would be a hard task (cache line
>> issue, and possibly feature) and I don't expect to ever cross this in Xen.
>> However, I am expecting to see big.LITTLE exposed to the guest (i.e having
>> big and little vCPUs).
>
> So it sounds like the big and LITTLE cores are architecturally
> different enough that software must be aware of which one it's running
> on?

That's correct. Each big and LITTLE cores may have different errata, 
different features...

It has also the advantage to let the guest dealing itself with its own 
power efficiency without introducing a specific Xen interface.

>
> Exposing varying numbers of big and LITTLE vcpus to guests seems like
> a sensible approach.  But at the moment cpupools only allow a domain
> to be in exactly one pool -- meaning if we use cpupools to control the
> big.LITTLE placement, you won't be *able* to have guests with both big
> and LITTLE vcpus.
>
>>> If need to create all the pools, need to decided how many pools need to be
>>> created.
>>> I thought about this, but I do not come out a good idea.
>>>
>>> The cpupool0 is defined in xen/common/cpupool.c, if need to create many
>>> pools,
>>> need to alloc cpupools dynamically when booting. I would not like to
>>> change a
>>> lot to common code.
>>
>>
>> Why? We should avoid to choose a specific design just because the common
>> code does not allow you to do it without heavy change.
>>
>> We never came across the big.LITTLE problem on x86, so it is normal to
>> modify the code.
>
> Julien is correct; there's no reason we couldn't have a default
> multiple pools on boot.
>
>>> The implementation in this patchset I think is an easy way to let Big and
>>> Little
>>> CPUs all run.
>>
>>
>> I care about having a design allowing an easy use of big.LITTLE on Xen. Your
>> solution requires the administrator to know the underlying platform and
>> create the pool.
>>
>> In the solution I suggested, the pools would be created by Xen (and the info
>> exposed to the userspace for the admin).
>
> FWIW another approach could be the one taken by "xl
> cpupool-numa-split": you could have "xl cpupool-bigLITTLE-split" or
> something that would automatically set up the pools.
>
> But expanding the schedulers to know about different classes of cpus,
> and having vcpus specified as running only on specific types of pcpus,
> seems like a more flexible approach.

So, if I understand correctly, you would not recommend to extend the 
number of CPU pool per domain, correct?

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19 10:06         ` Julien Grall
@ 2016-09-19 10:23           ` Juergen Gross
  2016-09-19 17:18             ` Dario Faggioli
  2016-09-19 20:55             ` Stefano Stabellini
  2016-09-19 10:33           ` George Dunlap
  1 sibling, 2 replies; 85+ messages in thread
From: Juergen Gross @ 2016-09-19 10:23 UTC (permalink / raw)
  To: Julien Grall, George Dunlap
  Cc: Peng Fan, Stefano Stabellini, Andrew Cooper, Dario Faggioli,
	xen-devel, Jan Beulich, Peng Fan

On 19/09/16 12:06, Julien Grall wrote:
> Hi George,
> 
> On 19/09/2016 11:45, George Dunlap wrote:
>> On Mon, Sep 19, 2016 at 9:53 AM, Julien Grall <julien.grall@arm.com>
>> wrote:
>>>>> As mentioned in the mail you pointed above, this series is not
>>>>> enough to
>>>>> make
>>>>> big.LITTLE working on then. Xen is always using the boot CPU to detect
>>>>> the
>>>>> list of features. With big.LITTLE features may not be the same.
>>>>>
>>>>> And I would prefer to see Xen supporting big.LITTLE correctly before
>>>>> beginning to think to expose big.LITTLE to the userspace (via cpupool)
>>>>> automatically.
>>>>
>>>>
>>>> Do you mean vcpus be scheduled between big and little cpus freely?
>>>
>>>
>>> By supporting big.LITTLE correctly I meant Xen thinks that all the
>>> cores has
>>> the same set of features. So the feature detection is only done the boot
>>> CPU. See processor_setup for instance...
>>>
>>> Moving vCPUs between big and little cores would be a hard task (cache
>>> line
>>> issue, and possibly feature) and I don't expect to ever cross this in
>>> Xen.
>>> However, I am expecting to see big.LITTLE exposed to the guest (i.e
>>> having
>>> big and little vCPUs).
>>
>> So it sounds like the big and LITTLE cores are architecturally
>> different enough that software must be aware of which one it's running
>> on?
> 
> That's correct. Each big and LITTLE cores may have different errata,
> different features...
> 
> It has also the advantage to let the guest dealing itself with its own
> power efficiency without introducing a specific Xen interface.
> 
>>
>> Exposing varying numbers of big and LITTLE vcpus to guests seems like
>> a sensible approach.  But at the moment cpupools only allow a domain
>> to be in exactly one pool -- meaning if we use cpupools to control the
>> big.LITTLE placement, you won't be *able* to have guests with both big
>> and LITTLE vcpus.
>>
>>>> If need to create all the pools, need to decided how many pools need
>>>> to be
>>>> created.
>>>> I thought about this, but I do not come out a good idea.
>>>>
>>>> The cpupool0 is defined in xen/common/cpupool.c, if need to create many
>>>> pools,
>>>> need to alloc cpupools dynamically when booting. I would not like to
>>>> change a
>>>> lot to common code.
>>>
>>>
>>> Why? We should avoid to choose a specific design just because the common
>>> code does not allow you to do it without heavy change.
>>>
>>> We never came across the big.LITTLE problem on x86, so it is normal to
>>> modify the code.
>>
>> Julien is correct; there's no reason we couldn't have a default
>> multiple pools on boot.
>>
>>>> The implementation in this patchset I think is an easy way to let
>>>> Big and
>>>> Little
>>>> CPUs all run.
>>>
>>>
>>> I care about having a design allowing an easy use of big.LITTLE on
>>> Xen. Your
>>> solution requires the administrator to know the underlying platform and
>>> create the pool.
>>>
>>> In the solution I suggested, the pools would be created by Xen (and
>>> the info
>>> exposed to the userspace for the admin).
>>
>> FWIW another approach could be the one taken by "xl
>> cpupool-numa-split": you could have "xl cpupool-bigLITTLE-split" or
>> something that would automatically set up the pools.
>>
>> But expanding the schedulers to know about different classes of cpus,
>> and having vcpus specified as running only on specific types of pcpus,
>> seems like a more flexible approach.
> 
> So, if I understand correctly, you would not recommend to extend the
> number of CPU pool per domain, correct?

Before deciding in which direction to go (multiple cpupools, sub-pools,
kind of implicit cpu pinning) I think we should think about the
implications regarding today's interfaces:

- Do we want to be able to use different schedulers for big/little
  (this would mean some cpupool related solution)? I'd prefer to
  have only one scheduler type for each domain. :-)

- What about scheduling parameters like weight and cap? How would
  those apply (answer probably influencing pinning solution).
  Remember that especially the downsides of pinning led to the
  introduction of cpupools.

- Is big.LITTLE to be expected to be combined with NUMA?

- Do we need to support live migration for domains containing both
  types of cpus?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19 10:06         ` Julien Grall
  2016-09-19 10:23           ` Juergen Gross
@ 2016-09-19 10:33           ` George Dunlap
  2016-09-19 13:33             ` Peng Fan
  2016-09-19 16:43             ` Dario Faggioli
  1 sibling, 2 replies; 85+ messages in thread
From: George Dunlap @ 2016-09-19 10:33 UTC (permalink / raw)
  To: Julien Grall, George Dunlap
  Cc: Jürgen Groß,
	Peng Fan, Stefano Stabellini, Andrew Cooper, Dario Faggioli,
	xen-devel, Jan Beulich, Peng Fan

On 19/09/16 11:06, Julien Grall wrote:
> Hi George,
> 
> On 19/09/2016 11:45, George Dunlap wrote:
>> On Mon, Sep 19, 2016 at 9:53 AM, Julien Grall <julien.grall@arm.com>
>> wrote:
>>>>> As mentioned in the mail you pointed above, this series is not
>>>>> enough to
>>>>> make
>>>>> big.LITTLE working on then. Xen is always using the boot CPU to detect
>>>>> the
>>>>> list of features. With big.LITTLE features may not be the same.
>>>>>
>>>>> And I would prefer to see Xen supporting big.LITTLE correctly before
>>>>> beginning to think to expose big.LITTLE to the userspace (via cpupool)
>>>>> automatically.
>>>>
>>>>
>>>> Do you mean vcpus be scheduled between big and little cpus freely?
>>>
>>>
>>> By supporting big.LITTLE correctly I meant Xen thinks that all the
>>> cores has
>>> the same set of features. So the feature detection is only done the boot
>>> CPU. See processor_setup for instance...
>>>
>>> Moving vCPUs between big and little cores would be a hard task (cache
>>> line
>>> issue, and possibly feature) and I don't expect to ever cross this in
>>> Xen.
>>> However, I am expecting to see big.LITTLE exposed to the guest (i.e
>>> having
>>> big and little vCPUs).
>>
>> So it sounds like the big and LITTLE cores are architecturally
>> different enough that software must be aware of which one it's running
>> on?
> 
> That's correct. Each big and LITTLE cores may have different errata,
> different features...
> 
> It has also the advantage to let the guest dealing itself with its own
> power efficiency without introducing a specific Xen interface.

Well in theory there would be advantages either way -- either to
allowing Xen to automatically add power-saving "smarts" to guests which
weren't programmed for them, or to exposing the power-saving abilities
to guests which were.  But it sounds like automatically migrating
between them isn't really an option (or would be a lot more trouble than
it's worth).

>>> I care about having a design allowing an easy use of big.LITTLE on
>>> Xen. Your
>>> solution requires the administrator to know the underlying platform and
>>> create the pool.
>>>
>>> In the solution I suggested, the pools would be created by Xen (and
>>> the info
>>> exposed to the userspace for the admin).
>>
>> FWIW another approach could be the one taken by "xl
>> cpupool-numa-split": you could have "xl cpupool-bigLITTLE-split" or
>> something that would automatically set up the pools.
>>
>> But expanding the schedulers to know about different classes of cpus,
>> and having vcpus specified as running only on specific types of pcpus,
>> seems like a more flexible approach.
> 
> So, if I understand correctly, you would not recommend to extend the
> number of CPU pool per domain, correct?

Well imagine trying to set the scheduling parameters, such as weight,
which in the past have been per-domain.  Now you have to specify
parameters for a domain in each of the cpupools that its' in.

No, I think it would be a lot simpler to just teach the scheduler about
different classes of cpus.  credit1 would probably need to be modified
so that its credit algorithm would be per-class rather than pool-wide;
but credit2 shouldn't need much modification at all, other than to make
sure that a given runqueue doesn't include more than one class; and to
do load-balancing only with runqueues of the same class.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19  8:53     ` Julien Grall
  2016-09-19  9:38       ` Peng Fan
  2016-09-19  9:45       ` George Dunlap
@ 2016-09-19 13:08       ` Peng Fan
  2 siblings, 0 replies; 85+ messages in thread
From: Peng Fan @ 2016-09-19 13:08 UTC (permalink / raw)
  To: Julien Grall
  Cc: jgross, Peng Fan, sstabellini, George Dunlap, andrew.cooper3,
	dario.faggioli, xen-devel, jbeulich

Hello Julien,
On Mon, Sep 19, 2016 at 10:53:56AM +0200, Julien Grall wrote:
>Hello,
>
>On 19/09/2016 10:36, Peng Fan wrote:
>>On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:
>>>Hello Peng,
>>>
>>>On 19/09/2016 04:08, van.freenix@gmail.com wrote:
>>>>From: Peng Fan <peng.fan@nxp.com>
>>>>
>>>>This patchset is to support XEN run on big.little SoC.
>>>>The idea of the patch is from
>>>>"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html"
>>>>
>>>>There are some changes to cpupool and add x86 stub functions to avoid build
>>>>break. Sending The RFC patchset out is to request for comments to see whether
>>>>this implementation is acceptable or not. Patchset have been tested based on
>>>>xen-4.8 unstable on NXP i.MX8.
>>>>
>>>>I use Big/Little CPU and cpupool to explain the idea.
>>>>A pool contains Big CPUs is called Big Pool.
>>>>A pool contains Little CPUs is called Little Pool.
>>>>If a pool does not contains any physical cpus, Little CPUs or Big CPUs
>>>>can be added to the cpupool. But the cpupool can not contain both Little
>>>>and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for ARM).
>>>>CPUs can not be added to the cpupool which contains cpus that have different cpu type.
>>>>Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
>>>>and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
>>>>When XEN tries to bringup all the CPUs, only add CPUs with the same cpu type(same midr value)
>>>>into cpupool0.
>>>
>>>As mentioned in the mail you pointed above, this series is not enough to make
>>>big.LITTLE working on then. Xen is always using the boot CPU to detect the
>>>list of features. With big.LITTLE features may not be the same.
>>>
>>>And I would prefer to see Xen supporting big.LITTLE correctly before
>>>beginning to think to expose big.LITTLE to the userspace (via cpupool)
>>>automatically.
>>
>>Do you mean vcpus be scheduled between big and little cpus freely?
>
>By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
>the same set of features. So the feature detection is only done the boot CPU.
>See processor_setup for instance...
>
>Moving vCPUs between big and little cores would be a hard task (cache line
>issue, and possibly feature) and I don't expect to ever cross this in Xen.
>However, I am expecting to see big.LITTLE exposed to the guest (i.e having
>big and little vCPUs).
>
>>
>>This patchset is to use cpupool to block the vcpu be scheduled between big and
>>little cpus.
>>
>>>
>>>See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).

Back to this.
In xen/arch/arm/traps.c, I found that
"
WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
             HCR_TWE|HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB,
             HCR_EL2);
"

HCR_TACR, HCR_TIDx is not set. HCR_TIDCP is set, but this is used to trap
implementation defined registers.

So accessing the actlr and cpu feature registers(exclude the implementation defined ones)
in guest os will not trap to xen, right?
If this is true, the actlr and cpu feature registers for DomU in Pool-A72 in my case
should be correct.

Thanks,
Peng.

>>
>>Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
>>the A53 and A72 TRM about AArch64 implementationd defined registers.
>>This actlr can be added to the cpupool_arch_info as midr.
>>
>>Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
>>Please advise if I missed anything else.
>
>Have you check the register emulation?


>
>>
>>>
>>>>
>>>>Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
>>>>that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
>>>>cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
>>>>`xl cpupool-list -c` will show cpu[0-3] in Pool-0.
>>>>
>>>>Then use the following script to create a new cpupool and add cpu[4-5] to
>>>>the cpupool.
>>>>#xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
>>>>#xl cpupool-cpu-add Pool-A72 4
>>>>#xl cpupool-cpu-add Pool-A72 5
>>>>#xl create -d /root/xen/domu-test pool=\"Pool-A72\"
>>>
>>>I am a bit confused with these runes. It means that only the first kind of
>>>CPUs have pool assigned. Why don't you directly create all the pools at boot
>>>time?
>>
>>If need to create all the pools, need to decided how many pools need to be created.
>>I thought about this, but I do not come out a good idea.
>>
>>The cpupool0 is defined in xen/common/cpupool.c, if need to create many pools,
>>need to alloc cpupools dynamically when booting. I would not like to change a
>>lot to common code.
>
>Why? We should avoid to choose a specific design just because the common code
>does not allow you to do it without heavy change.
>
>We never came across the big.LITTLE problem on x86, so it is normal to modify
>the code.
>
>>The implementation in this patchset I think is an easy way to let Big and Little
>>CPUs all run.
>
>I care about having a design allowing an easy use of big.LITTLE on Xen. Your
>solution requires the administrator to know the underlying platform and
>create the pool.
>
>In the solution I suggested, the pools would be created by Xen (and the info
>exposed to the userspace for the admin).
>
>>
>>>
>>>Also, in which pool a domain will be created if none is specified?
>>>
>>>>Now `xl cpupool-list -c` shows:
>>>>Name            CPU list
>>>>Pool-0          0,1,2,3
>>>>Pool-A72        4,5
>>>>
>>>>`xl cpupool-list` shows:
>>>>Name               CPUs   Sched     Active   Domain count
>>>>Pool-0               4    credit       y          1
>>>>Pool-A72             2   credit2       y          1
>>>>
>>>>`xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
>>>>not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.
>>>>
>>>>`xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
>>>>in Pool-A72 with A72 vcpu, while Pool-0 have A53 physical cpus.
>>>>
>>>>Patch 1/5:
>>>>use "cpumask_weight(cpupool0->cpu_valid);" to replace "num_online_cpus()",
>>>>because num_online_cpus() counts all the online CPUs, but now we only
>>>>need Big or Little CPUs.
>>>
>>>So if I understand correctly, if the boot CPU is a little CPU, DOM0 will
>>>always be able to only use little ones. Is that right?
>>
>>Yeah. Dom0 only use the little ones.
>
>This is really bad, dom0 on normal case will have all the backends. It may
>not be possible to select the boot CPU, and therefore always get a little
>CPU.
>
>Creating the pool at boot time would have avoid a such issue because, unless
>we expose big.LITTLE to dom0 (I would need the input of George and Dario for
>this bits), we could have a parameter to specify which set of CPUs (e.g pool)
>to allocate dom0 vCPUs.
>
>Note, that I am not asking you to implement everything. But I think we need a
>coherent view of big.LITTLE support in Xen today to go forward.
>
>Regards,
>
>-- 
>Julien Grall

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19  9:59         ` Julien Grall
@ 2016-09-19 13:15           ` Peng Fan
  2016-09-19 20:56             ` Stefano Stabellini
  0 siblings, 1 reply; 85+ messages in thread
From: Peng Fan @ 2016-09-19 13:15 UTC (permalink / raw)
  To: Julien Grall
  Cc: jgross, Peng Fan, sstabellini, George Dunlap, andrew.cooper3,
	dario.faggioli, xen-devel, jbeulich

On Mon, Sep 19, 2016 at 11:59:05AM +0200, Julien Grall wrote:
>
>
>On 19/09/2016 11:38, Peng Fan wrote:
>>On Mon, Sep 19, 2016 at 10:53:56AM +0200, Julien Grall wrote:
>>>Hello,
>>>
>>>On 19/09/2016 10:36, Peng Fan wrote:
>>>>On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:
>>>>>Hello Peng,
>>>>>
>>>>>On 19/09/2016 04:08, van.freenix@gmail.com wrote:
>>>>>>From: Peng Fan <peng.fan@nxp.com>
>>>>>>
>>>>>>This patchset is to support XEN run on big.little SoC.
>>>>>>The idea of the patch is from
>>>>>>"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html"
>>>>>>
>>>>>>There are some changes to cpupool and add x86 stub functions to avoid build
>>>>>>break. Sending The RFC patchset out is to request for comments to see whether
>>>>>>this implementation is acceptable or not. Patchset have been tested based on
>>>>>>xen-4.8 unstable on NXP i.MX8.
>>>>>>
>>>>>>I use Big/Little CPU and cpupool to explain the idea.
>>>>>>A pool contains Big CPUs is called Big Pool.
>>>>>>A pool contains Little CPUs is called Little Pool.
>>>>>>If a pool does not contains any physical cpus, Little CPUs or Big CPUs
>>>>>>can be added to the cpupool. But the cpupool can not contain both Little
>>>>>>and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for ARM).
>>>>>>CPUs can not be added to the cpupool which contains cpus that have different cpu type.
>>>>>>Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
>>>>>>and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
>>>>>>When XEN tries to bringup all the CPUs, only add CPUs with the same cpu type(same midr value)
>>>>>>into cpupool0.
>>>>>
>>>>>As mentioned in the mail you pointed above, this series is not enough to make
>>>>>big.LITTLE working on then. Xen is always using the boot CPU to detect the
>>>>>list of features. With big.LITTLE features may not be the same.
>>>>>
>>>>>And I would prefer to see Xen supporting big.LITTLE correctly before
>>>>>beginning to think to expose big.LITTLE to the userspace (via cpupool)
>>>>>automatically.
>>>>
>>>>Do you mean vcpus be scheduled between big and little cpus freely?
>>>
>>>By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
>>>the same set of features. So the feature detection is only done the boot CPU.
>>>See processor_setup for instance...
>>>
>>>Moving vCPUs between big and little cores would be a hard task (cache line
>>>issue, and possibly feature) and I don't expect to ever cross this in Xen.
>>>However, I am expecting to see big.LITTLE exposed to the guest (i.e having
>>>big and little vCPUs).
>>
>>big vCPUs scheduled on big Physical CPUs and little vCPUs scheduled on little
>>physical cpus, right?
>>If it is, is there is a need to let Xen think all the cores has the same set
>>of features?
>
>I think you missed my point. The feature registers on big and little cores
>may be different. Currently, Xen is reading the feature registers of the CPU
>boot and wrongly assumes that those features will exists on all CPUs. This is
>not the case and should be fixed before we are getting in trouble.
>
>>
>>Developing big.little guest support, I am not sure how much efforts needed.
>>Is this really needed?
>
>This is not necessary at the moment, although I have seen some interest about
>it. Running a guest only on a little core is a nice beginning, but a guest
>may want to take advantage of big.LITTLE (running hungry app on big one and
>little on small one).
>
>>
>>>
>>>>
>>>>This patchset is to use cpupool to block the vcpu be scheduled between big and
>>>>little cpus.
>>>>
>>>>>
>>>>>See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).
>>>>
>>>>Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
>>>>the A53 and A72 TRM about AArch64 implementationd defined registers.
>>>>This actlr can be added to the cpupool_arch_info as midr.
>>>>
>>>>Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
>>>>Please advise if I missed anything else.
>>>
>>>Have you check the register emulation?
>>
>>Checked midr. Have not checked others.
>>I think I missed some registers in ctxt_switch_to.
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
>>>>>>that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
>>>>>>cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
>>>>>>`xl cpupool-list -c` will show cpu[0-3] in Pool-0.
>>>>>>
>>>>>>Then use the following script to create a new cpupool and add cpu[4-5] to
>>>>>>the cpupool.
>>>>>>#xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
>>>>>>#xl cpupool-cpu-add Pool-A72 4
>>>>>>#xl cpupool-cpu-add Pool-A72 5
>>>>>>#xl create -d /root/xen/domu-test pool=\"Pool-A72\"
>>>>>
>>>>>I am a bit confused with these runes. It means that only the first kind of
>>>>>CPUs have pool assigned. Why don't you directly create all the pools at boot
>>>>>time?
>>>>
>>>>If need to create all the pools, need to decided how many pools need to be created.
>>>>I thought about this, but I do not come out a good idea.
>>>>
>>>>The cpupool0 is defined in xen/common/cpupool.c, if need to create many pools,
>>>>need to alloc cpupools dynamically when booting. I would not like to change a
>>>>lot to common code.
>>>
>>>Why? We should avoid to choose a specific design just because the common code
>>>does not allow you to do it without heavy change.
>>>
>>>We never came across the big.LITTLE problem on x86, so it is normal to modify
>>>the code.
>>>
>>>>The implementation in this patchset I think is an easy way to let Big and Little
>>>>CPUs all run.
>>>
>>>I care about having a design allowing an easy use of big.LITTLE on Xen. Your
>>>solution requires the administrator to know the underlying platform and
>>>create the pool.
>>
>>I suppose big.little is mainly used in embedded SoC :). So the user(developer?)
>>needs to know the hardware platform.
>
>The user will always be happy if Xen can save him a bit of time to create
>cpupool. ;)
>
>>
>>>
>>>In the solution I suggested, the pools would be created by Xen (and the info
>>>exposed to the userspace for the admin).
>>
>>I think the reason to create cpupools to support big.little SoC is to
>>avoid vcpus scheduled between big and little physical cpus.
>>
>>If need to support big.little guest, I think no need to create more
>>cpupools expect cpupoo0. Need to make sure vcpus not be scheduled between
>>big and little physical cpus. All the cpus needs to be in one cpupool.
>>
>>>
>>>>
>>>>>
>>>>>Also, in which pool a domain will be created if none is specified?
>>>>>
>>>>>>Now `xl cpupool-list -c` shows:
>>>>>>Name            CPU list
>>>>>>Pool-0          0,1,2,3
>>>>>>Pool-A72        4,5
>>>>>>
>>>>>>`xl cpupool-list` shows:
>>>>>>Name               CPUs   Sched     Active   Domain count
>>>>>>Pool-0               4    credit       y          1
>>>>>>Pool-A72             2   credit2       y          1
>>>>>>
>>>>>>`xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
>>>>>>not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.
>>>>>>
>>>>>>`xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
>>>>>>in Pool-A72 with A72 vcpu, while Pool-0 have A53 physical cpus.
>>>>>>
>>>>>>Patch 1/5:
>>>>>>use "cpumask_weight(cpupool0->cpu_valid);" to replace "num_online_cpus()",
>>>>>>because num_online_cpus() counts all the online CPUs, but now we only
>>>>>>need Big or Little CPUs.
>>>>>
>>>>>So if I understand correctly, if the boot CPU is a little CPU, DOM0 will
>>>>>always be able to only use little ones. Is that right?
>>>>
>>>>Yeah. Dom0 only use the little ones.
>>>
>>>This is really bad, dom0 on normal case will have all the backends. It may
>>>not be possible to select the boot CPU, and therefore always get a little
>>>CPU.
>>
>>Dom0 runs in cpupool0. cpupool0 only contains the cpu[0-3] in my case.
>
>So the performance of dom0 will be impacted because it will only use little
>cores.
>
>>
>>>
>>>Creating the pool at boot time would have avoid a such issue because, unless
>>>we expose big.LITTLE to dom0 (I would need the input of George and Dario for
>>>this bits), we could have a parameter to specify which set of CPUs (e.g pool)
>>>to allocate dom0 vCPUs.
>>
>>dom0 is control domain, I think no need to expose big.little for dom0.
>>Pin VCPU to specific physical cpus, this may help support big.little guest.
>>
>>>
>>>Note, that I am not asking you to implement everything. But I think we need a
>>>coherent view of big.LITTLE support in Xen today to go forward.
>>
>>Yeah. Then you prefer supporting big.little guest?
>
>I have seen some interest on it.

I have no idea about the use cases. Anyway big.little guest do have benefits.

>
>>Please advise if you have any plan/ideas or what I can do on this.
>
>I already gave some ideas on what could be done for big.LITTLE support. But,
>I admit I haven't yet much think about it, so I may miss some part.

This comes to me a question about Dom0. Do we need to let Dom0 be big.little? or
add a bootargs to indicate big.little, big or little for Dom0?

Regards,
Peng.
>
>Regards,
>
>-- 
>Julien Grall

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19 10:33           ` George Dunlap
@ 2016-09-19 13:33             ` Peng Fan
  2016-09-20  0:11               ` Dario Faggioli
  2016-09-19 16:43             ` Dario Faggioli
  1 sibling, 1 reply; 85+ messages in thread
From: Peng Fan @ 2016-09-19 13:33 UTC (permalink / raw)
  To: George Dunlap
  Cc: J??rgen Gro??,
	Peng Fan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Julien Grall, Jan Beulich

On Mon, Sep 19, 2016 at 11:33:58AM +0100, George Dunlap wrote:
>On 19/09/16 11:06, Julien Grall wrote:
>> Hi George,
>> 
>> On 19/09/2016 11:45, George Dunlap wrote:
>>> On Mon, Sep 19, 2016 at 9:53 AM, Julien Grall <julien.grall@arm.com>
>>> wrote:
>>>>>> As mentioned in the mail you pointed above, this series is not
>>>>>> enough to
>>>>>> make
>>>>>> big.LITTLE working on then. Xen is always using the boot CPU to detect
>>>>>> the
>>>>>> list of features. With big.LITTLE features may not be the same.
>>>>>>
>>>>>> And I would prefer to see Xen supporting big.LITTLE correctly before
>>>>>> beginning to think to expose big.LITTLE to the userspace (via cpupool)
>>>>>> automatically.
>>>>>
>>>>>
>>>>> Do you mean vcpus be scheduled between big and little cpus freely?
>>>>
>>>>
>>>> By supporting big.LITTLE correctly I meant Xen thinks that all the
>>>> cores has
>>>> the same set of features. So the feature detection is only done the boot
>>>> CPU. See processor_setup for instance...
>>>>
>>>> Moving vCPUs between big and little cores would be a hard task (cache
>>>> line
>>>> issue, and possibly feature) and I don't expect to ever cross this in
>>>> Xen.
>>>> However, I am expecting to see big.LITTLE exposed to the guest (i.e
>>>> having
>>>> big and little vCPUs).
>>>
>>> So it sounds like the big and LITTLE cores are architecturally
>>> different enough that software must be aware of which one it's running
>>> on?
>> 
>> That's correct. Each big and LITTLE cores may have different errata,
>> different features...
>> 
>> It has also the advantage to let the guest dealing itself with its own
>> power efficiency without introducing a specific Xen interface.
>
>Well in theory there would be advantages either way -- either to
>allowing Xen to automatically add power-saving "smarts" to guests which
>weren't programmed for them, or to exposing the power-saving abilities
>to guests which were.  But it sounds like automatically migrating
>between them isn't really an option (or would be a lot more trouble than
>it's worth).
>
>>>> I care about having a design allowing an easy use of big.LITTLE on
>>>> Xen. Your
>>>> solution requires the administrator to know the underlying platform and
>>>> create the pool.
>>>>
>>>> In the solution I suggested, the pools would be created by Xen (and
>>>> the info
>>>> exposed to the userspace for the admin).
>>>
>>> FWIW another approach could be the one taken by "xl
>>> cpupool-numa-split": you could have "xl cpupool-bigLITTLE-split" or
>>> something that would automatically set up the pools.
>>>
>>> But expanding the schedulers to know about different classes of cpus,
>>> and having vcpus specified as running only on specific types of pcpus,
>>> seems like a more flexible approach.
>> 
>> So, if I understand correctly, you would not recommend to extend the
>> number of CPU pool per domain, correct?
>
>Well imagine trying to set the scheduling parameters, such as weight,
>which in the past have been per-domain.  Now you have to specify
>parameters for a domain in each of the cpupools that its' in.
>
>No, I think it would be a lot simpler to just teach the scheduler about
>different classes of cpus.  credit1 would probably need to be modified
>so that its credit algorithm would be per-class rather than pool-wide;
>but credit2 shouldn't need much modification at all, other than to make
>sure that a given runqueue doesn't include more than one class; and to
>do load-balancing only with runqueues of the same class.

I try to follow.
 - scheduler needs to be aware of different classes of cpus. ARM big.Little cpus.
 - scheduler schedules vcpus on different physical cpus in one cpupool.
 - different cpu classes needs to be in different runqueue.

Then for implementation.
 - When create a guest, specific physical cpus that the guest will be run on.
 - If the physical cpus are different cpus, indicate the guest would like to be a big.little guest.
   And have big vcpus and little vcpus.
 - If no physical cpus specificed, then the guest may runs on big cpus or on little cpus. But not both.
   How to decide runs on big or little physical cpus?
 - For Dom0, I am still not sure,default big.little or else?

If use scheduler to handle the different classes cpu, we do not need to use cpupool
to block vcpus be scheduled onto different physical cpus. And using scheudler to handle this
gives an opportunity to support big.little guest.

Thanks,
Peng.

>
> -George

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19 10:33           ` George Dunlap
  2016-09-19 13:33             ` Peng Fan
@ 2016-09-19 16:43             ` Dario Faggioli
  1 sibling, 0 replies; 85+ messages in thread
From: Dario Faggioli @ 2016-09-19 16:43 UTC (permalink / raw)
  To: George Dunlap, Julien Grall, George Dunlap
  Cc: Jürgen Groß,
	Peng Fan, Stefano Stabellini, Andrew Cooper, xen-devel,
	Jan Beulich, Peng Fan


[-- Attachment #1.1: Type: text/plain, Size: 3116 bytes --]

On Mon, 2016-09-19 at 11:33 +0100, George Dunlap wrote:
> On 19/09/16 11:06, Julien Grall wrote:
> > So, if I understand correctly, you would not recommend to extend
> > the
> > number of CPU pool per domain, correct?
> 
> Well imagine trying to set the scheduling parameters, such as weight,
> which in the past have been per-domain.  Now you have to specify
> parameters for a domain in each of the cpupools that its' in.
> 
True, and not really convenient indeed. I think we can think of a way
to shape the interface in such a way that it's not too bad to use
(provide sane defaults/default behavior, etc), but this should be
definitely kept in mind.

In general, I agree with Juergen that, before implementing anything, we
must come up with a design, bearing in mind both behavior and
interface.

(I'll reply in some more details directly to Juergen's email.)

> No, I think it would be a lot simpler to just teach the scheduler
> aboutIf we want to support heterogeneous CPUs, some like this is
> absolutely necessary. In fact, either we set (and enforce) very
> strict rules on cpupools and pinning, or we'd end up scheduling stuff
> built for arch A on a processor of arch B! :-O
> different classes of cpus.  credit1 would probably need to be
> modified
> so that its credit algorithm would be per-class rather than pool-
> wide;
> but credit2 shouldn't need much modification at all, other than to
> make
> sure that a given runqueue doesn't include more than one class; and
> to
> do load-balancing only with runqueues of the same class.
> 
If we want to support heterogeneous CPUs, some like this is absolutely
necessary. In fact, either we set (and enforce) very strict rules on
cpupools and pinning, or we'd end up scheduling stuff built for arch A
on a processor of arch B! :-O

The "strict limits" approach may be an option --and this patch is a
first example of it-- but it's easy to see that it's very inflexible
(cpus can't move between pools, domains can't be migrated, etc). On the
other hand, as soon as we "relax" the constraints a little bit, we
absolutely need to modify the scheduler code to avoid bad things to
happen.

As George is saying, both Credit1 and Credit2 needs to be modified in
order to make sure that a vcpu that is meant to run on a big cpu is not
picked up for being executed by a LITTLE cpu. This has to do with
tweaking the load balancing code in both of them (e.g., in Credit1, a
LITTLE cpu must not steal work from a big cpu). Whether or not it will
also be required to change the Credit-ing algorithm, it will have to be
seen. The effect would be similar to some sort of pinning, which indeed
does not play well with Credit1 accounting logic... but we can probably
see about this along the way (or just focus only con Credit2! :-P)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19 10:23           ` Juergen Gross
@ 2016-09-19 17:18             ` Dario Faggioli
  2016-09-19 21:03               ` Stefano Stabellini
  2016-09-19 20:55             ` Stefano Stabellini
  1 sibling, 1 reply; 85+ messages in thread
From: Dario Faggioli @ 2016-09-19 17:18 UTC (permalink / raw)
  To: Juergen Gross, Julien Grall, George Dunlap
  Cc: Peng Fan, Stefano Stabellini, Andrew Cooper, xen-devel,
	Jan Beulich, Peng Fan


[-- Attachment #1.1: Type: text/plain, Size: 5025 bytes --]

On Mon, 2016-09-19 at 12:23 +0200, Juergen Gross wrote:
> On 19/09/16 12:06, Julien Grall wrote:
> > On 19/09/2016 11:45, George Dunlap wrote:
> > > But expanding the schedulers to know about different classes of
> > > cpus,
> > > and having vcpus specified as running only on specific types of
> > > pcpus,
> > > seems like a more flexible approach.
> > 
> > So, if I understand correctly, you would not recommend to extend
> > the
> > number of CPU pool per domain, correct?
> 
> Before deciding in which direction to go (multiple cpupools, sub-
> pools,
> kind of implicit cpu pinning) 
>
You mention "implicit pinning" here, and I'd like to stress this,
because basically no one (else) in the conversation seem to have
considered it. In fact, it may not necessarily be the best long term
solution, but doing something based on pinning is, IMO, a very
convenient first step (and may well become one of the 'modes' available
to the user for taking advantage of big.LITTLE.

So, if cpus 0-3 are big and cpus 4,5 are LITTLE, we can:
 - for domain X, which wants to run only on big cores, pin all it's
   vcpus to pcpus 0-3
 - for domain Y, which wants to run only on LITTLE cores, pin all it's
   vcpus to pcpus 4,5
 - for domain Z, which wants its vcpus 0,1 to run on big cores, and
   it's vcpus 2,3 to run on LITTLE cores, pin vcpus 0,1 to pcpus 0-3, 
   and pin vcpus 2,3 to pcpus 4,5

Setting thing up like this, even automatically, either in hypervisor or
toolstack, is basically already possible (with all the good and bad
aspects of pinning, of course).

Then, sure (as I said when replying to George), we may want things to
be more flexible, and we also probably want to be on the safe side --if 
ever some components manages to undo our automatic pinning-- wrt the
scheduler not picking up work for the wrong architecture... But still
I'm a bit surprised this did not came up... Julien, Peng, is that
because you think this is not doable for any reason I'm missing?

> I think we should think about the
> implications regarding today's interfaces:
> 
I totally agree. (At least) These three things should be very clear,
before starting to implement anything:
 - what is the behavior that we want to achieve, from the point of 
   view of both the hypervisor and the guests
 - what will be the interface
 - how this new interface will map and will interact with existing 
   interfaces

> - Do we want to be able to use different schedulers for big/little
>   (this would mean some cpupool related solution)? I'd prefer to
>   have only one scheduler type for each domain. :-)
> 
Well, this, actually is, IMO, from a behavioral perspective, a nice
point in favour of supporting a split-cpupool solution. In fact, I
think I can envision scenario and reasons for having different
schedulers between big cpus and LITTLE cpus (or same scheduler with
different parameters).

But then, yes, if we then want a domain to have both big and LITTLE
cpus, we'd need to allow a domain to live in more than one cpupool at a
time, which means a domain will have multiple schedulers.

I don't think this is impossible... almost all the scheduling happens
at the vcpu level already. The biggest challenge is probably the
interface. _HOWEVER_, I think this is something that can well come
later, like in phase 2 or 3, as an enhancement/possibility, instead
than be the foundation of big.LITTLE support in Xen.

> - What about scheduling parameters like weight and cap? How would
>   those apply (answer probably influencing pinning solution).
>   Remember that especially the downsides of pinning led to the
>   introduction of cpupools.
> 
Very important bit indeed. FWIW, there's already a scheduler that
supports per-vcpu parameters (so some glue code, or code from which to
take inspiration) is there already. And scheduling happens at the vcpu
level anyway. I.e., it would not be to hard to make it possible to pass
down to Xen, say, per-vcpu weights. Then, at, e.g., xl level, you
specify a set of parameters for big cpus, and another set for LITTLE
cpus, and either xl itself or libxl will do the mapping and prepare the
per-vcpu values.

Again, this is just to say that the "cpupool way" does not look too
impossible, and may be interesting. However, although I'd like to think
more (and see more thoughts) about designs and possibilities, I still
continue to think it should not be neither the only nor the first mode
that we will implement.

> - Is big.LITTLE to be expected to be combined with NUMA?
> 
> - Do we need to support live migration for domains containing both
>   types of cpus?
> 
Interesting points too.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19 10:23           ` Juergen Gross
  2016-09-19 17:18             ` Dario Faggioli
@ 2016-09-19 20:55             ` Stefano Stabellini
  1 sibling, 0 replies; 85+ messages in thread
From: Stefano Stabellini @ 2016-09-19 20:55 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Peng Fan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Julien Grall, Jan Beulich, Peng Fan

On Mon, 19 Sep 2016, Juergen Gross wrote:
> On 19/09/16 12:06, Julien Grall wrote:
> > Hi George,
> > 
> > On 19/09/2016 11:45, George Dunlap wrote:
> >> On Mon, Sep 19, 2016 at 9:53 AM, Julien Grall <julien.grall@arm.com>
> >> wrote:
> >>>>> As mentioned in the mail you pointed above, this series is not
> >>>>> enough to
> >>>>> make
> >>>>> big.LITTLE working on then. Xen is always using the boot CPU to detect
> >>>>> the
> >>>>> list of features. With big.LITTLE features may not be the same.
> >>>>>
> >>>>> And I would prefer to see Xen supporting big.LITTLE correctly before
> >>>>> beginning to think to expose big.LITTLE to the userspace (via cpupool)
> >>>>> automatically.
> >>>>
> >>>>
> >>>> Do you mean vcpus be scheduled between big and little cpus freely?
> >>>
> >>>
> >>> By supporting big.LITTLE correctly I meant Xen thinks that all the
> >>> cores has
> >>> the same set of features. So the feature detection is only done the boot
> >>> CPU. See processor_setup for instance...
> >>>
> >>> Moving vCPUs between big and little cores would be a hard task (cache
> >>> line
> >>> issue, and possibly feature) and I don't expect to ever cross this in
> >>> Xen.
> >>> However, I am expecting to see big.LITTLE exposed to the guest (i.e
> >>> having
> >>> big and little vCPUs).
> >>
> >> So it sounds like the big and LITTLE cores are architecturally
> >> different enough that software must be aware of which one it's running
> >> on?
> > 
> > That's correct. Each big and LITTLE cores may have different errata,
> > different features...
> > 
> > It has also the advantage to let the guest dealing itself with its own
> > power efficiency without introducing a specific Xen interface.
> > 
> >>
> >> Exposing varying numbers of big and LITTLE vcpus to guests seems like
> >> a sensible approach.  But at the moment cpupools only allow a domain
> >> to be in exactly one pool -- meaning if we use cpupools to control the
> >> big.LITTLE placement, you won't be *able* to have guests with both big
> >> and LITTLE vcpus.
> >>
> >>>> If need to create all the pools, need to decided how many pools need
> >>>> to be
> >>>> created.
> >>>> I thought about this, but I do not come out a good idea.
> >>>>
> >>>> The cpupool0 is defined in xen/common/cpupool.c, if need to create many
> >>>> pools,
> >>>> need to alloc cpupools dynamically when booting. I would not like to
> >>>> change a
> >>>> lot to common code.
> >>>
> >>>
> >>> Why? We should avoid to choose a specific design just because the common
> >>> code does not allow you to do it without heavy change.
> >>>
> >>> We never came across the big.LITTLE problem on x86, so it is normal to
> >>> modify the code.
> >>
> >> Julien is correct; there's no reason we couldn't have a default
> >> multiple pools on boot.
> >>
> >>>> The implementation in this patchset I think is an easy way to let
> >>>> Big and
> >>>> Little
> >>>> CPUs all run.
> >>>
> >>>
> >>> I care about having a design allowing an easy use of big.LITTLE on
> >>> Xen. Your
> >>> solution requires the administrator to know the underlying platform and
> >>> create the pool.
> >>>
> >>> In the solution I suggested, the pools would be created by Xen (and
> >>> the info
> >>> exposed to the userspace for the admin).
> >>
> >> FWIW another approach could be the one taken by "xl
> >> cpupool-numa-split": you could have "xl cpupool-bigLITTLE-split" or
> >> something that would automatically set up the pools.
> >>
> >> But expanding the schedulers to know about different classes of cpus,
> >> and having vcpus specified as running only on specific types of pcpus,
> >> seems like a more flexible approach.
> > 
> > So, if I understand correctly, you would not recommend to extend the
> > number of CPU pool per domain, correct?
> 
> Before deciding in which direction to go (multiple cpupools, sub-pools,
> kind of implicit cpu pinning) I think we should think about the
> implications regarding today's interfaces:
> 
> - Do we want to be able to use different schedulers for big/little
>   (this would mean some cpupool related solution)? I'd prefer to
>   have only one scheduler type for each domain. :-)
> 
> - What about scheduling parameters like weight and cap? How would
>   those apply (answer probably influencing pinning solution).
>   Remember that especially the downsides of pinning led to the
>   introduction of cpupools.

It isn't easy to answer these questions, but there might be a reason to
have different schedulers because they are supposed to run different
classes of workloads. big cores are of cpu intensive tasks (think of V8,
the Javascript engine), while LITTLE cores are for low key background
applications (think of the whatsapp daemon that runs in the background
on your phone).


> - Is big.LITTLE to be expected to be combined with NUMA?

NUMA is not a popular way to design hardware in the ARM ecosystem today.
If it did become more widespread, I would expect it to happen on server
hardware, where big.LITTLE is not used.

However if there will ever be NUMA systems combined with big.LITTLE, I
would expect each NUMA node to be big.LITTLE.


> - Do we need to support live migration for domains containing both
>   types of cpus?

I think it will happen. Once we have live-migration on ARM and support
for bit.LITTLE guests, then it is only natural to support live-migration
for them too.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19 13:15           ` Peng Fan
@ 2016-09-19 20:56             ` Stefano Stabellini
  0 siblings, 0 replies; 85+ messages in thread
From: Stefano Stabellini @ 2016-09-19 20:56 UTC (permalink / raw)
  To: Peng Fan
  Cc: jgross, Peng Fan, sstabellini, George Dunlap, andrew.cooper3,
	dario.faggioli, xen-devel, Julien Grall, jbeulich

On Mon, 19 Sep 2016, Peng Fan wrote:
> On Mon, Sep 19, 2016 at 11:59:05AM +0200, Julien Grall wrote:
> >
> >
> >On 19/09/2016 11:38, Peng Fan wrote:
> >>On Mon, Sep 19, 2016 at 10:53:56AM +0200, Julien Grall wrote:
> >>>Hello,
> >>>
> >>>On 19/09/2016 10:36, Peng Fan wrote:
> >>>>On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:
> >>>>>Hello Peng,
> >>>>>
> >>>>>On 19/09/2016 04:08, van.freenix@gmail.com wrote:
> >>>>>>From: Peng Fan <peng.fan@nxp.com>
> >>>>>>
> >>>>>>This patchset is to support XEN run on big.little SoC.
> >>>>>>The idea of the patch is from
> >>>>>>"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html"
> >>>>>>
> >>>>>>There are some changes to cpupool and add x86 stub functions to avoid build
> >>>>>>break. Sending The RFC patchset out is to request for comments to see whether
> >>>>>>this implementation is acceptable or not. Patchset have been tested based on
> >>>>>>xen-4.8 unstable on NXP i.MX8.
> >>>>>>
> >>>>>>I use Big/Little CPU and cpupool to explain the idea.
> >>>>>>A pool contains Big CPUs is called Big Pool.
> >>>>>>A pool contains Little CPUs is called Little Pool.
> >>>>>>If a pool does not contains any physical cpus, Little CPUs or Big CPUs
> >>>>>>can be added to the cpupool. But the cpupool can not contain both Little
> >>>>>>and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for ARM).
> >>>>>>CPUs can not be added to the cpupool which contains cpus that have different cpu type.
> >>>>>>Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
> >>>>>>and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
> >>>>>>When XEN tries to bringup all the CPUs, only add CPUs with the same cpu type(same midr value)
> >>>>>>into cpupool0.
> >>>>>
> >>>>>As mentioned in the mail you pointed above, this series is not enough to make
> >>>>>big.LITTLE working on then. Xen is always using the boot CPU to detect the
> >>>>>list of features. With big.LITTLE features may not be the same.
> >>>>>
> >>>>>And I would prefer to see Xen supporting big.LITTLE correctly before
> >>>>>beginning to think to expose big.LITTLE to the userspace (via cpupool)
> >>>>>automatically.
> >>>>
> >>>>Do you mean vcpus be scheduled between big and little cpus freely?
> >>>
> >>>By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
> >>>the same set of features. So the feature detection is only done the boot CPU.
> >>>See processor_setup for instance...
> >>>
> >>>Moving vCPUs between big and little cores would be a hard task (cache line
> >>>issue, and possibly feature) and I don't expect to ever cross this in Xen.
> >>>However, I am expecting to see big.LITTLE exposed to the guest (i.e having
> >>>big and little vCPUs).
> >>
> >>big vCPUs scheduled on big Physical CPUs and little vCPUs scheduled on little
> >>physical cpus, right?
> >>If it is, is there is a need to let Xen think all the cores has the same set
> >>of features?
> >
> >I think you missed my point. The feature registers on big and little cores
> >may be different. Currently, Xen is reading the feature registers of the CPU
> >boot and wrongly assumes that those features will exists on all CPUs. This is
> >not the case and should be fixed before we are getting in trouble.
> >
> >>
> >>Developing big.little guest support, I am not sure how much efforts needed.
> >>Is this really needed?
> >
> >This is not necessary at the moment, although I have seen some interest about
> >it. Running a guest only on a little core is a nice beginning, but a guest
> >may want to take advantage of big.LITTLE (running hungry app on big one and
> >little on small one).
> >
> >>
> >>>
> >>>>
> >>>>This patchset is to use cpupool to block the vcpu be scheduled between big and
> >>>>little cpus.
> >>>>
> >>>>>
> >>>>>See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).
> >>>>
> >>>>Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
> >>>>the A53 and A72 TRM about AArch64 implementationd defined registers.
> >>>>This actlr can be added to the cpupool_arch_info as midr.
> >>>>
> >>>>Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
> >>>>Please advise if I missed anything else.
> >>>
> >>>Have you check the register emulation?
> >>
> >>Checked midr. Have not checked others.
> >>I think I missed some registers in ctxt_switch_to.
> >>
> >>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>>Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
> >>>>>>that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
> >>>>>>cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
> >>>>>>`xl cpupool-list -c` will show cpu[0-3] in Pool-0.
> >>>>>>
> >>>>>>Then use the following script to create a new cpupool and add cpu[4-5] to
> >>>>>>the cpupool.
> >>>>>>#xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
> >>>>>>#xl cpupool-cpu-add Pool-A72 4
> >>>>>>#xl cpupool-cpu-add Pool-A72 5
> >>>>>>#xl create -d /root/xen/domu-test pool=\"Pool-A72\"
> >>>>>
> >>>>>I am a bit confused with these runes. It means that only the first kind of
> >>>>>CPUs have pool assigned. Why don't you directly create all the pools at boot
> >>>>>time?
> >>>>
> >>>>If need to create all the pools, need to decided how many pools need to be created.
> >>>>I thought about this, but I do not come out a good idea.
> >>>>
> >>>>The cpupool0 is defined in xen/common/cpupool.c, if need to create many pools,
> >>>>need to alloc cpupools dynamically when booting. I would not like to change a
> >>>>lot to common code.
> >>>
> >>>Why? We should avoid to choose a specific design just because the common code
> >>>does not allow you to do it without heavy change.
> >>>
> >>>We never came across the big.LITTLE problem on x86, so it is normal to modify
> >>>the code.
> >>>
> >>>>The implementation in this patchset I think is an easy way to let Big and Little
> >>>>CPUs all run.
> >>>
> >>>I care about having a design allowing an easy use of big.LITTLE on Xen. Your
> >>>solution requires the administrator to know the underlying platform and
> >>>create the pool.
> >>
> >>I suppose big.little is mainly used in embedded SoC :). So the user(developer?)
> >>needs to know the hardware platform.
> >
> >The user will always be happy if Xen can save him a bit of time to create
> >cpupool. ;)
> >
> >>
> >>>
> >>>In the solution I suggested, the pools would be created by Xen (and the info
> >>>exposed to the userspace for the admin).
> >>
> >>I think the reason to create cpupools to support big.little SoC is to
> >>avoid vcpus scheduled between big and little physical cpus.
> >>
> >>If need to support big.little guest, I think no need to create more
> >>cpupools expect cpupoo0. Need to make sure vcpus not be scheduled between
> >>big and little physical cpus. All the cpus needs to be in one cpupool.
> >>
> >>>
> >>>>
> >>>>>
> >>>>>Also, in which pool a domain will be created if none is specified?
> >>>>>
> >>>>>>Now `xl cpupool-list -c` shows:
> >>>>>>Name            CPU list
> >>>>>>Pool-0          0,1,2,3
> >>>>>>Pool-A72        4,5
> >>>>>>
> >>>>>>`xl cpupool-list` shows:
> >>>>>>Name               CPUs   Sched     Active   Domain count
> >>>>>>Pool-0               4    credit       y          1
> >>>>>>Pool-A72             2   credit2       y          1
> >>>>>>
> >>>>>>`xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
> >>>>>>not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.
> >>>>>>
> >>>>>>`xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
> >>>>>>in Pool-A72 with A72 vcpu, while Pool-0 have A53 physical cpus.
> >>>>>>
> >>>>>>Patch 1/5:
> >>>>>>use "cpumask_weight(cpupool0->cpu_valid);" to replace "num_online_cpus()",
> >>>>>>because num_online_cpus() counts all the online CPUs, but now we only
> >>>>>>need Big or Little CPUs.
> >>>>>
> >>>>>So if I understand correctly, if the boot CPU is a little CPU, DOM0 will
> >>>>>always be able to only use little ones. Is that right?
> >>>>
> >>>>Yeah. Dom0 only use the little ones.
> >>>
> >>>This is really bad, dom0 on normal case will have all the backends. It may
> >>>not be possible to select the boot CPU, and therefore always get a little
> >>>CPU.
> >>
> >>Dom0 runs in cpupool0. cpupool0 only contains the cpu[0-3] in my case.
> >
> >So the performance of dom0 will be impacted because it will only use little
> >cores.
> >
> >>
> >>>
> >>>Creating the pool at boot time would have avoid a such issue because, unless
> >>>we expose big.LITTLE to dom0 (I would need the input of George and Dario for
> >>>this bits), we could have a parameter to specify which set of CPUs (e.g pool)
> >>>to allocate dom0 vCPUs.
> >>
> >>dom0 is control domain, I think no need to expose big.little for dom0.
> >>Pin VCPU to specific physical cpus, this may help support big.little guest.
> >>
> >>>
> >>>Note, that I am not asking you to implement everything. But I think we need a
> >>>coherent view of big.LITTLE support in Xen today to go forward.
> >>
> >>Yeah. Then you prefer supporting big.little guest?
> >
> >I have seen some interest on it.
> 
> I have no idea about the use cases. Anyway big.little guest do have benefits.
> 
> >
> >>Please advise if you have any plan/ideas or what I can do on this.
> >
> >I already gave some ideas on what could be done for big.LITTLE support. But,
> >I admit I haven't yet much think about it, so I may miss some part.
> 
> This comes to me a question about Dom0. Do we need to let Dom0 be big.little? or
> add a bootargs to indicate big.little, big or little for Dom0?

These are my two cents: the two cpupools (one big, one LITTLE) need to
be created automatically, either by Xen or by an xl command, such as "xl
cpupool-bigLITTLE-split", as George suggested. If done with an xl
command, it needs to be called automatically at boot somehow, otherwise
we'll leave half of the cpus unused by default. As Xen needs to be able
to recognize big.LITTLE and only start Dom0 on big (or LITTLE) cpus, I
think it might make sense to create both cpupools directly in Xen.

It should be possible to configure to which cpupool Dom0 belongs to, but
I would default to big for performance reasons.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19 17:18             ` Dario Faggioli
@ 2016-09-19 21:03               ` Stefano Stabellini
  2016-09-19 22:55                 ` Dario Faggioli
  0 siblings, 1 reply; 85+ messages in thread
From: Stefano Stabellini @ 2016-09-19 21:03 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, xen-devel, Julien Grall, Jan Beulich, Peng Fan

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2787 bytes --]

On Mon, 19 Sep 2016, Dario Faggioli wrote:
> On Mon, 2016-09-19 at 12:23 +0200, Juergen Gross wrote:
> > On 19/09/16 12:06, Julien Grall wrote:
> > > On 19/09/2016 11:45, George Dunlap wrote:
> > > > But expanding the schedulers to know about different classes of
> > > > cpus,
> > > > and having vcpus specified as running only on specific types of
> > > > pcpus,
> > > > seems like a more flexible approach.
> > > 
> > > So, if I understand correctly, you would not recommend to extend
> > > the
> > > number of CPU pool per domain, correct?
> > 
> > Before deciding in which direction to go (multiple cpupools, sub-
> > pools,
> > kind of implicit cpu pinning) 
> >
> You mention "implicit pinning" here, and I'd like to stress this,
> because basically no one (else) in the conversation seem to have
> considered it. In fact, it may not necessarily be the best long term
> solution, but doing something based on pinning is, IMO, a very
> convenient first step (and may well become one of the 'modes' available
> to the user for taking advantage of big.LITTLE.
> 
> So, if cpus 0-3 are big and cpus 4,5 are LITTLE, we can:
>  - for domain X, which wants to run only on big cores, pin all it's
>    vcpus to pcpus 0-3
>  - for domain Y, which wants to run only on LITTLE cores, pin all it's
>    vcpus to pcpus 4,5
>  - for domain Z, which wants its vcpus 0,1 to run on big cores, and
>    it's vcpus 2,3 to run on LITTLE cores, pin vcpus 0,1 to pcpus 0-3, 
>    and pin vcpus 2,3 to pcpus 4,5
> 
> Setting thing up like this, even automatically, either in hypervisor or
> toolstack, is basically already possible (with all the good and bad
> aspects of pinning, of course).
> 
> Then, sure (as I said when replying to George), we may want things to
> be more flexible, and we also probably want to be on the safe side --if 
> ever some components manages to undo our automatic pinning-- wrt the
> scheduler not picking up work for the wrong architecture... But still
> I'm a bit surprised this did not came up... Julien, Peng, is that
> because you think this is not doable for any reason I'm missing?

Let's suppose that Xen detects big.LITTLE and pins dom0 vcpus to big
automatically. How can the user know that she really needs to be careful
in the way she pins the vcpus of new VMs? Xen would also need to pin
automatically vcpus of new VMs to either big or LITTLE cores, or xl
would have to do it.

The whole process would be more explicit and obvious if we used
cpupools. It would be easier for users to know what it is going on --
they just need to issue an `xl cpupool-list' command and they would see
two clearly named pools (something like big-pool and LITTLE-pool). We
wouldn't have to pin vcpus to cpus automatically in Xen or xl, which
doesn't sound like fun.

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19 21:03               ` Stefano Stabellini
@ 2016-09-19 22:55                 ` Dario Faggioli
  2016-09-20  0:01                   ` Stefano Stabellini
  0 siblings, 1 reply; 85+ messages in thread
From: Dario Faggioli @ 2016-09-19 22:55 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, George Dunlap, Andrew Cooper, xen-devel,
	Julien Grall, Jan Beulich, Peng Fan


[-- Attachment #1.1: Type: text/plain, Size: 4300 bytes --]

On Mon, 2016-09-19 at 14:03 -0700, Stefano Stabellini wrote:
> On Mon, 19 Sep 2016, Dario Faggioli wrote:
> > Setting thing up like this, even automatically, either in
> hypervisor or
> > toolstack, is basically already possible (with all the good and bad
> > aspects of pinning, of course).
> > 
> > Then, sure (as I said when replying to George), we may want things
> to
> > be more flexible, and we also probably want to be on the safe side
> --if 
> > ever some components manages to undo our automatic pinning-- wrt
> the
> > scheduler not picking up work for the wrong architecture... But
> still
> > I'm a bit surprised this did not came up... Julien, Peng, is that
> > because you think this is not doable for any reason I'm missing?
> 
> Let's suppose that Xen detects big.LITTLE and pins dom0 vcpus to big
> automatically. How can the user know that she really needs to be
> careful
> in the way she pins the vcpus of new VMs? Xen would also need to pin
> automatically vcpus of new VMs to either big or LITTLE cores, or xl
> would have to do it.
> 
Actually doing things with what we currently have for pinning is only
something I've brought up as an example, and (potentially) useful for
proof-of-concept, or very early stage level support.

In the long run, when thinking to the scheduler based solution, I see
things happening the other way round: you specify in xl config file
(and with boot parameters, for dom0) how many big and how many LITTLE
vcpus you want, and the scheduler will know that it can only schedule
the big ones on big physical cores, and the LITTLE ones on LITTLE
physical cores.

Note that we're saying 'pinning' (yeah, I know, I did it myself in the
first place :-/), but that would not be an actual 1-to-1 pinning. For
instance, if domain X has 4 big pcpus, say 0,1,2,3,4, and the host has
8 big pcpus, say 8-15, then dXv1, dXv2, dXv3 and dXv4 will only be run
by the scheduler on pcpus 8-15. Any of them, and with migration and
load balancing within the set possible. This is what I'm talking about.

And this would work even if/when there is only one cpupool, or in
general for domains that are in a pool that has both big and LITTLE
pcpus. Furthermore, big.LITTLE support and cpupools will be orthogonal,
just like pinning and cpupools are orthogonal right now. I.e., once we
will have what I described above, nothing prevents us from implementing
per-vcpu cpupool membership, and either create the two (or more!) big
and LITTLE pools, or from mixing things even more, for more complex and
specific use cases. :-)

Actually, with the cpupool solution, if you want a guest (or dom0) to
actually have both big and LITTLE vcpus, you necessarily have to
implement per-vcpu (rather than per-domain, as it is now) cpupool
membership. I said myself it's not impossible, but certainly it's some
work... with the scheduler solution you basically get that for free!

So, basically, if we use cpupools for the basics of big.LITTLE support,
there's no way out of it (apart from going implementing scheduling
support afterwords, but that looks backwards to me, especially when
thinking at it with the code in mind).

> The whole process would be more explicit and obvious if we used
> cpupools. It would be easier for users to know what it is going on --
> they just need to issue an `xl cpupool-list' command and they would
> see
> two clearly named pools (something like big-pool and LITTLE-pool). 
>
Well, I guess that, as part of big.LITTLE support, there will be a way
to tell what pcpus are big and which are LITTLE anyway, probably both
from `xl info' and from `xl cpupool-list -c' (and most likely in other
ways too).

> We
> wouldn't have to pin vcpus to cpus automatically in Xen or xl, which
> doesn't sound like fun.
>
As tried to say above, it will _look_ like some kind of automatic
pinning, but that does not mean it has to be implemented by means of
it, or dealt with by the user in the same way.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19 22:55                 ` Dario Faggioli
@ 2016-09-20  0:01                   ` Stefano Stabellini
  2016-09-20  0:54                     ` Dario Faggioli
  2016-09-20 10:18                     ` George Dunlap
  0 siblings, 2 replies; 85+ messages in thread
From: Stefano Stabellini @ 2016-09-20  0:01 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, xen-devel, Julien Grall, Jan Beulich, Peng Fan

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4391 bytes --]

On Tue, 20 Sep 2016, Dario Faggioli wrote:
> On Mon, 2016-09-19 at 14:03 -0700, Stefano Stabellini wrote:
> > On Mon, 19 Sep 2016, Dario Faggioli wrote:
> > > Setting thing up like this, even automatically, either in
> > hypervisor or
> > > toolstack, is basically already possible (with all the good and bad
> > > aspects of pinning, of course).
> > > 
> > > Then, sure (as I said when replying to George), we may want things
> > to
> > > be more flexible, and we also probably want to be on the safe side
> > --if 
> > > ever some components manages to undo our automatic pinning-- wrt
> > the
> > > scheduler not picking up work for the wrong architecture... But
> > still
> > > I'm a bit surprised this did not came up... Julien, Peng, is that
> > > because you think this is not doable for any reason I'm missing?
> > 
> > Let's suppose that Xen detects big.LITTLE and pins dom0 vcpus to big
> > automatically. How can the user know that she really needs to be
> > careful
> > in the way she pins the vcpus of new VMs? Xen would also need to pin
> > automatically vcpus of new VMs to either big or LITTLE cores, or xl
> > would have to do it.
> > 
> Actually doing things with what we currently have for pinning is only
> something I've brought up as an example, and (potentially) useful for
> proof-of-concept, or very early stage level support.
> 
> In the long run, when thinking to the scheduler based solution, I see
> things happening the other way round: you specify in xl config file
> (and with boot parameters, for dom0) how many big and how many LITTLE
> vcpus you want, and the scheduler will know that it can only schedule
> the big ones on big physical cores, and the LITTLE ones on LITTLE
> physical cores.
> 
> Note that we're saying 'pinning' (yeah, I know, I did it myself in the
> first place :-/), but that would not be an actual 1-to-1 pinning. For
> instance, if domain X has 4 big pcpus, say 0,1,2,3,4, and the host has
> 8 big pcpus, say 8-15, then dXv1, dXv2, dXv3 and dXv4 will only be run
> by the scheduler on pcpus 8-15. Any of them, and with migration and
> load balancing within the set possible. This is what I'm talking about.
> 
> And this would work even if/when there is only one cpupool, or in
> general for domains that are in a pool that has both big and LITTLE
> pcpus. Furthermore, big.LITTLE support and cpupools will be orthogonal,
> just like pinning and cpupools are orthogonal right now. I.e., once we
> will have what I described above, nothing prevents us from implementing
> per-vcpu cpupool membership, and either create the two (or more!) big
> and LITTLE pools, or from mixing things even more, for more complex and
> specific use cases. :-)

I think that everybody agrees that this is the best long term solution.


> Actually, with the cpupool solution, if you want a guest (or dom0) to
> actually have both big and LITTLE vcpus, you necessarily have to
> implement per-vcpu (rather than per-domain, as it is now) cpupool
> membership. I said myself it's not impossible, but certainly it's some
> work... with the scheduler solution you basically get that for free!
> 
> So, basically, if we use cpupools for the basics of big.LITTLE support,
> there's no way out of it (apart from going implementing scheduling
> support afterwords, but that looks backwards to me, especially when
> thinking at it with the code in mind).

The question is: what is the best short-term solution we can ask Peng to
implement that allows Xen to run on big.LITTLE systems today? Possibly
getting us closer to the long term solution, or at least not farther
from it?


> > The whole process would be more explicit and obvious if we used
> > cpupools. It would be easier for users to know what it is going on --
> > they just need to issue an `xl cpupool-list' command and they would
> > see
> > two clearly named pools (something like big-pool and LITTLE-pool). 
> >
> Well, I guess that, as part of big.LITTLE support, there will be a way
> to tell what pcpus are big and which are LITTLE anyway, probably both
> from `xl info' and from `xl cpupool-list -c' (and most likely in other
> ways too).

Sure, but it needs to be very clear. We cannot ask people to spot
architecture specific flags among the output of `xl info' to be able to
appropriately start a guest. Even what I suggested isn't great as `xl
cpupool-list' isn't a common command to run.

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-19 13:33             ` Peng Fan
@ 2016-09-20  0:11               ` Dario Faggioli
  2016-09-20  6:18                 ` Peng Fan
  0 siblings, 1 reply; 85+ messages in thread
From: Dario Faggioli @ 2016-09-20  0:11 UTC (permalink / raw)
  To: Peng Fan, George Dunlap
  Cc: J??rgen Gro??,
	Peng Fan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	xen-devel, Julien Grall, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 6371 bytes --]

On Mon, 2016-09-19 at 21:33 +0800, Peng Fan wrote:
> On Mon, Sep 19, 2016 at 11:33:58AM +0100, George Dunlap wrote:
> > 
> > No, I think it would be a lot simpler to just teach the scheduler
> > about
> > different classes of cpus.  credit1 would probably need to be
> > modified
> > so that its credit algorithm would be per-class rather than pool-
> > wide;
> > but credit2 shouldn't need much modification at all, other than to
> > make
> > sure that a given runqueue doesn't include more than one class; and
> > to
> > do load-balancing only with runqueues of the same class.
> 
> I try to follow.
>  - scheduler needs to be aware of different classes of cpus. ARM
> big.Little cpus.
>
Yes, I think this is essential.

>  - scheduler schedules vcpus on different physical cpus in one
> cpupool.
>
Yep, that's what the scheduler does. And personally, I'd start
implementing big.LITTLE support for a situation where both big and
LITTLE cpus coexists in the same pool.

>  - different cpu classes needs to be in different runqueue.
> 
Yes. So, basically, imagine to use vcpu pinning to support big.LITTLE.
I've spoken briefly about this in my reply to Juergen. You probably can
even get something like this up-&-running by writing very few or zero
code (you'll need --for now-- max_dom0_vcpus, dom0_vcpus_pin, and then,
in domain config files, "cpus='...'").

Then, the real goal, would be to achieve the same behavior
automatically, by acting on runqueues' arrangement and load balancing
logic in the scheduler(s).

Anyway, sorry for my ignorance on big.LITTLE, but there's something I'm
missing: _when_ is it that it is (or needs to be) decided whether a
vcpu will run on a big or LITTLE core?

Thinking to a bare metal system, I think that cpu X is, for instance, big, and will always be like that; similarly, cpu Y is LITTLE.

This makes me think that, for a virtual machine, it is ok to choose/specify at _domain_creation_ time, which vcpus are big and which vcpus are LITTLE, is this correct?
If yes, this also means that --whatever way we find to make this happen, cpupools, scheduler, etc-- the vcpus that we decided they are big, must only be scheduled on actual big pcpus, and pcpus that we decided they are LITTLE, must only be scheduled on actual LITTLE pcpus, correct again?

> Then for implementation.
>  - When create a guest, specific physical cpus that the guest will be
> run on.
>
I'd actually do that the other way round. I'd ask the user to specify
how many --and, if that's important-- vcpus are big and how many/which
are LITTLE.

Knowing that, we also know whether the domain is a big only, LITTLE
only or big.LITTLE one. And we also know on which set of pcpus each set
of vcpus should be restrict to.

So, basically (but it's just an example) something like this, in the xl
config file of a guest:

1) big.LITTLE guest, with 2 big and 2 LITTLE pcpus. User doesn't care  
   which is which, so a default could be 0,1 big and 2,3 LITTLE:

 vcpus = 4
 vcpus.big = 2

2) big.LITTLE guest, with 8 vcpus, of which 0,2,4 and 6 are big:

vcpus = 8
vcpus.big = [0, 2, 4, 6]

Which would be the same as

vcpus = 8
vcpus.little = [1, 3, 5, 7]

3) guest with 4 vcpus, all big:

vcpus = 4
vcpus.big = "all"

Which would be the same as:

vcpus = 4
vcpus.little = "none"

And also the same as just:

vcpus = 4


Or something like this

>  - If the physical cpus are different cpus, indicate the guest would
> like to be a big.little guest.
>    And have big vcpus and little vcpus.
>
Not liking this as _the_ way of specifying the guest topology, wrt
big.LITTLE-ness (see alternative proposal right above. :-))

However, right now we support pinning/affinity already. We certainly
need to decide what to do if, e.g., no vcpus.big or vcpus.little are
present, but the vcpus have hard or soft affinity to some specific
pcpus.

So, right now, this, in the xl config file:

cpus = [2, 8, 12, 13, 15, 17]

means that we want to ping 1-to-1 vcpu 0 to pcpu 2, vcpu 1 to pcpu 8,
vcpu 2 to pcpu 12, vcpu 3 to pcpu 13, vcpu 4 to pcpu 15 and vcpu 5 to
pcpu 17. Now, if cores 2, 8 and 12 are big, and no vcpus.big or
vcpu.little is specified, I'd put forward the assumption that the user
wants vcpus 0, 1 and 2 to be big, and vcpus 3, 4, and 5 to be LITTLE.

If, instead, there are vcpus.big or vcpus.little specified, and there's
disagreement, I'd either error out or decide which overrun the other
(and print a WARNING about that happening).

Still right now, this:

cpus = "2-12"

means that all the vcpus of the domain have hard affinity (i.e., are
pinned) to pcpus 2-12. And in this case I'd conclude that the user
wants for all the vcpus to be big.

I'm less sure what to do if _only_ soft-affinity is specified (via
"cpus_soft="), or if hard-affinity contains both big and LITTLE pcpus,
like, e.g.:

cpus = "2-15"

>  - If no physical cpus specificed, then the guest may runs on big
> cpus or on little cpus. But not both.
>
Yes. if nothing (or something contradictory) is specified, we "just"
have to decide what's the sanest default.

>    How to decide runs on big or little physical cpus?
>
I'd default to big.

>  - For Dom0, I am still not sure,default big.little or else?
> 
Again, if nothing is specified, I'd probably default to:
 - give dom0 as much vcpus are there are big cores
 - restrict them to big cores

But, of course, I think we should add boot time parameters like these
ones:

 dom0_vcpus_big = 4
 dom0_vcpus_little = 2

which would mean the user wants dom0 to have 4 big and 2 LITTLE
cores... and then we act accordingly, as described above, and in other
emails.

> If use scheduler to handle the different classes cpu, we do not need
> to use cpupool
> to block vcpus be scheduled onto different physical cpus. And using
> scheudler to handle this
> gives an opportunity to support big.little guest.
> 
Exactly, this is one strong point in favour of this solution, IMO!

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20  0:01                   ` Stefano Stabellini
@ 2016-09-20  0:54                     ` Dario Faggioli
  2016-09-20 10:03                       ` Peng Fan
  2016-09-20 10:18                     ` George Dunlap
  1 sibling, 1 reply; 85+ messages in thread
From: Dario Faggioli @ 2016-09-20  0:54 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, George Dunlap, Andrew Cooper, xen-devel,
	Julien Grall, Jan Beulich, Peng Fan


[-- Attachment #1.1: Type: text/plain, Size: 4781 bytes --]

On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > And this would work even if/when there is only one cpupool, or in
> > general for domains that are in a pool that has both big and LITTLE
> > pcpus. Furthermore, big.LITTLE support and cpupools will be
> > orthogonal,
> > just like pinning and cpupools are orthogonal right now. I.e., once
> > we
> > will have what I described above, nothing prevents us from
> > implementing
> > per-vcpu cpupool membership, and either create the two (or more!)
> > big
> > and LITTLE pools, or from mixing things even more, for more complex
> > and
> > specific use cases. :-)
> 
> I think that everybody agrees that this is the best long term
> solution.
> 
Well, no, that wasn't obvious to me. If that's the case, it's already
something! :-)

> > 
> > Actually, with the cpupool solution, if you want a guest (or dom0)
> > to
> > actually have both big and LITTLE vcpus, you necessarily have to
> > implement per-vcpu (rather than per-domain, as it is now) cpupool
> > membership. I said myself it's not impossible, but certainly it's
> > some
> > work... with the scheduler solution you basically get that for
> > free!
> > 
> > So, basically, if we use cpupools for the basics of big.LITTLE
> > support,
> > there's no way out of it (apart from going implementing scheduling
> > support afterwords, but that looks backwards to me, especially when
> > thinking at it with the code in mind).
> 
> The question is: what is the best short-term solution we can ask Peng
> to
> implement that allows Xen to run on big.LITTLE systems today?
> Possibly
> getting us closer to the long term solution, or at least not farther
> from it?
> 
So, I still have to look closely at the patches in these series. But,
with Credit2 in mind, if one:

 - take advantage of the knowledge of what arch a pcpu belongs inside 
   the code that arrange the pcpus in runqueues, which means we'll end 
   up with big runqueues and LITTLE runqueues. I re-wrote that code, I
   can provide pointers and help, if necessary;
 - tweak the one or two instance of for_each_runqueue() [*] that there
   are in the code into a for_each_runqueue_of_same_class(), i.e.:

 if (is_big(this_cpu))
 {
   for_each_big_runqueue()
   {
      ..
   }
 }
 else
 {
   for_each_LITTLE_runqueue()
   {
     ..
   }
 } 

then big.LITTLE support in Credit2 would be done already, and all it
would be left is support for the syntax of new config switches in xl,
and a way of telling, from xl/libxl down to Xen, what arch a vcpu
belongs to, so that it can be associated with one runqueue of the
proper class.

Thinking to Credit1, we need to make sure thet, in load_balance() and
runq_steal(), a LITTLE cpu *only* ever try to steal work from another
LITTLE cpu, and __never__ from a big cpu (and vice versa). And also
that when a vcpu wakes up, and what it has in its v->processor is a
LITTLE pcpu, that only LITTLE processors are considered for being
tickled (I'm less certain of this last part, but it should be more or
less like this).

Then, of course the the same glue and vcpu classification code.

However, in Credit1, it's possible that a trick like that would affect
the accounting and credit algorithm, and hence provide unfair, or in
general, unexpected results. Credit2 should, OTOH, be a lot mere
resilient, wrt that.

> > > The whole process would be more explicit and obvious if we used
> > > cpupools. It would be easier for users to know what it is going
> > > on --
> > > they just need to issue an `xl cpupool-list' command and they
> > > would
> > > see
> > > two clearly named pools (something like big-pool and LITTLE-
> > > pool). 
> > > 
> > Well, I guess that, as part of big.LITTLE support, there will be a
> > way
> > to tell what pcpus are big and which are LITTLE anyway, probably
> > both
> > from `xl info' and from `xl cpupool-list -c' (and most likely in
> > other
> > ways too).
> 
> Sure, but it needs to be very clear. We cannot ask people to spot
> architecture specific flags among the output of `xl info' to be able
> to
> appropriately start a guest. 
>
As mentioned in previous mail, and as drafted when replying to Peng,
the only think that the user should know is how many big and how many
LITTLE vcpus she wants (and, potentially, which one would be each). :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20  0:11               ` Dario Faggioli
@ 2016-09-20  6:18                 ` Peng Fan
  0 siblings, 0 replies; 85+ messages in thread
From: Peng Fan @ 2016-09-20  6:18 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: J??rgen Gro??,
	Peng Fan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	George Dunlap, xen-devel, Julien Grall, Jan Beulich

On Tue, Sep 20, 2016 at 02:11:04AM +0200, Dario Faggioli wrote:
>On Mon, 2016-09-19 at 21:33 +0800, Peng Fan wrote:
>> On Mon, Sep 19, 2016 at 11:33:58AM +0100, George Dunlap wrote:
>> >??
>> > No, I think it would be a lot simpler to just teach the scheduler
>> > about
>> > different classes of cpus.????credit1 would probably need to be
>> > modified
>> > so that its credit algorithm would be per-class rather than pool-
>> > wide;
>> > but credit2 shouldn't need much modification at all, other than to
>> > make
>> > sure that a given runqueue doesn't include more than one class; and
>> > to
>> > do load-balancing only with runqueues of the same class.
>> 
>> I try to follow.
>> ??- scheduler needs to be aware of different classes of cpus. ARM
>> big.Little cpus.
>>
>Yes, I think this is essential.
>
>> ??- scheduler schedules vcpus on different physical cpus in one
>> cpupool.
>>
>Yep, that's what the scheduler does. And personally, I'd start
>implementing big.LITTLE support for a situation where both big and
>LITTLE cpus coexists in the same pool.

It's great if you have plan to work on the scheduler part.

>
>> ??- different cpu classes needs to be in different runqueue.
>> 
>Yes. So, basically, imagine to use vcpu pinning to support big.LITTLE.
>I've spoken briefly about this in my reply to Juergen. You probably can
>even get something like this up-&-running by writing very few or zero
>code (you'll need --for now-- max_dom0_vcpus, dom0_vcpus_pin, and then,
>in domain config files, "cpus='...'").
>
>Then, the real goal, would be to achieve the same behavior
>automatically, by acting on runqueues' arrangement and load balancing
>logic in the scheduler(s).
>
>Anyway, sorry for my ignorance on big.LITTLE, but there's something I'm
>missing: _when_ is it that it is (or needs to be) decided whether a
>vcpu will run on a big or LITTLE core?

Big cores are more powerful than little cores, but consumes more power.
In Linux kernel, linaro is working on EAS scheduler to take advantage of big.LITTLE.
http://www.linaro.org/blog/core-dump/energy-aware-scheduling-eas-project/

As discussed, for big.little guest os that have big vcpu and little vcpu,
we only need to take care of big vcpu scheduled on big physical cpus, and little
vcpu sheduled on little physical cpus.
So a vcpu is not be scheduled between big and little physical cpus.

>
>Thinking to a bare metal system, I think that cpu X is, for instance, big, and will always be like that; similarly, cpu Y is LITTLE.
>
>This makes me think that, for a virtual machine, it is ok to choose/specify at _domain_creation_ time, which vcpus are big and which vcpus are LITTLE, is this correct?
>If yes, this also means that --whatever way we find to make this happen, cpupools, scheduler, etc-- the vcpus that we decided they are big, must only be scheduled on actual big pcpus, and pcpus that we decided they are LITTLE, must only be scheduled on actual LITTLE pcpus, correct again?
>
>> Then for implementation.
>> ??- When create a guest, specific physical cpus that the guest will be
>> run on.
>>
>I'd actually do that the other way round. I'd ask the user to specify
>how many --and, if that's important-- vcpus are big and how many/which
>are LITTLE.
>
>Knowing that, we also know whether the domain is a big only, LITTLE
>only or big.LITTLE one. And we also know on which set of pcpus each set
>of vcpus should be restrict to.
>
>So, basically (but it's just an example) something like this, in the xl
>config file of a guest:
>
>1) big.LITTLE guest, with 2 big and 2 LITTLE pcpus. User doesn't care ??
>?? ??which is which, so a default could be 0,1 big and 2,3 LITTLE:
>
>??vcpus = 4
>??vcpus.big = 2
>
>2) big.LITTLE guest, with 8 vcpus, of which 0,2,4 and 6 are big:
>
>vcpus = 8
>vcpus.big = [0, 2, 4, 6]
>
>Which would be the same as
>
>vcpus = 8
>vcpus.little = [1, 3, 5, 7]
>
>3) guest with 4 vcpus, all big:
>
>vcpus = 4
>vcpus.big = "all"
>
>Which would be the same as:
>
>vcpus = 4
>vcpus.little = "none"
>
>And also the same as just:
>
>vcpus = 4
>
>
>Or something like this
>
>> ??- If the physical cpus are different cpus, indicate the guest would
>> like to be a big.little guest.
>> ??????And have big vcpus and little vcpus.
>>
>Not liking this as _the_ way of specifying the guest topology, wrt
>big.LITTLE-ness (see alternative proposal right above. :-))
>
>However, right now we support pinning/affinity already. We certainly
>need to decide what to do if, e.g., no vcpus.big or vcpus.little are
>present, but the vcpus have hard or soft affinity to some specific
>pcpus.
>
>So, right now, this, in the xl config file:
>
>cpus = [2, 8, 12, 13, 15, 17]
>
>means that we want to ping 1-to-1 vcpu 0 to pcpu 2, vcpu 1 to pcpu 8,
>vcpu 2 to pcpu 12, vcpu 3 to pcpu 13, vcpu 4 to pcpu 15 and vcpu 5 to
>pcpu 17. Now, if cores 2, 8 and 12 are big, and no vcpus.big or
>vcpu.little is specified, I'd put forward the assumption that the user
>wants vcpus 0, 1 and 2 to be big, and vcpus 3, 4, and 5 to be LITTLE.
>
>If, instead, there are vcpus.big or vcpus.little specified, and there's
>disagreement, I'd either error out or decide which overrun the other
>(and print a WARNING about that happening).
>
>Still right now, this:
>
>cpus = "2-12"
>
>means that all the vcpus of the domain have hard affinity (i.e., are
>pinned) to pcpus 2-12. And in this case I'd conclude that the user
>wants for all the vcpus to be big.
>
>I'm less sure what to do if _only_ soft-affinity is specified (via
>"cpus_soft="), or if hard-affinity contains both big and LITTLE pcpus,
>like, e.g.:
>
>cpus = "2-15"
>
>> ??- If no physical cpus specificed, then the guest may runs on big
>> cpus or on little cpus. But not both.
>>
>Yes. if nothing (or something contradictory) is specified, we "just"
>have to decide what's the sanest default.
>
>> ??????How to decide runs on big or little physical cpus?
>>
>I'd default to big.
>
>> ??- For Dom0, I am still not sure,default big.little or else?
>> 
>Again, if nothing is specified, I'd probably default to:
>??- give dom0 as much vcpus are there are big cores
>??- restrict them to big cores
>
>But, of course, I think we should add boot time parameters like these
>ones:
>
>??dom0_vcpus_big = 4
>??dom0_vcpus_little = 2
>
>which would mean the user wants dom0 to have 4 big and 2 LITTLE
>cores... and then we act accordingly, as described above, and in other
>emails.
>
>> If use scheduler to handle the different classes cpu, we do not need
>> to use cpupool
>> to block vcpus be scheduled onto different physical cpus. And using
>> scheudler to handle this
>> gives an opportunity to support big.little guest.
>> 
>Exactly, this is one strong point in favour of this solution, IMO!

From the long run, I agree this is a good solution.

Thanks,
Peng.

>
>Regards,
>Dario
>-- 
><<This happens because I choose it to happen!>> (Raistlin Majere)
>-----------------------------------------------------------------
>Dario Faggioli, Ph.D, http://about.me/dario.faggioli
>Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20  0:54                     ` Dario Faggioli
@ 2016-09-20 10:03                       ` Peng Fan
  2016-09-20 10:27                         ` George Dunlap
  2016-09-21  9:45                         ` Dario Faggioli
  0 siblings, 2 replies; 85+ messages in thread
From: Peng Fan @ 2016-09-20 10:03 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, xen-devel, Julien Grall, Jan Beulich

Hi Dario,
On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>> > And this would work even if/when there is only one cpupool, or in
>> > general for domains that are in a pool that has both big and LITTLE
>> > pcpus. Furthermore, big.LITTLE support and cpupools will be
>> > orthogonal,
>> > just like pinning and cpupools are orthogonal right now. I.e., once
>> > we
>> > will have what I described above, nothing prevents us from
>> > implementing
>> > per-vcpu cpupool membership, and either create the two (or more!)
>> > big
>> > and LITTLE pools, or from mixing things even more, for more complex
>> > and
>> > specific use cases. :-)
>> 
>> I think that everybody agrees that this is the best long term
>> solution.
>> 
>Well, no, that wasn't obvious to me. If that's the case, it's already
>something! :-)
>
>> > 
>> > Actually, with the cpupool solution, if you want a guest (or dom0)
>> > to
>> > actually have both big and LITTLE vcpus, you necessarily have to
>> > implement per-vcpu (rather than per-domain, as it is now) cpupool
>> > membership. I said myself it's not impossible, but certainly it's
>> > some
>> > work... with the scheduler solution you basically get that for
>> > free!
>> > 
>> > So, basically, if we use cpupools for the basics of big.LITTLE
>> > support,
>> > there's no way out of it (apart from going implementing scheduling
>> > support afterwords, but that looks backwards to me, especially when
>> > thinking at it with the code in mind).
>> 
>> The question is: what is the best short-term solution we can ask Peng
>> to
>> implement that allows Xen to run on big.LITTLE systems today?
>> Possibly
>> getting us closer to the long term solution, or at least not farther
>> from it?
>> 
>So, I still have to look closely at the patches in these series. But,
>with Credit2 in mind, if one:
>
>??- take advantage of the knowledge of what arch a pcpu belongs inside??

>?? ??the code that arrange the pcpus in runqueues, which means we'll end??
>?? ??up with big runqueues and LITTLE runqueues. I re-wrote that code, I
>?? ??can provide pointers and help, if necessary;
>??- tweak the one or two instance of for_each_runqueue() [*] that there
>?? ??are in the code into a for_each_runqueue_of_same_class(), i.e.:

Do you have plan to add this support for big.LITTLE?

I admit that this is the first time I look into the scheduler part.
If I understand wrongly, please correct me.

There is a runqueue for each physical cpu, and there are several vcpus in the runqueue.
The scheduler will pick a vcpu in the runqueue to run on the physical cpu.

A vcpu is bind to a physical cpu when alloc_vcpu, but the vcpu can be scheduled
or migrated to a different physical cpu.

Settings cpu soft affinity and hard affinity to restrict vcpus be scheduled
on specific cpus. Then is there a need to introuduce more runqueues?

This seems more complicated than cpupool (:

>
>??if (is_big(this_cpu))
>??{
>?? ??for_each_big_runqueue()
>?? ??{
>?? ?? ?? ..
>?? ??}
>??}
>??else
>??{
>?? ??for_each_LITTLE_runqueue()
>?? ??{
>?? ?? ??..
>?? ??}
>??}??
>
>then big.LITTLE support in Credit2 would be done already, and all it
>would be left is support for the syntax of new config switches in xl,
>and a way of telling, from xl/libxl down to Xen, what arch a vcpu
>belongs to, so that it can be associated with one runqueue of the
>proper class.
>
>Thinking to Credit1, we need to make sure thet, in load_balance() and
>runq_steal(), a LITTLE cpu *only* ever try to steal work from another
>LITTLE cpu, and __never__ from a big cpu (and vice versa). And also
>that when a vcpu wakes up, and what it has in its v->processor is a
>LITTLE pcpu, that only LITTLE processors are considered for being
>tickled (I'm less certain of this last part, but it should be more or
>less like this).
>
>Then, of course the the same glue and vcpu classification code.
>
>However, in Credit1, it's possible that a trick like that would affect
>the accounting and credit algorithm, and hence provide unfair, or in
>general, unexpected results. Credit2 should, OTOH, be a lot mere
>resilient, wrt that.
>
>> > > The whole process would be more explicit and obvious if we used
>> > > cpupools. It would be easier for users to know what it is going
>> > > on --
>> > > they just need to issue an `xl cpupool-list' command and they
>> > > would
>> > > see
>> > > two clearly named pools (something like big-pool and LITTLE-
>> > > pool).??
>> > > 
>> > Well, I guess that, as part of big.LITTLE support, there will be a
>> > way
>> > to tell what pcpus are big and which are LITTLE anyway, probably
>> > both
>> > from `xl info' and from `xl cpupool-list -c' (and most likely in
>> > other
>> > ways too).
>> 
>> Sure, but it needs to be very clear. We cannot ask people to spot
>> architecture specific flags among the output of `xl info' to be able
>> to
>> appropriately start a guest. 
>>
>As mentioned in previous mail, and as drafted when replying to Peng,
>the only think that the user should know is how many big and how many
>LITTLE vcpus she wants (and, potentially, which one would be each). :-)

Yeah. Comes a new question to me.

For big.LITTLE, how to decide the physical cpu is a big CPU or a little cpu?

I'd like to add a computing capability in xen/arm, like this:

struct compute_capatiliby
{
   char *core_name;
   uint32_t rank;
   uint32_t cpu_partnum;
};

struct compute_capatiliby cc=
{
  {"A72", 4, 0xd08},
  {"A57", 3, 0xxxx},
  {"A53", 2, 0xd03},
  {"A35", 1, ...},
}

Then when identify cpu, we decide which cpu is big and which cpu is little
according to the computing rank.

Any comments?

Thanks,
Peng.
>
>Regards,
>Dario
>-- 
><<This happens because I choose it to happen!>> (Raistlin Majere)
>-----------------------------------------------------------------
>Dario Faggioli, Ph.D, http://about.me/dario.faggioli
>Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20  0:01                   ` Stefano Stabellini
  2016-09-20  0:54                     ` Dario Faggioli
@ 2016-09-20 10:18                     ` George Dunlap
  1 sibling, 0 replies; 85+ messages in thread
From: George Dunlap @ 2016-09-20 10:18 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, Andrew Cooper, Dario Faggioli,
	xen-devel, Julien Grall, Jan Beulich, Peng Fan

On Tue, Sep 20, 2016 at 1:01 AM, Stefano Stabellini
<sstabellini@kernel.org> wrote:

>> Actually, with the cpupool solution, if you want a guest (or dom0) to
>> actually have both big and LITTLE vcpus, you necessarily have to
>> implement per-vcpu (rather than per-domain, as it is now) cpupool
>> membership. I said myself it's not impossible, but certainly it's some
>> work... with the scheduler solution you basically get that for free!
>>
>> So, basically, if we use cpupools for the basics of big.LITTLE support,
>> there's no way out of it (apart from going implementing scheduling
>> support afterwords, but that looks backwards to me, especially when
>> thinking at it with the code in mind).
>
> The question is: what is the best short-term solution we can ask Peng to
> implement that allows Xen to run on big.LITTLE systems today? Possibly
> getting us closer to the long term solution, or at least not farther
> from it?

So remember that there's the *interface* we're providing to the user
(specifying vcpus as little or BIG) and the guest OS (how does the
guest OS know whether a specific vcpu is little or BIG), and there's
the *implementation* of that.

For comparison, xl provides a "guest numa" interface; but the Xen
schedulers actually don't know anything about NUMA -- they only have
the concept of "soft affinity".  xl uses soft affinity to implement
the NUMA characteristics we want from the scheduler.

It seems to me that the best combination of functionality / simplicity
would be to provide a way to specify, in xl, which vcpus should be big
and LITTLE (something similar to what Dario mentions below); and then
to implement that initially only with cpu pinning inside of xl.

Then at some point we can extend that to tagging each vcpu with a
"pcpu class", and teaching schedulers about those classes, and making
sure each vcpu runs only within its own class.  This effectively
amounts to another layer of hard pinning, but one which is a bit more
robust (i.e., won't be confused if the user tries to set the hard
affinity of a vcpu).

That said, if the goal is to get *something* up as quick as humanly
possible, implementing "xl cpupool-bigLITTLE-split" (or a
'cpupool_setup=biglittle' Xen command-line option) would probably do
the job; but only by limiting domains to having only big or LITTLE
vcpus.

> Sure, but it needs to be very clear. We cannot ask people to spot
> architecture specific flags among the output of `xl info' to be able to
> appropriately start a guest. Even what I suggested isn't great as `xl
> cpupool-list' isn't a common command to run.

Well fundamentally, given that a vcpu has to start and stay on the
same class of processors, the user will have to make some action to
deviate from the default.  I don't fundamentally see any difference
between saying "In order to use the LITTLE cpus you have to specify
the LITTLE cpupool" and "In order to use the LITTLE cpus you have to
specify some vcpus as LITTLE vcpus".

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20 10:03                       ` Peng Fan
@ 2016-09-20 10:27                         ` George Dunlap
  2016-09-20 15:34                           ` Julien Grall
  2016-09-21  9:45                         ` Dario Faggioli
  1 sibling, 1 reply; 85+ messages in thread
From: George Dunlap @ 2016-09-20 10:27 UTC (permalink / raw)
  To: Peng Fan
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Andrew Cooper,
	Dario Faggioli, xen-devel, Julien Grall, Jan Beulich

On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com> wrote:
> Hi Dario,
> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>> > And this would work even if/when there is only one cpupool, or in
>>> > general for domains that are in a pool that has both big and LITTLE
>>> > pcpus. Furthermore, big.LITTLE support and cpupools will be
>>> > orthogonal,
>>> > just like pinning and cpupools are orthogonal right now. I.e., once
>>> > we
>>> > will have what I described above, nothing prevents us from
>>> > implementing
>>> > per-vcpu cpupool membership, and either create the two (or more!)
>>> > big
>>> > and LITTLE pools, or from mixing things even more, for more complex
>>> > and
>>> > specific use cases. :-)
>>>
>>> I think that everybody agrees that this is the best long term
>>> solution.
>>>
>>Well, no, that wasn't obvious to me. If that's the case, it's already
>>something! :-)
>>
>>> >
>>> > Actually, with the cpupool solution, if you want a guest (or dom0)
>>> > to
>>> > actually have both big and LITTLE vcpus, you necessarily have to
>>> > implement per-vcpu (rather than per-domain, as it is now) cpupool
>>> > membership. I said myself it's not impossible, but certainly it's
>>> > some
>>> > work... with the scheduler solution you basically get that for
>>> > free!
>>> >
>>> > So, basically, if we use cpupools for the basics of big.LITTLE
>>> > support,
>>> > there's no way out of it (apart from going implementing scheduling
>>> > support afterwords, but that looks backwards to me, especially when
>>> > thinking at it with the code in mind).
>>>
>>> The question is: what is the best short-term solution we can ask Peng
>>> to
>>> implement that allows Xen to run on big.LITTLE systems today?
>>> Possibly
>>> getting us closer to the long term solution, or at least not farther
>>> from it?
>>>
>>So, I still have to look closely at the patches in these series. But,
>>with Credit2 in mind, if one:
>>
>>??- take advantage of the knowledge of what arch a pcpu belongs inside??
>
>>?? ??the code that arrange the pcpus in runqueues, which means we'll end??
>>?? ??up with big runqueues and LITTLE runqueues. I re-wrote that code, I
>>?? ??can provide pointers and help, if necessary;
>>??- tweak the one or two instance of for_each_runqueue() [*] that there
>>?? ??are in the code into a for_each_runqueue_of_same_class(), i.e.:
>
> Do you have plan to add this support for big.LITTLE?
>
> I admit that this is the first time I look into the scheduler part.
> If I understand wrongly, please correct me.
>
> There is a runqueue for each physical cpu, and there are several vcpus in the runqueue.
> The scheduler will pick a vcpu in the runqueue to run on the physical cpu.
>
> A vcpu is bind to a physical cpu when alloc_vcpu, but the vcpu can be scheduled
> or migrated to a different physical cpu.
>
> Settings cpu soft affinity and hard affinity to restrict vcpus be scheduled
> on specific cpus. Then is there a need to introuduce more runqueues?

Runqueues is a scheduler-specific thing.  The simplest thing to do
would be, in the toolstack, to limit the hard affinity of a vcpu to
its cpu class (either big or LITTLE).  Then the scheduler will simply
do the right thing.

> This seems more complicated than cpupool (:

It's more complicated than simply making 2 cpupools and having any
given domain be entirely big or entirely LITTLE.

But it's a lot *less* complicated than trying to make a single domain
cross two different kinds of cpupools. :-)

>>As mentioned in previous mail, and as drafted when replying to Peng,
>>the only think that the user should know is how many big and how many
>>LITTLE vcpus she wants (and, potentially, which one would be each). :-)
>
> Yeah. Comes a new question to me.
>
> For big.LITTLE, how to decide the physical cpu is a big CPU or a little cpu?
>
> I'd like to add a computing capability in xen/arm, like this:
>
> struct compute_capatiliby
> {
>    char *core_name;
>    uint32_t rank;
>    uint32_t cpu_partnum;
> };
>
> struct compute_capatiliby cc=
> {
>   {"A72", 4, 0xd08},
>   {"A57", 3, 0xxxx},
>   {"A53", 2, 0xd03},
>   {"A35", 1, ...},
> }
>
> Then when identify cpu, we decide which cpu is big and which cpu is little
> according to the computing rank.
>
> Any comments?

I think we definitely need to have Xen have some kind of idea the
order between processors, so that the user doesn't need to figure out
which class / pool is big and which pool is LITTLE.  Whether this sort
of enumeration is the best way to do that I'll let Julien and Stefano
give their opinion.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20 10:27                         ` George Dunlap
@ 2016-09-20 15:34                           ` Julien Grall
  2016-09-20 17:24                             ` Dario Faggioli
  2016-09-20 19:09                             ` Stefano Stabellini
  0 siblings, 2 replies; 85+ messages in thread
From: Julien Grall @ 2016-09-20 15:34 UTC (permalink / raw)
  To: George Dunlap, Peng Fan
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Andrew Cooper,
	Dario Faggioli, xen-devel, Jan Beulich

Hi,

On 20/09/2016 12:27, George Dunlap wrote:
> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com> wrote:
>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>> I'd like to add a computing capability in xen/arm, like this:
>>
>> struct compute_capatiliby
>> {
>>    char *core_name;
>>    uint32_t rank;
>>    uint32_t cpu_partnum;
>> };
>>
>> struct compute_capatiliby cc=
>> {
>>   {"A72", 4, 0xd08},
>>   {"A57", 3, 0xxxx},
>>   {"A53", 2, 0xd03},
>>   {"A35", 1, ...},
>> }
>>
>> Then when identify cpu, we decide which cpu is big and which cpu is little
>> according to the computing rank.
>>
>> Any comments?
>
> I think we definitely need to have Xen have some kind of idea the
> order between processors, so that the user doesn't need to figure out
> which class / pool is big and which pool is LITTLE.  Whether this sort
> of enumeration is the best way to do that I'll let Julien and Stefano
> give their opinion.

I don't think an hardcoded list of processor in Xen is the right 
solution. There are many existing processors and combinations for 
big.LITTLE so it will nearly be impossible to keep updated.

I would expect the firmware table (device tree, ACPI) to provide 
relevant data for each processor and differentiate big from LITTLE core.
Note that I haven't looked at it for now. A good place to start is 
looking at how Linux does.

Regards,


-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20 15:34                           ` Julien Grall
@ 2016-09-20 17:24                             ` Dario Faggioli
  2016-09-20 19:09                             ` Stefano Stabellini
  1 sibling, 0 replies; 85+ messages in thread
From: Dario Faggioli @ 2016-09-20 17:24 UTC (permalink / raw)
  To: Julien Grall, George Dunlap, Peng Fan
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Andrew Cooper,
	xen-devel, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 2038 bytes --]

On Tue, 2016-09-20 at 17:34 +0200, Julien Grall wrote:
> On 20/09/2016 12:27, George Dunlap wrote:
> > I think we definitely need to have Xen have some kind of idea the
> > order between processors, so that the user doesn't need to figure
> > out
> > which class / pool is big and which pool is LITTLE.  Whether this
> > sort
> > of enumeration is the best way to do that I'll let Julien and
> > Stefano
> > give their opinion.
> 
> I don't think an hardcoded list of processor in Xen is the right 
> solution. There are many existing processors and combinations for 
> big.LITTLE so it will nearly be impossible to keep updated.
> 
As far as either the scheduler or cpupools go, what's necessary would
be:
 - in Xen, a function (or an array acting as a map, or whatever) to 
   call to know whether pcpu X is big or LITTLE;
 - at toolstack level, an hypercal (or a field, bit, whatever in a
   struct already returned by an existing hypercall) to know the same 
   thing, i.e., whether c pcpu is big or LITTLE.

Once we have this, we can do everything. We will probably want to
abstract things a bit, and make them as generic as practical, so that
the same interface can be used not only for ARM big.LITTLE, but for
whatever future heterogeneous cpu arch we'll support... but really, the
actual information that we need is "just" that.

I've absolutely no idea how such info could be achieved, and I have no
ARM big.LITTLE hardware to test on.

> I would expect the firmware table (device tree, ACPI) to provide 
> relevant data for each processor and differentiate big from LITTLE
> core.
> Note that I haven't looked at it for now. A good place to start is 
> looking at how Linux does.
> 
Makes sense.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20 15:34                           ` Julien Grall
  2016-09-20 17:24                             ` Dario Faggioli
@ 2016-09-20 19:09                             ` Stefano Stabellini
  2016-09-20 19:41                               ` Julien Grall
  1 sibling, 1 reply; 85+ messages in thread
From: Stefano Stabellini @ 2016-09-20 19:09 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Dario Faggioli, xen-devel, Jan Beulich, Peng Fan

On Tue, 20 Sep 2016, Julien Grall wrote:
> Hi,
> 
> On 20/09/2016 12:27, George Dunlap wrote:
> > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com> wrote:
> > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
> > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > > I'd like to add a computing capability in xen/arm, like this:
> > > 
> > > struct compute_capatiliby
> > > {
> > >    char *core_name;
> > >    uint32_t rank;
> > >    uint32_t cpu_partnum;
> > > };
> > > 
> > > struct compute_capatiliby cc=
> > > {
> > >   {"A72", 4, 0xd08},
> > >   {"A57", 3, 0xxxx},
> > >   {"A53", 2, 0xd03},
> > >   {"A35", 1, ...},
> > > }
> > > 
> > > Then when identify cpu, we decide which cpu is big and which cpu is little
> > > according to the computing rank.
> > > 
> > > Any comments?
> > 
> > I think we definitely need to have Xen have some kind of idea the
> > order between processors, so that the user doesn't need to figure out
> > which class / pool is big and which pool is LITTLE.  Whether this sort
> > of enumeration is the best way to do that I'll let Julien and Stefano
> > give their opinion.
> 
> I don't think an hardcoded list of processor in Xen is the right solution.
> There are many existing processors and combinations for big.LITTLE so it will
> nearly be impossible to keep updated.
> 
> I would expect the firmware table (device tree, ACPI) to provide relevant data
> for each processor and differentiate big from LITTLE core.
> Note that I haven't looked at it for now. A good place to start is looking at
> how Linux does.

That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
trivial to identify the two different CPU classes and which cores belong
to which class. It is harder to figure out which one is supposed to be
big and which one LITTLE. Regardless, we could default to using the
first cluster (usually big), which is also the cluster of the boot cpu,
and utilize the second cluster only when the user demands it.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20 19:09                             ` Stefano Stabellini
@ 2016-09-20 19:41                               ` Julien Grall
  2016-09-20 20:17                                 ` Stefano Stabellini
  0 siblings, 1 reply; 85+ messages in thread
From: Julien Grall @ 2016-09-20 19:41 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jan Beulich, Peng Fan

Hi Stefano,

On 20/09/2016 20:09, Stefano Stabellini wrote:
> On Tue, 20 Sep 2016, Julien Grall wrote:
>> Hi,
>>
>> On 20/09/2016 12:27, George Dunlap wrote:
>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com> wrote:
>>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>> I'd like to add a computing capability in xen/arm, like this:
>>>>
>>>> struct compute_capatiliby
>>>> {
>>>>    char *core_name;
>>>>    uint32_t rank;
>>>>    uint32_t cpu_partnum;
>>>> };
>>>>
>>>> struct compute_capatiliby cc=
>>>> {
>>>>   {"A72", 4, 0xd08},
>>>>   {"A57", 3, 0xxxx},
>>>>   {"A53", 2, 0xd03},
>>>>   {"A35", 1, ...},
>>>> }
>>>>
>>>> Then when identify cpu, we decide which cpu is big and which cpu is little
>>>> according to the computing rank.
>>>>
>>>> Any comments?
>>>
>>> I think we definitely need to have Xen have some kind of idea the
>>> order between processors, so that the user doesn't need to figure out
>>> which class / pool is big and which pool is LITTLE.  Whether this sort
>>> of enumeration is the best way to do that I'll let Julien and Stefano
>>> give their opinion.
>>
>> I don't think an hardcoded list of processor in Xen is the right solution.
>> There are many existing processors and combinations for big.LITTLE so it will
>> nearly be impossible to keep updated.
>>
>> I would expect the firmware table (device tree, ACPI) to provide relevant data
>> for each processor and differentiate big from LITTLE core.
>> Note that I haven't looked at it for now. A good place to start is looking at
>> how Linux does.
>
> That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
> trivial to identify the two different CPU classes and which cores belong
> to which class.t, as

The class of the CPU can be found from the MIDR, there is no need to use 
the device tree/acpi for that. Note that I don't think there is an easy 
way in ACPI (i.e not in AML) to find out the class.

> It is harder to figure out which one is supposed to be
> big and which one LITTLE. Regardless, we could default to using the
> first cluster (usually big), which is also the cluster of the boot cpu,
> and utilize the second cluster only when the user demands it.

Why do you think the boot CPU will usually be a big one? In the case of 
Juno platform it is configurable, and the boot CPU is a little core on 
r2 by default.

In any case, what we care about is differentiate between two set of 
CPUs. I don't think Xen should care about migrating a guest vCPU between 
big and LITTLE cpus. So I am not sure why we would want to know that.

The only thing we need an identifier for each set (I might be the MIDR 
or the compatible in the device tree).

Note that, as Peng mentioned, Linaro is working on an energy-aware 
scheduler. So there is a way (maybe not yet upstreamed) to find the CPU 
topology.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20 19:41                               ` Julien Grall
@ 2016-09-20 20:17                                 ` Stefano Stabellini
  2016-09-21  8:38                                   ` Peng Fan
  2016-09-21 10:09                                   ` Julien Grall
  0 siblings, 2 replies; 85+ messages in thread
From: Stefano Stabellini @ 2016-09-20 20:17 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Dario Faggioli, xen-devel, Jan Beulich, Peng Fan

On Tue, 20 Sep 2016, Julien Grall wrote:
> Hi Stefano,
> 
> On 20/09/2016 20:09, Stefano Stabellini wrote:
> > On Tue, 20 Sep 2016, Julien Grall wrote:
> > > Hi,
> > > 
> > > On 20/09/2016 12:27, George Dunlap wrote:
> > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com>
> > > > wrote:
> > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
> > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > > > > I'd like to add a computing capability in xen/arm, like this:
> > > > > 
> > > > > struct compute_capatiliby
> > > > > {
> > > > >    char *core_name;
> > > > >    uint32_t rank;
> > > > >    uint32_t cpu_partnum;
> > > > > };
> > > > > 
> > > > > struct compute_capatiliby cc=
> > > > > {
> > > > >   {"A72", 4, 0xd08},
> > > > >   {"A57", 3, 0xxxx},
> > > > >   {"A53", 2, 0xd03},
> > > > >   {"A35", 1, ...},
> > > > > }
> > > > > 
> > > > > Then when identify cpu, we decide which cpu is big and which cpu is
> > > > > little
> > > > > according to the computing rank.
> > > > > 
> > > > > Any comments?
> > > > 
> > > > I think we definitely need to have Xen have some kind of idea the
> > > > order between processors, so that the user doesn't need to figure out
> > > > which class / pool is big and which pool is LITTLE.  Whether this sort
> > > > of enumeration is the best way to do that I'll let Julien and Stefano
> > > > give their opinion.
> > > 
> > > I don't think an hardcoded list of processor in Xen is the right solution.
> > > There are many existing processors and combinations for big.LITTLE so it
> > > will
> > > nearly be impossible to keep updated.
> > > 
> > > I would expect the firmware table (device tree, ACPI) to provide relevant
> > > data
> > > for each processor and differentiate big from LITTLE core.
> > > Note that I haven't looked at it for now. A good place to start is looking
> > > at
> > > how Linux does.
> > 
> > That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
> > trivial to identify the two different CPU classes and which cores belong
> > to which class.t, as
> 
> The class of the CPU can be found from the MIDR, there is no need to use the
> device tree/acpi for that. Note that I don't think there is an easy way in
> ACPI (i.e not in AML) to find out the class.
> 
> > It is harder to figure out which one is supposed to be
> > big and which one LITTLE. Regardless, we could default to using the
> > first cluster (usually big), which is also the cluster of the boot cpu,
> > and utilize the second cluster only when the user demands it.
> 
> Why do you think the boot CPU will usually be a big one? In the case of Juno
> platform it is configurable, and the boot CPU is a little core on r2 by
> default.
> 
> In any case, what we care about is differentiate between two set of CPUs. I
> don't think Xen should care about migrating a guest vCPU between big and
> LITTLE cpus. So I am not sure why we would want to know that.

No, it is not about migrating (at least yet). It is about giving useful
information to the user. It would be nice if the user had to choose
between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
even "A7" or "A15".

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20 20:17                                 ` Stefano Stabellini
@ 2016-09-21  8:38                                   ` Peng Fan
  2016-09-21  9:22                                     ` George Dunlap
  2016-09-21 10:15                                     ` Julien Grall
  2016-09-21 10:09                                   ` Julien Grall
  1 sibling, 2 replies; 85+ messages in thread
From: Peng Fan @ 2016-09-21  8:38 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Julien Grall, Jan Beulich

On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:
>On Tue, 20 Sep 2016, Julien Grall wrote:
>> Hi Stefano,
>> 
>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>> > On Tue, 20 Sep 2016, Julien Grall wrote:
>> > > Hi,
>> > > 
>> > > On 20/09/2016 12:27, George Dunlap wrote:
>> > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com>
>> > > > wrote:
>> > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>> > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>> > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
>> > > > > I'd like to add a computing capability in xen/arm, like this:
>> > > > > 
>> > > > > struct compute_capatiliby
>> > > > > {
>> > > > >    char *core_name;
>> > > > >    uint32_t rank;
>> > > > >    uint32_t cpu_partnum;
>> > > > > };
>> > > > > 
>> > > > > struct compute_capatiliby cc=
>> > > > > {
>> > > > >   {"A72", 4, 0xd08},
>> > > > >   {"A57", 3, 0xxxx},
>> > > > >   {"A53", 2, 0xd03},
>> > > > >   {"A35", 1, ...},
>> > > > > }
>> > > > > 
>> > > > > Then when identify cpu, we decide which cpu is big and which cpu is
>> > > > > little
>> > > > > according to the computing rank.
>> > > > > 
>> > > > > Any comments?
>> > > > 
>> > > > I think we definitely need to have Xen have some kind of idea the
>> > > > order between processors, so that the user doesn't need to figure out
>> > > > which class / pool is big and which pool is LITTLE.  Whether this sort
>> > > > of enumeration is the best way to do that I'll let Julien and Stefano
>> > > > give their opinion.
>> > > 
>> > > I don't think an hardcoded list of processor in Xen is the right solution.
>> > > There are many existing processors and combinations for big.LITTLE so it
>> > > will
>> > > nearly be impossible to keep updated.
>> > > 
>> > > I would expect the firmware table (device tree, ACPI) to provide relevant
>> > > data
>> > > for each processor and differentiate big from LITTLE core.
>> > > Note that I haven't looked at it for now. A good place to start is looking
>> > > at
>> > > how Linux does.
>> > 
>> > That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
>> > trivial to identify the two different CPU classes and which cores belong
>> > to which class.t, as
>> 
>> The class of the CPU can be found from the MIDR, there is no need to use the
>> device tree/acpi for that. Note that I don't think there is an easy way in
>> ACPI (i.e not in AML) to find out the class.
>> 
>> > It is harder to figure out which one is supposed to be
>> > big and which one LITTLE. Regardless, we could default to using the
>> > first cluster (usually big), which is also the cluster of the boot cpu,
>> > and utilize the second cluster only when the user demands it.
>> 
>> Why do you think the boot CPU will usually be a big one? In the case of Juno
>> platform it is configurable, and the boot CPU is a little core on r2 by
>> default.
>> 
>> In any case, what we care about is differentiate between two set of CPUs. I
>> don't think Xen should care about migrating a guest vCPU between big and
>> LITTLE cpus. So I am not sure why we would want to know that.
>
>No, it is not about migrating (at least yet). It is about giving useful
>information to the user. It would be nice if the user had to choose
>between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>even "A7" or "A15".

As Dario mentioned in previous email,
for dom0 provide like this:

dom0_vcpus_big = 4
dom0_vcpus_little = 2

to dom0.

If these two no provided, we could let dom0 runs on big pcpus or big.little.
Anyway this is not the important point for dom0 only big or big.little.

For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
Such as:

vcpus.big = 2
vcpus.litle = 4


According to George's comments,
Then, I think we could use affinity to restrict little vcpus be scheduled on little vcpus,
and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, use hard
affinity is to handle this.

We may need to provide some interface to let xl can get the information such as
big.little or smp. if it is big.little, which is big and which is little.

For how to differentiate cpus, I am looking the linaro eas cpu topology code,
The code has not been upstreamed (:, but merged into google android kernel.
I only plan to take some necessary code, such as device tree parse and
cpu topology build, because we only need to know the computing capacity of each pcpu.

Some doc about eas piece, including dts node examples:
https://git.linaro.org/arm/eas/kernel.git/blob/refs/heads/lsk-v4.4-eas-v5.2:/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt

I pasted partial eas code:
			for (i = 0, val = prop->value; i < nstates; i++) {
				cap_states[i].cap = be32_to_cpup(val++);
				cap_states[i].power = be32_to_cpup(val++);
			}

			sge->nr_cap_states = nstates;
			sge->cap_states = cap_states;

The upper code is to get the computing capacity and power from device tree.

For us, I think we only need to cap entry.
Add a "cap" entry in cpuinfo_arm, fill cap when parsing dt.
Add a "cap" entry in arch_vcpu or vcpu.

When creating a vcpu, fill cap according to this is big vcpu or little vcpu.
The cap should be same with the cap of physical cpu.
Then set the hard affinity of vcpu when creat a vcpu.

User may change the hard affinity of a vcpu, so we also need to block a little
vcpu be scheduled to a big physical cpu. Add some checking code in xen,
when chaning the hard affnity, check whether the cap of a vcpu is compatible
with the cap of the physical cpus.

I am not sure, but we may also need to handle mpidr for ARM, because big and little vcpus are supported.

All the above is that I would like to implement according the to discussion of
this thread. No cpupool and scheduler part included.

Please comments. 

Thanks,
Peng.
-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21  8:38                                   ` Peng Fan
@ 2016-09-21  9:22                                     ` George Dunlap
  2016-09-21 12:35                                       ` Peng Fan
  2016-09-21 15:00                                       ` Dario Faggioli
  2016-09-21 10:15                                     ` Julien Grall
  1 sibling, 2 replies; 85+ messages in thread
From: George Dunlap @ 2016-09-21  9:22 UTC (permalink / raw)
  To: Peng Fan, Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Julien Grall, Jan Beulich

On 21/09/16 09:38, Peng Fan wrote:
> On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:
>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>> Hi Stefano,
>>>
>>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> On 20/09/2016 12:27, George Dunlap wrote:
>>>>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com>
>>>>>> wrote:
>>>>>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>>>>>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>>> I'd like to add a computing capability in xen/arm, like this:
>>>>>>>
>>>>>>> struct compute_capatiliby
>>>>>>> {
>>>>>>>    char *core_name;
>>>>>>>    uint32_t rank;
>>>>>>>    uint32_t cpu_partnum;
>>>>>>> };
>>>>>>>
>>>>>>> struct compute_capatiliby cc=
>>>>>>> {
>>>>>>>   {"A72", 4, 0xd08},
>>>>>>>   {"A57", 3, 0xxxx},
>>>>>>>   {"A53", 2, 0xd03},
>>>>>>>   {"A35", 1, ...},
>>>>>>> }
>>>>>>>
>>>>>>> Then when identify cpu, we decide which cpu is big and which cpu is
>>>>>>> little
>>>>>>> according to the computing rank.
>>>>>>>
>>>>>>> Any comments?
>>>>>>
>>>>>> I think we definitely need to have Xen have some kind of idea the
>>>>>> order between processors, so that the user doesn't need to figure out
>>>>>> which class / pool is big and which pool is LITTLE.  Whether this sort
>>>>>> of enumeration is the best way to do that I'll let Julien and Stefano
>>>>>> give their opinion.
>>>>>
>>>>> I don't think an hardcoded list of processor in Xen is the right solution.
>>>>> There are many existing processors and combinations for big.LITTLE so it
>>>>> will
>>>>> nearly be impossible to keep updated.
>>>>>
>>>>> I would expect the firmware table (device tree, ACPI) to provide relevant
>>>>> data
>>>>> for each processor and differentiate big from LITTLE core.
>>>>> Note that I haven't looked at it for now. A good place to start is looking
>>>>> at
>>>>> how Linux does.
>>>>
>>>> That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
>>>> trivial to identify the two different CPU classes and which cores belong
>>>> to which class.t, as
>>>
>>> The class of the CPU can be found from the MIDR, there is no need to use the
>>> device tree/acpi for that. Note that I don't think there is an easy way in
>>> ACPI (i.e not in AML) to find out the class.
>>>
>>>> It is harder to figure out which one is supposed to be
>>>> big and which one LITTLE. Regardless, we could default to using the
>>>> first cluster (usually big), which is also the cluster of the boot cpu,
>>>> and utilize the second cluster only when the user demands it.
>>>
>>> Why do you think the boot CPU will usually be a big one? In the case of Juno
>>> platform it is configurable, and the boot CPU is a little core on r2 by
>>> default.
>>>
>>> In any case, what we care about is differentiate between two set of CPUs. I
>>> don't think Xen should care about migrating a guest vCPU between big and
>>> LITTLE cpus. So I am not sure why we would want to know that.
>>
>> No, it is not about migrating (at least yet). It is about giving useful
>> information to the user. It would be nice if the user had to choose
>> between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>> even "A7" or "A15".
> 
> As Dario mentioned in previous email,
> for dom0 provide like this:
> 
> dom0_vcpus_big = 4
> dom0_vcpus_little = 2
> 
> to dom0.
> 
> If these two no provided, we could let dom0 runs on big pcpus or big.little.
> Anyway this is not the important point for dom0 only big or big.little.
> 
> For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
> Such as:
> 
> vcpus.big = 2
> vcpus.litle = 4

FWIW, from a UI perspective, it would be nice if we designed the
interface such that it *can* be used simply (i.e., just "big" or
"little"), but can also be used more flexibly; for instance, specifying
"A15" or "A7" instead.

So maybe have a 'classifier' string; this could start by having just
"big" and "little", but could then be extended to allow fuller ways of
specifying specific kinds of cores.

To keep the illusion of python syntax, what about something like this:

vcpuclass=["big=2","little=4"]

Or would it be better to have a mapping of vcpu to class?

vcpuclass=["0-1:big","2-5:little"]


> According to George's comments,
> Then, I think we could use affinity to restrict little vcpus be scheduled on little vcpus,
> and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, use hard
> affinity is to handle this.
> 
> We may need to provide some interface to let xl can get the information such as
> big.little or smp. if it is big.little, which is big and which is little.

If it's possible for Xen to order the cpus by class, then there could be
a hypercall listing the different classes starting with the largest
class.  On typical big.LITTLE systems, class 0 would be "big" and class
1 would be "little".


> User may change the hard affinity of a vcpu, so we also need to block a little
> vcpu be scheduled to a big physical cpu. Add some checking code in xen,
> when chaning the hard affnity, check whether the cap of a vcpu is compatible
> with the cap of the physical cpus.

Dario, what do we do with vNUMA / soft affinity?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20 10:03                       ` Peng Fan
  2016-09-20 10:27                         ` George Dunlap
@ 2016-09-21  9:45                         ` Dario Faggioli
  1 sibling, 0 replies; 85+ messages in thread
From: Dario Faggioli @ 2016-09-21  9:45 UTC (permalink / raw)
  To: Peng Fan
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, xen-devel, Julien Grall, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 5318 bytes --]

On Tue, 2016-09-20 at 18:03 +0800, Peng Fan wrote:
> Hi Dario,
> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
> > 
> > On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> > > 
> > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > > > 
> > > > And this would work even if/when there is only one cpupool, or
> > > > in
> > > > general for domains that are in a pool that has both big and
> > > > LITTLE
> > > > pcpus. Furthermore, big.LITTLE support and cpupools will be
> > > > orthogonal,
> > > > just like pinning and cpupools are orthogonal right now. I.e.,
> > > > once
> > > > we
> > > > will have what I described above, nothing prevents us from
> > > > implementing
> > > > per-vcpu cpupool membership, and either create the two (or
> > > > more!)
> > > > big
> > > > and LITTLE pools, or from mixing things even more, for more
> > > > complex
> > > > and
> > > > specific use cases. :-)
> > > 
> > > I think that everybody agrees that this is the best long term
> > > solution.
> > > 
> > Well, no, that wasn't obvious to me. If that's the case, it's
> > already
> > something! :-)
> > 
> > > 
> > > > 
> > > > 
> > > > Actually, with the cpupool solution, if you want a guest (or
> > > > dom0)
> > > > to
> > > > actually have both big and LITTLE vcpus, you necessarily have
> > > > to
> > > > implement per-vcpu (rather than per-domain, as it is now)
> > > > cpupool
> > > > membership. I said myself it's not impossible, but certainly
> > > > it's
> > > > some
> > > > work... with the scheduler solution you basically get that for
> > > > free!
> > > > 
> > > > So, basically, if we use cpupools for the basics of big.LITTLE
> > > > support,
> > > > there's no way out of it (apart from going implementing
> > > > scheduling
> > > > support afterwords, but that looks backwards to me, especially
> > > > when
> > > > thinking at it with the code in mind).
> > > 
> > > The question is: what is the best short-term solution we can ask
> > > Peng
> > > to
> > > implement that allows Xen to run on big.LITTLE systems today?
> > > Possibly
> > > getting us closer to the long term solution, or at least not
> > > farther
> > > from it?
> > > 
> > So, I still have to look closely at the patches in these series.
> > But,
> > with Credit2 in mind, if one:
> > 
> > ??- take advantage of the knowledge of what arch a pcpu belongs
> > inside??
> 
> > 
> > ?? ??the code that arrange the pcpus in runqueues, which means
> > we'll end??
> > ?? ??up with big runqueues and LITTLE runqueues. I re-wrote that
> > code, I
> > ?? ??can provide pointers and help, if necessary;
> > ??- tweak the one or two instance of for_each_runqueue() [*] that
> > there
> > ?? ??are in the code into a for_each_runqueue_of_same_class(),
> > i.e.:
> 
> Do you have plan to add this support for big.LITTLE?
> 
> I admit that this is the first time I look into the scheduler part.
> If I understand wrongly, please correct me.
> 
No, I was not really planning to work on this directly myself... I was
only providing opinions and advice.

That of course may change, e.g., if we think that it is absolutely and
of capital importance for Xen to gain big.LITTLE support in matter of
days. :-)  That's a bit unlikely at this stage anyway, though, even
independently of who'll work on that, given where we stand in Xen 4.8
release process.

In any case, I'm happy to help, though, with any kind of advice --as
I'm already trying to do-- but also in a more concrete way, on actual
code... but I strongly think that it's better if you lead the effort,
e.g., by trying to do what we agree upon, and ask immediately, as soon
as you get stuck. :-)

> There is a runqueue for each physical cpu, and there are several
> vcpus in the runqueue.
> The scheduler will pick a vcpu in the runqueue to run on the physical
> cpu.
> 
If you start by "just" using pinning, as I envisioned for early
support, and that also George is suggesting as first step, there's
going to be nothing to do withing Xen and on scheduler's runqueue at
all.

And it won't actually even be wasted effort, because all the code for
parsing and implementing the interface in xl and libxl, will be
reusable for when we'll switch to ditch implicit pinning and integrate
the mechanism within the scheduler's logic.

> A vcpu is bind to a physical cpu when alloc_vcpu, but the vcpu can be
> scheduled
> or migrated to a different physical cpu.
> 
> Settings cpu soft affinity and hard affinity to restrict vcpus be
> scheduled
> on specific cpus. Then is there a need to introuduce more runqueues?
> 
No, it's all more dynamic and --allow me-- more elegant than this that
you describe... But I do understand the fact that you've never looked
at scheduling code, so it's ok to not have this clear. :-_

> This seems more complicated than cpupool (:
> 
Nah, it's not... It may be a comparable amount of effort, but for a
better end result! :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-20 20:17                                 ` Stefano Stabellini
  2016-09-21  8:38                                   ` Peng Fan
@ 2016-09-21 10:09                                   ` Julien Grall
  2016-09-21 10:22                                     ` George Dunlap
  2016-09-21 12:38                                     ` Peng Fan
  1 sibling, 2 replies; 85+ messages in thread
From: Julien Grall @ 2016-09-21 10:09 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jan Beulich, Peng Fan



On 20/09/16 21:17, Stefano Stabellini wrote:
> On Tue, 20 Sep 2016, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> On 20/09/2016 12:27, George Dunlap wrote:
>>>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com>
>>>>> wrote:
>>>>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>>>>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>> I'd like to add a computing capability in xen/arm, like this:
>>>>>>
>>>>>> struct compute_capatiliby
>>>>>> {
>>>>>>    char *core_name;
>>>>>>    uint32_t rank;
>>>>>>    uint32_t cpu_partnum;
>>>>>> };
>>>>>>
>>>>>> struct compute_capatiliby cc=
>>>>>> {
>>>>>>   {"A72", 4, 0xd08},
>>>>>>   {"A57", 3, 0xxxx},
>>>>>>   {"A53", 2, 0xd03},
>>>>>>   {"A35", 1, ...},
>>>>>> }
>>>>>>
>>>>>> Then when identify cpu, we decide which cpu is big and which cpu is
>>>>>> little
>>>>>> according to the computing rank.
>>>>>>
>>>>>> Any comments?
>>>>>
>>>>> I think we definitely need to have Xen have some kind of idea the
>>>>> order between processors, so that the user doesn't need to figure out
>>>>> which class / pool is big and which pool is LITTLE.  Whether this sort
>>>>> of enumeration is the best way to do that I'll let Julien and Stefano
>>>>> give their opinion.
>>>>
>>>> I don't think an hardcoded list of processor in Xen is the right solution.
>>>> There are many existing processors and combinations for big.LITTLE so it
>>>> will
>>>> nearly be impossible to keep updated.
>>>>
>>>> I would expect the firmware table (device tree, ACPI) to provide relevant
>>>> data
>>>> for each processor and differentiate big from LITTLE core.
>>>> Note that I haven't looked at it for now. A good place to start is looking
>>>> at
>>>> how Linux does.
>>>
>>> That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
>>> trivial to identify the two different CPU classes and which cores belong
>>> to which class.t, as
>>
>> The class of the CPU can be found from the MIDR, there is no need to use the
>> device tree/acpi for that. Note that I don't think there is an easy way in
>> ACPI (i.e not in AML) to find out the class.
>>
>>> It is harder to figure out which one is supposed to be
>>> big and which one LITTLE. Regardless, we could default to using the
>>> first cluster (usually big), which is also the cluster of the boot cpu,
>>> and utilize the second cluster only when the user demands it.
>>
>> Why do you think the boot CPU will usually be a big one? In the case of Juno
>> platform it is configurable, and the boot CPU is a little core on r2 by
>> default.
>>
>> In any case, what we care about is differentiate between two set of CPUs. I
>> don't think Xen should care about migrating a guest vCPU between big and
>> LITTLE cpus. So I am not sure why we would want to know that.
>
> No, it is not about migrating (at least yet). It is about giving useful
> information to the user. It would be nice if the user had to choose
> between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
> even "A7" or "A15".

I don't think it is wise to assume that we may have only 2 kind of CPUs 
on the platform. We may have more in the future, if so how would you 
name them?

IHMO, asking the user to specify the type of CPUs he wants would be the 
easiest way (though a bit difficult for the user) and avoid us to rely 
on non-upstreamed bindings.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21  8:38                                   ` Peng Fan
  2016-09-21  9:22                                     ` George Dunlap
@ 2016-09-21 10:15                                     ` Julien Grall
  2016-09-21 12:28                                       ` Peng Fan
  2016-09-22  9:45                                       ` Peng Fan
  1 sibling, 2 replies; 85+ messages in thread
From: Julien Grall @ 2016-09-21 10:15 UTC (permalink / raw)
  To: Peng Fan, Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jan Beulich

Hello Peng,

On 21/09/16 09:38, Peng Fan wrote:
> On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:
>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>> On 20/09/2016 12:27, George Dunlap wrote:
>>>>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com>
>>>>>> wrote:
>>>>>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>>>>>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>> It is harder to figure out which one is supposed to be
>>>> big and which one LITTLE. Regardless, we could default to using the
>>>> first cluster (usually big), which is also the cluster of the boot cpu,
>>>> and utilize the second cluster only when the user demands it.
>>>
>>> Why do you think the boot CPU will usually be a big one? In the case of Juno
>>> platform it is configurable, and the boot CPU is a little core on r2 by
>>> default.
>>>
>>> In any case, what we care about is differentiate between two set of CPUs. I
>>> don't think Xen should care about migrating a guest vCPU between big and
>>> LITTLE cpus. So I am not sure why we would want to know that.
>>
>> No, it is not about migrating (at least yet). It is about giving useful
>> information to the user. It would be nice if the user had to choose
>> between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>> even "A7" or "A15".
>
> As Dario mentioned in previous email,
> for dom0 provide like this:
>
> dom0_vcpus_big = 4
> dom0_vcpus_little = 2
>
> to dom0.
>
> If these two no provided, we could let dom0 runs on big pcpus or big.little.
> Anyway this is not the important point for dom0 only big or big.little.
>
> For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
> Such as:
>
> vcpus.big = 2
> vcpus.litle = 4
>
>
> According to George's comments,
> Then, I think we could use affinity to restrict little vcpus be scheduled on little vcpus,
> and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, use hard
> affinity is to handle this.
>
> We may need to provide some interface to let xl can get the information such as
> big.little or smp. if it is big.little, which is big and which is little.
>
> For how to differentiate cpus, I am looking the linaro eas cpu topology code,
> The code has not been upstreamed (:, but merged into google android kernel.
> I only plan to take some necessary code, such as device tree parse and
> cpu topology build, because we only need to know the computing capacity of each pcpu.
>
> Some doc about eas piece, including dts node examples:
> https://git.linaro.org/arm/eas/kernel.git/blob/refs/heads/lsk-v4.4-eas-v5.2:/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt

I am reluctant to take any non-upstreamed bindings in Xen. There is a 
similar series going on the lklm [1].

But it sounds like it is a lot of works for little benefits (i.e giving 
a better name to the set of CPUs). The naming will also not fit if in 
the future hardware will have more than 2 kind of CPUs.

[...]

> I am not sure, but we may also need to handle mpidr for ARM, because big and little vcpus are supported.

I am not sure to understand what you mean here.

Regards,

[1] https://lwn.net/Articles/699569/

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 10:09                                   ` Julien Grall
@ 2016-09-21 10:22                                     ` George Dunlap
  2016-09-21 13:06                                       ` Julien Grall
  2016-09-21 12:38                                     ` Peng Fan
  1 sibling, 1 reply; 85+ messages in thread
From: George Dunlap @ 2016-09-21 10:22 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jan Beulich, Peng Fan

On 21/09/16 11:09, Julien Grall wrote:
> 
> 
> On 20/09/16 21:17, Stefano Stabellini wrote:
>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>> Hi Stefano,
>>>
>>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> On 20/09/2016 12:27, George Dunlap wrote:
>>>>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com>
>>>>>> wrote:
>>>>>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>>>>>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>>> I'd like to add a computing capability in xen/arm, like this:
>>>>>>>
>>>>>>> struct compute_capatiliby
>>>>>>> {
>>>>>>>    char *core_name;
>>>>>>>    uint32_t rank;
>>>>>>>    uint32_t cpu_partnum;
>>>>>>> };
>>>>>>>
>>>>>>> struct compute_capatiliby cc=
>>>>>>> {
>>>>>>>   {"A72", 4, 0xd08},
>>>>>>>   {"A57", 3, 0xxxx},
>>>>>>>   {"A53", 2, 0xd03},
>>>>>>>   {"A35", 1, ...},
>>>>>>> }
>>>>>>>
>>>>>>> Then when identify cpu, we decide which cpu is big and which cpu is
>>>>>>> little
>>>>>>> according to the computing rank.
>>>>>>>
>>>>>>> Any comments?
>>>>>>
>>>>>> I think we definitely need to have Xen have some kind of idea the
>>>>>> order between processors, so that the user doesn't need to figure out
>>>>>> which class / pool is big and which pool is LITTLE.  Whether this
>>>>>> sort
>>>>>> of enumeration is the best way to do that I'll let Julien and Stefano
>>>>>> give their opinion.
>>>>>
>>>>> I don't think an hardcoded list of processor in Xen is the right
>>>>> solution.
>>>>> There are many existing processors and combinations for big.LITTLE
>>>>> so it
>>>>> will
>>>>> nearly be impossible to keep updated.
>>>>>
>>>>> I would expect the firmware table (device tree, ACPI) to provide
>>>>> relevant
>>>>> data
>>>>> for each processor and differentiate big from LITTLE core.
>>>>> Note that I haven't looked at it for now. A good place to start is
>>>>> looking
>>>>> at
>>>>> how Linux does.
>>>>
>>>> That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
>>>> trivial to identify the two different CPU classes and which cores
>>>> belong
>>>> to which class.t, as
>>>
>>> The class of the CPU can be found from the MIDR, there is no need to
>>> use the
>>> device tree/acpi for that. Note that I don't think there is an easy
>>> way in
>>> ACPI (i.e not in AML) to find out the class.
>>>
>>>> It is harder to figure out which one is supposed to be
>>>> big and which one LITTLE. Regardless, we could default to using the
>>>> first cluster (usually big), which is also the cluster of the boot cpu,
>>>> and utilize the second cluster only when the user demands it.
>>>
>>> Why do you think the boot CPU will usually be a big one? In the case
>>> of Juno
>>> platform it is configurable, and the boot CPU is a little core on r2 by
>>> default.
>>>
>>> In any case, what we care about is differentiate between two set of
>>> CPUs. I
>>> don't think Xen should care about migrating a guest vCPU between big and
>>> LITTLE cpus. So I am not sure why we would want to know that.
>>
>> No, it is not about migrating (at least yet). It is about giving useful
>> information to the user. It would be nice if the user had to choose
>> between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>> even "A7" or "A15".
> 
> I don't think it is wise to assume that we may have only 2 kind of CPUs
> on the platform. We may have more in the future, if so how would you
> name them?

I would suggest that internally Xen recognize an arbitrary number of
processor "classes", and order them according to more powerful -> less
powerful.  Then if at some point someone makes a platform with three
processors, you can say "class 0", "class 1" or "class 2".  "big" would
be an alias for "class 0" and "little" would be an alias for "class 1".

And in my suggestion, we allow a richer set of labels, so that the user
could also be more specific -- e.g., asking for "A15" specifically, for
example, and failing to build if there are no A15 cores present, while
allowing users to simply write "big" or "little" if they want simplicity
/ things which work across different platforms.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 10:15                                     ` Julien Grall
@ 2016-09-21 12:28                                       ` Peng Fan
  2016-09-21 15:06                                         ` Dario Faggioli
  2016-09-22  9:45                                       ` Peng Fan
  1 sibling, 1 reply; 85+ messages in thread
From: Peng Fan @ 2016-09-21 12:28 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Dario Faggioli, xen-devel, Jan Beulich

On Wed, Sep 21, 2016 at 11:15:35AM +0100, Julien Grall wrote:
>Hello Peng,
>
>On 21/09/16 09:38, Peng Fan wrote:
>>On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:
>>>On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>>>On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>>On 20/09/2016 12:27, George Dunlap wrote:
>>>>>>>On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com>
>>>>>>>wrote:
>>>>>>>>On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>>>>>>>>On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>>>>On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>It is harder to figure out which one is supposed to be
>>>>>big and which one LITTLE. Regardless, we could default to using the
>>>>>first cluster (usually big), which is also the cluster of the boot cpu,
>>>>>and utilize the second cluster only when the user demands it.
>>>>
>>>>Why do you think the boot CPU will usually be a big one? In the case of Juno
>>>>platform it is configurable, and the boot CPU is a little core on r2 by
>>>>default.
>>>>
>>>>In any case, what we care about is differentiate between two set of CPUs. I
>>>>don't think Xen should care about migrating a guest vCPU between big and
>>>>LITTLE cpus. So I am not sure why we would want to know that.
>>>
>>>No, it is not about migrating (at least yet). It is about giving useful
>>>information to the user. It would be nice if the user had to choose
>>>between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>>even "A7" or "A15".
>>
>>As Dario mentioned in previous email,
>>for dom0 provide like this:
>>
>>dom0_vcpus_big = 4
>>dom0_vcpus_little = 2
>>
>>to dom0.
>>
>>If these two no provided, we could let dom0 runs on big pcpus or big.little.
>>Anyway this is not the important point for dom0 only big or big.little.
>>
>>For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
>>Such as:
>>
>>vcpus.big = 2
>>vcpus.litle = 4
>>
>>
>>According to George's comments,
>>Then, I think we could use affinity to restrict little vcpus be scheduled on little vcpus,
>>and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, use hard
>>affinity is to handle this.
>>
>>We may need to provide some interface to let xl can get the information such as
>>big.little or smp. if it is big.little, which is big and which is little.
>>
>>For how to differentiate cpus, I am looking the linaro eas cpu topology code,
>>The code has not been upstreamed (:, but merged into google android kernel.
>>I only plan to take some necessary code, such as device tree parse and
>>cpu topology build, because we only need to know the computing capacity of each pcpu.
>>
>>Some doc about eas piece, including dts node examples:
>>https://git.linaro.org/arm/eas/kernel.git/blob/refs/heads/lsk-v4.4-eas-v5.2:/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt
>
>I am reluctant to take any non-upstreamed bindings in Xen. There is a similar

Yeah. I understand :)

>series going on the lklm [1].

I'll have a look at this series, seems simpler than the code in linaro tree.

Whether the EAS cpu topology code or the series you listed, this is to let us
differentiate the cpu classes. This is not the hard point, just what
information to get from dts.

We need to reach a point that how to arrange the different cpu classes, I think.

Think we get dmips/cap from dts for each cpu, put the info into cpu_data for each cpu?

>
>But it sounds like it is a lot of works for little benefits (i.e giving a
>better name to the set of CPUs). The naming will also not fit if in the
>future hardware will have more than 2 kind of CPUs.

Oh. Yeah. There is possibility that an soc contains such as A35 + A53 + A72..
Then xx.big and xx.little seems not enough.

On such SoC, we still need to support big.little guest? We may not call it
big.little guest, if guest also needs A35 + A53 + A72 vcpu..

Use this in xl cfg file?
vcpuclass=["0-1:A35","2-5:A53", "6-7:A72"] ?

I am not sure. If there are more kinds of CPUs, how to handle guest vcpus,
as we discussed in this thread, we tend to support different classes of vcpu
for guest. But if there are many kinds of physical CPUs, we also need to let
guest have so many kinds of virtual cpus?

Anyway the first step for me is to differentiate the physical cpus and
add the info to cpu_data or else.

>
>[...]
>
>>I am not sure, but we may also need to handle mpidr for ARM, because big and little vcpus are supported.
>
>I am not sure to understand what you mean here.

For big.little guest, which vcpu is in cluster 0 , which is in cluster 1, also need
to fill related value for MPIDR for guest.

Regards,
Peng.
>
>Regards,
>
>[1] https://lwn.net/Articles/699569/
>
>-- 
>Julien Grall

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21  9:22                                     ` George Dunlap
@ 2016-09-21 12:35                                       ` Peng Fan
  2016-09-21 15:00                                       ` Dario Faggioli
  1 sibling, 0 replies; 85+ messages in thread
From: Peng Fan @ 2016-09-21 12:35 UTC (permalink / raw)
  To: George Dunlap
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Dario Faggioli, xen-devel, Julien Grall,
	Jan Beulich

On Wed, Sep 21, 2016 at 10:22:14AM +0100, George Dunlap wrote:
>On 21/09/16 09:38, Peng Fan wrote:
>> On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:
>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>> Hi Stefano,
>>>>
>>>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 20/09/2016 12:27, George Dunlap wrote:
>>>>>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com>
>>>>>>> wrote:
>>>>>>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>>>>>>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>>>> I'd like to add a computing capability in xen/arm, like this:
>>>>>>>>
>>>>>>>> struct compute_capatiliby
>>>>>>>> {
>>>>>>>>    char *core_name;
>>>>>>>>    uint32_t rank;
>>>>>>>>    uint32_t cpu_partnum;
>>>>>>>> };
>>>>>>>>
>>>>>>>> struct compute_capatiliby cc=
>>>>>>>> {
>>>>>>>>   {"A72", 4, 0xd08},
>>>>>>>>   {"A57", 3, 0xxxx},
>>>>>>>>   {"A53", 2, 0xd03},
>>>>>>>>   {"A35", 1, ...},
>>>>>>>> }
>>>>>>>>
>>>>>>>> Then when identify cpu, we decide which cpu is big and which cpu is
>>>>>>>> little
>>>>>>>> according to the computing rank.
>>>>>>>>
>>>>>>>> Any comments?
>>>>>>>
>>>>>>> I think we definitely need to have Xen have some kind of idea the
>>>>>>> order between processors, so that the user doesn't need to figure out
>>>>>>> which class / pool is big and which pool is LITTLE.  Whether this sort
>>>>>>> of enumeration is the best way to do that I'll let Julien and Stefano
>>>>>>> give their opinion.
>>>>>>
>>>>>> I don't think an hardcoded list of processor in Xen is the right solution.
>>>>>> There are many existing processors and combinations for big.LITTLE so it
>>>>>> will
>>>>>> nearly be impossible to keep updated.
>>>>>>
>>>>>> I would expect the firmware table (device tree, ACPI) to provide relevant
>>>>>> data
>>>>>> for each processor and differentiate big from LITTLE core.
>>>>>> Note that I haven't looked at it for now. A good place to start is looking
>>>>>> at
>>>>>> how Linux does.
>>>>>
>>>>> That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
>>>>> trivial to identify the two different CPU classes and which cores belong
>>>>> to which class.t, as
>>>>
>>>> The class of the CPU can be found from the MIDR, there is no need to use the
>>>> device tree/acpi for that. Note that I don't think there is an easy way in
>>>> ACPI (i.e not in AML) to find out the class.
>>>>
>>>>> It is harder to figure out which one is supposed to be
>>>>> big and which one LITTLE. Regardless, we could default to using the
>>>>> first cluster (usually big), which is also the cluster of the boot cpu,
>>>>> and utilize the second cluster only when the user demands it.
>>>>
>>>> Why do you think the boot CPU will usually be a big one? In the case of Juno
>>>> platform it is configurable, and the boot CPU is a little core on r2 by
>>>> default.
>>>>
>>>> In any case, what we care about is differentiate between two set of CPUs. I
>>>> don't think Xen should care about migrating a guest vCPU between big and
>>>> LITTLE cpus. So I am not sure why we would want to know that.
>>>
>>> No, it is not about migrating (at least yet). It is about giving useful
>>> information to the user. It would be nice if the user had to choose
>>> between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>> even "A7" or "A15".
>> 
>> As Dario mentioned in previous email,
>> for dom0 provide like this:
>> 
>> dom0_vcpus_big = 4
>> dom0_vcpus_little = 2
>> 
>> to dom0.
>> 
>> If these two no provided, we could let dom0 runs on big pcpus or big.little.
>> Anyway this is not the important point for dom0 only big or big.little.
>> 
>> For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
>> Such as:
>> 
>> vcpus.big = 2
>> vcpus.litle = 4
>
>FWIW, from a UI perspective, it would be nice if we designed the
>interface such that it *can* be used simply (i.e., just "big" or
>"little"), but can also be used more flexibly; for instance, specifying
>"A15" or "A7" instead.
>
>So maybe have a 'classifier' string; this could start by having just
>"big" and "little", but could then be extended to allow fuller ways of
>specifying specific kinds of cores.
>
>To keep the illusion of python syntax, what about something like this:
>
>vcpuclass=["big=2","little=4"]
>
>Or would it be better to have a mapping of vcpu to class?
>
>vcpuclass=["0-1:big","2-5:little"]


Both are good -:)

>
>
>> According to George's comments,
>> Then, I think we could use affinity to restrict little vcpus be scheduled on little vcpus,
>> and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, use hard
>> affinity is to handle this.
>> 
>> We may need to provide some interface to let xl can get the information such as
>> big.little or smp. if it is big.little, which is big and which is little.
>
>If it's possible for Xen to order the cpus by class, then there could be
>a hypercall listing the different classes starting with the largest
>class.  On typical big.LITTLE systems, class 0 would be "big" and class
>1 would be "little".

From Class 0 to Class X, the computing capacity or dmips is decreasing -:)

Regards,
Peng.
>
>
>> User may change the hard affinity of a vcpu, so we also need to block a little
>> vcpu be scheduled to a big physical cpu. Add some checking code in xen,
>> when chaning the hard affnity, check whether the cap of a vcpu is compatible
>> with the cap of the physical cpus.
>
>Dario, what do we do with vNUMA / soft affinity?
>
> -George

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 10:09                                   ` Julien Grall
  2016-09-21 10:22                                     ` George Dunlap
@ 2016-09-21 12:38                                     ` Peng Fan
  1 sibling, 0 replies; 85+ messages in thread
From: Peng Fan @ 2016-09-21 12:38 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Dario Faggioli, xen-devel, Jan Beulich

On Wed, Sep 21, 2016 at 11:09:11AM +0100, Julien Grall wrote:
>
>
>On 20/09/16 21:17, Stefano Stabellini wrote:
>>On Tue, 20 Sep 2016, Julien Grall wrote:
>>>Hi Stefano,
>>>
>>>On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>>On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>Hi,
>>>>>
>>>>>On 20/09/2016 12:27, George Dunlap wrote:
>>>>>>On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com>
>>>>>>wrote:
>>>>>>>On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>>>>>>>On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>>>On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>>>I'd like to add a computing capability in xen/arm, like this:
>>>>>>>
>>>>>>>struct compute_capatiliby
>>>>>>>{
>>>>>>>   char *core_name;
>>>>>>>   uint32_t rank;
>>>>>>>   uint32_t cpu_partnum;
>>>>>>>};
>>>>>>>
>>>>>>>struct compute_capatiliby cc=
>>>>>>>{
>>>>>>>  {"A72", 4, 0xd08},
>>>>>>>  {"A57", 3, 0xxxx},
>>>>>>>  {"A53", 2, 0xd03},
>>>>>>>  {"A35", 1, ...},
>>>>>>>}
>>>>>>>
>>>>>>>Then when identify cpu, we decide which cpu is big and which cpu is
>>>>>>>little
>>>>>>>according to the computing rank.
>>>>>>>
>>>>>>>Any comments?
>>>>>>
>>>>>>I think we definitely need to have Xen have some kind of idea the
>>>>>>order between processors, so that the user doesn't need to figure out
>>>>>>which class / pool is big and which pool is LITTLE.  Whether this sort
>>>>>>of enumeration is the best way to do that I'll let Julien and Stefano
>>>>>>give their opinion.
>>>>>
>>>>>I don't think an hardcoded list of processor in Xen is the right solution.
>>>>>There are many existing processors and combinations for big.LITTLE so it
>>>>>will
>>>>>nearly be impossible to keep updated.
>>>>>
>>>>>I would expect the firmware table (device tree, ACPI) to provide relevant
>>>>>data
>>>>>for each processor and differentiate big from LITTLE core.
>>>>>Note that I haven't looked at it for now. A good place to start is looking
>>>>>at
>>>>>how Linux does.
>>>>
>>>>That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
>>>>trivial to identify the two different CPU classes and which cores belong
>>>>to which class.t, as
>>>
>>>The class of the CPU can be found from the MIDR, there is no need to use the
>>>device tree/acpi for that. Note that I don't think there is an easy way in
>>>ACPI (i.e not in AML) to find out the class.
>>>
>>>>It is harder to figure out which one is supposed to be
>>>>big and which one LITTLE. Regardless, we could default to using the
>>>>first cluster (usually big), which is also the cluster of the boot cpu,
>>>>and utilize the second cluster only when the user demands it.
>>>
>>>Why do you think the boot CPU will usually be a big one? In the case of Juno
>>>platform it is configurable, and the boot CPU is a little core on r2 by
>>>default.
>>>
>>>In any case, what we care about is differentiate between two set of CPUs. I
>>>don't think Xen should care about migrating a guest vCPU between big and
>>>LITTLE cpus. So I am not sure why we would want to know that.
>>
>>No, it is not about migrating (at least yet). It is about giving useful
>>information to the user. It would be nice if the user had to choose
>>between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>even "A7" or "A15".
>
>I don't think it is wise to assume that we may have only 2 kind of CPUs on
>the platform. We may have more in the future, if so how would you name them?

Consider more than 2 kinds of physical cpus,
"vcpuclass=["0-1:A35","2-5:A53", "6-7:A72"]" seems easier to be handled

Regards,
Peng.

>
>IHMO, asking the user to specify the type of CPUs he wants would be the
>easiest way (though a bit difficult for the user) and avoid us to rely on
>non-upstreamed bindings.
>
>Regards,
>
>-- 
>Julien Grall

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 10:22                                     ` George Dunlap
@ 2016-09-21 13:06                                       ` Julien Grall
  2016-09-21 15:45                                         ` Dario Faggioli
  2016-09-21 18:13                                         ` Stefano Stabellini
  0 siblings, 2 replies; 85+ messages in thread
From: Julien Grall @ 2016-09-21 13:06 UTC (permalink / raw)
  To: George Dunlap, Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, Steve Capper, George Dunlap,
	Andrew Cooper, Dario Faggioli, Punit Agrawal, xen-devel,
	Jan Beulich, Peng Fan

(CC a couple of ARM folks)

On 21/09/16 11:22, George Dunlap wrote:
> On 21/09/16 11:09, Julien Grall wrote:
>>
>>
>> On 20/09/16 21:17, Stefano Stabellini wrote:
>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>> Hi Stefano,
>>>>
>>>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 20/09/2016 12:27, George Dunlap wrote:
>>>>>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com>
>>>>>>> wrote:
>>>>>>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>>>>>>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>>>> I'd like to add a computing capability in xen/arm, like this:
>>>>>>>>
>>>>>>>> struct compute_capatiliby
>>>>>>>> {
>>>>>>>>    char *core_name;
>>>>>>>>    uint32_t rank;
>>>>>>>>    uint32_t cpu_partnum;
>>>>>>>> };
>>>>>>>>
>>>>>>>> struct compute_capatiliby cc=
>>>>>>>> {
>>>>>>>>   {"A72", 4, 0xd08},
>>>>>>>>   {"A57", 3, 0xxxx},
>>>>>>>>   {"A53", 2, 0xd03},
>>>>>>>>   {"A35", 1, ...},
>>>>>>>> }
>>>>>>>>
>>>>>>>> Then when identify cpu, we decide which cpu is big and which cpu is
>>>>>>>> little
>>>>>>>> according to the computing rank.
>>>>>>>>
>>>>>>>> Any comments?
>>>>>>>
>>>>>>> I think we definitely need to have Xen have some kind of idea the
>>>>>>> order between processors, so that the user doesn't need to figure out
>>>>>>> which class / pool is big and which pool is LITTLE.  Whether this
>>>>>>> sort
>>>>>>> of enumeration is the best way to do that I'll let Julien and Stefano
>>>>>>> give their opinion.
>>>>>>
>>>>>> I don't think an hardcoded list of processor in Xen is the right
>>>>>> solution.
>>>>>> There are many existing processors and combinations for big.LITTLE
>>>>>> so it
>>>>>> will
>>>>>> nearly be impossible to keep updated.
>>>>>>
>>>>>> I would expect the firmware table (device tree, ACPI) to provide
>>>>>> relevant
>>>>>> data
>>>>>> for each processor and differentiate big from LITTLE core.
>>>>>> Note that I haven't looked at it for now. A good place to start is
>>>>>> looking
>>>>>> at
>>>>>> how Linux does.
>>>>>
>>>>> That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
>>>>> trivial to identify the two different CPU classes and which cores
>>>>> belong
>>>>> to which class.t, as
>>>>
>>>> The class of the CPU can be found from the MIDR, there is no need to
>>>> use the
>>>> device tree/acpi for that. Note that I don't think there is an easy
>>>> way in
>>>> ACPI (i.e not in AML) to find out the class.
>>>>
>>>>> It is harder to figure out which one is supposed to be
>>>>> big and which one LITTLE. Regardless, we could default to using the
>>>>> first cluster (usually big), which is also the cluster of the boot cpu,
>>>>> and utilize the second cluster only when the user demands it.
>>>>
>>>> Why do you think the boot CPU will usually be a big one? In the case
>>>> of Juno
>>>> platform it is configurable, and the boot CPU is a little core on r2 by
>>>> default.
>>>>
>>>> In any case, what we care about is differentiate between two set of
>>>> CPUs. I
>>>> don't think Xen should care about migrating a guest vCPU between big and
>>>> LITTLE cpus. So I am not sure why we would want to know that.
>>>
>>> No, it is not about migrating (at least yet). It is about giving useful
>>> information to the user. It would be nice if the user had to choose
>>> between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>> even "A7" or "A15".
>>
>> I don't think it is wise to assume that we may have only 2 kind of CPUs
>> on the platform. We may have more in the future, if so how would you
>> name them?
>
> I would suggest that internally Xen recognize an arbitrary number of
> processor "classes", and order them according to more powerful -> less
> powerful.  Then if at some point someone makes a platform with three
> processors, you can say "class 0", "class 1" or "class 2".  "big" would
> be an alias for "class 0" and "little" would be an alias for "class 1".

As mentioned earlier, there is no upstreamed yet device tree bindings to 
know the "power" of a CPU (see [1]

>
> And in my suggestion, we allow a richer set of labels, so that the user
> could also be more specific -- e.g., asking for "A15" specifically, for
> example, and failing to build if there are no A15 cores present, while
> allowing users to simply write "big" or "little" if they want simplicity
> / things which work across different platforms.

Well, before trying to do something clever like that (i.e naming "big" 
and "little"), we need to have upstreamed bindings available to 
acknowledge the difference. AFAICT, it is not yet upstreamed for Device 
Tree (see [1]) and I don't know any static ACPI tables providing the 
similar information.

I had few discussions and  more thought about big.LITTLE support in Xen. 
The main goal of big.LITTLE is power efficiency by moving task around 
and been able to idle one cluster. All the solutions suggested 
(including mine) so far, can be replicated by hand (except the VPIDR) so 
they are mostly an automatic way. This will also remove the real 
benefits of big.LITTLE because Xen will not be able to migrate vCPU 
across cluster for power efficiency.

If we care about power efficiency, we would have to handle seamlessly 
big.LITTLE in Xen (i.e a guess would only see a kind of CPU). This arise 
quite few problem, nothing insurmountable, similar to migration across 
two platforms with different micro-architecture (e.g processors): 
errata, features supported... The guest would have to know the union of 
all the errata (this is done so far via the MIDR, so we would a PV way 
to do it), and only the intersection of features would be exposed to the 
guest. This also means the scheduler would have to be modified to handle 
power efficiency (not strictly necessary at the beginning).

I agree that a such solution would require some work to implement, 
although Xen will have a better control of the energy consumption of the 
platform.

So the question here, is what do we want to achieve with big.LITTLE?

Regards,

[1] https://lwn.net/Articles/699569/

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21  9:22                                     ` George Dunlap
  2016-09-21 12:35                                       ` Peng Fan
@ 2016-09-21 15:00                                       ` Dario Faggioli
  1 sibling, 0 replies; 85+ messages in thread
From: Dario Faggioli @ 2016-09-21 15:00 UTC (permalink / raw)
  To: George Dunlap, Peng Fan, Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, George Dunlap, Andrew Cooper, xen-devel,
	Julien Grall, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1925 bytes --]

On Wed, 2016-09-21 at 10:22 +0100, George Dunlap wrote:
> On 21/09/16 09:38, Peng Fan wrote:
> > User may change the hard affinity of a vcpu, so we also need to
> > block a little
> > vcpu be scheduled to a big physical cpu. Add some checking code in
> > xen,
> > when chaning the hard affnity, check whether the cap of a vcpu is
> > compatible
> > with the cap of the physical cpus.
> 
Yes, restricting affinity changes would indeed will be necessary. Note
that this is not a limit of the 'pinning based' implementation. Even
if/when we'll have in-scheduler support, trying to pin a LITTLE vcpu to
a big pcpu should, AFAIUI, will have to fail.

I was thinking to some parameter, that we can set from xl (applicable also on non-big.LITTLE or non-heterogeneous configurations), for askting Xen to make the hard-affinity 'immutable'. That would be rather simple to do.

But I like Peng's idea of validating hard-affinity against the class even better! :-)

> Dario, what do we do with vNUMA / soft affinity?
> 
We do nothing actually. I mean, for now, we just accept whatever the
user asks, which might well be setting soft, or even hard, affinity of
all the domain to a set of nodes when the domain itself does not have
any memory.

This is actually another case where either immutability or restriction
of the changes that we allow to the (soft, in this case) affinity would
be useful. I'd always liked to introduce the logic that at least would
print a warning is something that looks really bad is being done, but
never got down to actually do that.

Maybe this will be the chance to improve wrt to this too! :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 12:28                                       ` Peng Fan
@ 2016-09-21 15:06                                         ` Dario Faggioli
  0 siblings, 0 replies; 85+ messages in thread
From: Dario Faggioli @ 2016-09-21 15:06 UTC (permalink / raw)
  To: Peng Fan, Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, xen-devel, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1254 bytes --]

On Wed, 2016-09-21 at 20:28 +0800, Peng Fan wrote:
> Use this in xl cfg file?
> vcpuclass=["0-1:A35","2-5:A53", "6-7:A72"] ?
> 
> I am not sure. If there are more kinds of CPUs, how to handle guest
> vcpus,
> as we discussed in this thread, we tend to support different classes
> of vcpu
> for guest. But if there are many kinds of physical CPUs, we also need
> to let
> guest have so many kinds of virtual cpus?
> 
We don't _need_ to necessarily do that, or not right now.

**However**, this is the main point of spending time designing things
and/or having the kind of conversation we're having here: i.e., if the
design, and the resulting implementation, is generic enough, we may get
that for free, which would be great.

This seems to me to be the case, if we go for George's "vcpuclass=[]"
suggesion, and, even better, it doesn't look like it would make the
code much more difficult to write or complex (wrt to just allowing
"vcpus_big" and "vcpus_little").

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 13:06                                       ` Julien Grall
@ 2016-09-21 15:45                                         ` Dario Faggioli
  2016-09-21 19:28                                           ` Julien Grall
  2016-09-21 18:13                                         ` Stefano Stabellini
  1 sibling, 1 reply; 85+ messages in thread
From: Dario Faggioli @ 2016-09-21 15:45 UTC (permalink / raw)
  To: Julien Grall, George Dunlap, Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, Steve Capper, George Dunlap,
	Andrew Cooper, Punit Agrawal, xen-devel, Jan Beulich, Peng Fan


[-- Attachment #1.1: Type: text/plain, Size: 5096 bytes --]

On Wed, 2016-09-21 at 14:06 +0100, Julien Grall wrote:
> (CC a couple of ARM folks)
> 
Yay, thanks for this! :-)

> I had few discussions and  more thought about big.LITTLE support in
> Xen. 
> The main goal of big.LITTLE is power efficiency by moving task
> around 
> and been able to idle one cluster. All the solutions suggested 
> (including mine) so far, can be replicated by hand (except the VPIDR)
> so 
> they are mostly an automatic way. 
>
I'm sorry, how is this (going to be) handled in Linux? Is it that any
arbitrary task executing any arbitrary binary code can be run on both
big and LITTLE pcpus, depending on the scheduler's and energy
management's decisions?

This does not seem to match with what has been said at some point in
this thread... And if it's like that, how's that possible, if the
pcpus' ISAs are (even only slightly) different?

> This will also remove the real 
> benefits of big.LITTLE because Xen will not be able to migrate vCPU 
> across cluster for power efficiency.
> 
> If we care about power efficiency, we would have to handle
> seamlessly 
> big.LITTLE in Xen (i.e a guess would only see a kind of CPU). 
>
Well, I'm a big fan of an approach that leaves the guests' scheduler
dumb about things like these (i.e., load balancing, energy efficiency,
etc), and hence puts Xen in charge. In fact, on a Xen system, it is
only Xen that has all the info necessary to make wise decisions (e.g.,
the load of the _whole_ host, the effect of any decisions on the
_whole_ host, etc).

But this case may be a LITTLE.bit ( :-PP ) different.

Anyway, I guess I'll way your reply to my question above before
commenting more.

> This arise 
> quite few problem, nothing insurmountable, similar to migration
> across 
> two platforms with different micro-architecture (e.g processors): 
> errata, features supported... The guest would have to know the union
> of 
> all the errata (this is done so far via the MIDR, so we would a PV
> way 
> to do it), and only the intersection of features would be exposed to
> the 
> guest. This also means the scheduler would have to be modified to
> handle 
> power efficiency (not strictly necessary at the beginning).
> 
> I agree that a such solution would require some work to implement, 
> although Xen will have a better control of the energy consumption of
> the 
> platform.
> 
> So the question here, is what do we want to achieve with big.LITTLE?
> 
Just thinking out loud here. So, instead of "just", as George
suggested:

 vcpuclass=["0-1:A35","2-5:A53", "6-7:A72"]

we can allow something like the following (note that I'm tossing out
random numbers next to the 'A's):

 vcpuclass = ["0-1:A35", "2-5:A53,A17", "6-7:A72,A24,A31", "12-13:A8"]

with the following meaning:
 - vcpus 0, 1 can only run on pcpus of class A35
 - vcpus 2,3,4,5 can run on pcpus of class A53 _and_ on pcpus of class 
   A17
 - vcpus 6,7 can run on pcpus of class A72, A24, A31
 - vcpus 8,9,10,11 --since they're not mentioned, can run on pcpus of 
   any class
 - vcpus 12,13 can only run on pcpus of class A8

This will set the "boundaries", for each vcpu. Then, within these
boundaries, once in the (Xen's) scheduler, we can implement whatever
complex/magic/silly logic we want, e.g.:
 - only use a pcpu of class A53 for vcpus that have an average load 
   above 50%
 - only use a pcpu of class A31 if there are no idle pcpus of class A24
 - only use a pcpu of class A17 for a vcpu if the total system load 
   divided by the vcpu ID give 42 as result
 - whatever

This allows us to achieve both the following goals:
 - allow Xen to take smart decisions, considering the load and the 
   efficiency of the host as a whole
 - allow the guest to take smart decisions, like running lightweight 
   tasks on low power vcpus (which then Xen will run on low 
   power pcpus, at least on a properly configured system)

Of course this **requires** that, for instance, vcpu 6 must be able to
run on A72, A24 and A31 just fine, i.e., it must be possible for it to
block on I/O when executing on an A72 pcpu, and, later, after wakeup,
restart executing on an A24 pcpu.

If that is not possible, and doing such vcpu movement, instead than
just calling schedule.c:vcpu_migrate() (or equivalent), requires some
more complex fiddling, involving local migration --or alike--
techniques, then I honestly don't think this is something that can be
solved at the scheduler level anyway... :-O

> [1] https://lwn.net/Articles/699569/
> 
I tried to have a quick look, but I don't have the time right now, and
firthermore, it's all about ARM, and I still speak too few ARM for
properly understanding what's going on... :-(

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 13:06                                       ` Julien Grall
  2016-09-21 15:45                                         ` Dario Faggioli
@ 2016-09-21 18:13                                         ` Stefano Stabellini
  2016-09-21 19:11                                           ` Julien Grall
  1 sibling, 1 reply; 85+ messages in thread
From: Stefano Stabellini @ 2016-09-21 18:13 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Dario Faggioli, Punit Agrawal,
	George Dunlap, xen-devel, Jan Beulich, Peng Fan

On Wed, 21 Sep 2016, Julien Grall wrote:
> (CC a couple of ARM folks)
> 
> On 21/09/16 11:22, George Dunlap wrote:
> > On 21/09/16 11:09, Julien Grall wrote:
> > > 
> > > 
> > > On 20/09/16 21:17, Stefano Stabellini wrote:
> > > > On Tue, 20 Sep 2016, Julien Grall wrote:
> > > > > Hi Stefano,
> > > > > 
> > > > > On 20/09/2016 20:09, Stefano Stabellini wrote:
> > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On 20/09/2016 12:27, George Dunlap wrote:
> > > > > > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
> > > > > > > > <van.freenix@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli
> > > > > > > > > wrote:
> > > > > > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> > > > > > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > > > > > > > > I'd like to add a computing capability in xen/arm, like this:
> > > > > > > > > 
> > > > > > > > > struct compute_capatiliby
> > > > > > > > > {
> > > > > > > > >    char *core_name;
> > > > > > > > >    uint32_t rank;
> > > > > > > > >    uint32_t cpu_partnum;
> > > > > > > > > };
> > > > > > > > > 
> > > > > > > > > struct compute_capatiliby cc=
> > > > > > > > > {
> > > > > > > > >   {"A72", 4, 0xd08},
> > > > > > > > >   {"A57", 3, 0xxxx},
> > > > > > > > >   {"A53", 2, 0xd03},
> > > > > > > > >   {"A35", 1, ...},
> > > > > > > > > }
> > > > > > > > > 
> > > > > > > > > Then when identify cpu, we decide which cpu is big and which
> > > > > > > > > cpu is
> > > > > > > > > little
> > > > > > > > > according to the computing rank.
> > > > > > > > > 
> > > > > > > > > Any comments?
> > > > > > > > 
> > > > > > > > I think we definitely need to have Xen have some kind of idea
> > > > > > > > the
> > > > > > > > order between processors, so that the user doesn't need to
> > > > > > > > figure out
> > > > > > > > which class / pool is big and which pool is LITTLE.  Whether
> > > > > > > > this
> > > > > > > > sort
> > > > > > > > of enumeration is the best way to do that I'll let Julien and
> > > > > > > > Stefano
> > > > > > > > give their opinion.
> > > > > > > 
> > > > > > > I don't think an hardcoded list of processor in Xen is the right
> > > > > > > solution.
> > > > > > > There are many existing processors and combinations for big.LITTLE
> > > > > > > so it
> > > > > > > will
> > > > > > > nearly be impossible to keep updated.
> > > > > > > 
> > > > > > > I would expect the firmware table (device tree, ACPI) to provide
> > > > > > > relevant
> > > > > > > data
> > > > > > > for each processor and differentiate big from LITTLE core.
> > > > > > > Note that I haven't looked at it for now. A good place to start is
> > > > > > > looking
> > > > > > > at
> > > > > > > how Linux does.
> > > > > > 
> > > > > > That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It
> > > > > > is
> > > > > > trivial to identify the two different CPU classes and which cores
> > > > > > belong
> > > > > > to which class.t, as
> > > > > 
> > > > > The class of the CPU can be found from the MIDR, there is no need to
> > > > > use the
> > > > > device tree/acpi for that. Note that I don't think there is an easy
> > > > > way in
> > > > > ACPI (i.e not in AML) to find out the class.
> > > > > 
> > > > > > It is harder to figure out which one is supposed to be
> > > > > > big and which one LITTLE. Regardless, we could default to using the
> > > > > > first cluster (usually big), which is also the cluster of the boot
> > > > > > cpu,
> > > > > > and utilize the second cluster only when the user demands it.
> > > > > 
> > > > > Why do you think the boot CPU will usually be a big one? In the case
> > > > > of Juno
> > > > > platform it is configurable, and the boot CPU is a little core on r2
> > > > > by
> > > > > default.
> > > > > 
> > > > > In any case, what we care about is differentiate between two set of
> > > > > CPUs. I
> > > > > don't think Xen should care about migrating a guest vCPU between big
> > > > > and
> > > > > LITTLE cpus. So I am not sure why we would want to know that.
> > > > 
> > > > No, it is not about migrating (at least yet). It is about giving useful
> > > > information to the user. It would be nice if the user had to choose
> > > > between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
> > > > even "A7" or "A15".
> > > 
> > > I don't think it is wise to assume that we may have only 2 kind of CPUs
> > > on the platform. We may have more in the future, if so how would you
> > > name them?
> > 
> > I would suggest that internally Xen recognize an arbitrary number of
> > processor "classes", and order them according to more powerful -> less
> > powerful.  Then if at some point someone makes a platform with three
> > processors, you can say "class 0", "class 1" or "class 2".  "big" would
> > be an alias for "class 0" and "little" would be an alias for "class 1".
> 
> As mentioned earlier, there is no upstreamed yet device tree bindings to know
> the "power" of a CPU (see [1]
> 
> > 
> > And in my suggestion, we allow a richer set of labels, so that the user
> > could also be more specific -- e.g., asking for "A15" specifically, for
> > example, and failing to build if there are no A15 cores present, while
> > allowing users to simply write "big" or "little" if they want simplicity
> > / things which work across different platforms.
> 
> Well, before trying to do something clever like that (i.e naming "big" and
> "little"), we need to have upstreamed bindings available to acknowledge the
> difference. AFAICT, it is not yet upstreamed for Device Tree (see [1]) and I
> don't know any static ACPI tables providing the similar information.

I like George's idea that "big" and "little" could be just convenience
aliases. Of course they are predicated on the necessary device tree
bindings being upstream. We don't need [1] to be upstream in Linux, just
the binding:

http://marc.info/?l=linux-arm-kernel&m=147308556729426&w=2

which has already been acked by the relevant maintainers.


> I had few discussions and  more thought about big.LITTLE support in Xen. The
> main goal of big.LITTLE is power efficiency by moving task around and been
> able to idle one cluster. All the solutions suggested (including mine) so far,
> can be replicated by hand (except the VPIDR) so they are mostly an automatic
> way. This will also remove the real benefits of big.LITTLE because Xen will
> not be able to migrate vCPU across cluster for power efficiency.

The goal of the architects of big.LITTLE might have been power
efficiency, but of course we are free to use any features that the
hardware provides in the best way for Xen and the Xen community.


> If we care about power efficiency, we would have to handle seamlessly
> big.LITTLE in Xen (i.e a guess would only see a kind of CPU). This arise quite
> few problem, nothing insurmountable, similar to migration across two platforms
> with different micro-architecture (e.g processors): errata, features
> supported... The guest would have to know the union of all the errata (this is
> done so far via the MIDR, so we would a PV way to do it), and only the
> intersection of features would be exposed to the guest. This also means the
> scheduler would have to be modified to handle power efficiency (not strictly
> necessary at the beginning).
> 
> I agree that a such solution would require some work to implement, although
> Xen will have a better control of the energy consumption of the platform.
> 
> So the question here, is what do we want to achieve with big.LITTLE?

I don't think that handling seamlessly big.LITTLE in Xen is the best way
to do it in the scenarios where Xen on ARM is being used today. I
understand the principles behind it, but I don't think that it will lead
to good results in a virtualized environment, where there is more
activity and more vcpus than pcpus.

What we discussed in this thread so far is actionable, and gives us
big.LITTLE support in a short time frame. It is a good fit for Xen on
ARM use cases and still leads to lower power consumption with an wise
allocation of big and LITTLE vcpus and pcpus to guests.

I would start from this approach, then if somebody comes along with a
plan to implement a big.LITTLE switcher in Xen, I welcome her to do it
and I would be happy to accept the code in Xen. We'll just make it
optional.


> Regards,
> 
> [1] https://lwn.net/Articles/699569/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 18:13                                         ` Stefano Stabellini
@ 2016-09-21 19:11                                           ` Julien Grall
  2016-09-21 19:21                                             ` Julien Grall
                                                               ` (2 more replies)
  0 siblings, 3 replies; 85+ messages in thread
From: Julien Grall @ 2016-09-21 19:11 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, Steve Capper, George Dunlap,
	Andrew Cooper, Dario Faggioli, Punit Agrawal, George Dunlap,
	xen-devel, Jan Beulich, Peng Fan

Hi Stefano,

On 21/09/2016 19:13, Stefano Stabellini wrote:
> On Wed, 21 Sep 2016, Julien Grall wrote:
>> (CC a couple of ARM folks)
>>
>> On 21/09/16 11:22, George Dunlap wrote:
>>> On 21/09/16 11:09, Julien Grall wrote:
>>>>
>>>>
>>>> On 20/09/16 21:17, Stefano Stabellini wrote:
>>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>> Hi Stefano,
>>>>>>
>>>>>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 20/09/2016 12:27, George Dunlap wrote:
>>>>>>>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
>>>>>>>>> <van.freenix@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli
>>>>>>>>>> wrote:
>>>>>>>>>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>>>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>>>>>> I'd like to add a computing capability in xen/arm, like this:
>>>>>>>>>>
>>>>>>>>>> struct compute_capatiliby
>>>>>>>>>> {
>>>>>>>>>>    char *core_name;
>>>>>>>>>>    uint32_t rank;
>>>>>>>>>>    uint32_t cpu_partnum;
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> struct compute_capatiliby cc=
>>>>>>>>>> {
>>>>>>>>>>   {"A72", 4, 0xd08},
>>>>>>>>>>   {"A57", 3, 0xxxx},
>>>>>>>>>>   {"A53", 2, 0xd03},
>>>>>>>>>>   {"A35", 1, ...},
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Then when identify cpu, we decide which cpu is big and which
>>>>>>>>>> cpu is
>>>>>>>>>> little
>>>>>>>>>> according to the computing rank.
>>>>>>>>>>
>>>>>>>>>> Any comments?
>>>>>>>>>
>>>>>>>>> I think we definitely need to have Xen have some kind of idea
>>>>>>>>> the
>>>>>>>>> order between processors, so that the user doesn't need to
>>>>>>>>> figure out
>>>>>>>>> which class / pool is big and which pool is LITTLE.  Whether
>>>>>>>>> this
>>>>>>>>> sort
>>>>>>>>> of enumeration is the best way to do that I'll let Julien and
>>>>>>>>> Stefano
>>>>>>>>> give their opinion.
>>>>>>>>
>>>>>>>> I don't think an hardcoded list of processor in Xen is the right
>>>>>>>> solution.
>>>>>>>> There are many existing processors and combinations for big.LITTLE
>>>>>>>> so it
>>>>>>>> will
>>>>>>>> nearly be impossible to keep updated.
>>>>>>>>
>>>>>>>> I would expect the firmware table (device tree, ACPI) to provide
>>>>>>>> relevant
>>>>>>>> data
>>>>>>>> for each processor and differentiate big from LITTLE core.
>>>>>>>> Note that I haven't looked at it for now. A good place to start is
>>>>>>>> looking
>>>>>>>> at
>>>>>>>> how Linux does.
>>>>>>>
>>>>>>> That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It
>>>>>>> is
>>>>>>> trivial to identify the two different CPU classes and which cores
>>>>>>> belong
>>>>>>> to which class.t, as
>>>>>>
>>>>>> The class of the CPU can be found from the MIDR, there is no need to
>>>>>> use the
>>>>>> device tree/acpi for that. Note that I don't think there is an easy
>>>>>> way in
>>>>>> ACPI (i.e not in AML) to find out the class.
>>>>>>
>>>>>>> It is harder to figure out which one is supposed to be
>>>>>>> big and which one LITTLE. Regardless, we could default to using the
>>>>>>> first cluster (usually big), which is also the cluster of the boot
>>>>>>> cpu,
>>>>>>> and utilize the second cluster only when the user demands it.
>>>>>>
>>>>>> Why do you think the boot CPU will usually be a big one? In the case
>>>>>> of Juno
>>>>>> platform it is configurable, and the boot CPU is a little core on r2
>>>>>> by
>>>>>> default.
>>>>>>
>>>>>> In any case, what we care about is differentiate between two set of
>>>>>> CPUs. I
>>>>>> don't think Xen should care about migrating a guest vCPU between big
>>>>>> and
>>>>>> LITTLE cpus. So I am not sure why we would want to know that.
>>>>>
>>>>> No, it is not about migrating (at least yet). It is about giving useful
>>>>> information to the user. It would be nice if the user had to choose
>>>>> between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>>>> even "A7" or "A15".
>>>>
>>>> I don't think it is wise to assume that we may have only 2 kind of CPUs
>>>> on the platform. We may have more in the future, if so how would you
>>>> name them?
>>>
>>> I would suggest that internally Xen recognize an arbitrary number of
>>> processor "classes", and order them according to more powerful -> less
>>> powerful.  Then if at some point someone makes a platform with three
>>> processors, you can say "class 0", "class 1" or "class 2".  "big" would
>>> be an alias for "class 0" and "little" would be an alias for "class 1".
>>
>> As mentioned earlier, there is no upstreamed yet device tree bindings to know
>> the "power" of a CPU (see [1]
>>
>>>
>>> And in my suggestion, we allow a richer set of labels, so that the user
>>> could also be more specific -- e.g., asking for "A15" specifically, for
>>> example, and failing to build if there are no A15 cores present, while
>>> allowing users to simply write "big" or "little" if they want simplicity
>>> / things which work across different platforms.
>>
>> Well, before trying to do something clever like that (i.e naming "big" and
>> "little"), we need to have upstreamed bindings available to acknowledge the
>> difference. AFAICT, it is not yet upstreamed for Device Tree (see [1]) and I
>> don't know any static ACPI tables providing the similar information.
>
> I like George's idea that "big" and "little" could be just convenience
> aliases. Of course they are predicated on the necessary device tree
> bindings being upstream. We don't need [1] to be upstream in Linux, just
> the binding:
>
> http://marc.info/?l=linux-arm-kernel&m=147308556729426&w=2
>
> which has already been acked by the relevant maintainers.

This is device tree only. What about ACPI?

>
>
>> I had few discussions and  more thought about big.LITTLE support in Xen. The
>> main goal of big.LITTLE is power efficiency by moving task around and been
>> able to idle one cluster. All the solutions suggested (including mine) so far,
>> can be replicated by hand (except the VPIDR) so they are mostly an automatic
>> way. This will also remove the real benefits of big.LITTLE because Xen will
>> not be able to migrate vCPU across cluster for power efficiency.
>
> The goal of the architects of big.LITTLE might have been power
> efficiency, but of course we are free to use any features that the
> hardware provides in the best way for Xen and the Xen community.

This is very dependent on how the big.LITTLE has been implemented by the 
hardware. Some platform can not run both big and LITTLE cores at the 
same time. You need a proper switch in the firmware/hypervisor.

>
>> If we care about power efficiency, we would have to handle seamlessly
>> big.LITTLE in Xen (i.e a guess would only see a kind of CPU). This arise quite
>> few problem, nothing insurmountable, similar to migration across two platforms
>> with different micro-architecture (e.g processors): errata, features
>> supported... The guest would have to know the union of all the errata (this is
>> done so far via the MIDR, so we would a PV way to do it), and only the
>> intersection of features would be exposed to the guest. This also means the
>> scheduler would have to be modified to handle power efficiency (not strictly
>> necessary at the beginning).
>>
>> I agree that a such solution would require some work to implement, although
>> Xen will have a better control of the energy consumption of the platform.
>>
>> So the question here, is what do we want to achieve with big.LITTLE?
>
> I don't think that handling seamlessly big.LITTLE in Xen is the best way
> to do it in the scenarios where Xen on ARM is being used today. I
> understand the principles behind it, but I don't think that it will lead
> to good results in a virtualized environment, where there is more
> activity and more vcpus than pcpus.

Can you detail why you don't think it will give good results?

>
> What we discussed in this thread so far is actionable, and gives us
> big.LITTLE support in a short time frame. It is a good fit for Xen on
> ARM use cases and still leads to lower power consumption with an wise
> allocation of big and LITTLE vcpus and pcpus to guests.

How this would lead to lower power consumption? If there is nothing 
running of the processor we would have a wfi loop which will never put 
the physical CPU in deep sleep. The main advantage of big.LITTLE is too 
be able to switch off a cluster/cpu when it is not used.

Without any knowledge in Xen (such as CPU freq), I am afraid the the 
power consumption will still be the same.

>
> I would start from this approach, then if somebody comes along with a
> plan to implement a big.LITTLE switcher in Xen, I welcome her to do it
> and I would be happy to accept the code in Xen. We'll just make it
> optional.

I think we are discussing here a simple design for big.LITTLE. I never 
asked Peng to do all the work. I am worry that if we start to expose the 
big.LITTLE to the userspace it will be hard in the future to step back 
from it.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 19:11                                           ` Julien Grall
@ 2016-09-21 19:21                                             ` Julien Grall
  2016-09-21 23:45                                             ` Stefano Stabellini
  2016-09-22  6:49                                             ` Peng Fan
  2 siblings, 0 replies; 85+ messages in thread
From: Julien Grall @ 2016-09-21 19:21 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, Steve Capper, George Dunlap,
	Andrew Cooper, Dario Faggioli, Punit Agrawal, George Dunlap,
	xen-devel, Jan Beulich, Peng Fan



On 21/09/2016 20:11, Julien Grall wrote:
> Hi Stefano,
>
> On 21/09/2016 19:13, Stefano Stabellini wrote:
>> On Wed, 21 Sep 2016, Julien Grall wrote:
>>> (CC a couple of ARM folks)
>>>
>>> On 21/09/16 11:22, George Dunlap wrote:
>>>> On 21/09/16 11:09, Julien Grall wrote:
>>>>>
>>>>>
>>>>> On 20/09/16 21:17, Stefano Stabellini wrote:
>>>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>>> Hi Stefano,
>>>>>>>
>>>>>>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>>>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On 20/09/2016 12:27, George Dunlap wrote:
>>>>>>>>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
>>>>>>>>>> <van.freenix@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>>>>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>>>>>>> I'd like to add a computing capability in xen/arm, like this:
>>>>>>>>>>>
>>>>>>>>>>> struct compute_capatiliby
>>>>>>>>>>> {
>>>>>>>>>>>    char *core_name;
>>>>>>>>>>>    uint32_t rank;
>>>>>>>>>>>    uint32_t cpu_partnum;
>>>>>>>>>>> };
>>>>>>>>>>>
>>>>>>>>>>> struct compute_capatiliby cc=
>>>>>>>>>>> {
>>>>>>>>>>>   {"A72", 4, 0xd08},
>>>>>>>>>>>   {"A57", 3, 0xxxx},
>>>>>>>>>>>   {"A53", 2, 0xd03},
>>>>>>>>>>>   {"A35", 1, ...},
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Then when identify cpu, we decide which cpu is big and which
>>>>>>>>>>> cpu is
>>>>>>>>>>> little
>>>>>>>>>>> according to the computing rank.
>>>>>>>>>>>
>>>>>>>>>>> Any comments?
>>>>>>>>>>
>>>>>>>>>> I think we definitely need to have Xen have some kind of idea
>>>>>>>>>> the
>>>>>>>>>> order between processors, so that the user doesn't need to
>>>>>>>>>> figure out
>>>>>>>>>> which class / pool is big and which pool is LITTLE.  Whether
>>>>>>>>>> this
>>>>>>>>>> sort
>>>>>>>>>> of enumeration is the best way to do that I'll let Julien and
>>>>>>>>>> Stefano
>>>>>>>>>> give their opinion.
>>>>>>>>>
>>>>>>>>> I don't think an hardcoded list of processor in Xen is the right
>>>>>>>>> solution.
>>>>>>>>> There are many existing processors and combinations for big.LITTLE
>>>>>>>>> so it
>>>>>>>>> will
>>>>>>>>> nearly be impossible to keep updated.
>>>>>>>>>
>>>>>>>>> I would expect the firmware table (device tree, ACPI) to provide
>>>>>>>>> relevant
>>>>>>>>> data
>>>>>>>>> for each processor and differentiate big from LITTLE core.
>>>>>>>>> Note that I haven't looked at it for now. A good place to start is
>>>>>>>>> looking
>>>>>>>>> at
>>>>>>>>> how Linux does.
>>>>>>>>
>>>>>>>> That's right, see
>>>>>>>> Documentation/devicetree/bindings/arm/cpus.txt. It
>>>>>>>> is
>>>>>>>> trivial to identify the two different CPU classes and which cores
>>>>>>>> belong
>>>>>>>> to which class.t, as
>>>>>>>
>>>>>>> The class of the CPU can be found from the MIDR, there is no need to
>>>>>>> use the
>>>>>>> device tree/acpi for that. Note that I don't think there is an easy
>>>>>>> way in
>>>>>>> ACPI (i.e not in AML) to find out the class.
>>>>>>>
>>>>>>>> It is harder to figure out which one is supposed to be
>>>>>>>> big and which one LITTLE. Regardless, we could default to using the
>>>>>>>> first cluster (usually big), which is also the cluster of the boot
>>>>>>>> cpu,
>>>>>>>> and utilize the second cluster only when the user demands it.
>>>>>>>
>>>>>>> Why do you think the boot CPU will usually be a big one? In the case
>>>>>>> of Juno
>>>>>>> platform it is configurable, and the boot CPU is a little core on r2
>>>>>>> by
>>>>>>> default.
>>>>>>>
>>>>>>> In any case, what we care about is differentiate between two set of
>>>>>>> CPUs. I
>>>>>>> don't think Xen should care about migrating a guest vCPU between big
>>>>>>> and
>>>>>>> LITTLE cpus. So I am not sure why we would want to know that.
>>>>>>
>>>>>> No, it is not about migrating (at least yet). It is about giving
>>>>>> useful
>>>>>> information to the user. It would be nice if the user had to choose
>>>>>> between "big" and "LITTLE" rather than "class 0x1" and "class
>>>>>> 0x100", or
>>>>>> even "A7" or "A15".
>>>>>
>>>>> I don't think it is wise to assume that we may have only 2 kind of
>>>>> CPUs
>>>>> on the platform. We may have more in the future, if so how would you
>>>>> name them?
>>>>
>>>> I would suggest that internally Xen recognize an arbitrary number of
>>>> processor "classes", and order them according to more powerful -> less
>>>> powerful.  Then if at some point someone makes a platform with three
>>>> processors, you can say "class 0", "class 1" or "class 2".  "big" would
>>>> be an alias for "class 0" and "little" would be an alias for "class 1".
>>>
>>> As mentioned earlier, there is no upstreamed yet device tree bindings
>>> to know
>>> the "power" of a CPU (see [1]
>>>
>>>>
>>>> And in my suggestion, we allow a richer set of labels, so that the user
>>>> could also be more specific -- e.g., asking for "A15" specifically, for
>>>> example, and failing to build if there are no A15 cores present, while
>>>> allowing users to simply write "big" or "little" if they want
>>>> simplicity
>>>> / things which work across different platforms.
>>>
>>> Well, before trying to do something clever like that (i.e naming
>>> "big" and
>>> "little"), we need to have upstreamed bindings available to
>>> acknowledge the
>>> difference. AFAICT, it is not yet upstreamed for Device Tree (see
>>> [1]) and I
>>> don't know any static ACPI tables providing the similar information.
>>
>> I like George's idea that "big" and "little" could be just convenience
>> aliases. Of course they are predicated on the necessary device tree
>> bindings being upstream. We don't need [1] to be upstream in Linux, just
>> the binding:
>>
>> http://marc.info/?l=linux-arm-kernel&m=147308556729426&w=2
>>
>> which has already been acked by the relevant maintainers.
>
> This is device tree only. What about ACPI?
>
>>
>>
>>> I had few discussions and  more thought about big.LITTLE support in
>>> Xen. The
>>> main goal of big.LITTLE is power efficiency by moving task around and
>>> been
>>> able to idle one cluster. All the solutions suggested (including
>>> mine) so far,
>>> can be replicated by hand (except the VPIDR) so they are mostly an
>>> automatic
>>> way. This will also remove the real benefits of big.LITTLE because
>>> Xen will
>>> not be able to migrate vCPU across cluster for power efficiency.
>>
>> The goal of the architects of big.LITTLE might have been power
>> efficiency, but of course we are free to use any features that the
>> hardware provides in the best way for Xen and the Xen community.
>
> This is very dependent on how the big.LITTLE has been implemented by the
> hardware. Some platform can not run both big and LITTLE cores at the
> same time. You need a proper switch in the firmware/hypervisor.
>
>>
>>> If we care about power efficiency, we would have to handle seamlessly
>>> big.LITTLE in Xen (i.e a guess would only see a kind of CPU). This
>>> arise quite
>>> few problem, nothing insurmountable, similar to migration across two
>>> platforms
>>> with different micro-architecture (e.g processors): errata, features
>>> supported... The guest would have to know the union of all the errata
>>> (this is
>>> done so far via the MIDR, so we would a PV way to do it), and only the
>>> intersection of features would be exposed to the guest. This also
>>> means the
>>> scheduler would have to be modified to handle power efficiency (not
>>> strictly
>>> necessary at the beginning).
>>>
>>> I agree that a such solution would require some work to implement,
>>> although
>>> Xen will have a better control of the energy consumption of the
>>> platform.
>>>
>>> So the question here, is what do we want to achieve with big.LITTLE?
>>
>> I don't think that handling seamlessly big.LITTLE in Xen is the best way
>> to do it in the scenarios where Xen on ARM is being used today. I
>> understand the principles behind it, but I don't think that it will lead
>> to good results in a virtualized environment, where there is more
>> activity and more vcpus than pcpus.
>
> Can you detail why you don't think it will give good results?
>
>>
>> What we discussed in this thread so far is actionable, and gives us
>> big.LITTLE support in a short time frame. It is a good fit for Xen on
>> ARM use cases and still leads to lower power consumption with an wise
>> allocation of big and LITTLE vcpus and pcpus to guests.
>
> How this would lead to lower power consumption? If there is nothing
> running of the processor we would have a wfi loop which will never put
> the physical CPU in deep sleep. The main advantage of big.LITTLE is too
> be able to switch off a cluster/cpu when it is not used.
>
> Without any knowledge in Xen (such as CPU freq), I am afraid the the
> power consumption will still be the same.
>
>>
>> I would start from this approach, then if somebody comes along with a
>> plan to implement a big.LITTLE switcher in Xen, I welcome her to do it
>> and I would be happy to accept the code in Xen. We'll just make it
>> optional.
>
> I think we are discussing here a simple design for big.LITTLE. I never
> asked Peng to do all the work. I am worry that if we start to expose the
> big.LITTLE to the userspace it will be hard in the future to step back
> from it.

Thinking a bit more to this, after looking at Dario's mail [1], his 
suggestion could be seen as a high level use of soft/hard affinity.

I will answer to this.

>
> Regards,
>

[1] https://lists.xen.org/archives/html/xen-devel/2016-09/msg02293.html

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 15:45                                         ` Dario Faggioli
@ 2016-09-21 19:28                                           ` Julien Grall
  2016-09-22  6:16                                             ` Peng Fan
  2016-09-22  8:43                                             ` Dario Faggioli
  0 siblings, 2 replies; 85+ messages in thread
From: Julien Grall @ 2016-09-21 19:28 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap, Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, Steve Capper, George Dunlap,
	Andrew Cooper, Punit Agrawal, xen-devel, Jan Beulich, Peng Fan

Hi Dario,

On 21/09/2016 16:45, Dario Faggioli wrote:
> On Wed, 2016-09-21 at 14:06 +0100, Julien Grall wrote:
>> (CC a couple of ARM folks)
>>
> Yay, thanks for this! :-)
>
>> I had few discussions and  more thought about big.LITTLE support in
>> Xen.
>> The main goal of big.LITTLE is power efficiency by moving task
>> around
>> and been able to idle one cluster. All the solutions suggested
>> (including mine) so far, can be replicated by hand (except the VPIDR)
>> so
>> they are mostly an automatic way.
>>
> I'm sorry, how is this (going to be) handled in Linux? Is it that any
> arbitrary task executing any arbitrary binary code can be run on both
> big and LITTLE pcpus, depending on the scheduler's and energy
> management's decisions?
>
> This does not seem to match with what has been said at some point in
> this thread... And if it's like that, how's that possible, if the
> pcpus' ISAs are (even only slightly) different?

Right, at some point I mentioned that the set of errata and features 
will be different between processor.

However, it is possible to sanitize the feature registers to expose a 
common set to the guest. This is what is done in Linux at boot time, 
only the features common to all the CPUs will be enabled.

This allows a task to migrate between big and LITTLE CPUs seamlessly.

>
>> This will also remove the real
>> benefits of big.LITTLE because Xen will not be able to migrate vCPU
>> across cluster for power efficiency.
>>
>> If we care about power efficiency, we would have to handle
>> seamlessly
>> big.LITTLE in Xen (i.e a guess would only see a kind of CPU).
>>
> Well, I'm a big fan of an approach that leaves the guests' scheduler
> dumb about things like these (i.e., load balancing, energy efficiency,
> etc), and hence puts Xen in charge. In fact, on a Xen system, it is
> only Xen that has all the info necessary to make wise decisions (e.g.,
> the load of the _whole_ host, the effect of any decisions on the
> _whole_ host, etc).
>
> But this case may be a LITTLE.bit ( :-PP ) different.
>
> Anyway, I guess I'll way your reply to my question above before
> commenting more.
>
>> This arise
>> quite few problem, nothing insurmountable, similar to migration
>> across
>> two platforms with different micro-architecture (e.g processors):
>> errata, features supported... The guest would have to know the union
>> of
>> all the errata (this is done so far via the MIDR, so we would a PV
>> way
>> to do it), and only the intersection of features would be exposed to
>> the
>> guest. This also means the scheduler would have to be modified to
>> handle
>> power efficiency (not strictly necessary at the beginning).
>>
>> I agree that a such solution would require some work to implement,
>> although Xen will have a better control of the energy consumption of
>> the
>> platform.
>>
>> So the question here, is what do we want to achieve with big.LITTLE?
>>
> Just thinking out loud here. So, instead of "just", as George
> suggested:
>
>  vcpuclass=["0-1:A35","2-5:A53", "6-7:A72"]
>
> we can allow something like the following (note that I'm tossing out
> random numbers next to the 'A's):
>
>  vcpuclass = ["0-1:A35", "2-5:A53,A17", "6-7:A72,A24,A31", "12-13:A8"]
>
> with the following meaning:
>  - vcpus 0, 1 can only run on pcpus of class A35
>  - vcpus 2,3,4,5 can run on pcpus of class A53 _and_ on pcpus of class
>    A17
>  - vcpus 6,7 can run on pcpus of class A72, A24, A31
>  - vcpus 8,9,10,11 --since they're not mentioned, can run on pcpus of
>    any class
>  - vcpus 12,13 can only run on pcpus of class A8
>
> This will set the "boundaries", for each vcpu. Then, within these
> boundaries, once in the (Xen's) scheduler, we can implement whatever
> complex/magic/silly logic we want, e.g.:
>  - only use a pcpu of class A53 for vcpus that have an average load
>    above 50%
>  - only use a pcpu of class A31 if there are no idle pcpus of class A24
>  - only use a pcpu of class A17 for a vcpu if the total system load
>    divided by the vcpu ID give 42 as result
>  - whatever
>
> This allows us to achieve both the following goals:
>  - allow Xen to take smart decisions, considering the load and the
>    efficiency of the host as a whole
>  - allow the guest to take smart decisions, like running lightweight
>    tasks on low power vcpus (which then Xen will run on low
>    power pcpus, at least on a properly configured system)
>
> Of course this **requires** that, for instance, vcpu 6 must be able to
> run on A72, A24 and A31 just fine, i.e., it must be possible for it to
> block on I/O when executing on an A72 pcpu, and, later, after wakeup,
> restart executing on an A24 pcpu.

With a bit of work in Xen, it would be possible to do move the vCPU 
between big and LITTLE cpus. As mentioned above, we could sanitize the 
features to only enable a common set. You can view the big.LITTLE 
problem as a local live migration between two kind of CPUs.

In your suggestion you don't mention what would happen if the guest 
configuration does not contain the affinity. Does it mean the vCPU will 
be scheduled anywhere? A pCPU/class will be chosen randomly?

To be honest, I quite like this idea. It could be used as soft/hard 
affinity for the moment. But can be extended in the future if/when the 
scheduler gain knowledge of power efficiency and vCPU can migrate 
between big and LITTLE.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 19:11                                           ` Julien Grall
  2016-09-21 19:21                                             ` Julien Grall
@ 2016-09-21 23:45                                             ` Stefano Stabellini
  2016-09-22  6:49                                             ` Peng Fan
  2 siblings, 0 replies; 85+ messages in thread
From: Stefano Stabellini @ 2016-09-21 23:45 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Dario Faggioli, Punit Agrawal,
	George Dunlap, xen-devel, Jan Beulich, Peng Fan

On Wed, 21 Sep 2016, Julien Grall wrote:
> > > > And in my suggestion, we allow a richer set of labels, so that the user
> > > > could also be more specific -- e.g., asking for "A15" specifically, for
> > > > example, and failing to build if there are no A15 cores present, while
> > > > allowing users to simply write "big" or "little" if they want simplicity
> > > > / things which work across different platforms.
> > > 
> > > Well, before trying to do something clever like that (i.e naming "big" and
> > > "little"), we need to have upstreamed bindings available to acknowledge
> > > the
> > > difference. AFAICT, it is not yet upstreamed for Device Tree (see [1]) and
> > > I
> > > don't know any static ACPI tables providing the similar information.
> > 
> > I like George's idea that "big" and "little" could be just convenience
> > aliases. Of course they are predicated on the necessary device tree
> > bindings being upstream. We don't need [1] to be upstream in Linux, just
> > the binding:
> > 
> > http://marc.info/?l=linux-arm-kernel&m=147308556729426&w=2
> > 
> > which has already been acked by the relevant maintainers.
> 
> This is device tree only. What about ACPI?

ACPI will come along with similar information at some point. When we'll
have it, we'll use it.


> > > I had few discussions and  more thought about big.LITTLE support in Xen.
> > > The
> > > main goal of big.LITTLE is power efficiency by moving task around and been
> > > able to idle one cluster. All the solutions suggested (including mine) so
> > > far,
> > > can be replicated by hand (except the VPIDR) so they are mostly an
> > > automatic
> > > way. This will also remove the real benefits of big.LITTLE because Xen
> > > will
> > > not be able to migrate vCPU across cluster for power efficiency.
> > 
> > The goal of the architects of big.LITTLE might have been power
> > efficiency, but of course we are free to use any features that the
> > hardware provides in the best way for Xen and the Xen community.
> 
> This is very dependent on how the big.LITTLE has been implemented by the
> hardware. Some platform can not run both big and LITTLE cores at the same
> time. You need a proper switch in the firmware/hypervisor.
 
Fair enough, that hardware wouldn't benefit from this work.


> > > If we care about power efficiency, we would have to handle seamlessly
> > > big.LITTLE in Xen (i.e a guess would only see a kind of CPU). This arise
> > > quite
> > > few problem, nothing insurmountable, similar to migration across two
> > > platforms
> > > with different micro-architecture (e.g processors): errata, features
> > > supported... The guest would have to know the union of all the errata
> > > (this is
> > > done so far via the MIDR, so we would a PV way to do it), and only the
> > > intersection of features would be exposed to the guest. This also means
> > > the
> > > scheduler would have to be modified to handle power efficiency (not
> > > strictly
> > > necessary at the beginning).
> > > 
> > > I agree that a such solution would require some work to implement,
> > > although
> > > Xen will have a better control of the energy consumption of the platform.
> > > 
> > > So the question here, is what do we want to achieve with big.LITTLE?
> > 
> > I don't think that handling seamlessly big.LITTLE in Xen is the best way
> > to do it in the scenarios where Xen on ARM is being used today. I
> > understand the principles behind it, but I don't think that it will lead
> > to good results in a virtualized environment, where there is more
> > activity and more vcpus than pcpus.
> 
> Can you detail why you don't think it will give good results?

I think big.LITTLE works well for cases where you have short clear burst
of activity while most of the time the system is quasi-idle (but not
completely idle). Basically like a smartphone. For other scenarios with
more uniform activity patterns, like a server or an infotainment system,
big.LITTLE is too big of an hammer to be used for dynamic power saving.
In those cases it is more flexible to expose all cores to VMs, so that
they can exploit all resources when necessary and idle them when they
can (with wfi or deeper sleep state if possible).


> > What we discussed in this thread so far is actionable, and gives us
> > big.LITTLE support in a short time frame. It is a good fit for Xen on
> > ARM use cases and still leads to lower power consumption with an wise
> > allocation of big and LITTLE vcpus and pcpus to guests.
> 
> How this would lead to lower power consumption?  If there is nothing
> running of the processor we would have a wfi loop which will never put
> the physical CPU in deep sleep.

I expect that by assigning appropriate tasks to big and LITTLE cores,
some big cores will be left to idle which will lead to some power
saving, especially if put idle cores in deep sleep (maybe using PSCI?).


> The main advantage of big.LITTLE is too be able to switch off a
> cluster/cpu when it is not used.

To me the main advantage is having double number of cores, each of them
better suited for different kinds of tasks :-)


> Without any knowledge in Xen (such as CPU freq), I am afraid the the power
> consumption will still be the same.
>
> > I would start from this approach, then if somebody comes along with a
> > plan to implement a big.LITTLE switcher in Xen, I welcome her to do it
> > and I would be happy to accept the code in Xen. We'll just make it
> > optional.
> 
> I think we are discussing here a simple design for big.LITTLE. I never asked
> Peng to do all the work. I am worry that if we start to expose the big.LITTLE
> to the userspace it will be hard in the future to step back from it.

I don't think so: I think we can make both approaches work without
issue, but you seem to have come to the same conclusion from following
emails.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 19:28                                           ` Julien Grall
@ 2016-09-22  6:16                                             ` Peng Fan
  2016-09-22  8:43                                             ` Dario Faggioli
  1 sibling, 0 replies; 85+ messages in thread
From: Peng Fan @ 2016-09-22  6:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Dario Faggioli, Punit Agrawal,
	George Dunlap, xen-devel, Jan Beulich

On Wed, Sep 21, 2016 at 08:28:32PM +0100, Julien Grall wrote:
>Hi Dario,
>
>On 21/09/2016 16:45, Dario Faggioli wrote:
>>On Wed, 2016-09-21 at 14:06 +0100, Julien Grall wrote:
>>>(CC a couple of ARM folks)
>>>
>>Yay, thanks for this! :-)
>>
>>>I had few discussions and  more thought about big.LITTLE support in
>>>Xen.
>>>The main goal of big.LITTLE is power efficiency by moving task
>>>around
>>>and been able to idle one cluster. All the solutions suggested
>>>(including mine) so far, can be replicated by hand (except the VPIDR)
>>>so
>>>they are mostly an automatic way.
>>>
>>I'm sorry, how is this (going to be) handled in Linux? Is it that any
>>arbitrary task executing any arbitrary binary code can be run on both
>>big and LITTLE pcpus, depending on the scheduler's and energy
>>management's decisions?
>>
>>This does not seem to match with what has been said at some point in
>>this thread... And if it's like that, how's that possible, if the
>>pcpus' ISAs are (even only slightly) different?
>
>Right, at some point I mentioned that the set of errata and features will be
>different between processor.
>
>However, it is possible to sanitize the feature registers to expose a common
>set to the guest. This is what is done in Linux at boot time, only the
>features common to all the CPUs will be enabled.
>
>This allows a task to migrate between big and LITTLE CPUs seamlessly.
>
>>
>>>This will also remove the real
>>>benefits of big.LITTLE because Xen will not be able to migrate vCPU
>>>across cluster for power efficiency.
>>>
>>>If we care about power efficiency, we would have to handle
>>>seamlessly
>>>big.LITTLE in Xen (i.e a guess would only see a kind of CPU).
>>>
>>Well, I'm a big fan of an approach that leaves the guests' scheduler
>>dumb about things like these (i.e., load balancing, energy efficiency,
>>etc), and hence puts Xen in charge. In fact, on a Xen system, it is
>>only Xen that has all the info necessary to make wise decisions (e.g.,
>>the load of the _whole_ host, the effect of any decisions on the
>>_whole_ host, etc).
>>
>>But this case may be a LITTLE.bit ( :-PP ) different.
>>
>>Anyway, I guess I'll way your reply to my question above before
>>commenting more.
>>
>>>This arise
>>>quite few problem, nothing insurmountable, similar to migration
>>>across
>>>two platforms with different micro-architecture (e.g processors):
>>>errata, features supported... The guest would have to know the union
>>>of
>>>all the errata (this is done so far via the MIDR, so we would a PV
>>>way
>>>to do it), and only the intersection of features would be exposed to
>>>the
>>>guest. This also means the scheduler would have to be modified to
>>>handle
>>>power efficiency (not strictly necessary at the beginning).
>>>
>>>I agree that a such solution would require some work to implement,
>>>although Xen will have a better control of the energy consumption of
>>>the
>>>platform.
>>>
>>>So the question here, is what do we want to achieve with big.LITTLE?
>>>
>>Just thinking out loud here. So, instead of "just", as George
>>suggested:
>>
>> vcpuclass=["0-1:A35","2-5:A53", "6-7:A72"]
>>
>>we can allow something like the following (note that I'm tossing out
>>random numbers next to the 'A's):
>>
>> vcpuclass = ["0-1:A35", "2-5:A53,A17", "6-7:A72,A24,A31", "12-13:A8"]
>>
>>with the following meaning:
>> - vcpus 0, 1 can only run on pcpus of class A35
>> - vcpus 2,3,4,5 can run on pcpus of class A53 _and_ on pcpus of class
>>   A17
>> - vcpus 6,7 can run on pcpus of class A72, A24, A31
>> - vcpus 8,9,10,11 --since they're not mentioned, can run on pcpus of
>>   any class
>> - vcpus 12,13 can only run on pcpus of class A8
>>
>>This will set the "boundaries", for each vcpu. Then, within these
>>boundaries, once in the (Xen's) scheduler, we can implement whatever
>>complex/magic/silly logic we want, e.g.:
>> - only use a pcpu of class A53 for vcpus that have an average load
>>   above 50%
>> - only use a pcpu of class A31 if there are no idle pcpus of class A24
>> - only use a pcpu of class A17 for a vcpu if the total system load
>>   divided by the vcpu ID give 42 as result
>> - whatever
>>
>>This allows us to achieve both the following goals:
>> - allow Xen to take smart decisions, considering the load and the
>>   efficiency of the host as a whole
>> - allow the guest to take smart decisions, like running lightweight
>>   tasks on low power vcpus (which then Xen will run on low
>>   power pcpus, at least on a properly configured system)
>>
>>Of course this **requires** that, for instance, vcpu 6 must be able to
>>run on A72, A24 and A31 just fine, i.e., it must be possible for it to
>>block on I/O when executing on an A72 pcpu, and, later, after wakeup,
>>restart executing on an A24 pcpu.
>
>With a bit of work in Xen, it would be possible to do move the vCPU between
>big and LITTLE cpus. As mentioned above, we could sanitize the features to
>only enable a common set. You can view the big.LITTLE problem as a local live
>migration between two kind of CPUs.
>
>In your suggestion you don't mention what would happen if the guest
>configuration does not contain the affinity. Does it mean the vCPU will be
>scheduled anywhere? A pCPU/class will be chosen randomly?

From the doc I read, https://wiki.xen.org/wiki/Tuning_Xen_for_Performance
Default hard affinity is all set 1, so vcpus can be scheduled to all pcpus.
But scheduler will choose prefered pcpus according to soft affinity.

>
>To be honest, I quite like this idea. It could be used as soft/hard affinity
>for the moment. But can be extended in the future if/when the scheduler gain
>knowledge of power efficiency and vCPU can migrate between big and LITTLE.

To GUEST, vCPUs have the same vcpu type, but it can be scheduled on big and LITTLE
pcpu. I can not foresee how much efforts needed for this.
This is a different direction from we discussed earlier in the thread.

For power efficiency, such as cpufreq and etc, seems little was done for xen/arm.
It is good that xen could take the advantage of big and LITTLE in future.
I am not sure that linux task high load means vCPU high load and
xen migrate the vCPU to big pcpu?

Regards,
Peng.

>
>Regards,
>
>-- 
>Julien Grall

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 19:11                                           ` Julien Grall
  2016-09-21 19:21                                             ` Julien Grall
  2016-09-21 23:45                                             ` Stefano Stabellini
@ 2016-09-22  6:49                                             ` Peng Fan
  2016-09-22  8:50                                               ` Dario Faggioli
  2 siblings, 1 reply; 85+ messages in thread
From: Peng Fan @ 2016-09-22  6:49 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Dario Faggioli, Punit Agrawal,
	George Dunlap, xen-devel, Jan Beulich

On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>Hi Stefano,
>
>On 21/09/2016 19:13, Stefano Stabellini wrote:
>>On Wed, 21 Sep 2016, Julien Grall wrote:
>>>(CC a couple of ARM folks)
>>>
>>>On 21/09/16 11:22, George Dunlap wrote:
>>>>On 21/09/16 11:09, Julien Grall wrote:
>>>>>
>>>>>
>>>>>On 20/09/16 21:17, Stefano Stabellini wrote:
>>>>>>On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>>>Hi Stefano,
>>>>>>>
>>>>>>>On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>>>>>>On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>>>>>Hi,
>>>>>>>>>
>>>>>>>>>On 20/09/2016 12:27, George Dunlap wrote:
>>>>>>>>>>On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
>>>>>>>>>><van.freenix@gmail.com>
>>>>>>>>>>wrote:
>>>>>>>>>>>On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli
>>>>>>>>>>>wrote:
>>>>>>>>>>>>On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>>>>>>>On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>>>>>>>I'd like to add a computing capability in xen/arm, like this:
>>>>>>>>>>>
>>>>>>>>>>>struct compute_capatiliby
>>>>>>>>>>>{
>>>>>>>>>>>   char *core_name;
>>>>>>>>>>>   uint32_t rank;
>>>>>>>>>>>   uint32_t cpu_partnum;
>>>>>>>>>>>};
>>>>>>>>>>>
>>>>>>>>>>>struct compute_capatiliby cc=
>>>>>>>>>>>{
>>>>>>>>>>>  {"A72", 4, 0xd08},
>>>>>>>>>>>  {"A57", 3, 0xxxx},
>>>>>>>>>>>  {"A53", 2, 0xd03},
>>>>>>>>>>>  {"A35", 1, ...},
>>>>>>>>>>>}
>>>>>>>>>>>
>>>>>>>>>>>Then when identify cpu, we decide which cpu is big and which
>>>>>>>>>>>cpu is
>>>>>>>>>>>little
>>>>>>>>>>>according to the computing rank.
>>>>>>>>>>>
>>>>>>>>>>>Any comments?
>>>>>>>>>>
>>>>>>>>>>I think we definitely need to have Xen have some kind of idea
>>>>>>>>>>the
>>>>>>>>>>order between processors, so that the user doesn't need to
>>>>>>>>>>figure out
>>>>>>>>>>which class / pool is big and which pool is LITTLE.  Whether
>>>>>>>>>>this
>>>>>>>>>>sort
>>>>>>>>>>of enumeration is the best way to do that I'll let Julien and
>>>>>>>>>>Stefano
>>>>>>>>>>give their opinion.
>>>>>>>>>
>>>>>>>>>I don't think an hardcoded list of processor in Xen is the right
>>>>>>>>>solution.
>>>>>>>>>There are many existing processors and combinations for big.LITTLE
>>>>>>>>>so it
>>>>>>>>>will
>>>>>>>>>nearly be impossible to keep updated.
>>>>>>>>>
>>>>>>>>>I would expect the firmware table (device tree, ACPI) to provide
>>>>>>>>>relevant
>>>>>>>>>data
>>>>>>>>>for each processor and differentiate big from LITTLE core.
>>>>>>>>>Note that I haven't looked at it for now. A good place to start is
>>>>>>>>>looking
>>>>>>>>>at
>>>>>>>>>how Linux does.
>>>>>>>>
>>>>>>>>That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It
>>>>>>>>is
>>>>>>>>trivial to identify the two different CPU classes and which cores
>>>>>>>>belong
>>>>>>>>to which class.t, as
>>>>>>>
>>>>>>>The class of the CPU can be found from the MIDR, there is no need to
>>>>>>>use the
>>>>>>>device tree/acpi for that. Note that I don't think there is an easy
>>>>>>>way in
>>>>>>>ACPI (i.e not in AML) to find out the class.
>>>>>>>
>>>>>>>>It is harder to figure out which one is supposed to be
>>>>>>>>big and which one LITTLE. Regardless, we could default to using the
>>>>>>>>first cluster (usually big), which is also the cluster of the boot
>>>>>>>>cpu,
>>>>>>>>and utilize the second cluster only when the user demands it.
>>>>>>>
>>>>>>>Why do you think the boot CPU will usually be a big one? In the case
>>>>>>>of Juno
>>>>>>>platform it is configurable, and the boot CPU is a little core on r2
>>>>>>>by
>>>>>>>default.
>>>>>>>
>>>>>>>In any case, what we care about is differentiate between two set of
>>>>>>>CPUs. I
>>>>>>>don't think Xen should care about migrating a guest vCPU between big
>>>>>>>and
>>>>>>>LITTLE cpus. So I am not sure why we would want to know that.
>>>>>>
>>>>>>No, it is not about migrating (at least yet). It is about giving useful
>>>>>>information to the user. It would be nice if the user had to choose
>>>>>>between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>>>>>even "A7" or "A15".
>>>>>
>>>>>I don't think it is wise to assume that we may have only 2 kind of CPUs
>>>>>on the platform. We may have more in the future, if so how would you
>>>>>name them?
>>>>
>>>>I would suggest that internally Xen recognize an arbitrary number of
>>>>processor "classes", and order them according to more powerful -> less
>>>>powerful.  Then if at some point someone makes a platform with three
>>>>processors, you can say "class 0", "class 1" or "class 2".  "big" would
>>>>be an alias for "class 0" and "little" would be an alias for "class 1".
>>>
>>>As mentioned earlier, there is no upstreamed yet device tree bindings to know
>>>the "power" of a CPU (see [1]
>>>
>>>>
>>>>And in my suggestion, we allow a richer set of labels, so that the user
>>>>could also be more specific -- e.g., asking for "A15" specifically, for
>>>>example, and failing to build if there are no A15 cores present, while
>>>>allowing users to simply write "big" or "little" if they want simplicity
>>>>/ things which work across different platforms.
>>>
>>>Well, before trying to do something clever like that (i.e naming "big" and
>>>"little"), we need to have upstreamed bindings available to acknowledge the
>>>difference. AFAICT, it is not yet upstreamed for Device Tree (see [1]) and I
>>>don't know any static ACPI tables providing the similar information.
>>
>>I like George's idea that "big" and "little" could be just convenience
>>aliases. Of course they are predicated on the necessary device tree
>>bindings being upstream. We don't need [1] to be upstream in Linux, just
>>the binding:
>>
>>http://marc.info/?l=linux-arm-kernel&m=147308556729426&w=2
>>
>>which has already been acked by the relevant maintainers.
>
>This is device tree only. What about ACPI?
>
>>
>>
>>>I had few discussions and  more thought about big.LITTLE support in Xen. The
>>>main goal of big.LITTLE is power efficiency by moving task around and been
>>>able to idle one cluster. All the solutions suggested (including mine) so far,
>>>can be replicated by hand (except the VPIDR) so they are mostly an automatic
>>>way. This will also remove the real benefits of big.LITTLE because Xen will
>>>not be able to migrate vCPU across cluster for power efficiency.
>>
>>The goal of the architects of big.LITTLE might have been power
>>efficiency, but of course we are free to use any features that the
>>hardware provides in the best way for Xen and the Xen community.
>
>This is very dependent on how the big.LITTLE has been implemented by the
>hardware. Some platform can not run both big and LITTLE cores at the same
>time. You need a proper switch in the firmware/hypervisor.
>
>>
>>>If we care about power efficiency, we would have to handle seamlessly
>>>big.LITTLE in Xen (i.e a guess would only see a kind of CPU). This arise quite
>>>few problem, nothing insurmountable, similar to migration across two platforms
>>>with different micro-architecture (e.g processors): errata, features
>>>supported... The guest would have to know the union of all the errata (this is
>>>done so far via the MIDR, so we would a PV way to do it), and only the
>>>intersection of features would be exposed to the guest. This also means the
>>>scheduler would have to be modified to handle power efficiency (not strictly
>>>necessary at the beginning).
>>>
>>>I agree that a such solution would require some work to implement, although
>>>Xen will have a better control of the energy consumption of the platform.
>>>
>>>So the question here, is what do we want to achieve with big.LITTLE?
>>
>>I don't think that handling seamlessly big.LITTLE in Xen is the best way
>>to do it in the scenarios where Xen on ARM is being used today. I
>>understand the principles behind it, but I don't think that it will lead
>>to good results in a virtualized environment, where there is more
>>activity and more vcpus than pcpus.
>
>Can you detail why you don't think it will give good results?
>
>>
>>What we discussed in this thread so far is actionable, and gives us
>>big.LITTLE support in a short time frame. It is a good fit for Xen on
>>ARM use cases and still leads to lower power consumption with an wise
>>allocation of big and LITTLE vcpus and pcpus to guests.
>
>How this would lead to lower power consumption? If there is nothing running
>of the processor we would have a wfi loop which will never put the physical
>CPU in deep sleep. The main advantage of big.LITTLE is too be able to switch
>off a cluster/cpu when it is not used.
>
>Without any knowledge in Xen (such as CPU freq), I am afraid the the power
>consumption will still be the same.
>
>>
>>I would start from this approach, then if somebody comes along with a
>>plan to implement a big.LITTLE switcher in Xen, I welcome her to do it
>>and I would be happy to accept the code in Xen. We'll just make it
>>optional.
>
>I think we are discussing here a simple design for big.LITTLE. I never asked
>Peng to do all the work. I am worry that if we start to expose the big.LITTLE
>to the userspace it will be hard in the future to step back from it.

Hello Julien,


I prefer the simple doable method, and As Stefano said, actionable.

 - introduce a hypercall interface to let xl can get the different classes cpu info.
 - Use vcpuclass in xl cfg file to let user create different vcpus
 - extract cpu computing cap from dts to differentiate cpus. As you said, bindings
   not upstreamed. But this is not the hardpoint, we could change the info, whether
   dmips or cap in future.
 - use cpu hard affinity to block vcpu scheduling bwtween little and big pcpu.
 - block user setting vcpu hard affinity bwtween big and LITTLE.
 - only hard affinity seems enough, no need soft affinity.

Anyway, for vcpu scheduling bwtween big and LITTLE, if this is the right
direction and you have an idea on how to implement, I could follow you on
enabling this feature with you leading the work. I do not have much idea on this.

I checked the code http://www.linux-arm.org/git?p=linux-skp.git;a=shortlog;h=refs/heads/cpu-ftr/v3-4.3-rc4
which is near the same with the patches merged into Linux.
It tries to expose the common safe cpu feature to userspace/kvm/debug, most for userspace.

But for midr, it says:
MIDR_EL1 is exposed to help identify the processor. On a heterogeneous
system, this could be racy (just like getcpu()). The process could be
migrated to another CPU by the time it uses the register value, unless the
CPU affinity is set. Hence, there is no guarantee that the value reflects the
processor that it is currently executing on. The REVIDR is not exposed due
to this constraint, as REVIDR makes sense only in conjunction with the MIDR.

I think for vCPU, we may met same issue.

Thanks,
Peng.

>
>Regards,
>
>-- 
>Julien Grall

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 19:28                                           ` Julien Grall
  2016-09-22  6:16                                             ` Peng Fan
@ 2016-09-22  8:43                                             ` Dario Faggioli
  2016-09-22 11:24                                               ` Julien Grall
  1 sibling, 1 reply; 85+ messages in thread
From: Dario Faggioli @ 2016-09-22  8:43 UTC (permalink / raw)
  To: Julien Grall, George Dunlap, Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, Steve Capper, George Dunlap,
	Andrew Cooper, Punit Agrawal, xen-devel, Jan Beulich, Peng Fan


[-- Attachment #1.1: Type: text/plain, Size: 4391 bytes --]

On Wed, 2016-09-21 at 20:28 +0100, Julien Grall wrote:
> On 21/09/2016 16:45, Dario Faggioli wrote:
> > This does not seem to match with what has been said at some point
> > in
> > this thread... And if it's like that, how's that possible, if the
> > pcpus' ISAs are (even only slightly) different?
> 
> Right, at some point I mentioned that the set of errata and features 
> will be different between processor.
> 
Yes, I read that, but wasn't (and still am not) sure about whether or
not that meant a vcpu can move freely between classes or not, in the
way that the scheduler does that.

In fact, you say:

> With a bit of work in Xen, it would be possible to do move the vCPU 
> between big and LITTLE cpus. As mentioned above, we could sanitize
> the 
> features to only enable a common set. 
> You can view the big.LITTLE 
> problem as a local live migration between two kind of CPUs.
> 
Local migration basically --from the vcpu perspective-- means create a
new vcpu, stop the original vcpu, copy the state from original to new,
destroy the original vcpu and start the new one. My point is that this
is not something that can be done within nor initiated by the
scheduler, e.g., during a context switch or a vcpu wakeup!

And I'm saying this because...

> In your suggestion you don't mention what would happen if the guest 
> configuration does not contain the affinity. Does it mean the vCPU
> will 
> be scheduled anywhere? A pCPU/class will be chosen randomly?
> 
...in my example there were vcpus for which no set of classes was
specified, and I said that it meant those vcpus can run on any pcpu of
any class. And this would be what I think we should do even in cases
where no "vcpuclass" parameter is specified at all.

*BUT* that is only possible if moving a vcpu from a pcpu of class A to
a pcpu of class B does *NOT* require the steps described above, similar
to local migration. IOW, this is only possible if moving a vcpu from a
pcpu of class A to a pcpu of class B *ONLY* requires a context switch.

If changing class requires local migration, the scheduler must be told
that he should never move vcpus between classes (or set of classes made
by homogeneous enough vcpus for which a context switch is sufficient).
If changing class is --or can be made to be, with some work in Xen--
just a context switch, then we can have the scheduler moving vcpus
between (set of) classes.

It's probably not too big of a deal, wrt the end result (see below),
but it changes the implementation a lot.

But, yeah, if changing class can be made simple with some work in Xen,
but is not simple/possible **right now**, then this means that,
_for_now_, vcpus for which a class is not specified must be assigned to
a class (or a set of classes within which the scheduler can freely move
vcpus). In future, we can change this, broadening the "default class"
as much as seamless migration within its pcpus allows that.

Hope I made myself clear enough. :-D

> To be honest, I quite like this idea. 
>
:-)

> It could be used as soft/hard 
> affinity for the moment. But can be extended in the future if/when
> the 
> scheduler gain knowledge of power efficiency and vCPU can migrate 
> between big and LITTLE.
> 
Yes, exactly, and this is, I think, true in both of the above outlined
cases. What I meant when I said it is the implementation, rather than
the end result that changes, is that:
 - if complex migration-alike operations are necessary for changing 
   class, migrating between classes (e.g., between big and LITTLE) 
   will have to happen, e.g., in a load and energy management and
   balancing component implemented above the scheduler itself
 - if just plain context switch is enough, the scheduler can do
   everything by itself.

But yes, in any case, the model we're coming up with looks to be a very
good starting point, because it is orthogonal to and independent from
other components and solution (e.g., cpupools) and is pretty simple and
basic, and leaves room for future extensions.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22  6:49                                             ` Peng Fan
@ 2016-09-22  8:50                                               ` Dario Faggioli
  2016-09-22  9:27                                                 ` Peng Fan
  2016-09-22 10:05                                                 ` Peng Fan
  0 siblings, 2 replies; 85+ messages in thread
From: Dario Faggioli @ 2016-09-22  8:50 UTC (permalink / raw)
  To: Peng Fan, Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Punit Agrawal, George Dunlap,
	xen-devel, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 14473 bytes --]

On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
> > 
> > Hi Stefano,
> > 
> > On 21/09/2016 19:13, Stefano Stabellini wrote:
> > > 
> > > On Wed, 21 Sep 2016, Julien Grall wrote:
> > > > 
> > > > (CC a couple of ARM folks)
> > > > 
> > > > On 21/09/16 11:22, George Dunlap wrote:
> > > > > 
> > > > > On 21/09/16 11:09, Julien Grall wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On 20/09/16 21:17, Stefano Stabellini wrote:
> > > > > > > 
> > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
> > > > > > > > 
> > > > > > > > Hi Stefano,
> > > > > > > > 
> > > > > > > > On 20/09/2016 20:09, Stefano Stabellini wrote:
> > > > > > > > > 
> > > > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
> > > > > > > > > > 
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > On 20/09/2016 12:27, George Dunlap wrote:
> > > > > > > > > > > 
> > > > > > > > > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
> > > > > > > > > > > <van.freenix@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario
> > > > > > > > > > > > Faggioli
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano
> > > > > > > > > > > > > Stabellini wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > > > > > > > > > > > I'd like to add a computing capability in
> > > > > > > > > > > > xen/arm, like this:
> > > > > > > > > > > > 
> > > > > > > > > > > > struct compute_capatiliby
> > > > > > > > > > > > {
> > > > > > > > > > > >   char *core_name;
> > > > > > > > > > > >   uint32_t rank;
> > > > > > > > > > > >   uint32_t cpu_partnum;
> > > > > > > > > > > > };
> > > > > > > > > > > > 
> > > > > > > > > > > > struct compute_capatiliby cc=
> > > > > > > > > > > > {
> > > > > > > > > > > >  {"A72", 4, 0xd08},
> > > > > > > > > > > >  {"A57", 3, 0xxxx},
> > > > > > > > > > > >  {"A53", 2, 0xd03},
> > > > > > > > > > > >  {"A35", 1, ...},
> > > > > > > > > > > > }
> > > > > > > > > > > > 
> > > > > > > > > > > > Then when identify cpu, we decide which cpu is
> > > > > > > > > > > > big and which
> > > > > > > > > > > > cpu is
> > > > > > > > > > > > little
> > > > > > > > > > > > according to the computing rank.
> > > > > > > > > > > > 
> > > > > > > > > > > > Any comments?
> > > > > > > > > > > 
> > > > > > > > > > > I think we definitely need to have Xen have some
> > > > > > > > > > > kind of idea
> > > > > > > > > > > the
> > > > > > > > > > > order between processors, so that the user
> > > > > > > > > > > doesn't need to
> > > > > > > > > > > figure out
> > > > > > > > > > > which class / pool is big and which pool is
> > > > > > > > > > > LITTLE.  Whether
> > > > > > > > > > > this
> > > > > > > > > > > sort
> > > > > > > > > > > of enumeration is the best way to do that I'll
> > > > > > > > > > > let Julien and
> > > > > > > > > > > Stefano
> > > > > > > > > > > give their opinion.
> > > > > > > > > > 
> > > > > > > > > > I don't think an hardcoded list of processor in Xen
> > > > > > > > > > is the right
> > > > > > > > > > solution.
> > > > > > > > > > There are many existing processors and combinations
> > > > > > > > > > for big.LITTLE
> > > > > > > > > > so it
> > > > > > > > > > will
> > > > > > > > > > nearly be impossible to keep updated.
> > > > > > > > > > 
> > > > > > > > > > I would expect the firmware table (device tree,
> > > > > > > > > > ACPI) to provide
> > > > > > > > > > relevant
> > > > > > > > > > data
> > > > > > > > > > for each processor and differentiate big from
> > > > > > > > > > LITTLE core.
> > > > > > > > > > Note that I haven't looked at it for now. A good
> > > > > > > > > > place to start is
> > > > > > > > > > looking
> > > > > > > > > > at
> > > > > > > > > > how Linux does.
> > > > > > > > > 
> > > > > > > > > That's right, see
> > > > > > > > > Documentation/devicetree/bindings/arm/cpus.txt. It
> > > > > > > > > is
> > > > > > > > > trivial to identify the two different CPU classes and
> > > > > > > > > which cores
> > > > > > > > > belong
> > > > > > > > > to which class.t, as
> > > > > > > > 
> > > > > > > > The class of the CPU can be found from the MIDR, there
> > > > > > > > is no need to
> > > > > > > > use the
> > > > > > > > device tree/acpi for that. Note that I don't think
> > > > > > > > there is an easy
> > > > > > > > way in
> > > > > > > > ACPI (i.e not in AML) to find out the class.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > It is harder to figure out which one is supposed to
> > > > > > > > > be
> > > > > > > > > big and which one LITTLE. Regardless, we could
> > > > > > > > > default to using the
> > > > > > > > > first cluster (usually big), which is also the
> > > > > > > > > cluster of the boot
> > > > > > > > > cpu,
> > > > > > > > > and utilize the second cluster only when the user
> > > > > > > > > demands it.
> > > > > > > > 
> > > > > > > > Why do you think the boot CPU will usually be a big
> > > > > > > > one? In the case
> > > > > > > > of Juno
> > > > > > > > platform it is configurable, and the boot CPU is a
> > > > > > > > little core on r2
> > > > > > > > by
> > > > > > > > default.
> > > > > > > > 
> > > > > > > > In any case, what we care about is differentiate
> > > > > > > > between two set of
> > > > > > > > CPUs. I
> > > > > > > > don't think Xen should care about migrating a guest
> > > > > > > > vCPU between big
> > > > > > > > and
> > > > > > > > LITTLE cpus. So I am not sure why we would want to know
> > > > > > > > that.
> > > > > > > 
> > > > > > > No, it is not about migrating (at least yet). It is about
> > > > > > > giving useful
> > > > > > > information to the user. It would be nice if the user had
> > > > > > > to choose
> > > > > > > between "big" and "LITTLE" rather than "class 0x1" and
> > > > > > > "class 0x100", or
> > > > > > > even "A7" or "A15".
> > > > > > 
> > > > > > I don't think it is wise to assume that we may have only 2
> > > > > > kind of CPUs
> > > > > > on the platform. We may have more in the future, if so how
> > > > > > would you
> > > > > > name them?
> > > > > 
> > > > > I would suggest that internally Xen recognize an arbitrary
> > > > > number of
> > > > > processor "classes", and order them according to more
> > > > > powerful -> less
> > > > > powerful.  Then if at some point someone makes a platform
> > > > > with three
> > > > > processors, you can say "class 0", "class 1" or "class
> > > > > 2".  "big" would
> > > > > be an alias for "class 0" and "little" would be an alias for
> > > > > "class 1".
> > > > 
> > > > As mentioned earlier, there is no upstreamed yet device tree
> > > > bindings to know
> > > > the "power" of a CPU (see [1]
> > > > 
> > > > > 
> > > > > 
> > > > > And in my suggestion, we allow a richer set of labels, so
> > > > > that the user
> > > > > could also be more specific -- e.g., asking for "A15"
> > > > > specifically, for
> > > > > example, and failing to build if there are no A15 cores
> > > > > present, while
> > > > > allowing users to simply write "big" or "little" if they want
> > > > > simplicity
> > > > > / things which work across different platforms.
> > > > 
> > > > Well, before trying to do something clever like that (i.e
> > > > naming "big" and
> > > > "little"), we need to have upstreamed bindings available to
> > > > acknowledge the
> > > > difference. AFAICT, it is not yet upstreamed for Device Tree
> > > > (see [1]) and I
> > > > don't know any static ACPI tables providing the similar
> > > > information.
> > > 
> > > I like George's idea that "big" and "little" could be just
> > > convenience
> > > aliases. Of course they are predicated on the necessary device
> > > tree
> > > bindings being upstream. We don't need [1] to be upstream in
> > > Linux, just
> > > the binding:
> > > 
> > > http://marc.info/?l=linux-arm-kernel&m=147308556729426&w=2
> > > 
> > > which has already been acked by the relevant maintainers.
> > 
> > This is device tree only. What about ACPI?
> > 
> > > 
> > > 
> > > 
> > > > 
> > > > I had few discussions and  more thought about big.LITTLE
> > > > support in Xen. The
> > > > main goal of big.LITTLE is power efficiency by moving task
> > > > around and been
> > > > able to idle one cluster. All the solutions suggested
> > > > (including mine) so far,
> > > > can be replicated by hand (except the VPIDR) so they are mostly
> > > > an automatic
> > > > way. This will also remove the real benefits of big.LITTLE
> > > > because Xen will
> > > > not be able to migrate vCPU across cluster for power
> > > > efficiency.
> > > 
> > > The goal of the architects of big.LITTLE might have been power
> > > efficiency, but of course we are free to use any features that
> > > the
> > > hardware provides in the best way for Xen and the Xen community.
> > 
> > This is very dependent on how the big.LITTLE has been implemented
> > by the
> > hardware. Some platform can not run both big and LITTLE cores at
> > the same
> > time. You need a proper switch in the firmware/hypervisor.
> > 
> > > 
> > > 
> > > > 
> > > > If we care about power efficiency, we would have to handle
> > > > seamlessly
> > > > big.LITTLE in Xen (i.e a guess would only see a kind of CPU).
> > > > This arise quite
> > > > few problem, nothing insurmountable, similar to migration
> > > > across two platforms
> > > > with different micro-architecture (e.g processors): errata,
> > > > features
> > > > supported... The guest would have to know the union of all the
> > > > errata (this is
> > > > done so far via the MIDR, so we would a PV way to do it), and
> > > > only the
> > > > intersection of features would be exposed to the guest. This
> > > > also means the
> > > > scheduler would have to be modified to handle power efficiency
> > > > (not strictly
> > > > necessary at the beginning).
> > > > 
> > > > I agree that a such solution would require some work to
> > > > implement, although
> > > > Xen will have a better control of the energy consumption of the
> > > > platform.
> > > > 
> > > > So the question here, is what do we want to achieve with
> > > > big.LITTLE?
> > > 
> > > I don't think that handling seamlessly big.LITTLE in Xen is the
> > > best way
> > > to do it in the scenarios where Xen on ARM is being used today. I
> > > understand the principles behind it, but I don't think that it
> > > will lead
> > > to good results in a virtualized environment, where there is more
> > > activity and more vcpus than pcpus.
> > 
> > Can you detail why you don't think it will give good results?
> > 
> > > 
> > > 
> > > What we discussed in this thread so far is actionable, and gives
> > > us
> > > big.LITTLE support in a short time frame. It is a good fit for
> > > Xen on
> > > ARM use cases and still leads to lower power consumption with an
> > > wise
> > > allocation of big and LITTLE vcpus and pcpus to guests.
> > 
> > How this would lead to lower power consumption? If there is nothing
> > running
> > of the processor we would have a wfi loop which will never put the
> > physical
> > CPU in deep sleep. The main advantage of big.LITTLE is too be able
> > to switch
> > off a cluster/cpu when it is not used.
> > 
> > Without any knowledge in Xen (such as CPU freq), I am afraid the
> > the power
> > consumption will still be the same.
> > 
> > > 
> > > 
> > > I would start from this approach, then if somebody comes along
> > > with a
> > > plan to implement a big.LITTLE switcher in Xen, I welcome her to
> > > do it
> > > and I would be happy to accept the code in Xen. We'll just make
> > > it
> > > optional.
> > 
> > I think we are discussing here a simple design for big.LITTLE. I
> > never asked
> > Peng to do all the work. I am worry that if we start to expose the
> > big.LITTLE
> > to the userspace it will be hard in the future to step back from
> > it.
> 
> Hello Julien,
> 
> 
> I prefer the simple doable method, and As Stefano said, actionable.
> 
Yep, this is a very good starting point, IMO.

>  - introduce a hypercall interface to let xl can get the different
> classes cpu info.
>
+1

>  - Use vcpuclass in xl cfg file to let user create different vcpus
>
Yep.

>  - extract cpu computing cap from dts to differentiate cpus. As you
> said, bindings
>    not upstreamed. But this is not the hardpoint, we could change the
> info, whether
>    dmips or cap in future.
>
Yes (or I should say, "whatever", as I know nothing about all
this! :-P)

>  - use cpu hard affinity to block vcpu scheduling bwtween little and
> big pcpu.
>
"to block vcpu scheduling within the specified classes, for each vcpu"

But, again, yes.

>  - block user setting vcpu hard affinity bwtween big and LITTLE.
>
"between the specified class"

Indeed.

>  - only hard affinity seems enough, no need soft affinity.
> 
Correct. Just don't care at all and don't touch soft affinity for now.

> Anyway, for vcpu scheduling bwtween big and LITTLE, if this is the
> right
> direction and you have an idea on how to implement, I could follow
> you on
> enabling this feature with you leading the work. I do not have much
> idea on this.
> 
This can come later, either as an enhancement of this affinity based
solution, or being implemented on top of it.

In any case, let's start with the affinity based solution for now. It's
a good level support already, and a nice first step toward future
improvements.

Oh, btw, please don't throw away this cpupool patch series either!

A feature like `xl cpupool-biglittle-split' can still be interesting,
completely orthogonally and independently from the affinity based work,
and this series looks like it can be used to implement that. :-)

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22  8:50                                               ` Dario Faggioli
@ 2016-09-22  9:27                                                 ` Peng Fan
  2016-09-22  9:51                                                   ` George Dunlap
                                                                     ` (2 more replies)
  2016-09-22 10:05                                                 ` Peng Fan
  1 sibling, 3 replies; 85+ messages in thread
From: Peng Fan @ 2016-09-22  9:27 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Punit Agrawal, George Dunlap,
	xen-devel, Julien Grall, Jan Beulich

On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>> > 
>> > Hi Stefano,
>> > 
>> > On 21/09/2016 19:13, Stefano Stabellini wrote:
>> > > 
>> > > On Wed, 21 Sep 2016, Julien Grall wrote:
>> > > > 
>> > > > (CC a couple of ARM folks)
>> > > > 
>> > > > On 21/09/16 11:22, George Dunlap wrote:
>> > > > > 
>> > > > > On 21/09/16 11:09, Julien Grall wrote:
>> > > > > > 
>> > > > > > 
>> > > > > > 
>> > > > > > On 20/09/16 21:17, Stefano Stabellini wrote:
>> > > > > > > 
>> > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
>> > > > > > > > 
>> > > > > > > > Hi Stefano,
>> > > > > > > > 
>> > > > > > > > On 20/09/2016 20:09, Stefano Stabellini wrote:
>> > > > > > > > > 
>> > > > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
>> > > > > > > > > > 
>> > > > > > > > > > Hi,
>> > > > > > > > > > 
>> > > > > > > > > > On 20/09/2016 12:27, George Dunlap wrote:
>> > > > > > > > > > > 
>> > > > > > > > > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
>> > > > > > > > > > > <van.freenix@gmail.com>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > 
>> > > > > > > > > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario
>> > > > > > > > > > > > Faggioli
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > 
>> > > > > > > > > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano
>> > > > > > > > > > > > > Stabellini wrote:
>> > > > > > > > > > > > > > 
>> > > > > > > > > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
>> > > > > > > > > > > > I'd like to add a computing capability in
>> > > > > > > > > > > > xen/arm, like this:
>> > > > > > > > > > > > 
>> > > > > > > > > > > > struct compute_capatiliby
>> > > > > > > > > > > > {
>> > > > > > > > > > > > ?? char *core_name;
>> > > > > > > > > > > > ?? uint32_t rank;
>> > > > > > > > > > > > ?? uint32_t cpu_partnum;
>> > > > > > > > > > > > };
>> > > > > > > > > > > > 
>> > > > > > > > > > > > struct compute_capatiliby cc=
>> > > > > > > > > > > > {
>> > > > > > > > > > > > ??{"A72", 4, 0xd08},
>> > > > > > > > > > > > ??{"A57", 3, 0xxxx},
>> > > > > > > > > > > > ??{"A53", 2, 0xd03},
>> > > > > > > > > > > > ??{"A35", 1, ...},
>> > > > > > > > > > > > }
>> > > > > > > > > > > > 
>> > > > > > > > > > > > Then when identify cpu, we decide which cpu is
>> > > > > > > > > > > > big and which
>> > > > > > > > > > > > cpu is
>> > > > > > > > > > > > little
>> > > > > > > > > > > > according to the computing rank.
>> > > > > > > > > > > > 
>> > > > > > > > > > > > Any comments?
>> > > > > > > > > > > 
>> > > > > > > > > > > I think we definitely need to have Xen have some
>> > > > > > > > > > > kind of idea
>> > > > > > > > > > > the
>> > > > > > > > > > > order between processors, so that the user
>> > > > > > > > > > > doesn't need to
>> > > > > > > > > > > figure out
>> > > > > > > > > > > which class / pool is big and which pool is
>> > > > > > > > > > > LITTLE.????Whether
>> > > > > > > > > > > this
>> > > > > > > > > > > sort
>> > > > > > > > > > > of enumeration is the best way to do that I'll
>> > > > > > > > > > > let Julien and
>> > > > > > > > > > > Stefano
>> > > > > > > > > > > give their opinion.
>> > > > > > > > > > 
>> > > > > > > > > > I don't think an hardcoded list of processor in Xen
>> > > > > > > > > > is the right
>> > > > > > > > > > solution.
>> > > > > > > > > > There are many existing processors and combinations
>> > > > > > > > > > for big.LITTLE
>> > > > > > > > > > so it
>> > > > > > > > > > will
>> > > > > > > > > > nearly be impossible to keep updated.
>> > > > > > > > > > 
>> > > > > > > > > > I would expect the firmware table (device tree,
>> > > > > > > > > > ACPI) to provide
>> > > > > > > > > > relevant
>> > > > > > > > > > data
>> > > > > > > > > > for each processor and differentiate big from
>> > > > > > > > > > LITTLE core.
>> > > > > > > > > > Note that I haven't looked at it for now. A good
>> > > > > > > > > > place to start is
>> > > > > > > > > > looking
>> > > > > > > > > > at
>> > > > > > > > > > how Linux does.
>> > > > > > > > > 
>> > > > > > > > > That's right, see
>> > > > > > > > > Documentation/devicetree/bindings/arm/cpus.txt. It
>> > > > > > > > > is
>> > > > > > > > > trivial to identify the two different CPU classes and
>> > > > > > > > > which cores
>> > > > > > > > > belong
>> > > > > > > > > to which class.t, as
>> > > > > > > > 
>> > > > > > > > The class of the CPU can be found from the MIDR, there
>> > > > > > > > is no need to
>> > > > > > > > use the
>> > > > > > > > device tree/acpi for that. Note that I don't think
>> > > > > > > > there is an easy
>> > > > > > > > way in
>> > > > > > > > ACPI (i.e not in AML) to find out the class.
>> > > > > > > > 
>> > > > > > > > > 
>> > > > > > > > > It is harder to figure out which one is supposed to
>> > > > > > > > > be
>> > > > > > > > > big and which one LITTLE. Regardless, we could
>> > > > > > > > > default to using the
>> > > > > > > > > first cluster (usually big), which is also the
>> > > > > > > > > cluster of the boot
>> > > > > > > > > cpu,
>> > > > > > > > > and utilize the second cluster only when the user
>> > > > > > > > > demands it.
>> > > > > > > > 
>> > > > > > > > Why do you think the boot CPU will usually be a big
>> > > > > > > > one? In the case
>> > > > > > > > of Juno
>> > > > > > > > platform it is configurable, and the boot CPU is a
>> > > > > > > > little core on r2
>> > > > > > > > by
>> > > > > > > > default.
>> > > > > > > > 
>> > > > > > > > In any case, what we care about is differentiate
>> > > > > > > > between two set of
>> > > > > > > > CPUs. I
>> > > > > > > > don't think Xen should care about migrating a guest
>> > > > > > > > vCPU between big
>> > > > > > > > and
>> > > > > > > > LITTLE cpus. So I am not sure why we would want to know
>> > > > > > > > that.
>> > > > > > > 
>> > > > > > > No, it is not about migrating (at least yet). It is about
>> > > > > > > giving useful
>> > > > > > > information to the user. It would be nice if the user had
>> > > > > > > to choose
>> > > > > > > between "big" and "LITTLE" rather than "class 0x1" and
>> > > > > > > "class 0x100", or
>> > > > > > > even "A7" or "A15".
>> > > > > > 
>> > > > > > I don't think it is wise to assume that we may have only 2
>> > > > > > kind of CPUs
>> > > > > > on the platform. We may have more in the future, if so how
>> > > > > > would you
>> > > > > > name them?
>> > > > > 
>> > > > > I would suggest that internally Xen recognize an arbitrary
>> > > > > number of
>> > > > > processor "classes", and order them according to more
>> > > > > powerful -> less
>> > > > > powerful.????Then if at some point someone makes a platform
>> > > > > with three
>> > > > > processors, you can say "class 0", "class 1" or "class
>> > > > > 2".????"big" would
>> > > > > be an alias for "class 0" and "little" would be an alias for
>> > > > > "class 1".
>> > > > 
>> > > > As mentioned earlier, there is no upstreamed yet device tree
>> > > > bindings to know
>> > > > the "power" of a CPU (see [1]
>> > > > 
>> > > > > 
>> > > > > 
>> > > > > And in my suggestion, we allow a richer set of labels, so
>> > > > > that the user
>> > > > > could also be more specific -- e.g., asking for "A15"
>> > > > > specifically, for
>> > > > > example, and failing to build if there are no A15 cores
>> > > > > present, while
>> > > > > allowing users to simply write "big" or "little" if they want
>> > > > > simplicity
>> > > > > / things which work across different platforms.
>> > > > 
>> > > > Well, before trying to do something clever like that (i.e
>> > > > naming "big" and
>> > > > "little"), we need to have upstreamed bindings available to
>> > > > acknowledge the
>> > > > difference. AFAICT, it is not yet upstreamed for Device Tree
>> > > > (see [1]) and I
>> > > > don't know any static ACPI tables providing the similar
>> > > > information.
>> > > 
>> > > I like George's idea that "big" and "little" could be just
>> > > convenience
>> > > aliases. Of course they are predicated on the necessary device
>> > > tree
>> > > bindings being upstream. We don't need [1] to be upstream in
>> > > Linux, just
>> > > the binding:
>> > > 
>> > > http://marc.info/?l=linux-arm-kernel&m=147308556729426&w=2
>> > > 
>> > > which has already been acked by the relevant maintainers.
>> > 
>> > This is device tree only. What about ACPI?
>> > 
>> > > 
>> > > 
>> > > 
>> > > > 
>> > > > I had few discussions and????more thought about big.LITTLE
>> > > > support in Xen. The
>> > > > main goal of big.LITTLE is power efficiency by moving task
>> > > > around and been
>> > > > able to idle one cluster. All the solutions suggested
>> > > > (including mine) so far,
>> > > > can be replicated by hand (except the VPIDR) so they are mostly
>> > > > an automatic
>> > > > way. This will also remove the real benefits of big.LITTLE
>> > > > because Xen will
>> > > > not be able to migrate vCPU across cluster for power
>> > > > efficiency.
>> > > 
>> > > The goal of the architects of big.LITTLE might have been power
>> > > efficiency, but of course we are free to use any features that
>> > > the
>> > > hardware provides in the best way for Xen and the Xen community.
>> > 
>> > This is very dependent on how the big.LITTLE has been implemented
>> > by the
>> > hardware. Some platform can not run both big and LITTLE cores at
>> > the same
>> > time. You need a proper switch in the firmware/hypervisor.
>> > 
>> > > 
>> > > 
>> > > > 
>> > > > If we care about power efficiency, we would have to handle
>> > > > seamlessly
>> > > > big.LITTLE in Xen (i.e a guess would only see a kind of CPU).
>> > > > This arise quite
>> > > > few problem, nothing insurmountable, similar to migration
>> > > > across two platforms
>> > > > with different micro-architecture (e.g processors): errata,
>> > > > features
>> > > > supported... The guest would have to know the union of all the
>> > > > errata (this is
>> > > > done so far via the MIDR, so we would a PV way to do it), and
>> > > > only the
>> > > > intersection of features would be exposed to the guest. This
>> > > > also means the
>> > > > scheduler would have to be modified to handle power efficiency
>> > > > (not strictly
>> > > > necessary at the beginning).
>> > > > 
>> > > > I agree that a such solution would require some work to
>> > > > implement, although
>> > > > Xen will have a better control of the energy consumption of the
>> > > > platform.
>> > > > 
>> > > > So the question here, is what do we want to achieve with
>> > > > big.LITTLE?
>> > > 
>> > > I don't think that handling seamlessly big.LITTLE in Xen is the
>> > > best way
>> > > to do it in the scenarios where Xen on ARM is being used today. I
>> > > understand the principles behind it, but I don't think that it
>> > > will lead
>> > > to good results in a virtualized environment, where there is more
>> > > activity and more vcpus than pcpus.
>> > 
>> > Can you detail why you don't think it will give good results?
>> > 
>> > > 
>> > > 
>> > > What we discussed in this thread so far is actionable, and gives
>> > > us
>> > > big.LITTLE support in a short time frame. It is a good fit for
>> > > Xen on
>> > > ARM use cases and still leads to lower power consumption with an
>> > > wise
>> > > allocation of big and LITTLE vcpus and pcpus to guests.
>> > 
>> > How this would lead to lower power consumption? If there is nothing
>> > running
>> > of the processor we would have a wfi loop which will never put the
>> > physical
>> > CPU in deep sleep. The main advantage of big.LITTLE is too be able
>> > to switch
>> > off a cluster/cpu when it is not used.
>> > 
>> > Without any knowledge in Xen (such as CPU freq), I am afraid the
>> > the power
>> > consumption will still be the same.
>> > 
>> > > 
>> > > 
>> > > I would start from this approach, then if somebody comes along
>> > > with a
>> > > plan to implement a big.LITTLE switcher in Xen, I welcome her to
>> > > do it
>> > > and I would be happy to accept the code in Xen. We'll just make
>> > > it
>> > > optional.
>> > 
>> > I think we are discussing here a simple design for big.LITTLE. I
>> > never asked
>> > Peng to do all the work. I am worry that if we start to expose the
>> > big.LITTLE
>> > to the userspace it will be hard in the future to step back from
>> > it.
>> 
>> Hello Julien,
>> 
>> 
>> I prefer the simple doable method, and As Stefano said, actionable.
>> 
>Yep, this is a very good starting point, IMO.
>
>> ??- introduce a hypercall interface to let xl can get the different
>> classes cpu info.
>>
>+1
>
>> ??- Use vcpuclass in xl cfg file to let user create different vcpus
>>
>Yep.
>
>> ??- extract cpu computing cap from dts to differentiate cpus. As you
>> said, bindings
>> ??????not upstreamed. But this is not the hardpoint, we could change the
>> info, whether
>> ??????dmips or cap in future.
>>
>Yes (or I should say, "whatever", as I know nothing about all
>this! :-P)
>
>> ??- use cpu hard affinity to block vcpu scheduling bwtween little and
>> big pcpu.
>>
>"to block vcpu scheduling within the specified classes, for each vcpu"
>
>But, again, yes.
>
>> ??- block user setting vcpu hard affinity bwtween big and LITTLE.
>>
>"between the specified class"
>
>Indeed.
>
>> ??- only hard affinity seems enough, no need soft affinity.
>> 
>Correct. Just don't care at all and don't touch soft affinity for now.
>
>> Anyway, for vcpu scheduling bwtween big and LITTLE, if this is the
>> right
>> direction and you have an idea on how to implement, I could follow
>> you on
>> enabling this feature with you leading the work. I do not have much
>> idea on this.
>> 
>This can come later, either as an enhancement of this affinity based
>solution, or being implemented on top of it.
>
>In any case, let's start with the affinity based solution for now. It's
>a good level support already, and a nice first step toward future
>improvements.
>
>Oh, btw, please don't throw away this cpupool patch series either!

Ok -:)

>
>A feature like `xl cpupool-biglittle-split' can still be interesting,

"cpupool-cluster-split" maybe a better name?

>completely orthogonally and independently from the affinity based work,
>and this series looks like it can be used to implement that. :-)

Agree. All pcpus default can be assigned into cpupool0 based on the affinity work.
We could add one like "cpupool-numa-split" to split different classes cpu
into different cpupools.

Thanks,
Peng.

>
>Thanks and Regards,
>Dario
>-- 
><<This happens because I choose it to happen!>> (Raistlin Majere)
>-----------------------------------------------------------------
>Dario Faggioli, Ph.D, http://about.me/dario.faggioli
>Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-21 10:15                                     ` Julien Grall
  2016-09-21 12:28                                       ` Peng Fan
@ 2016-09-22  9:45                                       ` Peng Fan
  2016-09-22 11:21                                         ` Julien Grall
  1 sibling, 1 reply; 85+ messages in thread
From: Peng Fan @ 2016-09-22  9:45 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Dario Faggioli, xen-devel, Jan Beulich

On Wed, Sep 21, 2016 at 11:15:35AM +0100, Julien Grall wrote:
>Hello Peng,
>
>On 21/09/16 09:38, Peng Fan wrote:
>>On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:
>>>On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>>>On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>>On 20/09/2016 12:27, George Dunlap wrote:
>>>>>>>On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com>
>>>>>>>wrote:
>>>>>>>>On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>>>>>>>>On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>>>>On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>It is harder to figure out which one is supposed to be
>>>>>big and which one LITTLE. Regardless, we could default to using the
>>>>>first cluster (usually big), which is also the cluster of the boot cpu,
>>>>>and utilize the second cluster only when the user demands it.
>>>>
>>>>Why do you think the boot CPU will usually be a big one? In the case of Juno
>>>>platform it is configurable, and the boot CPU is a little core on r2 by
>>>>default.
>>>>
>>>>In any case, what we care about is differentiate between two set of CPUs. I
>>>>don't think Xen should care about migrating a guest vCPU between big and
>>>>LITTLE cpus. So I am not sure why we would want to know that.
>>>
>>>No, it is not about migrating (at least yet). It is about giving useful
>>>information to the user. It would be nice if the user had to choose
>>>between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>>even "A7" or "A15".
>>
>>As Dario mentioned in previous email,
>>for dom0 provide like this:
>>
>>dom0_vcpus_big = 4
>>dom0_vcpus_little = 2
>>
>>to dom0.
>>
>>If these two no provided, we could let dom0 runs on big pcpus or big.little.
>>Anyway this is not the important point for dom0 only big or big.little.
>>
>>For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
>>Such as:
>>
>>vcpus.big = 2
>>vcpus.litle = 4
>>
>>
>>According to George's comments,
>>Then, I think we could use affinity to restrict little vcpus be scheduled on little vcpus,
>>and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, use hard
>>affinity is to handle this.
>>
>>We may need to provide some interface to let xl can get the information such as
>>big.little or smp. if it is big.little, which is big and which is little.
>>
>>For how to differentiate cpus, I am looking the linaro eas cpu topology code,
>>The code has not been upstreamed (:, but merged into google android kernel.
>>I only plan to take some necessary code, such as device tree parse and
>>cpu topology build, because we only need to know the computing capacity of each pcpu.
>>
>>Some doc about eas piece, including dts node examples:
>>https://git.linaro.org/arm/eas/kernel.git/blob/refs/heads/lsk-v4.4-eas-v5.2:/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt
>
>I am reluctant to take any non-upstreamed bindings in Xen. There is a similar
>series going on the lklm [1].

For how to differentiate cpu classes, how about directly use
compatible property of each cpu node?

	A57_0: cpu@0 {
		compatible = "arm,cortex-a57","arm,armv8";
		reg = <0x0 0x0>;
		...
	};

	A53_0: cpu@100 {
		compatible = "arm,cortex-a53","arm,armv8";
		reg = <0x0 0x100>;
		.....
	}

Thanks,
Peng.

>
>But it sounds like it is a lot of works for little benefits (i.e giving a
>better name to the set of CPUs). The naming will also not fit if in the
>future hardware will have more than 2 kind of CPUs.
>
>[...]
>
>>I am not sure, but we may also need to handle mpidr for ARM, because big and little vcpus are supported.
>
>I am not sure to understand what you mean here.
>
>Regards,
>
>[1] https://lwn.net/Articles/699569/
>
>-- 
>Julien Grall

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22  9:27                                                 ` Peng Fan
@ 2016-09-22  9:51                                                   ` George Dunlap
  2016-09-22 10:09                                                     ` Peng Fan
  2016-09-22 10:13                                                     ` Juergen Gross
  2016-09-22  9:52                                                   ` Dario Faggioli
  2016-09-22 11:29                                                   ` Julien Grall
  2 siblings, 2 replies; 85+ messages in thread
From: George Dunlap @ 2016-09-22  9:51 UTC (permalink / raw)
  To: Peng Fan, Dario Faggioli
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Punit Agrawal, xen-devel,
	Julien Grall, Jan Beulich

On 22/09/16 10:27, Peng Fan wrote:
> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>> On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>>> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>>>>
>>>> Hi Stefano,
>>>>
>>>> On 21/09/2016 19:13, Stefano Stabellini wrote:
>>>>>
>>>>> On Wed, 21 Sep 2016, Julien Grall wrote:
>>>>>>
>>>>>> (CC a couple of ARM folks)
>>>>>>
>>>>>> On 21/09/16 11:22, George Dunlap wrote:
>>>>>>>
>>>>>>> On 21/09/16 11:09, Julien Grall wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 20/09/16 21:17, Stefano Stabellini wrote:
>>>>>>>>>
>>>>>>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Stefano,
>>>>>>>>>>
>>>>>>>>>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> On 20/09/2016 12:27, George Dunlap wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
>>>>>>>>>>>>> <van.freenix@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario
>>>>>>>>>>>>>> Faggioli
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano
>>>>>>>>>>>>>>> Stabellini wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>>>>>>>>>> I'd like to add a computing capability in
>>>>>>>>>>>>>> xen/arm, like this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> struct compute_capatiliby
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> ?? char *core_name;
>>>>>>>>>>>>>> ?? uint32_t rank;
>>>>>>>>>>>>>> ?? uint32_t cpu_partnum;
>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> struct compute_capatiliby cc=
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> ??{"A72", 4, 0xd08},
>>>>>>>>>>>>>> ??{"A57", 3, 0xxxx},
>>>>>>>>>>>>>> ??{"A53", 2, 0xd03},
>>>>>>>>>>>>>> ??{"A35", 1, ...},
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then when identify cpu, we decide which cpu is
>>>>>>>>>>>>>> big and which
>>>>>>>>>>>>>> cpu is
>>>>>>>>>>>>>> little
>>>>>>>>>>>>>> according to the computing rank.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any comments?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think we definitely need to have Xen have some
>>>>>>>>>>>>> kind of idea
>>>>>>>>>>>>> the
>>>>>>>>>>>>> order between processors, so that the user
>>>>>>>>>>>>> doesn't need to
>>>>>>>>>>>>> figure out
>>>>>>>>>>>>> which class / pool is big and which pool is
>>>>>>>>>>>>> LITTLE.????Whether
>>>>>>>>>>>>> this
>>>>>>>>>>>>> sort
>>>>>>>>>>>>> of enumeration is the best way to do that I'll
>>>>>>>>>>>>> let Julien and
>>>>>>>>>>>>> Stefano
>>>>>>>>>>>>> give their opinion.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think an hardcoded list of processor in Xen
>>>>>>>>>>>> is the right
>>>>>>>>>>>> solution.
>>>>>>>>>>>> There are many existing processors and combinations
>>>>>>>>>>>> for big.LITTLE
>>>>>>>>>>>> so it
>>>>>>>>>>>> will
>>>>>>>>>>>> nearly be impossible to keep updated.
>>>>>>>>>>>>
>>>>>>>>>>>> I would expect the firmware table (device tree,
>>>>>>>>>>>> ACPI) to provide
>>>>>>>>>>>> relevant
>>>>>>>>>>>> data
>>>>>>>>>>>> for each processor and differentiate big from
>>>>>>>>>>>> LITTLE core.
>>>>>>>>>>>> Note that I haven't looked at it for now. A good
>>>>>>>>>>>> place to start is
>>>>>>>>>>>> looking
>>>>>>>>>>>> at
>>>>>>>>>>>> how Linux does.
>>>>>>>>>>>
>>>>>>>>>>> That's right, see
>>>>>>>>>>> Documentation/devicetree/bindings/arm/cpus.txt. It
>>>>>>>>>>> is
>>>>>>>>>>> trivial to identify the two different CPU classes and
>>>>>>>>>>> which cores
>>>>>>>>>>> belong
>>>>>>>>>>> to which class.t, as
>>>>>>>>>>
>>>>>>>>>> The class of the CPU can be found from the MIDR, there
>>>>>>>>>> is no need to
>>>>>>>>>> use the
>>>>>>>>>> device tree/acpi for that. Note that I don't think
>>>>>>>>>> there is an easy
>>>>>>>>>> way in
>>>>>>>>>> ACPI (i.e not in AML) to find out the class.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It is harder to figure out which one is supposed to
>>>>>>>>>>> be
>>>>>>>>>>> big and which one LITTLE. Regardless, we could
>>>>>>>>>>> default to using the
>>>>>>>>>>> first cluster (usually big), which is also the
>>>>>>>>>>> cluster of the boot
>>>>>>>>>>> cpu,
>>>>>>>>>>> and utilize the second cluster only when the user
>>>>>>>>>>> demands it.
>>>>>>>>>>
>>>>>>>>>> Why do you think the boot CPU will usually be a big
>>>>>>>>>> one? In the case
>>>>>>>>>> of Juno
>>>>>>>>>> platform it is configurable, and the boot CPU is a
>>>>>>>>>> little core on r2
>>>>>>>>>> by
>>>>>>>>>> default.
>>>>>>>>>>
>>>>>>>>>> In any case, what we care about is differentiate
>>>>>>>>>> between two set of
>>>>>>>>>> CPUs. I
>>>>>>>>>> don't think Xen should care about migrating a guest
>>>>>>>>>> vCPU between big
>>>>>>>>>> and
>>>>>>>>>> LITTLE cpus. So I am not sure why we would want to know
>>>>>>>>>> that.
>>>>>>>>>
>>>>>>>>> No, it is not about migrating (at least yet). It is about
>>>>>>>>> giving useful
>>>>>>>>> information to the user. It would be nice if the user had
>>>>>>>>> to choose
>>>>>>>>> between "big" and "LITTLE" rather than "class 0x1" and
>>>>>>>>> "class 0x100", or
>>>>>>>>> even "A7" or "A15".
>>>>>>>>
>>>>>>>> I don't think it is wise to assume that we may have only 2
>>>>>>>> kind of CPUs
>>>>>>>> on the platform. We may have more in the future, if so how
>>>>>>>> would you
>>>>>>>> name them?
>>>>>>>
>>>>>>> I would suggest that internally Xen recognize an arbitrary
>>>>>>> number of
>>>>>>> processor "classes", and order them according to more
>>>>>>> powerful -> less
>>>>>>> powerful.????Then if at some point someone makes a platform
>>>>>>> with three
>>>>>>> processors, you can say "class 0", "class 1" or "class
>>>>>>> 2".????"big" would
>>>>>>> be an alias for "class 0" and "little" would be an alias for
>>>>>>> "class 1".
>>>>>>
>>>>>> As mentioned earlier, there is no upstreamed yet device tree
>>>>>> bindings to know
>>>>>> the "power" of a CPU (see [1]
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> And in my suggestion, we allow a richer set of labels, so
>>>>>>> that the user
>>>>>>> could also be more specific -- e.g., asking for "A15"
>>>>>>> specifically, for
>>>>>>> example, and failing to build if there are no A15 cores
>>>>>>> present, while
>>>>>>> allowing users to simply write "big" or "little" if they want
>>>>>>> simplicity
>>>>>>> / things which work across different platforms.
>>>>>>
>>>>>> Well, before trying to do something clever like that (i.e
>>>>>> naming "big" and
>>>>>> "little"), we need to have upstreamed bindings available to
>>>>>> acknowledge the
>>>>>> difference. AFAICT, it is not yet upstreamed for Device Tree
>>>>>> (see [1]) and I
>>>>>> don't know any static ACPI tables providing the similar
>>>>>> information.
>>>>>
>>>>> I like George's idea that "big" and "little" could be just
>>>>> convenience
>>>>> aliases. Of course they are predicated on the necessary device
>>>>> tree
>>>>> bindings being upstream. We don't need [1] to be upstream in
>>>>> Linux, just
>>>>> the binding:
>>>>>
>>>>> http://marc.info/?l=linux-arm-kernel&m=147308556729426&w=2
>>>>>
>>>>> which has already been acked by the relevant maintainers.
>>>>
>>>> This is device tree only. What about ACPI?
>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> I had few discussions and????more thought about big.LITTLE
>>>>>> support in Xen. The
>>>>>> main goal of big.LITTLE is power efficiency by moving task
>>>>>> around and been
>>>>>> able to idle one cluster. All the solutions suggested
>>>>>> (including mine) so far,
>>>>>> can be replicated by hand (except the VPIDR) so they are mostly
>>>>>> an automatic
>>>>>> way. This will also remove the real benefits of big.LITTLE
>>>>>> because Xen will
>>>>>> not be able to migrate vCPU across cluster for power
>>>>>> efficiency.
>>>>>
>>>>> The goal of the architects of big.LITTLE might have been power
>>>>> efficiency, but of course we are free to use any features that
>>>>> the
>>>>> hardware provides in the best way for Xen and the Xen community.
>>>>
>>>> This is very dependent on how the big.LITTLE has been implemented
>>>> by the
>>>> hardware. Some platform can not run both big and LITTLE cores at
>>>> the same
>>>> time. You need a proper switch in the firmware/hypervisor.
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> If we care about power efficiency, we would have to handle
>>>>>> seamlessly
>>>>>> big.LITTLE in Xen (i.e a guess would only see a kind of CPU).
>>>>>> This arise quite
>>>>>> few problem, nothing insurmountable, similar to migration
>>>>>> across two platforms
>>>>>> with different micro-architecture (e.g processors): errata,
>>>>>> features
>>>>>> supported... The guest would have to know the union of all the
>>>>>> errata (this is
>>>>>> done so far via the MIDR, so we would a PV way to do it), and
>>>>>> only the
>>>>>> intersection of features would be exposed to the guest. This
>>>>>> also means the
>>>>>> scheduler would have to be modified to handle power efficiency
>>>>>> (not strictly
>>>>>> necessary at the beginning).
>>>>>>
>>>>>> I agree that a such solution would require some work to
>>>>>> implement, although
>>>>>> Xen will have a better control of the energy consumption of the
>>>>>> platform.
>>>>>>
>>>>>> So the question here, is what do we want to achieve with
>>>>>> big.LITTLE?
>>>>>
>>>>> I don't think that handling seamlessly big.LITTLE in Xen is the
>>>>> best way
>>>>> to do it in the scenarios where Xen on ARM is being used today. I
>>>>> understand the principles behind it, but I don't think that it
>>>>> will lead
>>>>> to good results in a virtualized environment, where there is more
>>>>> activity and more vcpus than pcpus.
>>>>
>>>> Can you detail why you don't think it will give good results?
>>>>
>>>>>
>>>>>
>>>>> What we discussed in this thread so far is actionable, and gives
>>>>> us
>>>>> big.LITTLE support in a short time frame. It is a good fit for
>>>>> Xen on
>>>>> ARM use cases and still leads to lower power consumption with an
>>>>> wise
>>>>> allocation of big and LITTLE vcpus and pcpus to guests.
>>>>
>>>> How this would lead to lower power consumption? If there is nothing
>>>> running
>>>> of the processor we would have a wfi loop which will never put the
>>>> physical
>>>> CPU in deep sleep. The main advantage of big.LITTLE is too be able
>>>> to switch
>>>> off a cluster/cpu when it is not used.
>>>>
>>>> Without any knowledge in Xen (such as CPU freq), I am afraid the
>>>> the power
>>>> consumption will still be the same.
>>>>
>>>>>
>>>>>
>>>>> I would start from this approach, then if somebody comes along
>>>>> with a
>>>>> plan to implement a big.LITTLE switcher in Xen, I welcome her to
>>>>> do it
>>>>> and I would be happy to accept the code in Xen. We'll just make
>>>>> it
>>>>> optional.
>>>>
>>>> I think we are discussing here a simple design for big.LITTLE. I
>>>> never asked
>>>> Peng to do all the work. I am worry that if we start to expose the
>>>> big.LITTLE
>>>> to the userspace it will be hard in the future to step back from
>>>> it.
>>>
>>> Hello Julien,
>>>
>>>
>>> I prefer the simple doable method, and As Stefano said, actionable.
>>>
>> Yep, this is a very good starting point, IMO.
>>
>>> ??- introduce a hypercall interface to let xl can get the different
>>> classes cpu info.
>>>
>> +1
>>
>>> ??- Use vcpuclass in xl cfg file to let user create different vcpus
>>>
>> Yep.
>>
>>> ??- extract cpu computing cap from dts to differentiate cpus. As you
>>> said, bindings
>>> ??????not upstreamed. But this is not the hardpoint, we could change the
>>> info, whether
>>> ??????dmips or cap in future.
>>>
>> Yes (or I should say, "whatever", as I know nothing about all
>> this! :-P)
>>
>>> ??- use cpu hard affinity to block vcpu scheduling bwtween little and
>>> big pcpu.
>>>
>> "to block vcpu scheduling within the specified classes, for each vcpu"
>>
>> But, again, yes.
>>
>>> ??- block user setting vcpu hard affinity bwtween big and LITTLE.
>>>
>> "between the specified class"
>>
>> Indeed.
>>
>>> ??- only hard affinity seems enough, no need soft affinity.
>>>
>> Correct. Just don't care at all and don't touch soft affinity for now.
>>
>>> Anyway, for vcpu scheduling bwtween big and LITTLE, if this is the
>>> right
>>> direction and you have an idea on how to implement, I could follow
>>> you on
>>> enabling this feature with you leading the work. I do not have much
>>> idea on this.
>>>
>> This can come later, either as an enhancement of this affinity based
>> solution, or being implemented on top of it.
>>
>> In any case, let's start with the affinity based solution for now. It's
>> a good level support already, and a nice first step toward future
>> improvements.
>>
>> Oh, btw, please don't throw away this cpupool patch series either!
> 
> Ok -:)
> 
>>
>> A feature like `xl cpupool-biglittle-split' can still be interesting,
> 
> "cpupool-cluster-split" maybe a better name?

I think we should name this however we name the different types of cpus.
 i.e., if we're going to call these "cpu classes", then we should call
this "cpupool-cpuclass-split" or something.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22  9:27                                                 ` Peng Fan
  2016-09-22  9:51                                                   ` George Dunlap
@ 2016-09-22  9:52                                                   ` Dario Faggioli
  2016-09-22 11:29                                                   ` Julien Grall
  2 siblings, 0 replies; 85+ messages in thread
From: Dario Faggioli @ 2016-09-22  9:52 UTC (permalink / raw)
  To: Peng Fan
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Punit Agrawal, George Dunlap,
	xen-devel, Julien Grall, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1192 bytes --]

On Thu, 2016-09-22 at 17:27 +0800, Peng Fan wrote:
> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
> > A feature like `xl cpupool-biglittle-split' can still be
> > interesting,
> 
> "cpupool-cluster-split" maybe a better name?
> 
Yeah, sure, whatever! :-D

> > 
> > completely orthogonally and independently from the affinity based
> > work,
> > and this series looks like it can be used to implement that. :-)
> 
> Agree. All pcpus default can be assigned into cpupool0 based on the
> affinity work.
>
Exactly. If we work on affinity, this cpupool splitting will not be by
any means necessary, and must not be done at boot. It will be something
that the user can do by himself at any time, and move/create domains
inside the various pools, if that's what he wants.

> We could add one like "cpupool-numa-split" to split different classes
> cpu
> into different cpupools.
> 
Yep.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22  8:50                                               ` Dario Faggioli
  2016-09-22  9:27                                                 ` Peng Fan
@ 2016-09-22 10:05                                                 ` Peng Fan
  2016-09-22 16:26                                                   ` Dario Faggioli
  1 sibling, 1 reply; 85+ messages in thread
From: Peng Fan @ 2016-09-22 10:05 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Punit Agrawal, George Dunlap,
	xen-devel, Julien Grall, Jan Beulich

On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>> > 
>> > Hi Stefano,
>> > 
>> > On 21/09/2016 19:13, Stefano Stabellini wrote:
>> > > 
>> > > On Wed, 21 Sep 2016, Julien Grall wrote:
>> > > > 
>> > > > (CC a couple of ARM folks)
>> > > > 
>> > > > On 21/09/16 11:22, George Dunlap wrote:
>> > > > > 
>> > > > > On 21/09/16 11:09, Julien Grall wrote:
>> > > > > > 
>> > > > > > 
>> > > > > > 
>> > > > > > On 20/09/16 21:17, Stefano Stabellini wrote:
>> > > > > > > 
>> > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
>> > > > > > > > 
>> > > > > > > > Hi Stefano,
>> > > > > > > > 
>> > > > > > > > On 20/09/2016 20:09, Stefano Stabellini wrote:
>> > > > > > > > > 
>> > > > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
>> > > > > > > > > > 
>> > > > > > > > > > Hi,
>> > > > > > > > > > 
>> > > > > > > > > > On 20/09/2016 12:27, George Dunlap wrote:
>> > > > > > > > > > > 
>> > > > > > > > > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
>> > > > > > > > > > > <van.freenix@gmail.com>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > 
>> > > > > > > > > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario
>> > > > > > > > > > > > Faggioli
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > 
>> > > > > > > > > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano
>> > > > > > > > > > > > > Stabellini wrote:
>> > > > > > > > > > > > > > 
>> > > > > > > > > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
>> > > > > > > > > > > > I'd like to add a computing capability in
>> > > > > > > > > > > > xen/arm, like this:
>> > > > > > > > > > > > 
>> > > > > > > > > > > > struct compute_capatiliby
>> > > > > > > > > > > > {
>> > > > > > > > > > > > ?? char *core_name;
>> > > > > > > > > > > > ?? uint32_t rank;
>> > > > > > > > > > > > ?? uint32_t cpu_partnum;
>> > > > > > > > > > > > };
>> > > > > > > > > > > > 
>> > > > > > > > > > > > struct compute_capatiliby cc=
>> > > > > > > > > > > > {
>> > > > > > > > > > > > ??{"A72", 4, 0xd08},
>> > > > > > > > > > > > ??{"A57", 3, 0xxxx},
>> > > > > > > > > > > > ??{"A53", 2, 0xd03},
>> > > > > > > > > > > > ??{"A35", 1, ...},
>> > > > > > > > > > > > }
>> > > > > > > > > > > > 
>> > > > > > > > > > > > Then when identify cpu, we decide which cpu is
>> > > > > > > > > > > > big and which
>> > > > > > > > > > > > cpu is
>> > > > > > > > > > > > little
>> > > > > > > > > > > > according to the computing rank.
>> > > > > > > > > > > > 
>> > > > > > > > > > > > Any comments?
>> > > > > > > > > > > 
>> > > > > > > > > > > I think we definitely need to have Xen have some
>> > > > > > > > > > > kind of idea
>> > > > > > > > > > > the
>> > > > > > > > > > > order between processors, so that the user
>> > > > > > > > > > > doesn't need to
>> > > > > > > > > > > figure out
>> > > > > > > > > > > which class / pool is big and which pool is
>> > > > > > > > > > > LITTLE.????Whether
>> > > > > > > > > > > this
>> > > > > > > > > > > sort
>> > > > > > > > > > > of enumeration is the best way to do that I'll
>> > > > > > > > > > > let Julien and
>> > > > > > > > > > > Stefano
>> > > > > > > > > > > give their opinion.
>> > > > > > > > > > 
>> > > > > > > > > > I don't think an hardcoded list of processor in Xen
>> > > > > > > > > > is the right
>> > > > > > > > > > solution.
>> > > > > > > > > > There are many existing processors and combinations
>> > > > > > > > > > for big.LITTLE
>> > > > > > > > > > so it
>> > > > > > > > > > will
>> > > > > > > > > > nearly be impossible to keep updated.
>> > > > > > > > > > 
>> > > > > > > > > > I would expect the firmware table (device tree,
>> > > > > > > > > > ACPI) to provide
>> > > > > > > > > > relevant
>> > > > > > > > > > data
>> > > > > > > > > > for each processor and differentiate big from
>> > > > > > > > > > LITTLE core.
>> > > > > > > > > > Note that I haven't looked at it for now. A good
>> > > > > > > > > > place to start is
>> > > > > > > > > > looking
>> > > > > > > > > > at
>> > > > > > > > > > how Linux does.
>> > > > > > > > > 
>> > > > > > > > > That's right, see
>> > > > > > > > > Documentation/devicetree/bindings/arm/cpus.txt. It
>> > > > > > > > > is
>> > > > > > > > > trivial to identify the two different CPU classes and
>> > > > > > > > > which cores
>> > > > > > > > > belong
>> > > > > > > > > to which class.t, as
>> > > > > > > > 
>> > > > > > > > The class of the CPU can be found from the MIDR, there
>> > > > > > > > is no need to
>> > > > > > > > use the
>> > > > > > > > device tree/acpi for that. Note that I don't think
>> > > > > > > > there is an easy
>> > > > > > > > way in
>> > > > > > > > ACPI (i.e not in AML) to find out the class.
>> > > > > > > > 
>> > > > > > > > > 
>> > > > > > > > > It is harder to figure out which one is supposed to
>> > > > > > > > > be
>> > > > > > > > > big and which one LITTLE. Regardless, we could
>> > > > > > > > > default to using the
>> > > > > > > > > first cluster (usually big), which is also the
>> > > > > > > > > cluster of the boot
>> > > > > > > > > cpu,
>> > > > > > > > > and utilize the second cluster only when the user
>> > > > > > > > > demands it.
>> > > > > > > > 
>> > > > > > > > Why do you think the boot CPU will usually be a big
>> > > > > > > > one? In the case
>> > > > > > > > of Juno
>> > > > > > > > platform it is configurable, and the boot CPU is a
>> > > > > > > > little core on r2
>> > > > > > > > by
>> > > > > > > > default.
>> > > > > > > > 
>> > > > > > > > In any case, what we care about is differentiate
>> > > > > > > > between two set of
>> > > > > > > > CPUs. I
>> > > > > > > > don't think Xen should care about migrating a guest
>> > > > > > > > vCPU between big
>> > > > > > > > and
>> > > > > > > > LITTLE cpus. So I am not sure why we would want to know
>> > > > > > > > that.
>> > > > > > > 
>> > > > > > > No, it is not about migrating (at least yet). It is about
>> > > > > > > giving useful
>> > > > > > > information to the user. It would be nice if the user had
>> > > > > > > to choose
>> > > > > > > between "big" and "LITTLE" rather than "class 0x1" and
>> > > > > > > "class 0x100", or
>> > > > > > > even "A7" or "A15".
>> > > > > > 
>> > > > > > I don't think it is wise to assume that we may have only 2
>> > > > > > kind of CPUs
>> > > > > > on the platform. We may have more in the future, if so how
>> > > > > > would you
>> > > > > > name them?
>> > > > > 
>> > > > > I would suggest that internally Xen recognize an arbitrary
>> > > > > number of
>> > > > > processor "classes", and order them according to more
>> > > > > powerful -> less
>> > > > > powerful.????Then if at some point someone makes a platform
>> > > > > with three
>> > > > > processors, you can say "class 0", "class 1" or "class
>> > > > > 2".????"big" would
>> > > > > be an alias for "class 0" and "little" would be an alias for
>> > > > > "class 1".
>> > > > 
>> > > > As mentioned earlier, there is no upstreamed yet device tree
>> > > > bindings to know
>> > > > the "power" of a CPU (see [1]
>> > > > 
>> > > > > 
>> > > > > 
>> > > > > And in my suggestion, we allow a richer set of labels, so
>> > > > > that the user
>> > > > > could also be more specific -- e.g., asking for "A15"
>> > > > > specifically, for
>> > > > > example, and failing to build if there are no A15 cores
>> > > > > present, while
>> > > > > allowing users to simply write "big" or "little" if they want
>> > > > > simplicity
>> > > > > / things which work across different platforms.
>> > > > 
>> > > > Well, before trying to do something clever like that (i.e
>> > > > naming "big" and
>> > > > "little"), we need to have upstreamed bindings available to
>> > > > acknowledge the
>> > > > difference. AFAICT, it is not yet upstreamed for Device Tree
>> > > > (see [1]) and I
>> > > > don't know any static ACPI tables providing the similar
>> > > > information.
>> > > 
>> > > I like George's idea that "big" and "little" could be just
>> > > convenience
>> > > aliases. Of course they are predicated on the necessary device
>> > > tree
>> > > bindings being upstream. We don't need [1] to be upstream in
>> > > Linux, just
>> > > the binding:
>> > > 
>> > > http://marc.info/?l=linux-arm-kernel&m=147308556729426&w=2
>> > > 
>> > > which has already been acked by the relevant maintainers.
>> > 
>> > This is device tree only. What about ACPI?
>> > 
>> > > 
>> > > 
>> > > 
>> > > > 
>> > > > I had few discussions and????more thought about big.LITTLE
>> > > > support in Xen. The
>> > > > main goal of big.LITTLE is power efficiency by moving task
>> > > > around and been
>> > > > able to idle one cluster. All the solutions suggested
>> > > > (including mine) so far,
>> > > > can be replicated by hand (except the VPIDR) so they are mostly
>> > > > an automatic
>> > > > way. This will also remove the real benefits of big.LITTLE
>> > > > because Xen will
>> > > > not be able to migrate vCPU across cluster for power
>> > > > efficiency.
>> > > 
>> > > The goal of the architects of big.LITTLE might have been power
>> > > efficiency, but of course we are free to use any features that
>> > > the
>> > > hardware provides in the best way for Xen and the Xen community.
>> > 
>> > This is very dependent on how the big.LITTLE has been implemented
>> > by the
>> > hardware. Some platform can not run both big and LITTLE cores at
>> > the same
>> > time. You need a proper switch in the firmware/hypervisor.
>> > 
>> > > 
>> > > 
>> > > > 
>> > > > If we care about power efficiency, we would have to handle
>> > > > seamlessly
>> > > > big.LITTLE in Xen (i.e a guess would only see a kind of CPU).
>> > > > This arise quite
>> > > > few problem, nothing insurmountable, similar to migration
>> > > > across two platforms
>> > > > with different micro-architecture (e.g processors): errata,
>> > > > features
>> > > > supported... The guest would have to know the union of all the
>> > > > errata (this is
>> > > > done so far via the MIDR, so we would a PV way to do it), and
>> > > > only the
>> > > > intersection of features would be exposed to the guest. This
>> > > > also means the
>> > > > scheduler would have to be modified to handle power efficiency
>> > > > (not strictly
>> > > > necessary at the beginning).
>> > > > 
>> > > > I agree that a such solution would require some work to
>> > > > implement, although
>> > > > Xen will have a better control of the energy consumption of the
>> > > > platform.
>> > > > 
>> > > > So the question here, is what do we want to achieve with
>> > > > big.LITTLE?
>> > > 
>> > > I don't think that handling seamlessly big.LITTLE in Xen is the
>> > > best way
>> > > to do it in the scenarios where Xen on ARM is being used today. I
>> > > understand the principles behind it, but I don't think that it
>> > > will lead
>> > > to good results in a virtualized environment, where there is more
>> > > activity and more vcpus than pcpus.
>> > 
>> > Can you detail why you don't think it will give good results?
>> > 
>> > > 
>> > > 
>> > > What we discussed in this thread so far is actionable, and gives
>> > > us
>> > > big.LITTLE support in a short time frame. It is a good fit for
>> > > Xen on
>> > > ARM use cases and still leads to lower power consumption with an
>> > > wise
>> > > allocation of big and LITTLE vcpus and pcpus to guests.
>> > 
>> > How this would lead to lower power consumption? If there is nothing
>> > running
>> > of the processor we would have a wfi loop which will never put the
>> > physical
>> > CPU in deep sleep. The main advantage of big.LITTLE is too be able
>> > to switch
>> > off a cluster/cpu when it is not used.
>> > 
>> > Without any knowledge in Xen (such as CPU freq), I am afraid the
>> > the power
>> > consumption will still be the same.
>> > 
>> > > 
>> > > 
>> > > I would start from this approach, then if somebody comes along
>> > > with a
>> > > plan to implement a big.LITTLE switcher in Xen, I welcome her to
>> > > do it
>> > > and I would be happy to accept the code in Xen. We'll just make
>> > > it
>> > > optional.
>> > 
>> > I think we are discussing here a simple design for big.LITTLE. I
>> > never asked
>> > Peng to do all the work. I am worry that if we start to expose the
>> > big.LITTLE
>> > to the userspace it will be hard in the future to step back from
>> > it.
>> 
>> Hello Julien,
>> 
>> 
>> I prefer the simple doable method, and As Stefano said, actionable.
>> 
>Yep, this is a very good starting point, IMO.
>
>> ??- introduce a hypercall interface to let xl can get the different
>> classes cpu info.
>>
>+1
>
>> ??- Use vcpuclass in xl cfg file to let user create different vcpus
>>
>Yep.
>
>> ??- extract cpu computing cap from dts to differentiate cpus. As you
>> said, bindings
>> ??????not upstreamed. But this is not the hardpoint, we could change the
>> info, whether
>> ??????dmips or cap in future.
>>
>Yes (or I should say, "whatever", as I know nothing about all
>this! :-P)

One more thing I'd like to ask, do you prefer cpu classes to be ARM specific or ARM/X86
common?

Thanks,
Peng.

>
>> ??- use cpu hard affinity to block vcpu scheduling bwtween little and
>> big pcpu.
>>
>"to block vcpu scheduling within the specified classes, for each vcpu"
>
>But, again, yes.
>
>> ??- block user setting vcpu hard affinity bwtween big and LITTLE.
>>
>"between the specified class"
>
>Indeed.
>
>> ??- only hard affinity seems enough, no need soft affinity.
>> 
>Correct. Just don't care at all and don't touch soft affinity for now.
>
>> Anyway, for vcpu scheduling bwtween big and LITTLE, if this is the
>> right
>> direction and you have an idea on how to implement, I could follow
>> you on
>> enabling this feature with you leading the work. I do not have much
>> idea on this.
>> 
>This can come later, either as an enhancement of this affinity based
>solution, or being implemented on top of it.
>
>In any case, let's start with the affinity based solution for now. It's
>a good level support already, and a nice first step toward future
>improvements.
>
>Oh, btw, please don't throw away this cpupool patch series either!
>
>A feature like `xl cpupool-biglittle-split' can still be interesting,
>completely orthogonally and independently from the affinity based work,
>and this series looks like it can be used to implement that. :-)
>
>Thanks and Regards,
>Dario
>-- 
><<This happens because I choose it to happen!>> (Raistlin Majere)
>-----------------------------------------------------------------
>Dario Faggioli, Ph.D, http://about.me/dario.faggioli
>Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22  9:51                                                   ` George Dunlap
@ 2016-09-22 10:09                                                     ` Peng Fan
  2016-09-22 10:39                                                       ` Dario Faggioli
  2016-09-22 10:13                                                     ` Juergen Gross
  1 sibling, 1 reply; 85+ messages in thread
From: Peng Fan @ 2016-09-22 10:09 UTC (permalink / raw)
  To: George Dunlap
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Dario Faggioli, Punit Agrawal,
	xen-devel, Julien Grall, Jan Beulich

On Thu, Sep 22, 2016 at 10:51:04AM +0100, George Dunlap wrote:
>On 22/09/16 10:27, Peng Fan wrote:
>> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>>> On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>>>> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>>>>>
>>>>> Hi Stefano,
>>>>>
>>>>> On 21/09/2016 19:13, Stefano Stabellini wrote:
>>>>>>
>>>>>> On Wed, 21 Sep 2016, Julien Grall wrote:
>>>>>>>
>>>>>>> (CC a couple of ARM folks)
>>>>>>>
>>>>>>> On 21/09/16 11:22, George Dunlap wrote:
>>>>>>>>
>>>>>>>> On 21/09/16 11:09, Julien Grall wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 20/09/16 21:17, Stefano Stabellini wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Stefano,
>>>>>>>>>>>
>>>>>>>>>>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 20/09/2016 12:27, George Dunlap wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
>>>>>>>>>>>>>> <van.freenix@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario
>>>>>>>>>>>>>>> Faggioli
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano
>>>>>>>>>>>>>>>> Stabellini wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>>>>>>>>>>> I'd like to add a computing capability in
>>>>>>>>>>>>>>> xen/arm, like this:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> struct compute_capatiliby
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> ?? char *core_name;
>>>>>>>>>>>>>>> ?? uint32_t rank;
>>>>>>>>>>>>>>> ?? uint32_t cpu_partnum;
>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> struct compute_capatiliby cc=
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> ??{"A72", 4, 0xd08},
>>>>>>>>>>>>>>> ??{"A57", 3, 0xxxx},
>>>>>>>>>>>>>>> ??{"A53", 2, 0xd03},
>>>>>>>>>>>>>>> ??{"A35", 1, ...},
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then when identify cpu, we decide which cpu is
>>>>>>>>>>>>>>> big and which
>>>>>>>>>>>>>>> cpu is
>>>>>>>>>>>>>>> little
>>>>>>>>>>>>>>> according to the computing rank.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any comments?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think we definitely need to have Xen have some
>>>>>>>>>>>>>> kind of idea
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> order between processors, so that the user
>>>>>>>>>>>>>> doesn't need to
>>>>>>>>>>>>>> figure out
>>>>>>>>>>>>>> which class / pool is big and which pool is
>>>>>>>>>>>>>> LITTLE.????Whether
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> sort
>>>>>>>>>>>>>> of enumeration is the best way to do that I'll
>>>>>>>>>>>>>> let Julien and
>>>>>>>>>>>>>> Stefano
>>>>>>>>>>>>>> give their opinion.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think an hardcoded list of processor in Xen
>>>>>>>>>>>>> is the right
>>>>>>>>>>>>> solution.
>>>>>>>>>>>>> There are many existing processors and combinations
>>>>>>>>>>>>> for big.LITTLE
>>>>>>>>>>>>> so it
>>>>>>>>>>>>> will
>>>>>>>>>>>>> nearly be impossible to keep updated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would expect the firmware table (device tree,
>>>>>>>>>>>>> ACPI) to provide
>>>>>>>>>>>>> relevant
>>>>>>>>>>>>> data
>>>>>>>>>>>>> for each processor and differentiate big from
>>>>>>>>>>>>> LITTLE core.
>>>>>>>>>>>>> Note that I haven't looked at it for now. A good
>>>>>>>>>>>>> place to start is
>>>>>>>>>>>>> looking
>>>>>>>>>>>>> at
>>>>>>>>>>>>> how Linux does.
>>>>>>>>>>>>
>>>>>>>>>>>> That's right, see
>>>>>>>>>>>> Documentation/devicetree/bindings/arm/cpus.txt. It
>>>>>>>>>>>> is
>>>>>>>>>>>> trivial to identify the two different CPU classes and
>>>>>>>>>>>> which cores
>>>>>>>>>>>> belong
>>>>>>>>>>>> to which class.t, as
>>>>>>>>>>>
>>>>>>>>>>> The class of the CPU can be found from the MIDR, there
>>>>>>>>>>> is no need to
>>>>>>>>>>> use the
>>>>>>>>>>> device tree/acpi for that. Note that I don't think
>>>>>>>>>>> there is an easy
>>>>>>>>>>> way in
>>>>>>>>>>> ACPI (i.e not in AML) to find out the class.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It is harder to figure out which one is supposed to
>>>>>>>>>>>> be
>>>>>>>>>>>> big and which one LITTLE. Regardless, we could
>>>>>>>>>>>> default to using the
>>>>>>>>>>>> first cluster (usually big), which is also the
>>>>>>>>>>>> cluster of the boot
>>>>>>>>>>>> cpu,
>>>>>>>>>>>> and utilize the second cluster only when the user
>>>>>>>>>>>> demands it.
>>>>>>>>>>>
>>>>>>>>>>> Why do you think the boot CPU will usually be a big
>>>>>>>>>>> one? In the case
>>>>>>>>>>> of Juno
>>>>>>>>>>> platform it is configurable, and the boot CPU is a
>>>>>>>>>>> little core on r2
>>>>>>>>>>> by
>>>>>>>>>>> default.
>>>>>>>>>>>
>>>>>>>>>>> In any case, what we care about is differentiate
>>>>>>>>>>> between two set of
>>>>>>>>>>> CPUs. I
>>>>>>>>>>> don't think Xen should care about migrating a guest
>>>>>>>>>>> vCPU between big
>>>>>>>>>>> and
>>>>>>>>>>> LITTLE cpus. So I am not sure why we would want to know
>>>>>>>>>>> that.
>>>>>>>>>>
>>>>>>>>>> No, it is not about migrating (at least yet). It is about
>>>>>>>>>> giving useful
>>>>>>>>>> information to the user. It would be nice if the user had
>>>>>>>>>> to choose
>>>>>>>>>> between "big" and "LITTLE" rather than "class 0x1" and
>>>>>>>>>> "class 0x100", or
>>>>>>>>>> even "A7" or "A15".
>>>>>>>>>
>>>>>>>>> I don't think it is wise to assume that we may have only 2
>>>>>>>>> kind of CPUs
>>>>>>>>> on the platform. We may have more in the future, if so how
>>>>>>>>> would you
>>>>>>>>> name them?
>>>>>>>>
>>>>>>>> I would suggest that internally Xen recognize an arbitrary
>>>>>>>> number of
>>>>>>>> processor "classes", and order them according to more
>>>>>>>> powerful -> less
>>>>>>>> powerful.????Then if at some point someone makes a platform
>>>>>>>> with three
>>>>>>>> processors, you can say "class 0", "class 1" or "class
>>>>>>>> 2".????"big" would
>>>>>>>> be an alias for "class 0" and "little" would be an alias for
>>>>>>>> "class 1".
>>>>>>>
>>>>>>> As mentioned earlier, there is no upstreamed yet device tree
>>>>>>> bindings to know
>>>>>>> the "power" of a CPU (see [1]
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> And in my suggestion, we allow a richer set of labels, so
>>>>>>>> that the user
>>>>>>>> could also be more specific -- e.g., asking for "A15"
>>>>>>>> specifically, for
>>>>>>>> example, and failing to build if there are no A15 cores
>>>>>>>> present, while
>>>>>>>> allowing users to simply write "big" or "little" if they want
>>>>>>>> simplicity
>>>>>>>> / things which work across different platforms.
>>>>>>>
>>>>>>> Well, before trying to do something clever like that (i.e
>>>>>>> naming "big" and
>>>>>>> "little"), we need to have upstreamed bindings available to
>>>>>>> acknowledge the
>>>>>>> difference. AFAICT, it is not yet upstreamed for Device Tree
>>>>>>> (see [1]) and I
>>>>>>> don't know any static ACPI tables providing the similar
>>>>>>> information.
>>>>>>
>>>>>> I like George's idea that "big" and "little" could be just
>>>>>> convenience
>>>>>> aliases. Of course they are predicated on the necessary device
>>>>>> tree
>>>>>> bindings being upstream. We don't need [1] to be upstream in
>>>>>> Linux, just
>>>>>> the binding:
>>>>>>
>>>>>> http://marc.info/?l=linux-arm-kernel&m=147308556729426&w=2
>>>>>>
>>>>>> which has already been acked by the relevant maintainers.
>>>>>
>>>>> This is device tree only. What about ACPI?
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I had few discussions and????more thought about big.LITTLE
>>>>>>> support in Xen. The
>>>>>>> main goal of big.LITTLE is power efficiency by moving task
>>>>>>> around and been
>>>>>>> able to idle one cluster. All the solutions suggested
>>>>>>> (including mine) so far,
>>>>>>> can be replicated by hand (except the VPIDR) so they are mostly
>>>>>>> an automatic
>>>>>>> way. This will also remove the real benefits of big.LITTLE
>>>>>>> because Xen will
>>>>>>> not be able to migrate vCPU across cluster for power
>>>>>>> efficiency.
>>>>>>
>>>>>> The goal of the architects of big.LITTLE might have been power
>>>>>> efficiency, but of course we are free to use any features that
>>>>>> the
>>>>>> hardware provides in the best way for Xen and the Xen community.
>>>>>
>>>>> This is very dependent on how the big.LITTLE has been implemented
>>>>> by the
>>>>> hardware. Some platform can not run both big and LITTLE cores at
>>>>> the same
>>>>> time. You need a proper switch in the firmware/hypervisor.
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> If we care about power efficiency, we would have to handle
>>>>>>> seamlessly
>>>>>>> big.LITTLE in Xen (i.e a guess would only see a kind of CPU).
>>>>>>> This arise quite
>>>>>>> few problem, nothing insurmountable, similar to migration
>>>>>>> across two platforms
>>>>>>> with different micro-architecture (e.g processors): errata,
>>>>>>> features
>>>>>>> supported... The guest would have to know the union of all the
>>>>>>> errata (this is
>>>>>>> done so far via the MIDR, so we would a PV way to do it), and
>>>>>>> only the
>>>>>>> intersection of features would be exposed to the guest. This
>>>>>>> also means the
>>>>>>> scheduler would have to be modified to handle power efficiency
>>>>>>> (not strictly
>>>>>>> necessary at the beginning).
>>>>>>>
>>>>>>> I agree that a such solution would require some work to
>>>>>>> implement, although
>>>>>>> Xen will have a better control of the energy consumption of the
>>>>>>> platform.
>>>>>>>
>>>>>>> So the question here, is what do we want to achieve with
>>>>>>> big.LITTLE?
>>>>>>
>>>>>> I don't think that handling seamlessly big.LITTLE in Xen is the
>>>>>> best way
>>>>>> to do it in the scenarios where Xen on ARM is being used today. I
>>>>>> understand the principles behind it, but I don't think that it
>>>>>> will lead
>>>>>> to good results in a virtualized environment, where there is more
>>>>>> activity and more vcpus than pcpus.
>>>>>
>>>>> Can you detail why you don't think it will give good results?
>>>>>
>>>>>>
>>>>>>
>>>>>> What we discussed in this thread so far is actionable, and gives
>>>>>> us
>>>>>> big.LITTLE support in a short time frame. It is a good fit for
>>>>>> Xen on
>>>>>> ARM use cases and still leads to lower power consumption with an
>>>>>> wise
>>>>>> allocation of big and LITTLE vcpus and pcpus to guests.
>>>>>
>>>>> How this would lead to lower power consumption? If there is nothing
>>>>> running
>>>>> of the processor we would have a wfi loop which will never put the
>>>>> physical
>>>>> CPU in deep sleep. The main advantage of big.LITTLE is too be able
>>>>> to switch
>>>>> off a cluster/cpu when it is not used.
>>>>>
>>>>> Without any knowledge in Xen (such as CPU freq), I am afraid the
>>>>> the power
>>>>> consumption will still be the same.
>>>>>
>>>>>>
>>>>>>
>>>>>> I would start from this approach, then if somebody comes along
>>>>>> with a
>>>>>> plan to implement a big.LITTLE switcher in Xen, I welcome her to
>>>>>> do it
>>>>>> and I would be happy to accept the code in Xen. We'll just make
>>>>>> it
>>>>>> optional.
>>>>>
>>>>> I think we are discussing here a simple design for big.LITTLE. I
>>>>> never asked
>>>>> Peng to do all the work. I am worry that if we start to expose the
>>>>> big.LITTLE
>>>>> to the userspace it will be hard in the future to step back from
>>>>> it.
>>>>
>>>> Hello Julien,
>>>>
>>>>
>>>> I prefer the simple doable method, and As Stefano said, actionable.
>>>>
>>> Yep, this is a very good starting point, IMO.
>>>
>>>> ??- introduce a hypercall interface to let xl can get the different
>>>> classes cpu info.
>>>>
>>> +1
>>>
>>>> ??- Use vcpuclass in xl cfg file to let user create different vcpus
>>>>
>>> Yep.
>>>
>>>> ??- extract cpu computing cap from dts to differentiate cpus. As you
>>>> said, bindings
>>>> ??????not upstreamed. But this is not the hardpoint, we could change the
>>>> info, whether
>>>> ??????dmips or cap in future.
>>>>
>>> Yes (or I should say, "whatever", as I know nothing about all
>>> this! :-P)
>>>
>>>> ??- use cpu hard affinity to block vcpu scheduling bwtween little and
>>>> big pcpu.
>>>>
>>> "to block vcpu scheduling within the specified classes, for each vcpu"
>>>
>>> But, again, yes.
>>>
>>>> ??- block user setting vcpu hard affinity bwtween big and LITTLE.
>>>>
>>> "between the specified class"
>>>
>>> Indeed.
>>>
>>>> ??- only hard affinity seems enough, no need soft affinity.
>>>>
>>> Correct. Just don't care at all and don't touch soft affinity for now.
>>>
>>>> Anyway, for vcpu scheduling bwtween big and LITTLE, if this is the
>>>> right
>>>> direction and you have an idea on how to implement, I could follow
>>>> you on
>>>> enabling this feature with you leading the work. I do not have much
>>>> idea on this.
>>>>
>>> This can come later, either as an enhancement of this affinity based
>>> solution, or being implemented on top of it.
>>>
>>> In any case, let's start with the affinity based solution for now. It's
>>> a good level support already, and a nice first step toward future
>>> improvements.
>>>
>>> Oh, btw, please don't throw away this cpupool patch series either!
>> 
>> Ok -:)
>> 
>>>
>>> A feature like `xl cpupool-biglittle-split' can still be interesting,
>> 
>> "cpupool-cluster-split" maybe a better name?
>
>I think we should name this however we name the different types of cpus.
> i.e., if we're going to call these "cpu classes", then we should call
>this "cpupool-cpuclass-split" or something.

Ok. Got it.

Thanks,
Peng.

>
> -George
>

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22  9:51                                                   ` George Dunlap
  2016-09-22 10:09                                                     ` Peng Fan
@ 2016-09-22 10:13                                                     ` Juergen Gross
  1 sibling, 0 replies; 85+ messages in thread
From: Juergen Gross @ 2016-09-22 10:13 UTC (permalink / raw)
  To: George Dunlap, Peng Fan, Dario Faggioli
  Cc: Peng Fan, Stefano Stabellini, Steve Capper, George Dunlap,
	Andrew Cooper, Punit Agrawal, xen-devel, Julien Grall,
	Jan Beulich

On 22/09/16 11:51, George Dunlap wrote:
> On 22/09/16 10:27, Peng Fan wrote:
>> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>> "cpupool-cluster-split" maybe a better name?
> 
> I think we should name this however we name the different types of cpus.
>  i.e., if we're going to call these "cpu classes", then we should call
> this "cpupool-cpuclass-split" or something.

I'd go with "cpupool-split feature=cpuclass". This can be extended later
to e.g.:

cpupool-split feature=cpuclass,numa

In order to combine it with "cpupool-numa-split" (which will be the same
as "cpupool-split feature=numa").


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22 10:09                                                     ` Peng Fan
@ 2016-09-22 10:39                                                       ` Dario Faggioli
  0 siblings, 0 replies; 85+ messages in thread
From: Dario Faggioli @ 2016-09-22 10:39 UTC (permalink / raw)
  To: Peng Fan, George Dunlap; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1369 bytes --]

[Trimming the Cc-list quite a bit!]

On Thu, 2016-09-22 at 18:09 +0800, Peng Fan wrote:
> On Thu, Sep 22, 2016 at 10:51:04AM +0100, George Dunlap wrote:
> > I think we should name this however we name the different types of
> > cpus.
> > i.e., if we're going to call these "cpu classes", then we should
> > call
> > this "cpupool-cpuclass-split" or something.
> 
> Ok. Got it.
> 
Hey, Peng... non technical thing: can you trim your quotes when
replying to emails?

What I mean by that is exactly what I've done in this very message (and
in any message I write, unless I forget :-D), i.e., remove all the
content coming from previous emails in the conversation that is not
relevant for what you are actually talking about and replying to.

Of course, it's a matter of balance, and there is the risk of removing
too much, which then means one would have to open old emails to follow
the conversation. But, for instance in this case, I had to hit PgDown
_15_ times in my MUA, just to figure out you were saying "Ok. Got it",
which is certainly not ideal. :-/

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22  9:45                                       ` Peng Fan
@ 2016-09-22 11:21                                         ` Julien Grall
  2016-09-23  2:38                                           ` Peng Fan
  0 siblings, 1 reply; 85+ messages in thread
From: Julien Grall @ 2016-09-22 11:21 UTC (permalink / raw)
  To: Peng Fan
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Dario Faggioli, xen-devel, Jan Beulich

Hello Peng,

On 22/09/16 10:45, Peng Fan wrote:
> On Wed, Sep 21, 2016 at 11:15:35AM +0100, Julien Grall wrote:
>> Hello Peng,
>>
>> On 21/09/16 09:38, Peng Fan wrote:
>>> On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:
>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>>>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>>>>> On 20/09/2016 12:27, George Dunlap wrote:
>>>>>>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan <van.freenix@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>>>>>>>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>>>>>>>>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>>>> It is harder to figure out which one is supposed to be
>>>>>> big and which one LITTLE. Regardless, we could default to using the
>>>>>> first cluster (usually big), which is also the cluster of the boot cpu,
>>>>>> and utilize the second cluster only when the user demands it.
>>>>>
>>>>> Why do you think the boot CPU will usually be a big one? In the case of Juno
>>>>> platform it is configurable, and the boot CPU is a little core on r2 by
>>>>> default.
>>>>>
>>>>> In any case, what we care about is differentiate between two set of CPUs. I
>>>>> don't think Xen should care about migrating a guest vCPU between big and
>>>>> LITTLE cpus. So I am not sure why we would want to know that.
>>>>
>>>> No, it is not about migrating (at least yet). It is about giving useful
>>>> information to the user. It would be nice if the user had to choose
>>>> between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>>> even "A7" or "A15".
>>>
>>> As Dario mentioned in previous email,
>>> for dom0 provide like this:
>>>
>>> dom0_vcpus_big = 4
>>> dom0_vcpus_little = 2
>>>
>>> to dom0.
>>>
>>> If these two no provided, we could let dom0 runs on big pcpus or big.little.
>>> Anyway this is not the important point for dom0 only big or big.little.
>>>
>>> For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
>>> Such as:
>>>
>>> vcpus.big = 2
>>> vcpus.litle = 4
>>>
>>>
>>> According to George's comments,
>>> Then, I think we could use affinity to restrict little vcpus be scheduled on little vcpus,
>>> and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, use hard
>>> affinity is to handle this.
>>>
>>> We may need to provide some interface to let xl can get the information such as
>>> big.little or smp. if it is big.little, which is big and which is little.
>>>
>>> For how to differentiate cpus, I am looking the linaro eas cpu topology code,
>>> The code has not been upstreamed (:, but merged into google android kernel.
>>> I only plan to take some necessary code, such as device tree parse and
>>> cpu topology build, because we only need to know the computing capacity of each pcpu.
>>>
>>> Some doc about eas piece, including dts node examples:
>>> https://git.linaro.org/arm/eas/kernel.git/blob/refs/heads/lsk-v4.4-eas-v5.2:/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt
>>
>> I am reluctant to take any non-upstreamed bindings in Xen. There is a similar
>> series going on the lklm [1].
>
> For how to differentiate cpu classes, how about directly use
> compatible property of each cpu node?

What do you mean by cpu classes? If it is power, then the compatible 
will not help here. You may have a platform with the same core (e.g 
cortex A53) but different silicon implementation, so the power 
efficiency will be different.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22  8:43                                             ` Dario Faggioli
@ 2016-09-22 11:24                                               ` Julien Grall
  2016-09-22 16:31                                                 ` Dario Faggioli
  0 siblings, 1 reply; 85+ messages in thread
From: Julien Grall @ 2016-09-22 11:24 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap, Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, Steve Capper, George Dunlap,
	Andrew Cooper, Punit Agrawal, xen-devel, Jan Beulich, Peng Fan

Hi Dario,

On 22/09/16 09:43, Dario Faggioli wrote:
> On Wed, 2016-09-21 at 20:28 +0100, Julien Grall wrote:
>> On 21/09/2016 16:45, Dario Faggioli wrote:
>>> This does not seem to match with what has been said at some point
>>> in
>>> this thread... And if it's like that, how's that possible, if the
>>> pcpus' ISAs are (even only slightly) different?
>>
>> Right, at some point I mentioned that the set of errata and features
>> will be different between processor.
>>
> Yes, I read that, but wasn't (and still am not) sure about whether or
> not that meant a vcpu can move freely between classes or not, in the
> way that the scheduler does that.
>
> In fact, you say:
>
>> With a bit of work in Xen, it would be possible to do move the vCPU
>> between big and LITTLE cpus. As mentioned above, we could sanitize
>> the
>> features to only enable a common set.
>> You can view the big.LITTLE
>> problem as a local live migration between two kind of CPUs.
>>
> Local migration basically --from the vcpu perspective-- means create a
> new vcpu, stop the original vcpu, copy the state from original to new,
> destroy the original vcpu and start the new one. My point is that this
> is not something that can be done within nor initiated by the
> scheduler, e.g., during a context switch or a vcpu wakeup!

By local migration, I meant from the perspective of the hypervisor. In 
the hypervisor you have to trap feature registers and other 
implementation defined registers to show the same value across all the 
physical CPUs.

You don't need to recreate the vCPU every time you move from one set of 
CPUs to another one. Sorry for the confusion.

Regards,
-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22  9:27                                                 ` Peng Fan
  2016-09-22  9:51                                                   ` George Dunlap
  2016-09-22  9:52                                                   ` Dario Faggioli
@ 2016-09-22 11:29                                                   ` Julien Grall
  2016-09-22 17:31                                                     ` Stefano Stabellini
  2016-09-23  2:03                                                     ` Peng Fan
  2 siblings, 2 replies; 85+ messages in thread
From: Julien Grall @ 2016-09-22 11:29 UTC (permalink / raw)
  To: Peng Fan, Dario Faggioli
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Punit Agrawal, George Dunlap,
	xen-devel, Jan Beulich

Hello Peng,

On 22/09/16 10:27, Peng Fan wrote:
> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>> On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>>> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>> A feature like `xl cpupool-biglittle-split' can still be interesting,
>
> "cpupool-cluster-split" maybe a better name?

You seem to assume that a cluster, from the MPIDR point of view, can 
only contain the same set of CPUs. I don't think this is part of the 
architecture, so this may not be true in the future.

>
>> completely orthogonally and independently from the affinity based work,
>> and this series looks like it can be used to implement that. :-)
>
> Agree. All pcpus default can be assigned into cpupool0 based on the affinity work.

What do you mean by affinity? From MPIDR?

> We could add one like "cpupool-numa-split" to split different classes cpu
> into different cpupools.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22 10:05                                                 ` Peng Fan
@ 2016-09-22 16:26                                                   ` Dario Faggioli
  2016-09-22 17:33                                                     ` Stefano Stabellini
  0 siblings, 1 reply; 85+ messages in thread
From: Dario Faggioli @ 2016-09-22 16:26 UTC (permalink / raw)
  To: Peng Fan
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Punit Agrawal, George Dunlap,
	xen-devel, Julien Grall, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1405 bytes --]

On Thu, 2016-09-22 at 18:05 +0800, Peng Fan wrote:
> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
> > Yes (or I should say, "whatever", as I know nothing about all
> > this! :-P)
> 
> One more thing I'd like to ask, do you prefer cpu classes to be ARM
> specific or ARM/X86
> common?
> 
I'm not sure. I'd say that it depends on where we are. I mean, in Xen,
names can be rather specific, like some codename of the
chip/core/family/etc.

I'm not sure what this means for you, on ARM, but I guess it would
depend on what you, Julien and Stefano will come up and agree on.

Then, at the toolstack level (xl and libxl) we can have aliases for the
various classes, and/or names for specific group of classes, arranged
according whatever criteria.

I also like George's idea of allowing to pick a class by its order in
the hypervisor hierarchy, if/as soon as we'll put classes in a
hierarchy withing the hypervisor.

But I'd like to hear others... In the meanwhile, if I were you, I'd
start with either "class 0", "class 1", etc., or just use the codename
of the chip ("A17, "A15", etc.)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22 11:24                                               ` Julien Grall
@ 2016-09-22 16:31                                                 ` Dario Faggioli
  2016-09-23 13:56                                                   ` Julien Grall
  0 siblings, 1 reply; 85+ messages in thread
From: Dario Faggioli @ 2016-09-22 16:31 UTC (permalink / raw)
  To: Julien Grall, George Dunlap, Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, Steve Capper, George Dunlap,
	Andrew Cooper, Punit Agrawal, xen-devel, Jan Beulich, Peng Fan


[-- Attachment #1.1: Type: text/plain, Size: 1720 bytes --]

On Thu, 2016-09-22 at 12:24 +0100, Julien Grall wrote:
> On 22/09/16 09:43, Dario Faggioli wrote:
> > Local migration basically --from the vcpu perspective-- means
> > create a
> > new vcpu, stop the original vcpu, copy the state from original to
> > new,
> > destroy the original vcpu and start the new one. My point is that
> > this
> > is not something that can be done within nor initiated by the
> > scheduler, e.g., during a context switch or a vcpu wakeup!
> 
> By local migration, I meant from the perspective of the hypervisor.
> In 
> the hypervisor you have to trap feature registers and other 
> implementation defined registers to show the same value across all
> the 
> physical CPUs.
> 
You mean we trap feature registers during the (normal) execution of a
vcpu, because we want Xen to vet what's returned to the guest itself.
And that migration support, and hence the possibility that the guest
have been migrated to a cpu different than the one where it was
created, is already one of the reasons why this is necessary... right?

If yes, and if that's "all" we need, I think it should be fine.

> You don't need to recreate the vCPU every time you move from one set
> of 
> CPUs to another one. Sorry for the confusion.
> 
No, I am sorry... it's not you that you're making confusion, it's
probably me knowing to few about ARM, and did not think at the above
when you said "migration". :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22 11:29                                                   ` Julien Grall
@ 2016-09-22 17:31                                                     ` Stefano Stabellini
  2016-09-22 18:54                                                       ` Julien Grall
  2016-09-23  2:03                                                     ` Peng Fan
  1 sibling, 1 reply; 85+ messages in thread
From: Stefano Stabellini @ 2016-09-22 17:31 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Dario Faggioli, Punit Agrawal,
	George Dunlap, xen-devel, Jan Beulich, Peng Fan

On Thu, 22 Sep 2016, Julien Grall wrote:
> Hello Peng,
> 
> On 22/09/16 10:27, Peng Fan wrote:
> > On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
> > > On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
> > > > On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
> > > A feature like `xl cpupool-biglittle-split' can still be interesting,
> > 
> > "cpupool-cluster-split" maybe a better name?
> 
> You seem to assume that a cluster, from the MPIDR point of view, can only
> contain the same set of CPUs. I don't think this is part of the architecture,
> so this may not be true in the future.

Interesting. I also understood that a cluster can only have one kind if
cpus. Honestly it would be a little insane for it to be otherwise :-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22 16:26                                                   ` Dario Faggioli
@ 2016-09-22 17:33                                                     ` Stefano Stabellini
  0 siblings, 0 replies; 85+ messages in thread
From: Stefano Stabellini @ 2016-09-22 17:33 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Punit Agrawal, George Dunlap,
	xen-devel, Julien Grall, Jan Beulich, Peng Fan

On Thu, 22 Sep 2016, Dario Faggioli wrote:
> On Thu, 2016-09-22 at 18:05 +0800, Peng Fan wrote:
> > On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
> > > Yes (or I should say, "whatever", as I know nothing about all
> > > this! :-P)
> > 
> > One more thing I'd like to ask, do you prefer cpu classes to be ARM
> > specific or ARM/X86
> > common?
> > 
> I'm not sure. I'd say that it depends on where we are. I mean, in Xen,
> names can be rather specific, like some codename of the
> chip/core/family/etc.
> 
> I'm not sure what this means for you, on ARM, but I guess it would
> depend on what you, Julien and Stefano will come up and agree on.

Actually it depends on what the x86 maintainers think. For us (ARM
maintainers) it makes little difference whether the concept of cpu
classes is ARM specific or common.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22 17:31                                                     ` Stefano Stabellini
@ 2016-09-22 18:54                                                       ` Julien Grall
  2016-09-23  2:14                                                         ` Peng Fan
  2016-09-24  1:35                                                         ` Stefano Stabellini
  0 siblings, 2 replies; 85+ messages in thread
From: Julien Grall @ 2016-09-22 18:54 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, Steve Capper, George Dunlap,
	Andrew Cooper, Dario Faggioli, Punit Agrawal, George Dunlap,
	xen-devel, Jan Beulich, Peng Fan

Hi Stefano,

On 22/09/2016 18:31, Stefano Stabellini wrote:
> On Thu, 22 Sep 2016, Julien Grall wrote:
>> Hello Peng,
>>
>> On 22/09/16 10:27, Peng Fan wrote:
>>> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>>>> On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>>>>> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>>>> A feature like `xl cpupool-biglittle-split' can still be interesting,
>>>
>>> "cpupool-cluster-split" maybe a better name?
>>
>> You seem to assume that a cluster, from the MPIDR point of view, can only
>> contain the same set of CPUs. I don't think this is part of the architecture,
>> so this may not be true in the future.
>
> Interesting. I also understood that a cluster can only have one kind if
> cpus. Honestly it would be a little insane for it to be otherwise :-)

I don't think this is insane (or maybe I am insane :)). Cluster usually 
doesn't share all L2 cache (assuming L1 is local to each core) and L3 
cache may not be present, so if you move a task from one cluster to 
another you will add latency because the new L2 cache has to be refilled.

The use case of big.LITTLE is big cores are used for short period of 
burst and little core are used for the rest (e.g listening audio, 
fetching mail...). If you want to reduce latency when switch between big 
and little CPUs, you may want to put them within the same cluster.

Also, as mentioned in another thread, you may have a platform with the 
same micro-architecture (e.g Cortex A-53) but different silicon 
implementation (e.g to have a different frequency, power efficiency). 
Here the concept of big.LITTLE is more blurred.

That's why I am quite reluctant to name (even if it may be more handy to 
the user) "big" and "little" the different CPU set.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22 11:29                                                   ` Julien Grall
  2016-09-22 17:31                                                     ` Stefano Stabellini
@ 2016-09-23  2:03                                                     ` Peng Fan
  1 sibling, 0 replies; 85+ messages in thread
From: Peng Fan @ 2016-09-23  2:03 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Dario Faggioli, Punit Agrawal,
	George Dunlap, xen-devel, Jan Beulich

On Thu, Sep 22, 2016 at 12:29:53PM +0100, Julien Grall wrote:
>Hello Peng,
>
>On 22/09/16 10:27, Peng Fan wrote:
>>On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>>>On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>>>>On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>>>A feature like `xl cpupool-biglittle-split' can still be interesting,
>>
>>"cpupool-cluster-split" maybe a better name?
>
>You seem to assume that a cluster, from the MPIDR point of view, can only
>contain the same set of CPUs. I don't think this is part of the architecture,
>so this may not be true in the future.
>
>>
>>>completely orthogonally and independently from the affinity based work,
>>>and this series looks like it can be used to implement that. :-)
>>
>>Agree. All pcpus default can be assigned into cpupool0 based on the affinity work.
>
>What do you mean by affinity? From MPIDR?

vcpu hard affinity. When allocate or initialize vcpu, the hard affinity needs
to be inititialized.

Thanks,
Peng.

>
>>We could add one like "cpupool-numa-split" to split different classes cpu
>>into different cpupools.
>
>Regards,
>
>-- 
>Julien Grall

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22 18:54                                                       ` Julien Grall
@ 2016-09-23  2:14                                                         ` Peng Fan
  2016-09-23  9:24                                                           ` Julien Grall
  2016-09-24  1:35                                                         ` Stefano Stabellini
  1 sibling, 1 reply; 85+ messages in thread
From: Peng Fan @ 2016-09-23  2:14 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Dario Faggioli, Punit Agrawal,
	George Dunlap, xen-devel, Jan Beulich

On Thu, Sep 22, 2016 at 07:54:02PM +0100, Julien Grall wrote:
>Hi Stefano,
>
>On 22/09/2016 18:31, Stefano Stabellini wrote:
>>On Thu, 22 Sep 2016, Julien Grall wrote:
>>>Hello Peng,
>>>
>>>On 22/09/16 10:27, Peng Fan wrote:
>>>>On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>>>>>On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>>>>>>On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>>>>>A feature like `xl cpupool-biglittle-split' can still be interesting,
>>>>
>>>>"cpupool-cluster-split" maybe a better name?
>>>
>>>You seem to assume that a cluster, from the MPIDR point of view, can only
>>>contain the same set of CPUs. I don't think this is part of the architecture,
>>>so this may not be true in the future.
>>
>>Interesting. I also understood that a cluster can only have one kind if
>>cpus. Honestly it would be a little insane for it to be otherwise :-)
>
>I don't think this is insane (or maybe I am insane :)). Cluster usually
>doesn't share all L2 cache (assuming L1 is local to each core) and L3 cache
>may not be present, so if you move a task from one cluster to another you
>will add latency because the new L2 cache has to be refilled.
>
>The use case of big.LITTLE is big cores are used for short period of burst
>and little core are used for the rest (e.g listening audio, fetching
>mail...). If you want to reduce latency when switch between big and little
>CPUs, you may want to put them within the same cluster.
>
>Also, as mentioned in another thread, you may have a platform with the same
>micro-architecture (e.g Cortex A-53) but different silicon implementation
>(e.g to have a different frequency, power efficiency). Here the concept of
>big.LITTLE is more blurred.

That is possible that in one cluster, different pcpus runs with different cpu
frequency. This depends on hardware design. Some may require all the cores in
one cluster runs at the same frequency, some may have more complicated design that
supports different cores runs at different frequency.

This is just like you have a smp system, but different cores can run at
different cpu frequency. I think this is not what bit.LITTLE means.

For the pcpus in one cluster, xen needs to choose which pcpu for vcpu
for power or etc.


Thanks,
Peng.

>
>That's why I am quite reluctant to name (even if it may be more handy to the
>user) "big" and "little" the different CPU set.
>
>Cheers,
>
>-- 
>Julien Grall

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22 11:21                                         ` Julien Grall
@ 2016-09-23  2:38                                           ` Peng Fan
  0 siblings, 0 replies; 85+ messages in thread
From: Peng Fan @ 2016-09-23  2:38 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Dario Faggioli, xen-devel, Jan Beulich

On Thu, Sep 22, 2016 at 12:21:00PM +0100, Julien Grall wrote:
>>>>
>>>>According to George's comments,
>>>>Then, I think we could use affinity to restrict little vcpus be scheduled on little vcpus,
>>>>and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, use hard
>>>>affinity is to handle this.
>>>>
>>>>We may need to provide some interface to let xl can get the information such as
>>>>big.little or smp. if it is big.little, which is big and which is little.
>>>>
>>>>For how to differentiate cpus, I am looking the linaro eas cpu topology code,
>>>>The code has not been upstreamed (:, but merged into google android kernel.
>>>>I only plan to take some necessary code, such as device tree parse and
>>>>cpu topology build, because we only need to know the computing capacity of each pcpu.
>>>>
>>>>Some doc about eas piece, including dts node examples:
>>>>https://git.linaro.org/arm/eas/kernel.git/blob/refs/heads/lsk-v4.4-eas-v5.2:/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt
>>>
>>>I am reluctant to take any non-upstreamed bindings in Xen. There is a similar
>>>series going on the lklm [1].
>>
>>For how to differentiate cpu classes, how about directly use
>>compatible property of each cpu node?
>
>What do you mean by cpu classes? If it is power, then the compatible will not
>help here. You may have a platform with the same core (e.g cortex A53) but
>different silicon implementation, so the power efficiency will be different.

cpu classes, I mean cpu clusters. I checked the cpu capacity code[1] you listed,
it use dmips from dhystone. But now what I plan to implement to block vcpu
from being scheduled within big.LITTLE.

In my case, vcpu will be restricted in A53 or A72.

In the same cluster, different cores may run at different cpu freq or all the cores
run at the same freq. This depends on soc implementation.

This needs xen to choose which pcpu to run the vcpu, and no local migration 
scheduling vcpu on the cpus in one cluster.

Considering power for future, dmips needs to be used, but also need to differentiate
cpus from different clusters. So "dmips + compatible" both needs to be considered.

For cpus in one cluster, also need to take the dmips info for xen scheduling
vcpu between pcpu effectively.

Thanks,
Peng.

[1]https://lwn.net/Articles/699569/
>
>Regards,
>
>-- 
>Julien Grall

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-23  2:14                                                         ` Peng Fan
@ 2016-09-23  9:24                                                           ` Julien Grall
  2016-09-23 10:05                                                             ` Peng Fan
  0 siblings, 1 reply; 85+ messages in thread
From: Julien Grall @ 2016-09-23  9:24 UTC (permalink / raw)
  To: Peng Fan
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Dario Faggioli, Punit Agrawal,
	George Dunlap, xen-devel, Jan Beulich

Hello Peng,

On 23/09/16 03:14, Peng Fan wrote:
> On Thu, Sep 22, 2016 at 07:54:02PM +0100, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 22/09/2016 18:31, Stefano Stabellini wrote:
>>> On Thu, 22 Sep 2016, Julien Grall wrote:
>>>> Hello Peng,
>>>>
>>>> On 22/09/16 10:27, Peng Fan wrote:
>>>>> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>>>>>> On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>>>>>>> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>>>>>> A feature like `xl cpupool-biglittle-split' can still be interesting,
>>>>>
>>>>> "cpupool-cluster-split" maybe a better name?
>>>>
>>>> You seem to assume that a cluster, from the MPIDR point of view, can only
>>>> contain the same set of CPUs. I don't think this is part of the architecture,
>>>> so this may not be true in the future.
>>>
>>> Interesting. I also understood that a cluster can only have one kind if
>>> cpus. Honestly it would be a little insane for it to be otherwise :-)
>>
>> I don't think this is insane (or maybe I am insane :)). Cluster usually
>> doesn't share all L2 cache (assuming L1 is local to each core) and L3 cache
>> may not be present, so if you move a task from one cluster to another you
>> will add latency because the new L2 cache has to be refilled.
>>
>> The use case of big.LITTLE is big cores are used for short period of burst
>> and little core are used for the rest (e.g listening audio, fetching
>> mail...). If you want to reduce latency when switch between big and little
>> CPUs, you may want to put them within the same cluster.
>>
>> Also, as mentioned in another thread, you may have a platform with the same
>> micro-architecture (e.g Cortex A-53) but different silicon implementation
>> (e.g to have a different frequency, power efficiency). Here the concept of
>> big.LITTLE is more blurred.
>
> That is possible that in one cluster, different pcpus runs with different cpu
> frequency. This depends on hardware design. Some may require all the cores in
> one cluster runs at the same frequency, some may have more complicated design that
> supports different cores runs at different frequency.
>
> This is just like you have a smp system, but different cores can run at
> different cpu frequency. I think this is not what bit.LITTLE means.

big.LITTLE is a generic term to have "power hungry and powerful core 
powerful" (big) with slower and battery-saving cores (LITTLE).

It is not mandatory to have different micro-architectures between big 
and LITTLE cores.

In any case, the interface should not be big.LITTLE specific. We don't 
want to tie us to one specific architecture.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-23  9:24                                                           ` Julien Grall
@ 2016-09-23 10:05                                                             ` Peng Fan
  2016-09-23 10:15                                                               ` Julien Grall
  2016-09-23 13:52                                                               ` Dario Faggioli
  0 siblings, 2 replies; 85+ messages in thread
From: Peng Fan @ 2016-09-23 10:05 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Dario Faggioli, Punit Agrawal,
	George Dunlap, xen-devel, Jan Beulich

On Fri, Sep 23, 2016 at 10:24:37AM +0100, Julien Grall wrote:
>Hello Peng,
>
>On 23/09/16 03:14, Peng Fan wrote:
>>On Thu, Sep 22, 2016 at 07:54:02PM +0100, Julien Grall wrote:
>>>Hi Stefano,
>>>
>>>On 22/09/2016 18:31, Stefano Stabellini wrote:
>>>>On Thu, 22 Sep 2016, Julien Grall wrote:
>>>>>Hello Peng,
>>>>>
>>>>>On 22/09/16 10:27, Peng Fan wrote:
>>>>>>On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>>>>>>>On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>>>>>>>>On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>>>>>>>A feature like `xl cpupool-biglittle-split' can still be interesting,
>>>>>>
>>>>>>"cpupool-cluster-split" maybe a better name?
>>>>>
>>>>>You seem to assume that a cluster, from the MPIDR point of view, can only
>>>>>contain the same set of CPUs. I don't think this is part of the architecture,
>>>>>so this may not be true in the future.
>>>>
>>>>Interesting. I also understood that a cluster can only have one kind if
>>>>cpus. Honestly it would be a little insane for it to be otherwise :-)
>>>
>>>I don't think this is insane (or maybe I am insane :)). Cluster usually
>>>doesn't share all L2 cache (assuming L1 is local to each core) and L3 cache
>>>may not be present, so if you move a task from one cluster to another you
>>>will add latency because the new L2 cache has to be refilled.
>>>
>>>The use case of big.LITTLE is big cores are used for short period of burst
>>>and little core are used for the rest (e.g listening audio, fetching
>>>mail...). If you want to reduce latency when switch between big and little
>>>CPUs, you may want to put them within the same cluster.
>>>
>>>Also, as mentioned in another thread, you may have a platform with the same
>>>micro-architecture (e.g Cortex A-53) but different silicon implementation
>>>(e.g to have a different frequency, power efficiency). Here the concept of
>>>big.LITTLE is more blurred.
>>
>>That is possible that in one cluster, different pcpus runs with different cpu
>>frequency. This depends on hardware design. Some may require all the cores in
>>one cluster runs at the same frequency, some may have more complicated design that
>>supports different cores runs at different frequency.
>>
>>This is just like you have a smp system, but different cores can run at
>>different cpu frequency. I think this is not what bit.LITTLE means.
>
>big.LITTLE is a generic term to have "power hungry and powerful core
>powerful" (big) with slower and battery-saving cores (LITTLE).
>
>It is not mandatory to have different micro-architectures between big and
>LITTLE cores.
>
>In any case, the interface should not be big.LITTLE specific. We don't want
>to tie us to one specific architecture.

If all the cores have the same micro-architecture, but for some reason,
they are put in different clusters or cpus in one cluster support running
at different cpu freq.

We still can introduce cpupool-cluster-split or as Juergen suggested,
use "cpupool-slit feature=xx"  to split the cluster or cpuclasses
into different cpupools. This is just a feature that better to have, I think.

The reason to include cpupool-cluster-split or else is to split the big and little
cores into different cpupools. And now big and little cores are in different cpu
clusters from the hardware[1] I can see. I think assigning cores from
different clusters into one cpupool is not a good idea.

I have no idea about future hardware.


If cluster is not prefered, cpuclass maybe a choice, but I personally perfer
"cluster" split for ARM.

Thanks,
Peng.

[1] https://en.wikipedia.org/wiki/ARM_big.LITTLE

>
>Regards,
>
>-- 
>Julien Grall

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-23 10:05                                                             ` Peng Fan
@ 2016-09-23 10:15                                                               ` Julien Grall
  2016-09-23 13:36                                                                 ` Dario Faggioli
  2016-09-23 13:52                                                               ` Dario Faggioli
  1 sibling, 1 reply; 85+ messages in thread
From: Julien Grall @ 2016-09-23 10:15 UTC (permalink / raw)
  To: Peng Fan
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Dario Faggioli, Punit Agrawal,
	George Dunlap, xen-devel, Jan Beulich



On 23/09/16 11:05, Peng Fan wrote:
> On Fri, Sep 23, 2016 at 10:24:37AM +0100, Julien Grall wrote:
>> Hello Peng,
>>
>> On 23/09/16 03:14, Peng Fan wrote:
>>> On Thu, Sep 22, 2016 at 07:54:02PM +0100, Julien Grall wrote:
>>>> Hi Stefano,
>>>>
>>>> On 22/09/2016 18:31, Stefano Stabellini wrote:
>>>>> On Thu, 22 Sep 2016, Julien Grall wrote:
>>>>>> Hello Peng,
>>>>>>
>>>>>> On 22/09/16 10:27, Peng Fan wrote:
>>>>>>> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>>>>>>>> On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>>>>>>>>> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>>>>>>>> A feature like `xl cpupool-biglittle-split' can still be interesting,
>>>>>>>
>>>>>>> "cpupool-cluster-split" maybe a better name?
>>>>>>
>>>>>> You seem to assume that a cluster, from the MPIDR point of view, can only
>>>>>> contain the same set of CPUs. I don't think this is part of the architecture,
>>>>>> so this may not be true in the future.
>>>>>
>>>>> Interesting. I also understood that a cluster can only have one kind if
>>>>> cpus. Honestly it would be a little insane for it to be otherwise :-)
>>>>
>>>> I don't think this is insane (or maybe I am insane :)). Cluster usually
>>>> doesn't share all L2 cache (assuming L1 is local to each core) and L3 cache
>>>> may not be present, so if you move a task from one cluster to another you
>>>> will add latency because the new L2 cache has to be refilled.
>>>>
>>>> The use case of big.LITTLE is big cores are used for short period of burst
>>>> and little core are used for the rest (e.g listening audio, fetching
>>>> mail...). If you want to reduce latency when switch between big and little
>>>> CPUs, you may want to put them within the same cluster.
>>>>
>>>> Also, as mentioned in another thread, you may have a platform with the same
>>>> micro-architecture (e.g Cortex A-53) but different silicon implementation
>>>> (e.g to have a different frequency, power efficiency). Here the concept of
>>>> big.LITTLE is more blurred.
>>>
>>> That is possible that in one cluster, different pcpus runs with different cpu
>>> frequency. This depends on hardware design. Some may require all the cores in
>>> one cluster runs at the same frequency, some may have more complicated design that
>>> supports different cores runs at different frequency.
>>>
>>> This is just like you have a smp system, but different cores can run at
>>> different cpu frequency. I think this is not what bit.LITTLE means.
>>
>> big.LITTLE is a generic term to have "power hungry and powerful core
>> powerful" (big) with slower and battery-saving cores (LITTLE).
>>
>> It is not mandatory to have different micro-architectures between big and
>> LITTLE cores.
>>
>> In any case, the interface should not be big.LITTLE specific. We don't want
>> to tie us to one specific architecture.
>
> If all the cores have the same micro-architecture, but for some reason,
> they are put in different clusters or cpus in one cluster support running
> at different cpu freq.
>
> We still can introduce cpupool-cluster-split or as Juergen suggested,
> use "cpupool-slit feature=xx"  to split the cluster or cpuclasses
> into different cpupools. This is just a feature that better to have, I think.
>
> The reason to include cpupool-cluster-split or else is to split the big and little
> cores into different cpupools. And now big and little cores are in different cpu
> clusters from the hardware[1] I can see. I think assigning cores from
> different clusters into one cpupool is not a good idea.
>
> I have no idea about future hardware.
>
>
> If cluster is not prefered, cpuclass maybe a choice, but I personally perfer
> "cluster" split for ARM.
>
> Thanks,
> Peng.
>
> [1] https://en.wikipedia.org/wiki/ARM_big.LITTLE

Let me be clear here, the ARM ARM is authoritative not Wikipedia. The 
latter will only reflect what is done today, not what could be done.

If the ARM ARM does not forbid it, nothing prevent a semiconductor to do 
it. I gave an example on the mail you answered.

Please try to have a think on all the use case and not only yours.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-23 10:15                                                               ` Julien Grall
@ 2016-09-23 13:36                                                                 ` Dario Faggioli
  2016-09-24  1:57                                                                   ` Stefano Stabellini
  0 siblings, 1 reply; 85+ messages in thread
From: Dario Faggioli @ 2016-09-23 13:36 UTC (permalink / raw)
  To: Julien Grall, Peng Fan
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Punit Agrawal, George Dunlap,
	xen-devel, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1370 bytes --]

On Fri, 2016-09-23 at 11:15 +0100, Julien Grall wrote:
> On 23/09/16 11:05, Peng Fan wrote:
> > If cluster is not prefered, cpuclass maybe a choice, but I
> > personally perfer
> > "cluster" split for ARM.
> > 
> > Thanks,
> > Peng.
> > 
> > [1] https://en.wikipedia.org/wiki/ARM_big.LITTLE
> 
> Please try to have a think on all the use case and not only yours.
> 
This last line is absolutely true and very important!

That being said, I am a bit lost.

So, AFAICT, in order to act properly when the user asks for:

 vcpuclass = ["1,2:foo", "0,3:bar"]

we need to decide what "foo" and "bar" are at the xl and libxl level,
and whether they are the same all the way down to Xen (and if not,
what's the mapping).

We also said it would be nice to support:

 xl cpupool-split --feature=foobar

and hence we also need to decide what's foobar, whether it is in the
same namespace of foo and bar (i.e., it can be foobar==foo, or
foobar==bar, etc), or it is something else, or both.

Can someone list what are the various alternative approaches on the
table?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-23 10:05                                                             ` Peng Fan
  2016-09-23 10:15                                                               ` Julien Grall
@ 2016-09-23 13:52                                                               ` Dario Faggioli
  1 sibling, 0 replies; 85+ messages in thread
From: Dario Faggioli @ 2016-09-23 13:52 UTC (permalink / raw)
  To: Peng Fan, Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Punit Agrawal, George Dunlap,
	xen-devel, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 2872 bytes --]

On Fri, 2016-09-23 at 18:05 +0800, Peng Fan wrote:
> We still can introduce cpupool-cluster-split or as Juergen suggested,
> use "cpupool-slit feature=xx"  to split the cluster or cpuclasses
> into different cpupools. This is just a feature that better to have,
> I think.
> 
> The reason to include cpupool-cluster-split or else is to split the
> big and little
> cores into different cpupools. And now big and little cores are in
> different cpu
> clusters from the hardware[1] I can see. 
>
Note that this `cpupool-split' thing is meant to be an aid to the user
to quickly put the system in a state that we think it could be a common
or relevant setup.

For instance, cpupools can be used to partition big NUMA system, and we
thought users may be interested in having one pool per NUMA node, so an
helper for doing that quickly (i.e., with just one command) has been
provided.

That does not mean that it's the only use of cpupools, nor that it's
the only --or the only sane-- way to use cpupools on NUMA systems...
it's just a speculation, in an attempt to make life easier for users.

In a similar way, if we think that, for instance, creating a 'big pool'
and a 'LITTLE pool' would be something common, and/or we (Peng?
Stefano?) already have an usecase for this, we can well implement a
`cpupool-split' variant that does that.

*BUT* that does not mean that people must use it, or that they can't do
anything else or different with cpupools on ARM! In fact, on a NUMA
system, one can completely ignore `cpupool-numa-split', and create
whatever pools and assign pcpus to them at will. Or she can actually
use `cpupool-numa-split' as a basis, i.e., issue the command, manually
alter the resulting status, by doing some more movement of pcpus among
the pools the command created.

All this to say that, especially when thinking about this
cpupool-split thing, we "only" need to come up with something that we
think makes sense, either to be used as is or as a basis, not to the
one and only way cpupools and big.LITTLE --or ARM in general-- should
interact.

In fact:
> I think assigning cores from
> different clusters into one cpupool is not a good idea.
> 
I'd be perfectly fine with this, and with cpupool-split on big.LITTLE
to cut pools around clusters boundaries. But I definitely would not
want to forbid the user to manually shuffle things around, including
ending up in a situation where there are pcpus from different
class/cluster/whatever in the same pool... If that is shooting in his
own foot, then so be it!

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22 16:31                                                 ` Dario Faggioli
@ 2016-09-23 13:56                                                   ` Julien Grall
  0 siblings, 0 replies; 85+ messages in thread
From: Julien Grall @ 2016-09-23 13:56 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap, Stefano Stabellini
  Cc: Juergen Gross, Peng Fan, Steve Capper, George Dunlap,
	Andrew Cooper, Punit Agrawal, xen-devel, Jan Beulich, Peng Fan

Hi Dario,

On 22/09/16 17:31, Dario Faggioli wrote:
> On Thu, 2016-09-22 at 12:24 +0100, Julien Grall wrote:
>> On 22/09/16 09:43, Dario Faggioli wrote:
>>> Local migration basically --from the vcpu perspective-- means
>>> create a
>>> new vcpu, stop the original vcpu, copy the state from original to
>>> new,
>>> destroy the original vcpu and start the new one. My point is that
>>> this
>>> is not something that can be done within nor initiated by the
>>> scheduler, e.g., during a context switch or a vcpu wakeup!
>>
>> By local migration, I meant from the perspective of the hypervisor.
>> In
>> the hypervisor you have to trap feature registers and other
>> implementation defined registers to show the same value across all
>> the
>> physical CPUs.
>>
> You mean we trap feature registers during the (normal) execution of a
> vcpu, because we want Xen to vet what's returned to the guest itself.
> And that migration support, and hence the possibility that the guest
> have been migrated to a cpu different than the one where it was
> created, is already one of the reasons why this is necessary... right?

That's correct.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-22 18:54                                                       ` Julien Grall
  2016-09-23  2:14                                                         ` Peng Fan
@ 2016-09-24  1:35                                                         ` Stefano Stabellini
  1 sibling, 0 replies; 85+ messages in thread
From: Stefano Stabellini @ 2016-09-24  1:35 UTC (permalink / raw)
  To: Julien Grall
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Dario Faggioli, Punit Agrawal,
	George Dunlap, xen-devel, Jan Beulich, Peng Fan

On Thu, 22 Sep 2016, Julien Grall wrote:
> Hi Stefano,
> 
> On 22/09/2016 18:31, Stefano Stabellini wrote:
> > On Thu, 22 Sep 2016, Julien Grall wrote:
> > > Hello Peng,
> > > 
> > > On 22/09/16 10:27, Peng Fan wrote:
> > > > On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
> > > > > On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
> > > > > > On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
> > > > > A feature like `xl cpupool-biglittle-split' can still be interesting,
> > > > 
> > > > "cpupool-cluster-split" maybe a better name?
> > > 
> > > You seem to assume that a cluster, from the MPIDR point of view, can only
> > > contain the same set of CPUs. I don't think this is part of the
> > > architecture,
> > > so this may not be true in the future.
> > 
> > Interesting. I also understood that a cluster can only have one kind if
> > cpus. Honestly it would be a little insane for it to be otherwise :-)
> 
> I don't think this is insane (or maybe I am insane :)). Cluster usually
> doesn't share all L2 cache (assuming L1 is local to each core) and L3 cache
> may not be present, so if you move a task from one cluster to another you will
> add latency because the new L2 cache has to be refilled.
> 
> The use case of big.LITTLE is big cores are used for short period of burst and
> little core are used for the rest (e.g listening audio, fetching mail...). If
> you want to reduce latency when switch between big and little CPUs, you may
> want to put them within the same cluster.
> 
> Also, as mentioned in another thread, you may have a platform with the same
> micro-architecture (e.g Cortex A-53) but different silicon implementation (e.g
> to have a different frequency, power efficiency). Here the concept of
> big.LITTLE is more blurred.

Different frequency is fine, we have been able to set per core frequency
on x86 cpus for a long time now. If they are cores of the same
micro-architecture, it doesn't matter the cpu frequency, we can deal
with them as usual.

To me big.LITTLE means: it is technically possible, but very difficult
(currently unimplemented), and slower than than usual to move a vcpu
across big and LITTLE pcpus. That's why they need to be dealt with in a
different way.

If we had big.LITTLE cores in the same cluster, sharing L2 caches, with
the same cache line sizes, maybe we could also deal with them as usual
because it wouldn't be much of an issue to migrate a vcpu across big and
LITTLE cores. If/when we come across such an architecture we'll deal
with it.


> That's why I am quite reluctant to name (even if it may be more handy to the
> user) "big" and "little" the different CPU set.
 
Technically you might be right, but "big.LITTLE" is how the architecture
has been advertized to people, so unfortunately we are stuck with the
name. We have to deal with it in those terms at least at the xl level.
Of course in Xen we are free to do whatever we want.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFC 0/5] xen/arm: support big.little SoC
  2016-09-23 13:36                                                                 ` Dario Faggioli
@ 2016-09-24  1:57                                                                   ` Stefano Stabellini
  0 siblings, 0 replies; 85+ messages in thread
From: Stefano Stabellini @ 2016-09-24  1:57 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, Steve Capper,
	George Dunlap, Andrew Cooper, Punit Agrawal, George Dunlap,
	xen-devel, Julien Grall, Jan Beulich, Peng Fan

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3357 bytes --]

On Fri, 23 Sep 2016, Dario Faggioli wrote:
> On Fri, 2016-09-23 at 11:15 +0100, Julien Grall wrote:
> > On 23/09/16 11:05, Peng Fan wrote:
> > > If cluster is not prefered, cpuclass maybe a choice, but I
> > > personally perfer
> > > "cluster" split for ARM.
> > > 
> > > Thanks,
> > > Peng.
> > > 
> > > [1] https://en.wikipedia.org/wiki/ARM_big.LITTLE
> > 
> > Please try to have a think on all the use case and not only yours.
> > 
> This last line is absolutely true and very important!
> 
> That being said, I am a bit lost.
> 
> So, AFAICT, in order to act properly when the user asks for:
> 
>  vcpuclass = ["1,2:foo", "0,3:bar"]
> 
> we need to decide what "foo" and "bar" are at the xl and libxl level,
> and whether they are the same all the way down to Xen (and if not,
> what's the mapping).

I think "foo" and "bar" need to be "big" and "LITTLE" at the xl level.

Given that Xen is the one with the information about which core is big
and which is LITTLE, I think the hypervisor should provide the mapping
between labels and cpu and cluster indexes.


> We also said it would be nice to support:
> 
>  xl cpupool-split --feature=foobar
> 
> and hence we also need to decide what's foobar, whether it is in the
> same namespace of foo and bar (i.e., it can be foobar==foo, or
> foobar==bar, etc), or it is something else, or both.

I would be consistent and always use foobar=bigLITTLE at the xl level.


> Can someone list what are the various alternative approaches on the
> table?

The info available is:
http://lxr.free-electrons.com/source/Documentation/devicetree/bindings/arm/cpus.txt
plus:
http://marc.info/?l=linux-arm-kernel&m=147308556729426&w=2

We have cpu and cluster indexes. We have cpu compatible strings which
tell us whether a cpu is an "a53" or an "a15". We have
an optional property that tells us the cpu "capacity". Higher capacity
means "big", lower capacity means "LITTLE".

Xen could always deal with cpu and cluster indexes, but provide
convenient labels to libxl.

For example:

        cpus {
                #size-cells = <0>;
                #address-cells = <1>;

                cpu@0 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a15";
                        reg = <0x0>;
                        capacity-dmips-mhz = <1024>;
                };

                cpu@1 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a15";
                        reg = <0x1>;
                        capacity-dmips-mhz = <1024>;
                };

                cpu@100 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a7";
                        reg = <0x100>;
                        capacity-dmips-mhz = <512>;
                };

                cpu@101 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a7";
                        reg = <0x101>;
                        capacity-dmips-mhz = <512>;
                };
        };

The reg property encodes cpu number and cluster number and matches the
value in the MPIDR register. That is what Xen could take as parameter.

The mapping between reg and "big" or "LITTLE" and the cpu compatible
name, such as "a7", could be returned by an hypercall such as
xen_arch_domainconfig.

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2016-09-24  1:57 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-19  2:08 [RFC 0/5] xen/arm: support big.little SoC van.freenix
2016-09-19  2:08 ` [RFC 1/5] xen/arm: domain_build: setting opt_dom0_max_vcpus according to cpupool0 info van.freenix
2016-09-19  2:08 ` [RFC 2/5] xen: cpupool: introduce cpupool_arch_info van.freenix
2016-09-19  2:08 ` [RFC 3/5] xen: cpupool: add arch cpupool hook van.freenix
2016-09-19  2:08 ` [RFC 4/5] xen/arm: move vpidr from arch_domain to arch_vcpu van.freenix
2016-09-19  2:08 ` [RFC 5/5] xen/arm: cpupool: implement arch_domain_cpupool_compatible van.freenix
2016-09-19  8:09 ` [RFC 0/5] xen/arm: support big.little SoC Julien Grall
2016-09-19  8:36   ` Peng Fan
2016-09-19  8:53     ` Julien Grall
2016-09-19  9:38       ` Peng Fan
2016-09-19  9:59         ` Julien Grall
2016-09-19 13:15           ` Peng Fan
2016-09-19 20:56             ` Stefano Stabellini
2016-09-19  9:45       ` George Dunlap
2016-09-19 10:06         ` Julien Grall
2016-09-19 10:23           ` Juergen Gross
2016-09-19 17:18             ` Dario Faggioli
2016-09-19 21:03               ` Stefano Stabellini
2016-09-19 22:55                 ` Dario Faggioli
2016-09-20  0:01                   ` Stefano Stabellini
2016-09-20  0:54                     ` Dario Faggioli
2016-09-20 10:03                       ` Peng Fan
2016-09-20 10:27                         ` George Dunlap
2016-09-20 15:34                           ` Julien Grall
2016-09-20 17:24                             ` Dario Faggioli
2016-09-20 19:09                             ` Stefano Stabellini
2016-09-20 19:41                               ` Julien Grall
2016-09-20 20:17                                 ` Stefano Stabellini
2016-09-21  8:38                                   ` Peng Fan
2016-09-21  9:22                                     ` George Dunlap
2016-09-21 12:35                                       ` Peng Fan
2016-09-21 15:00                                       ` Dario Faggioli
2016-09-21 10:15                                     ` Julien Grall
2016-09-21 12:28                                       ` Peng Fan
2016-09-21 15:06                                         ` Dario Faggioli
2016-09-22  9:45                                       ` Peng Fan
2016-09-22 11:21                                         ` Julien Grall
2016-09-23  2:38                                           ` Peng Fan
2016-09-21 10:09                                   ` Julien Grall
2016-09-21 10:22                                     ` George Dunlap
2016-09-21 13:06                                       ` Julien Grall
2016-09-21 15:45                                         ` Dario Faggioli
2016-09-21 19:28                                           ` Julien Grall
2016-09-22  6:16                                             ` Peng Fan
2016-09-22  8:43                                             ` Dario Faggioli
2016-09-22 11:24                                               ` Julien Grall
2016-09-22 16:31                                                 ` Dario Faggioli
2016-09-23 13:56                                                   ` Julien Grall
2016-09-21 18:13                                         ` Stefano Stabellini
2016-09-21 19:11                                           ` Julien Grall
2016-09-21 19:21                                             ` Julien Grall
2016-09-21 23:45                                             ` Stefano Stabellini
2016-09-22  6:49                                             ` Peng Fan
2016-09-22  8:50                                               ` Dario Faggioli
2016-09-22  9:27                                                 ` Peng Fan
2016-09-22  9:51                                                   ` George Dunlap
2016-09-22 10:09                                                     ` Peng Fan
2016-09-22 10:39                                                       ` Dario Faggioli
2016-09-22 10:13                                                     ` Juergen Gross
2016-09-22  9:52                                                   ` Dario Faggioli
2016-09-22 11:29                                                   ` Julien Grall
2016-09-22 17:31                                                     ` Stefano Stabellini
2016-09-22 18:54                                                       ` Julien Grall
2016-09-23  2:14                                                         ` Peng Fan
2016-09-23  9:24                                                           ` Julien Grall
2016-09-23 10:05                                                             ` Peng Fan
2016-09-23 10:15                                                               ` Julien Grall
2016-09-23 13:36                                                                 ` Dario Faggioli
2016-09-24  1:57                                                                   ` Stefano Stabellini
2016-09-23 13:52                                                               ` Dario Faggioli
2016-09-24  1:35                                                         ` Stefano Stabellini
2016-09-23  2:03                                                     ` Peng Fan
2016-09-22 10:05                                                 ` Peng Fan
2016-09-22 16:26                                                   ` Dario Faggioli
2016-09-22 17:33                                                     ` Stefano Stabellini
2016-09-21 12:38                                     ` Peng Fan
2016-09-21  9:45                         ` Dario Faggioli
2016-09-20 10:18                     ` George Dunlap
2016-09-19 20:55             ` Stefano Stabellini
2016-09-19 10:33           ` George Dunlap
2016-09-19 13:33             ` Peng Fan
2016-09-20  0:11               ` Dario Faggioli
2016-09-20  6:18                 ` Peng Fan
2016-09-19 16:43             ` Dario Faggioli
2016-09-19 13:08       ` Peng Fan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).