All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/6] enable Cache QoS Monitoring (CQM) feature
@ 2014-03-26  6:35 Dongxiao Xu
  2014-03-26  6:35 ` [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature Dongxiao Xu
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Dongxiao Xu @ 2014-03-26  6:35 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

Changes from v9:
 - Revise the readonly mapping mechanism to share data between Xen and
   userspace. We create L3C buffer for each socket, share both buffer
   address MFNs and buffer MFNs to userspace.
 - Split the pqos.c into pqos.c and cqm.c for better code structure.
 - Show the total L3 cache size when issueing xl pqos-list cqm command.
 - Abstract a libxl_getcqminfo() function to fetch cqm data from Xen.
 - Several coding style fixes.

Changes from v8:
 - Address comments from Ian Campbell, including:
   * Modify the return handling for xc_sysctl();
   * Add man page items for platform QoS related commands.
   * Fix typo in commit message.

Changes from v7:
 - Address comments from Andrew Cooper, including:
   * Check CQM capability before allocating cpumask memory.
   * Move one function declaration into the correct patch.

Changes from v6:
 - Address comments from Jan Beulich, including:
   * Remove the unnecessary CPUID feature check.
   * Remove the unnecessary socket_cpu_map.
   * Spin_lock related changes, avoid spin_lock_irqsave().
   * Use readonly mapping to pass cqm data between Xen/Userspace,
     to avoid data copying.
   * Optimize RDMSR/WRMSR logic to avoid unnecessary calls.
   * Misc fixes including __read_mostly prefix, return value, etc.

Changes from v5:
 - Address comments from Dario Faggioli, including:
   * Define a new libxl_cqminfo structure to avoid reference of xc
     structure in libxl functions.
   * Use LOGE() instead of the LIBXL__LOG() functions.

Changes from v4:
 - When comparing xl cqm parameter, use strcmp instead of strncmp,
   otherwise, "xl pqos-attach cqmabcd domid" will be considered as
   a valid command line.
 - Address comments from Andrew Cooper, including:
   * Adjust the pqos parameter parsing function.
   * Modify the pqos related documentation.
   * Add a check for opt_cqm_max_rmid in initialization code.
   * Do not IPI CPU that is in same socket with current CPU.
 - Address comments from Dario Faggioli, including:
   * Fix an typo in export symbols.
   * Return correct libxl error code for qos related functions.
   * Abstract the error printing logic into a function.
 - Address comment from Daniel De Graaf, including:
   * Add return value in for pqos related check.
 - Address comments from Konrad Rzeszutek Wilk, including:
   * Modify the GPLv2 related file header, remove the address.

Changes from v3:
 - Use structure to better organize CQM related global variables.
 - Address comments from Andrew Cooper, including:
   * Remove the domain creation flag for CQM RMID allocation.
   * Adjust the boot parameter format, use custom_param().
   * Add documentation for the new added boot parameter.
   * Change QoS type flag to be uint64_t.
   * Initialize the per socket cpu bitmap in system boot time.
   * Remove get_cqm_avail() function.
   * Misc of format changes.
 - Address comment from Daniel De Graaf, including:
   * Use avc_current_has_perm() for XEN2__PQOS_OP that belongs to SECCLASS_XEN2.

Changes from v2:
 - Address comments from Andrew Cooper, including:
   * Merging tools stack changes into one patch.
   * Reduce the IPI number to one per socket.
   * Change structures for CQM data exchange between tools and Xen.
   * Misc of format/variable/function name changes.
 - Address comments from Konrad Rzeszutek Wilk, including:
   * Simplify the error printing logic.
   * Add xsm check for the new added hypercalls.

Changes from v1:
 - Address comments from Andrew Cooper, including:
   * Change function names, e.g., alloc_cqm_rmid(), system_supports_cqm(), etc.
   * Change some structure element order to save packing cost.
   * Correct some function's return value.
   * Some programming styles change.
   * ...

Future generations of Intel Xeon processor may offer monitoring capability in
each logical processor to measure specific quality-of-service metric,
for example, the Cache QoS Monitoring to get L3 cache occupancy.
Detailed information please refer to Intel SDM chapter 17.14.

Cache QoS Monitoring provides a layer of abstraction between applications and
logical processors through the use of Resource Monitoring IDs (RMIDs).
In Xen design, each guest in the system can be assigned an RMID independently,
while RMID=0 is reserved for monitoring domains that doesn't enable CQM service.
When any of the domain's vcpu is scheduled on a logical processor, the domain's
RMID will be activated by programming the value into one specific MSR, and when
the vcpu is scheduled out, a RMID=0 will be programmed into that MSR.
The Cache QoS Hardware tracks cache utilization of memory accesses according to
the RMIDs and reports monitored data via a counter register. With this solution,
we can get the knowledge how much L3 cache is used by a certain guest.

To attach CQM service to a certain guest, two approaches are provided:
1) Create the guest with "pqos_cqm=1" set in configuration file.
2) Use "xl pqos-attach cqm domid" for a running guest.

To detached CQM service from a guest, users can:
1) Use "xl pqos-detach cqm domid" for a running guest.
2) Also destroying a guest will detach the CQM service.

To get the L3 cache usage, users can use the command of:
$ xl pqos-list cqm

The below data is just an example showing how the CQM related data is exposed to
end user.

[root@localhost]# xl pqos-list cqm
Name                                        ID  SocketID        L3C_Usage       SocketID        L3C_Usage
Domain-0                                     0         0         19096 KB              1         13608 KB
ExampleHVMDomain1                            1         0             0 KB              1          5992 KB
ExampleHVMDomain2                            2         0          5600 KB              1             0 KB
ExampleHVMDomain3                            3         0             0 KB              1          6048 KB
ExampleHVMDomain4                            4         0          4928 KB              1             0 KB
ExampleHVMDomain5                            5         0             0 KB              1          5488 KB
ExampleHVMDomain6                            6         0          4704 KB              1          1512 KB

Socket L3 Cache Size: 35840 KB
RMID count    56        RMID available    48

Dongxiao Xu (6):
  x86: detect and initialize Cache QoS Monitoring feature
  x86: dynamically attach/detach CQM service for a guest
  x86: collect CQM information from all sockets
  x86: enable CQM monitoring for each domain RMID
  xsm: add platform QoS related xsm policies
  tools: enable Cache QoS Monitoring feature for libxl/libxc

 docs/man/xl.pod.1                            |   23 ++
 docs/misc/xen-command-line.markdown          |    7 +
 tools/flask/policy/policy/modules/xen/xen.if |    2 +-
 tools/flask/policy/policy/modules/xen/xen.te |    5 +-
 tools/libxc/xc_domain.c                      |   41 ++++
 tools/libxc/xenctrl.h                        |   13 ++
 tools/libxl/Makefile                         |    3 +-
 tools/libxl/libxl.h                          |   12 ++
 tools/libxl/libxl_pqos.c                     |  190 +++++++++++++++++
 tools/libxl/libxl_types.idl                  |    9 +
 tools/libxl/xl.h                             |    3 +
 tools/libxl/xl_cmdimpl.c                     |  109 ++++++++++
 tools/libxl/xl_cmdtable.c                    |   15 ++
 xen/arch/x86/Makefile                        |    1 +
 xen/arch/x86/domain.c                        |    8 +
 xen/arch/x86/domctl.c                        |   24 +++
 xen/arch/x86/pqos/Makefile                   |    2 +
 xen/arch/x86/pqos/cqm.c                      |  292 ++++++++++++++++++++++++++
 xen/arch/x86/pqos/pqos.c                     |   94 +++++++++
 xen/arch/x86/setup.c                         |    3 +
 xen/arch/x86/sysctl.c                        |   70 ++++++
 xen/include/asm-x86/cpufeature.h             |    1 +
 xen/include/asm-x86/domain.h                 |    2 +
 xen/include/asm-x86/msr-index.h              |    5 +
 xen/include/asm-x86/pqos.h                   |   68 ++++++
 xen/include/public/domctl.h                  |   11 +
 xen/include/public/sysctl.h                  |   12 ++
 xen/xsm/flask/hooks.c                        |    8 +
 xen/xsm/flask/policy/access_vectors          |   17 +-
 29 files changed, 1044 insertions(+), 6 deletions(-)
 create mode 100644 tools/libxl/libxl_pqos.c
 create mode 100644 xen/arch/x86/pqos/Makefile
 create mode 100644 xen/arch/x86/pqos/cqm.c
 create mode 100644 xen/arch/x86/pqos/pqos.c
 create mode 100644 xen/include/asm-x86/pqos.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-26  6:35 [PATCH v10 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
@ 2014-03-26  6:35 ` Dongxiao Xu
  2014-03-31 14:21   ` Jan Beulich
  2014-03-26  6:35 ` [PATCH v10 2/6] x86: dynamically attach/detach CQM service for a guest Dongxiao Xu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Dongxiao Xu @ 2014-03-26  6:35 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

Detect platform QoS feature status and enumerate the resource types,
one of which is to monitor the L3 cache occupancy.

Also introduce a Xen grub command line parameter to control the
QoS feature status.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
---
 docs/misc/xen-command-line.markdown |    7 ++
 xen/arch/x86/Makefile               |    1 +
 xen/arch/x86/pqos/Makefile          |    2 +
 xen/arch/x86/pqos/cqm.c             |  179 +++++++++++++++++++++++++++++++++++
 xen/arch/x86/pqos/pqos.c            |   94 ++++++++++++++++++
 xen/arch/x86/setup.c                |    3 +
 xen/include/asm-x86/cpufeature.h    |    1 +
 xen/include/asm-x86/pqos.h          |   47 +++++++++
 8 files changed, 334 insertions(+)
 create mode 100644 xen/arch/x86/pqos/Makefile
 create mode 100644 xen/arch/x86/pqos/cqm.c
 create mode 100644 xen/arch/x86/pqos/pqos.c
 create mode 100644 xen/include/asm-x86/pqos.h

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 689ffe6..36620ad 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -788,6 +788,13 @@ This option can be specified more than once (up to 8 times at present).
 ### ple\_window
 > `= <integer>`
 
+### pqos (Intel)
+> `= List of ( <boolean> | cqm:<boolean> | cqm_max_rmid:<integer> )`
+
+> Default: `pqos=1,cqm:1,cqm_max_rmid:255`
+
+Configure platform QoS services.
+
 ### reboot
 > `= t[riple] | k[bd] | n[o] [, [w]arm | [c]old]`
 
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index d502bdf..87b329b 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -4,6 +4,7 @@ subdir-y += genapic
 subdir-y += hvm
 subdir-y += mm
 subdir-y += oprofile
+subdir-y += pqos
 
 subdir-$(x86_64) += x86_64
 
diff --git a/xen/arch/x86/pqos/Makefile b/xen/arch/x86/pqos/Makefile
new file mode 100644
index 0000000..92f9b3d
--- /dev/null
+++ b/xen/arch/x86/pqos/Makefile
@@ -0,0 +1,2 @@
+obj-y += pqos.o
+obj-y += cqm.o
diff --git a/xen/arch/x86/pqos/cqm.c b/xen/arch/x86/pqos/cqm.c
new file mode 100644
index 0000000..b46f25b
--- /dev/null
+++ b/xen/arch/x86/pqos/cqm.c
@@ -0,0 +1,179 @@
+/*
+ * pqos.c: Platform QoS related service for guest.
+ *
+ * Copyright (c) 2014, Intel Corporation
+ * Author: Dongxiao Xu <dongxiao.xu@intel.com>
+ * Author: Jiongxi Li  <jiongxi.li@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <asm/processor.h>
+#include <xen/init.h>
+#include <xen/mm.h>
+#include <xen/spinlock.h>
+#include <xen/notifier.h>
+#include <xen/cpu.h>
+#include <asm/pqos.h>
+
+struct pqos_cqm *__read_mostly cqm = NULL;
+
+static int cqm_add_socket(int socket)
+{
+    unsigned int i, order;
+
+    if ( cqm->l3c[socket] )
+        return 0;
+
+    /* Allocate per-socket CQM LLC buffer */
+    order = get_order_from_bytes((cqm->rmid_max + 1) * sizeof(unsigned long));
+    cqm->l3c[socket] = alloc_xenheap_pages(order, 0);
+    if ( !cqm->l3c[socket] )
+    {
+        printk(XENLOG_WARNING
+               "Failed to alloc CQM buffer when adding a new socket.\n");
+        return -1;
+    }
+    /* Track the per-socket CQM LLC buffer MFN */
+    cqm->socket_l3c_mfn[socket] = virt_to_mfn(cqm->l3c[socket]);
+    for ( i = 0; i < (1 << order); i++ )
+        share_xen_page_with_privileged_guests(
+            virt_to_page((void *)cqm->l3c[socket] + i * PAGE_SIZE),
+            XENSHARE_readonly);
+
+    return 0;
+}
+
+static void cqm_remove_socket(int socket)
+{
+    unsigned int order;
+
+    if ( cqm->l3c[socket] )
+    {
+        order = get_order_from_bytes((cqm->rmid_max + 1) * sizeof(unsigned long));
+        free_xenheap_pages(cqm->l3c[socket], order);
+    }
+
+    cqm->socket_l3c_mfn[socket] = 0;
+}
+
+/* Always return NOTIFY_DONE to avoid CPU online/offline failure by CQM */
+static int cpu_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    unsigned int cpu = (unsigned long)hcpu;
+    int socket = cpu_to_socket(cpu);
+
+    if ( socket < 0 )
+        return NOTIFY_DONE;
+
+    switch ( action )
+    {
+    case CPU_ONLINE:
+        cqm_add_socket(socket);
+        break;
+    case CPU_DEAD:
+        if ( !cpumask_weight(per_cpu(cpu_core_mask, cpu)) )
+            cqm_remove_socket(socket);
+        break;
+    default:
+        break;
+    }
+
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_nfb = {
+    .notifier_call = cpu_callback
+};
+
+void __init init_cqm(unsigned int rmid_max, unsigned long rmid_mask)
+{
+    unsigned int rmid, cpu, socket;
+    unsigned int eax, edx;
+    unsigned int i, order = 0;
+
+    if ( !rmid_max )
+        return;
+
+    cqm = xzalloc(struct pqos_cqm);
+    if ( !cqm )
+        return;
+
+    cpuid_count(0xf, 1, &eax, &cqm->upscaling_factor, &cqm->rmid_max, &edx);
+    if ( !(edx & QOS_MONITOR_EVTID_L3) )
+        goto out;
+
+    cqm->rmid_mask = rmid_mask;
+    cqm->rmid_inuse = 0;
+    cqm->rmid_min = 1;
+    cqm->rmid_max = min(rmid_max, cqm->rmid_max);
+
+    spin_lock_init(&cqm->cqm_lock);
+
+    /* According to Intel SDM, the possible maximum rmid number is 2^10 = 1024,
+     * thus one page is enough to hold cqm->rmid_to_dom structure */
+    cqm->rmid_to_dom = alloc_xenheap_page();
+    if ( !cqm->rmid_to_dom )
+        goto out;
+    /* Reserve RMID 0 for all domains not being monitored */
+    cqm->rmid_to_dom[0] = DOMID_XEN;
+    for ( rmid = 1; rmid < PAGE_SIZE/sizeof(domid_t); rmid++ )
+        cqm->rmid_to_dom[rmid] = DOMID_INVALID;
+    /* Dom0 tool stack needs to know the RMID to DOMID mapping */
+    share_xen_page_with_privileged_guests(
+        virt_to_page((void *)cqm->rmid_to_dom), XENSHARE_readonly);
+
+    /* Allocate the buffer that holds MFNs of per-socket CQM LLC */
+    order = get_order_from_bytes(sizeof(unsigned long) * NR_CPUS);
+    cqm->socket_l3c_mfn = alloc_xenheap_pages(order, 0);
+    if ( !cqm->socket_l3c_mfn )
+        goto out;
+    memset(cqm->socket_l3c_mfn, 0, PAGE_SIZE * (1 << order));
+    /* Dom0 tool stack will use these MFNs to map each CQM LLC buffer */
+    for ( i = 0; i < (1 << order); i++ )
+        share_xen_page_with_privileged_guests(
+            virt_to_page((void *)cqm->socket_l3c_mfn + i * PAGE_SIZE),
+            XENSHARE_readonly);
+
+    for ( cpu = 0; cpu < NR_CPUS; cpu++ )
+    {
+        socket = cpu_to_socket(cpu);
+        if ( socket < 0 )
+            continue;
+        if ( cqm_add_socket(socket) < 0 )
+            goto out;
+    }
+
+    register_cpu_notifier(&cpu_nfb);
+
+    printk(XENLOG_INFO "Cache QoS Monitoring Enabled.\n");
+
+    return;
+
+out:
+    for ( socket = 0; socket < NR_CPUS; socket++ )
+        cqm_remove_socket(socket);
+    if ( cqm->socket_l3c_mfn )
+        free_xenheap_pages(cqm->socket_l3c_mfn, order);
+    if ( cqm->rmid_to_dom )
+        free_xenheap_page(cqm->rmid_to_dom);
+    xfree(cqm);
+    cqm = NULL;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/pqos/pqos.c b/xen/arch/x86/pqos/pqos.c
new file mode 100644
index 0000000..3f430ba
--- /dev/null
+++ b/xen/arch/x86/pqos/pqos.c
@@ -0,0 +1,94 @@
+/*
+ * pqos.c: Platform QoS related service for guest.
+ *
+ * Copyright (c) 2014, Intel Corporation
+ * Author: Dongxiao Xu <dongxiao.xu@intel.com>
+ * Author: Jiongxi Li  <jiongxi.li@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <xen/init.h>
+#include <xen/cpu.h>
+#include <asm/pqos.h>
+
+static bool_t __initdata opt_pqos = 1;
+static bool_t __initdata opt_cqm = 1;
+static unsigned int __initdata opt_cqm_max_rmid = 255;
+
+static void __init parse_pqos_param(char *s)
+{
+    char *ss, *val_str;
+    int val;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        val = parse_bool(s);
+        if ( val >= 0 )
+            opt_pqos = val;
+        else
+        {
+            val = !!strncmp(s, "no-", 3);
+            if ( !val )
+                s += 3;
+
+            val_str = strchr(s, ':');
+            if ( val_str )
+                *val_str++ = '\0';
+
+            if ( val_str && !strcmp(s, "cqm") &&
+                 (val = parse_bool(val_str)) >= 0 )
+                opt_cqm = val;
+            else if ( val_str && !strcmp(s, "cqm_max_rmid") )
+                opt_cqm_max_rmid = simple_strtoul(val_str, NULL, 0);
+        }
+
+        s = ss + 1;
+    } while ( ss );
+}
+
+custom_param("pqos", parse_pqos_param);
+
+static void __init init_qos_monitor(void)
+{
+    unsigned int qm_features;
+    unsigned int eax, ebx, ecx;
+    uint64_t rmid_mask;
+
+    if ( !boot_cpu_has(X86_FEATURE_QOSM) )
+        return;
+
+    cpuid_count(0xf, 0, &eax, &ebx, &ecx, &qm_features);
+
+    rmid_mask = ~(~0ull << get_count_order(ebx));
+
+    if ( opt_cqm && (qm_features & QOS_MONITOR_TYPE_L3) )
+        init_cqm(opt_cqm_max_rmid, rmid_mask);
+}
+
+void __init init_platform_qos(void)
+{
+    if ( !opt_pqos )
+        return;
+
+    init_qos_monitor();
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 4dbf2b7..5c522d2 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -48,6 +48,7 @@
 #include <asm/setup.h>
 #include <xen/cpu.h>
 #include <asm/nmi.h>
+#include <asm/pqos.h>
 
 /* opt_nosmp: If true, secondary processors are ignored. */
 static bool_t __initdata opt_nosmp;
@@ -1412,6 +1413,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     domain_unpause_by_systemcontroller(dom0);
 
+    init_platform_qos();
+
     reset_stack_and_jump(init_done);
 }
 
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 0c4d6c1..37cd8b7 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -145,6 +145,7 @@
 #define X86_FEATURE_ERMS	(7*32+ 9) /* Enhanced REP MOVSB/STOSB */
 #define X86_FEATURE_INVPCID	(7*32+10) /* Invalidate Process Context ID */
 #define X86_FEATURE_RTM 	(7*32+11) /* Restricted Transactional Memory */
+#define X86_FEATURE_QOSM	(7*32+12) /* Platform QoS monitoring capability */
 #define X86_FEATURE_NO_FPU_SEL 	(7*32+13) /* FPU CS/DS stored as zero */
 #define X86_FEATURE_MPX		(7*32+14) /* Memory Protection Extensions */
 #define X86_FEATURE_RDSEED	(7*32+18) /* RDSEED instruction */
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
new file mode 100644
index 0000000..9baa621
--- /dev/null
+++ b/xen/include/asm-x86/pqos.h
@@ -0,0 +1,47 @@
+/*
+ * pqos.h: Platform QoS related service for guest.
+ *
+ * Copyright (c) 2014, Intel Corporation
+ * Author: Dongxiao Xu <dongxiao.xu@intel.com>
+ * Author: Jiongxi Li  <jiongxi.li@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#ifndef ASM_PQOS_H
+#define ASM_PQOS_H
+
+#include <public/xen.h>
+#include <xen/spinlock.h>
+
+/* QoS Resource Type Enumeration */
+#define QOS_MONITOR_TYPE_L3            0x2
+
+/* QoS Monitoring Event ID */
+#define QOS_MONITOR_EVTID_L3           0x1
+
+struct pqos_cqm {
+    unsigned long rmid_mask;
+    unsigned int rmid_min;
+    unsigned int rmid_max;
+    unsigned int rmid_inuse;
+    unsigned int upscaling_factor;
+    domid_t *rmid_to_dom;
+    /* socket_l3c_mfn stores cqm->l3c MFNs of each socket */
+    unsigned long *socket_l3c_mfn;
+    /* NR_CPUS is big enough to cover all sockets */
+    unsigned long *l3c[NR_CPUS];
+    spinlock_t cqm_lock;
+};
+extern struct pqos_cqm *cqm;
+
+void __init init_platform_qos(void);
+void __init init_cqm(unsigned int rmid_max, unsigned long rmid_mask);
+
+#endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v10 2/6] x86: dynamically attach/detach CQM service for a guest
  2014-03-26  6:35 [PATCH v10 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
  2014-03-26  6:35 ` [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature Dongxiao Xu
@ 2014-03-26  6:35 ` Dongxiao Xu
  2014-04-02 15:24   ` Jan Beulich
  2014-03-26  6:35 ` [PATCH v10 3/6] x86: collect CQM information from all sockets Dongxiao Xu
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Dongxiao Xu @ 2014-03-26  6:35 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

Add hypervisor side support for dynamically attach and detach CQM
services for a certain guest.

When attach CQM service for a guest, system will allocate an RMID for
it. When detach or guest is shutdown, the RMID will be retrieved back
for future use.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
---
 xen/arch/x86/domain.c        |    3 +++
 xen/arch/x86/domctl.c        |   24 +++++++++++++++++
 xen/arch/x86/pqos/cqm.c      |   59 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h |    2 ++
 xen/include/asm-x86/pqos.h   |    9 +++++++
 xen/include/public/domctl.h  |   11 ++++++++
 6 files changed, 108 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 0d563de..3ea9402 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -60,6 +60,7 @@
 #include <xen/numa.h>
 #include <xen/iommu.h>
 #include <compat/vcpu.h>
+#include <asm/pqos.h>
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
 DEFINE_PER_CPU(unsigned long, cr4);
@@ -611,6 +612,8 @@ void arch_domain_destroy(struct domain *d)
 
     free_xenheap_page(d->shared_info);
     cleanup_domain_irq_mapping(d);
+
+    free_cqm_rmid(d);
 }
 
 unsigned long pv_guest_cr4_fixup(const struct vcpu *v, unsigned long guest_cr4)
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 26635ff..d5835c3 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -35,6 +35,7 @@
 #include <asm/mem_sharing.h>
 #include <asm/xstate.h>
 #include <asm/debugger.h>
+#include <asm/pqos.h>
 
 static int gdbsx_guest_mem_io(
     domid_t domid, struct xen_domctl_gdbsx_memio *iop)
@@ -1256,6 +1257,29 @@ long arch_do_domctl(
     }
     break;
 
+    case XEN_DOMCTL_attach_pqos:
+        if ( !(domctl->u.qos_type.flags & XEN_DOMCTL_pqos_cqm) )
+            ret = -EINVAL;
+        else if ( !system_supports_cqm() )
+            ret = -ENODEV;
+        else
+            ret = alloc_cqm_rmid(d);
+        break;
+
+    case XEN_DOMCTL_detach_pqos:
+        if ( !(domctl->u.qos_type.flags & XEN_DOMCTL_pqos_cqm) )
+            ret = -EINVAL;
+        else if ( !system_supports_cqm() )
+            ret = -ENODEV;
+        else if ( d->arch.pqos_cqm_rmid > 0 )
+        {
+            free_cqm_rmid(d);
+            ret = 0;
+        }
+        else
+            ret = -ENOENT;
+        break;
+
     default:
         ret = iommu_do_domctl(domctl, d, u_domctl);
         break;
diff --git a/xen/arch/x86/pqos/cqm.c b/xen/arch/x86/pqos/cqm.c
index b46f25b..0f492a4 100644
--- a/xen/arch/x86/pqos/cqm.c
+++ b/xen/arch/x86/pqos/cqm.c
@@ -168,6 +168,65 @@ out:
     cqm = NULL;
 }
 
+int alloc_cqm_rmid(struct domain *d)
+{
+    int rc = 0;
+    unsigned int rmid;
+
+    ASSERT(system_supports_cqm());
+
+    spin_lock(&cqm->cqm_lock);
+
+    if ( d->arch.pqos_cqm_rmid > 0 )
+    {
+        rc = -EEXIST;
+        goto out;
+    }
+
+    for ( rmid = cqm->rmid_min; rmid <= cqm->rmid_max; rmid++ )
+    {
+        if ( cqm->rmid_to_dom[rmid] != DOMID_INVALID)
+            continue;
+
+        cqm->rmid_to_dom[rmid] = d->domain_id;
+        break;
+    }
+
+    /* No CQM RMID available, assign RMID=0 by default */
+    if ( rmid > cqm->rmid_max )
+    {
+        rmid = 0;
+        rc = -EUSERS;
+    }
+    else
+        cqm->rmid_inuse++;
+
+    d->arch.pqos_cqm_rmid = rmid;
+
+out:
+    spin_unlock(&cqm->cqm_lock);
+
+    return rc;
+}
+
+void free_cqm_rmid(struct domain *d)
+{
+    unsigned int rmid;
+
+    spin_lock(&cqm->cqm_lock);
+    rmid = d->arch.pqos_cqm_rmid;
+    /* We do not free system reserved "RMID=0" */
+    if ( rmid == 0 )
+        goto out;
+
+    cqm->rmid_to_dom[rmid] = DOMID_INVALID;
+    d->arch.pqos_cqm_rmid = 0;
+    cqm->rmid_inuse--;
+
+out:
+    spin_unlock(&cqm->cqm_lock);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 4ff89f0..006fc19 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -313,6 +313,8 @@ struct arch_domain
     spinlock_t e820_lock;
     struct e820entry *e820;
     unsigned int nr_e820;
+
+    unsigned int pqos_cqm_rmid;       /* CQM RMID assigned to the domain */
 } __cacheline_aligned;
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
index 9baa621..01fa7f2 100644
--- a/xen/include/asm-x86/pqos.h
+++ b/xen/include/asm-x86/pqos.h
@@ -16,6 +16,7 @@
  */
 #ifndef ASM_PQOS_H
 #define ASM_PQOS_H
+#include <xen/sched.h>
 
 #include <public/xen.h>
 #include <xen/spinlock.h>
@@ -41,7 +42,15 @@ struct pqos_cqm {
 };
 extern struct pqos_cqm *cqm;
 
+static inline bool_t system_supports_cqm(void)
+{
+    return !!cqm;
+}
+
 void __init init_platform_qos(void);
 void __init init_cqm(unsigned int rmid_max, unsigned long rmid_mask);
 
+int alloc_cqm_rmid(struct domain *d);
+void free_cqm_rmid(struct domain *d);
+
 #endif
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index f22fe2e..93435ce 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -896,6 +896,14 @@ struct xen_domctl_cacheflush {
 typedef struct xen_domctl_cacheflush xen_domctl_cacheflush_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_cacheflush_t);
 
+struct xen_domctl_qos_type {
+#define _XEN_DOMCTL_pqos_cqm      0
+#define XEN_DOMCTL_pqos_cqm       (1U<<_XEN_DOMCTL_pqos_cqm)
+    uint64_t flags;
+};
+typedef struct xen_domctl_qos_type xen_domctl_qos_type_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_qos_type_t);
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -966,6 +974,8 @@ struct xen_domctl {
 #define XEN_DOMCTL_getnodeaffinity               69
 #define XEN_DOMCTL_set_max_evtchn                70
 #define XEN_DOMCTL_cacheflush                    71
+#define XEN_DOMCTL_attach_pqos                   72
+#define XEN_DOMCTL_detach_pqos                   73
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1027,6 +1037,7 @@ struct xen_domctl {
         struct xen_domctl_cacheflush        cacheflush;
         struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
         struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
+        struct xen_domctl_qos_type          qos_type;
         uint8_t                             pad[128];
     } u;
 };
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v10 3/6] x86: collect CQM information from all sockets
  2014-03-26  6:35 [PATCH v10 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
  2014-03-26  6:35 ` [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature Dongxiao Xu
  2014-03-26  6:35 ` [PATCH v10 2/6] x86: dynamically attach/detach CQM service for a guest Dongxiao Xu
@ 2014-03-26  6:35 ` Dongxiao Xu
  2014-04-02 15:35   ` Jan Beulich
  2014-03-26  6:35 ` [PATCH v10 4/6] x86: enable CQM monitoring for each domain RMID Dongxiao Xu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Dongxiao Xu @ 2014-03-26  6:35 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

Collect CQM information (L3 cache occupancy) from all sockets.
Upper layer application can parse the data structure to get the
information of guest's L3 cache occupancy on certain sockets.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
---
 xen/arch/x86/pqos/cqm.c         |   33 ++++++++++++++++++
 xen/arch/x86/sysctl.c           |   70 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/msr-index.h |    4 +++
 xen/include/asm-x86/pqos.h      |    6 ++++
 xen/include/public/sysctl.h     |   12 +++++++
 5 files changed, 125 insertions(+)

diff --git a/xen/arch/x86/pqos/cqm.c b/xen/arch/x86/pqos/cqm.c
index 0f492a4..4913975 100644
--- a/xen/arch/x86/pqos/cqm.c
+++ b/xen/arch/x86/pqos/cqm.c
@@ -15,6 +15,7 @@
  * more details.
  */
 #include <asm/processor.h>
+#include <asm/msr.h>
 #include <xen/init.h>
 #include <xen/mm.h>
 #include <xen/spinlock.h>
@@ -227,6 +228,38 @@ out:
     spin_unlock(&cqm->cqm_lock);
 }
 
+static void read_cqm_data(void *arg)
+{
+    uint64_t cqm_data;
+    unsigned int rmid;
+    int socket = cpu_to_socket(smp_processor_id());
+
+    ASSERT(system_supports_cqm());
+
+    if ( socket < 0 )
+        return;
+
+    for ( rmid = cqm->rmid_min; rmid <= cqm->rmid_max; rmid++ )
+    {
+        if ( cqm->rmid_to_dom[rmid] == DOMID_INVALID )
+            continue;
+
+        wrmsr(MSR_IA32_QOSEVTSEL, QOS_MONITOR_EVTID_L3, rmid);
+        rdmsrl(MSR_IA32_QMC, cqm_data);
+
+        if ( !(cqm_data & IA32_QM_CTR_ERROR_MASK) )
+            cqm->l3c[socket][rmid] = cqm_data * cqm->upscaling_factor;
+    }
+}
+
+void get_cqm_info(const cpumask_t *cpu_cqmdata_map)
+{
+    /* Read CQM data in current CPU */
+    read_cqm_data(NULL);
+    /* Issue IPI to other CPUs to read CQM data */
+    on_selected_cpus(cpu_cqmdata_map, read_cqm_data, NULL, 1);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 15d4b91..30dcde3 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -28,6 +28,7 @@
 #include <xen/nodemask.h>
 #include <xen/cpu.h>
 #include <xsm/xsm.h>
+#include <asm/pqos.h>
 
 #define get_xen_guest_handle(val, hnd)  do { val = (hnd).p; } while (0)
 
@@ -66,6 +67,36 @@ void arch_do_physinfo(xen_sysctl_physinfo_t *pi)
         pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm_directio;
 }
 
+/* Select one random CPU for each socket. Current CPU's socket is excluded */
+static int select_socket_cpu(cpumask_t *cpu_bitmap)
+{
+    int i;
+    unsigned int cpu;
+    int socket, socket_curr = cpu_to_socket(smp_processor_id());
+    cpumask_var_t sockets;
+
+    if ( !zalloc_cpumask_var(&sockets) )
+        return -ENOMEM;
+
+    if ( socket_curr >= 0 )
+        set_bit(socket_curr, sockets);
+
+    cpumask_clear(cpu_bitmap);
+    for ( i = 0; i < NR_CPUS; i++ )
+    {
+        socket = cpu_to_socket(i);
+        if ( socket < 0 || test_and_set_bit(socket, sockets) )
+            continue;
+        cpu = cpumask_any(per_cpu(cpu_core_mask, i));
+        if ( cpu < nr_cpu_ids )
+            cpumask_set_cpu(cpu, cpu_bitmap);
+    }
+
+    free_cpumask_var(sockets);
+
+    return 0;
+}
+
 long arch_do_sysctl(
     struct xen_sysctl *sysctl, XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
@@ -101,6 +132,45 @@ long arch_do_sysctl(
     }
     break;
 
+    case XEN_SYSCTL_getcqminfo:
+    {
+        struct cpuinfo_x86 *c = &boot_cpu_data;
+        cpumask_var_t cpu_cqmdata_map;
+
+        if ( !system_supports_cqm() )
+        {
+            ret = -ENODEV;
+            break;
+        }
+
+        if ( !zalloc_cpumask_var(&cpu_cqmdata_map) )
+        {
+            ret = -ENOMEM;
+            break;
+        }
+
+        ret = select_socket_cpu(cpu_cqmdata_map);
+        if ( ret < 0 )
+        {
+            free_cpumask_var(cpu_cqmdata_map);
+            break;
+        }
+
+        get_cqm_info(cpu_cqmdata_map);
+
+        sysctl->u.getcqminfo.socket_l3c_mfn = virt_to_mfn(cqm->socket_l3c_mfn);
+        sysctl->u.getcqminfo.rmid_dom_mfn = virt_to_mfn(cqm->rmid_to_dom);
+        sysctl->u.getcqminfo.nr_rmids = cqm->rmid_max + 1;
+        sysctl->u.getcqminfo.nr_sockets = cpumask_weight(cpu_cqmdata_map) + 1;
+        sysctl->u.getcqminfo.l3c_total = c->x86_cache_size;
+
+        if ( __copy_to_guest(u_sysctl, sysctl, 1) )
+            ret = -EFAULT;
+
+        free_cpumask_var(cpu_cqmdata_map);
+    }
+    break;
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index ff86ed9..847aeb9 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -482,4 +482,8 @@
 /* Geode defined MSRs */
 #define MSR_GEODE_BUSCONT_CONF0		0x00001900
 
+/* Platform QoS register */
+#define MSR_IA32_QOSEVTSEL             0x00000c8d
+#define MSR_IA32_QMC                   0x00000c8e
+
 #endif /* __ASM_MSR_INDEX_H */
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
index 01fa7f2..55b5187 100644
--- a/xen/include/asm-x86/pqos.h
+++ b/xen/include/asm-x86/pqos.h
@@ -17,6 +17,8 @@
 #ifndef ASM_PQOS_H
 #define ASM_PQOS_H
 #include <xen/sched.h>
+#include <xen/cpumask.h>
+#include <public/domctl.h>
 
 #include <public/xen.h>
 #include <xen/spinlock.h>
@@ -27,6 +29,9 @@
 /* QoS Monitoring Event ID */
 #define QOS_MONITOR_EVTID_L3           0x1
 
+/* IA32_QM_CTR */
+#define IA32_QM_CTR_ERROR_MASK         (0x3ul << 62)
+
 struct pqos_cqm {
     unsigned long rmid_mask;
     unsigned int rmid_min;
@@ -52,5 +57,6 @@ void __init init_cqm(unsigned int rmid_max, unsigned long rmid_mask);
 
 int alloc_cqm_rmid(struct domain *d);
 void free_cqm_rmid(struct domain *d);
+void get_cqm_info(const cpumask_t *cpu_cqmdata_map);
 
 #endif
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 8ae6870..d2d9a64 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -629,6 +629,16 @@ struct xen_sysctl_coverage_op {
 typedef struct xen_sysctl_coverage_op xen_sysctl_coverage_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_coverage_op_t);
 
+struct xen_sysctl_getcqminfo {
+    uint64_aligned_t socket_l3c_mfn;
+    uint64_aligned_t rmid_dom_mfn;
+    uint32_t nr_rmids;
+    uint32_t nr_sockets;
+    uint32_t l3c_total;
+};
+typedef struct xen_sysctl_getcqminfo xen_sysctl_getcqminfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_getcqminfo_t);
+
 
 struct xen_sysctl {
     uint32_t cmd;
@@ -651,6 +661,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_cpupool_op                    18
 #define XEN_SYSCTL_scheduler_op                  19
 #define XEN_SYSCTL_coverage_op                   20
+#define XEN_SYSCTL_getcqminfo                    21
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -672,6 +683,7 @@ struct xen_sysctl {
         struct xen_sysctl_cpupool_op        cpupool_op;
         struct xen_sysctl_scheduler_op      scheduler_op;
         struct xen_sysctl_coverage_op       coverage_op;
+        struct xen_sysctl_getcqminfo        getcqminfo;
         uint8_t                             pad[128];
     } u;
 };
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v10 4/6] x86: enable CQM monitoring for each domain RMID
  2014-03-26  6:35 [PATCH v10 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
                   ` (2 preceding siblings ...)
  2014-03-26  6:35 ` [PATCH v10 3/6] x86: collect CQM information from all sockets Dongxiao Xu
@ 2014-03-26  6:35 ` Dongxiao Xu
  2014-03-26  6:35 ` [PATCH v10 5/6] xsm: add platform QoS related xsm policies Dongxiao Xu
  2014-03-26  6:35 ` [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc Dongxiao Xu
  5 siblings, 0 replies; 28+ messages in thread
From: Dongxiao Xu @ 2014-03-26  6:35 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

If the CQM service is attached to a domain, its related RMID will be set
to hardware for monitoring when the domain's vcpu is scheduled in. When
the domain's vcpu is scheduled out, RMID 0 (system reserved) will be set
for monitoring.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
---
 xen/arch/x86/domain.c           |    5 +++++
 xen/arch/x86/pqos/cqm.c         |   21 +++++++++++++++++++++
 xen/include/asm-x86/msr-index.h |    1 +
 xen/include/asm-x86/pqos.h      |    6 ++++++
 4 files changed, 33 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 3ea9402..e256cbb 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1380,6 +1380,8 @@ static void __context_switch(void)
     {
         memcpy(&p->arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
         vcpu_save_fpu(p);
+        if ( system_supports_cqm() && cqm->rmid_inuse )
+            cqm_assoc_rmid(0);
         p->arch.ctxt_switch_from(p);
     }
 
@@ -1404,6 +1406,9 @@ static void __context_switch(void)
         }
         vcpu_restore_fpu_eager(n);
         n->arch.ctxt_switch_to(n);
+
+        if ( system_supports_cqm() && n->domain->arch.pqos_cqm_rmid > 0 )
+            cqm_assoc_rmid(n->domain->arch.pqos_cqm_rmid);
     }
 
     gdt = !is_pv_32on64_vcpu(n) ? per_cpu(gdt_table, cpu) :
diff --git a/xen/arch/x86/pqos/cqm.c b/xen/arch/x86/pqos/cqm.c
index 4913975..7a936e0 100644
--- a/xen/arch/x86/pqos/cqm.c
+++ b/xen/arch/x86/pqos/cqm.c
@@ -24,6 +24,7 @@
 #include <asm/pqos.h>
 
 struct pqos_cqm *__read_mostly cqm = NULL;
+static DEFINE_PER_CPU(struct pqr_assoc, pqr_assoc);
 
 static int cqm_add_socket(int socket)
 {
@@ -260,6 +261,26 @@ void get_cqm_info(const cpumask_t *cpu_cqmdata_map)
     on_selected_cpus(cpu_cqmdata_map, read_cqm_data, NULL, 1);
 }
 
+void cqm_assoc_rmid(unsigned int rmid)
+{
+    uint64_t val;
+    uint64_t new_val;
+
+    if ( !this_cpu(pqr_assoc).initialized )
+    {
+        rdmsrl(MSR_IA32_PQR_ASSOC, this_cpu(pqr_assoc).val);
+        this_cpu(pqr_assoc).initialized = 1;
+    }
+    val = this_cpu(pqr_assoc).val;
+
+    new_val = (val & ~cqm->rmid_mask) | (rmid & cqm->rmid_mask);
+    if ( val != new_val )
+    {
+        wrmsrl(MSR_IA32_PQR_ASSOC, new_val);
+        this_cpu(pqr_assoc).val = new_val;
+    }
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 847aeb9..1fc90a6 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -485,5 +485,6 @@
 /* Platform QoS register */
 #define MSR_IA32_QOSEVTSEL             0x00000c8d
 #define MSR_IA32_QMC                   0x00000c8e
+#define MSR_IA32_PQR_ASSOC             0x00000c8f
 
 #endif /* __ASM_MSR_INDEX_H */
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
index 55b5187..83c6ac6 100644
--- a/xen/include/asm-x86/pqos.h
+++ b/xen/include/asm-x86/pqos.h
@@ -47,6 +47,11 @@ struct pqos_cqm {
 };
 extern struct pqos_cqm *cqm;
 
+struct pqr_assoc {
+    uint64_t val;
+    bool_t initialized;
+};
+
 static inline bool_t system_supports_cqm(void)
 {
     return !!cqm;
@@ -58,5 +63,6 @@ void __init init_cqm(unsigned int rmid_max, unsigned long rmid_mask);
 int alloc_cqm_rmid(struct domain *d);
 void free_cqm_rmid(struct domain *d);
 void get_cqm_info(const cpumask_t *cpu_cqmdata_map);
+void cqm_assoc_rmid(unsigned int rmid);
 
 #endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v10 5/6] xsm: add platform QoS related xsm policies
  2014-03-26  6:35 [PATCH v10 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
                   ` (3 preceding siblings ...)
  2014-03-26  6:35 ` [PATCH v10 4/6] x86: enable CQM monitoring for each domain RMID Dongxiao Xu
@ 2014-03-26  6:35 ` Dongxiao Xu
  2014-03-26  6:35 ` [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc Dongxiao Xu
  5 siblings, 0 replies; 28+ messages in thread
From: Dongxiao Xu @ 2014-03-26  6:35 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

Add xsm policies for attach/detach pqos services and get CQM info
hypercalls.

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 tools/flask/policy/policy/modules/xen/xen.if |    2 +-
 tools/flask/policy/policy/modules/xen/xen.te |    5 ++++-
 xen/xsm/flask/hooks.c                        |    8 ++++++++
 xen/xsm/flask/policy/access_vectors          |   17 ++++++++++++++---
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if
index dedc035..1f683af 100644
--- a/tools/flask/policy/policy/modules/xen/xen.if
+++ b/tools/flask/policy/policy/modules/xen/xen.if
@@ -49,7 +49,7 @@ define(`create_domain_common', `
 			getdomaininfo hypercall setvcpucontext setextvcpucontext
 			getscheduler getvcpuinfo getvcpuextstate getaddrsize
 			getaffinity setaffinity };
-	allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim  set_max_evtchn };
+	allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim set_max_evtchn pqos_op };
 	allow $1 $2:security check_context;
 	allow $1 $2:shadow enable;
 	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op };
diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index bb59fe8..115fcfe 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -64,6 +64,9 @@ allow dom0_t xen_t:xen {
 	getidle debug getcpuinfo heap pm_op mca_op lockprof cpupool_op tmem_op
 	tmem_control getscheduler setscheduler
 };
+allow dom0_t xen_t:xen2 {
+	pqos_op
+};
 allow dom0_t xen_t:mmu memorymap;
 
 # Allow dom0 to use these domctls on itself. For domctls acting on other
@@ -76,7 +79,7 @@ allow dom0_t dom0_t:domain {
 	getpodtarget setpodtarget set_misc_info set_virq_handler
 };
 allow dom0_t dom0_t:domain2 {
-	set_cpuid gettsc settsc setscheduler set_max_evtchn
+	set_cpuid gettsc settsc setscheduler set_max_evtchn pqos_op
 };
 allow dom0_t dom0_t:resource { add remove };
 
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index ba719de..ef557f0 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -735,6 +735,10 @@ static int flask_domctl(struct domain *d, int cmd)
     case XEN_DOMCTL_cacheflush:
         return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__CACHEFLUSH);
 
+    case XEN_DOMCTL_attach_pqos:
+    case XEN_DOMCTL_detach_pqos:
+        return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__PQOS_OP);
+
     default:
         printk("flask_domctl: Unknown op %d\n", cmd);
         return -EPERM;
@@ -790,6 +794,10 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_numainfo:
         return domain_has_xen(current->domain, XEN__PHYSINFO);
 
+    case XEN_SYSCTL_getcqminfo:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__PQOS_OP, NULL);
+
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index a0ed13d..6d2c1e0 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -3,9 +3,9 @@
 #
 # class class_name { permission_name ... }
 
-# Class xen consists of dom0-only operations dealing with the hypervisor itself.
-# Unless otherwise specified, the source is the domain executing the hypercall,
-# and the target is the xen initial sid (type xen_t).
+# Class xen and xen2 consists of dom0-only operations dealing with the
+# hypervisor itself. Unless otherwise specified, the source is the domain
+# executing the hypercall, and the target is the xen initial sid (type xen_t).
 class xen
 {
 # XENPF_settime
@@ -75,6 +75,14 @@ class xen
     setscheduler
 }
 
+# This is a continuation of class xen, since only 32 permissions can be
+# defined per class
+class xen2
+{
+# XEN_SYSCTL_getcqminfo
+    pqos_op
+}
+
 # Classes domain and domain2 consist of operations that a domain performs on
 # another domain or on itself.  Unless otherwise specified, the source is the
 # domain executing the hypercall, and the target is the domain being operated on
@@ -198,6 +206,9 @@ class domain2
     set_max_evtchn
 # XEN_DOMCTL_cacheflush
     cacheflush
+# XEN_DOMCTL_attach_pqos
+# XEN_DOMCTL_detach_pqos
+    pqos_op
 }
 
 # Similar to class domain, but primarily contains domctls related to HVM domains
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc
  2014-03-26  6:35 [PATCH v10 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
                   ` (4 preceding siblings ...)
  2014-03-26  6:35 ` [PATCH v10 5/6] xsm: add platform QoS related xsm policies Dongxiao Xu
@ 2014-03-26  6:35 ` Dongxiao Xu
  2014-03-27 15:13   ` Ian Campbell
  5 siblings, 1 reply; 28+ messages in thread
From: Dongxiao Xu @ 2014-03-26  6:35 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

Introduced two new xl commands to attach/detach CQM service for a guest
$ xl pqos-attach cqm domid
$ xl pqos-detach cqm domid

Introduce one new xl command to retrieve guest CQM information
$ xl pqos-list cqm

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
---
 docs/man/xl.pod.1           |   23 ++++++
 tools/libxc/xc_domain.c     |   41 ++++++++++
 tools/libxc/xenctrl.h       |   13 +++
 tools/libxl/Makefile        |    3 +-
 tools/libxl/libxl.h         |   12 +++
 tools/libxl/libxl_pqos.c    |  190 +++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_types.idl |    9 ++
 tools/libxl/xl.h            |    3 +
 tools/libxl/xl_cmdimpl.c    |  109 +++++++++++++++++++++++++
 tools/libxl/xl_cmdtable.c   |   15 ++++
 10 files changed, 417 insertions(+), 1 deletion(-)
 create mode 100644 tools/libxl/libxl_pqos.c

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index f7ceaa8..daa191a 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1340,6 +1340,29 @@ Load FLASK policy from the given policy file. The initial policy is provided to
 the hypervisor as a multiboot module; this command allows runtime updates to the
 policy. Loading new security policy will reset runtime changes to device labels.
 
+=head1 PLATFORM QOS
+
+New Intel processor may offer monitoring capability in each logical processor to
+measure specific quality-of-service metric, for example, Cache QoS Monitoring to
+get L3 cache occupancy.
+
+=over 4
+
+=item B<pqos-attach> [I<qos-type>] [I<domain-id>]
+
+Attach certain platform QoS service for a domain.
+Current supported I<qos-type> is: "cqm".
+
+=item B<pqos-detach> [I<qos-type>] [I<domain-id>]
+
+Detach certain platform QoS service from a domain.
+Current supported I<qos-type> is: "cqm".
+
+=item B<pqos-list> [I<qos-type>]
+
+List platform QoS information for QoS attached domains.
+Current supported I<qos-type> is: "cqm".
+
 =back
 
 =head1 TO BE DOCUMENTED
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 369c3f3..80f0458 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1814,6 +1814,47 @@ int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid,
     return do_domctl(xch, &domctl);
 }
 
+int xc_domain_pqos_attach(xc_interface *xch, uint32_t domid, uint64_t flags)
+{
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_attach_pqos;
+    domctl.domain = (domid_t)domid;
+    domctl.u.qos_type.flags = flags;
+
+    return do_domctl(xch, &domctl);
+}
+
+int xc_domain_pqos_detach(xc_interface *xch, uint32_t domid, uint64_t flags)
+{
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_detach_pqos;
+    domctl.domain = (domid_t)domid;
+    domctl.u.qos_type.flags = flags;
+
+    return do_domctl(xch, &domctl);
+}
+
+int xc_domain_getcqminfo(xc_interface *xch, xc_cqminfo_t *info)
+{
+    int ret;
+    DECLARE_SYSCTL;
+
+    sysctl.cmd = XEN_SYSCTL_getcqminfo;
+    ret = xc_sysctl(xch, &sysctl);
+    if ( ret >= 0 )
+    {
+        info->socket_l3c_mfn = sysctl.u.getcqminfo.socket_l3c_mfn;
+        info->rmid_dom_mfn = sysctl.u.getcqminfo.rmid_dom_mfn;
+        info->nr_rmids = sysctl.u.getcqminfo.nr_rmids;
+        info->nr_sockets = sysctl.u.getcqminfo.nr_sockets;
+        info->l3c_total = sysctl.u.getcqminfo.l3c_total;
+    }
+
+    return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index e3a32f2..f1e8b9b 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -2424,4 +2424,17 @@ int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
  */
 int xc_kexec_unload(xc_interface *xch, int type);
 
+struct xc_cqminfo
+{
+    uint64_aligned_t socket_l3c_mfn;
+    uint64_aligned_t rmid_dom_mfn;
+    uint32_t nr_rmids;
+    uint32_t nr_sockets;
+    uint32_t l3c_total;
+};
+typedef struct xc_cqminfo xc_cqminfo_t;
+
+int xc_domain_pqos_attach(xc_interface *xch, uint32_t domid, uint64_t flags);
+int xc_domain_pqos_detach(xc_interface *xch, uint32_t domid, uint64_t flags);
+int xc_domain_getcqminfo(xc_interface *xch, xc_cqminfo_t *info);
 #endif /* XENCTRL_H */
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 755b666..7b54720 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -76,7 +76,8 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_internal.o libxl_utils.o libxl_uuid.o \
 			libxl_json.o libxl_aoutils.o libxl_numa.o \
 			libxl_save_callout.o _libxl_save_msgs_callout.o \
-			libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
+			libxl_qmp.o libxl_event.o libxl_fork.o libxl_pqos.o \
+			$(LIBXL_OBJS-y)
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
 
 LIBXL_TESTS += timedereg
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index b2c3015..b330633 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -467,6 +467,13 @@ typedef uint8_t libxl_mac[6];
 #define LIBXL_MAC_FMTLEN ((2*6)+5) /* 6 hex bytes plus 5 colons */
 #define LIBXL_MAC_BYTES(mac) mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]
 
+/*
+ * LIBXL_HAVE_PQOS_CQM
+ *
+ * If this is defined, the CQM feature is suported.
+ */
+#define LIBXL_HAVE_PQOS_CQM 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
@@ -1151,6 +1158,11 @@ int libxl_flask_getenforce(libxl_ctx *ctx);
 int libxl_flask_setenforce(libxl_ctx *ctx, int mode);
 int libxl_flask_loadpolicy(libxl_ctx *ctx, void *policy, uint32_t size);
 
+int libxl_pqos_attach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type qos_type);
+int libxl_pqos_detach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type qos_type);
+libxl_cqminfo * libxl_getcqminfo(libxl_ctx *ctx, unsigned int *l3c_total,
+    unsigned int *nr_domains, unsigned int *nr_rmids, unsigned int *nr_sockets);
+
 /* misc */
 
 /* Each of these sets or clears the flag according to whether the
diff --git a/tools/libxl/libxl_pqos.c b/tools/libxl/libxl_pqos.c
new file mode 100644
index 0000000..d26aa4c
--- /dev/null
+++ b/tools/libxl/libxl_pqos.c
@@ -0,0 +1,190 @@
+/*
+ * Copyright (C) 2014      Intel Corporation
+ * Author Dongxiao Xu <dongxiao.xu@intel.com>
+ * Author Jiongxi Li <jiongxi.li@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+#include "libxl_internal.h"
+
+static const char * const msg[] = {
+    [EINVAL] = "invalid QoS resource type! Supported types: \"cqm\"",
+    [ENODEV] = "CQM is not supported in this system.",
+    [EEXIST] = "CQM is already attached to this domain.",
+    [ENOENT] = "CQM is not attached to this domain.",
+    [EUSERS] = "there is no free CQM RMID available.",
+    [ESRCH]  = "is this Domain ID valid?",
+};
+
+static void libxl_pqos_err_msg(libxl_ctx *ctx, int err)
+{
+    GC_INIT(ctx);
+
+    switch (err) {
+    case EINVAL:
+    case ENODEV:
+    case EEXIST:
+    case EUSERS:
+    case ESRCH:
+    case ENOENT:
+        LOGE(ERROR, "%s", msg[err]);
+        break;
+    default:
+        LOGE(ERROR, "errno: %d", err);
+    }
+
+    GC_FREE;
+}
+
+static int libxl_pqos_type2flags(libxl_pqos_type qos_type, uint64_t *flags)
+{
+    int rc = 0;
+
+    if (qos_type == LIBXL_PQOS_TYPE_CQM)
+        *flags |= XEN_DOMCTL_pqos_cqm;
+    else
+        rc = -1;
+
+    return rc;
+}
+
+int libxl_pqos_attach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type qos_type)
+{
+    int rc;
+    uint64_t flags = 0;
+
+    rc = libxl_pqos_type2flags(qos_type, &flags);
+    if (rc < 0) {
+        libxl_pqos_err_msg(ctx, EINVAL);
+        return ERROR_FAIL;
+    }
+
+    rc = xc_domain_pqos_attach(ctx->xch, domid, flags);
+    if (rc < 0) {
+        libxl_pqos_err_msg(ctx, errno);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+int libxl_pqos_detach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type qos_type)
+{
+    int rc;
+    uint64_t flags = 0;
+
+    rc = libxl_pqos_type2flags(qos_type, &flags);
+    if (rc < 0) {
+        libxl_pqos_err_msg(ctx, EINVAL);
+        return ERROR_FAIL;
+    }
+
+    rc = xc_domain_pqos_detach(ctx->xch, domid, flags);
+    if (rc < 0) {
+        libxl_pqos_err_msg(ctx, errno);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+libxl_cqminfo * libxl_getcqminfo(libxl_ctx *ctx, unsigned int *l3c_total,
+    unsigned int *nr_domains, unsigned int *nr_rmids, unsigned int *nr_sockets)
+{
+    libxl_cqminfo *xl_cqminfo;
+    unsigned long *socket_l3c, *l3c;
+    uint16_t *rmid_to_dom;
+    xc_cqminfo_t xcinfo;
+    int i, j, idx;
+    unsigned int nr_rmids_size, nr_sockets_size;
+    GC_INIT(ctx);
+
+    if (xc_domain_getcqminfo(ctx->xch, &xcinfo) < 0) {
+        LOGE(ERROR, "getting domain cqm info");
+        goto err;
+    }
+
+    rmid_to_dom = xc_map_foreign_range(ctx->xch, DOMID_XEN,
+                      XC_PAGE_SIZE, PROT_READ, xcinfo.rmid_dom_mfn);
+    if (rmid_to_dom == 0) {
+        LOGE(ERROR, "Failed to map cqm rmid_to_dom buffers");
+        goto err;
+    }
+
+    *l3c_total = xcinfo.l3c_total;
+    *nr_sockets = xcinfo.nr_sockets;
+    *nr_rmids = xcinfo.nr_rmids;
+    *nr_domains = 1024;
+
+    nr_sockets_size = xcinfo.nr_sockets * sizeof(unsigned long);
+    socket_l3c = xc_map_foreign_range(ctx->xch, DOMID_XEN,
+                     nr_sockets_size, PROT_READ, xcinfo.socket_l3c_mfn);
+    if (socket_l3c == 0) {
+        LOGE(ERROR, "Failed to map cqm rmid_to_dom buffers");
+        munmap(rmid_to_dom, XC_PAGE_SIZE);
+        goto err;
+    }
+
+    xl_cqminfo = calloc(*nr_domains * xcinfo.nr_sockets,
+                        sizeof(libxl_cqminfo));
+    if (!xl_cqminfo) {
+        LOGE(ERROR, "allocating domain cqm data structure");
+        munmap(socket_l3c, nr_sockets_size);
+        munmap(rmid_to_dom, XC_PAGE_SIZE);
+        goto err;
+    }
+
+    for (i = 0; i < xcinfo.nr_sockets; i++) {
+        nr_rmids_size = xcinfo.nr_rmids * sizeof(unsigned long);
+        l3c = xc_map_foreign_range(ctx->xch, DOMID_XEN,
+                  nr_rmids_size, PROT_READ, socket_l3c[i]);
+        if (l3c == 0) {
+            LOGE(ERROR, "Failed to map cqm rmid_to_dom buffers");
+            free(xl_cqminfo);
+            munmap(socket_l3c, nr_sockets_size);
+            munmap(rmid_to_dom, XC_PAGE_SIZE);
+            goto err;
+        }
+
+        for (j = 0; j < xcinfo.nr_rmids; j++) {
+            if (rmid_to_dom[j] > *nr_domains)
+                continue;
+            idx = i + rmid_to_dom[j] * xcinfo.nr_sockets;
+            xl_cqminfo[idx].valid = 1;
+            xl_cqminfo[idx].l3c = l3c[j];
+        }
+        munmap(l3c, nr_rmids_size);
+    }
+
+    munmap(socket_l3c, nr_sockets_size);
+    munmap(rmid_to_dom, XC_PAGE_SIZE);
+    GC_FREE;
+
+    return xl_cqminfo;
+
+err:
+    *nr_domains = 0;
+    *nr_rmids = 0;
+    *nr_sockets = 0;
+    GC_FREE;
+
+    return NULL;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 612645c..73b9ab3 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -598,3 +598,12 @@ libxl_event = Struct("event",[
                                  ])),
            ("domain_create_console_available", Struct(None, [])),
            ]))])
+
+libxl_pqos_type = Enumeration("pqos_type", [
+    (1, "CQM"),
+    ])
+
+libxl_cqminfo = Struct("cqminfo", [
+    ("valid",         uint32),
+    ("l3c",           uint64),
+    ])
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 10a2e66..9b4f668 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -110,6 +110,9 @@ int main_loadpolicy(int argc, char **argv);
 int main_remus(int argc, char **argv);
 #endif
 int main_devd(int argc, char **argv);
+int main_pqosattach(int argc, char **argv);
+int main_pqosdetach(int argc, char **argv);
+int main_pqoslist(int argc, char **argv);
 
 void help(const char *command);
 
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 8389468..664a34d 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7381,6 +7381,115 @@ out:
     return ret;
 }
 
+int main_pqosattach(int argc, char **argv)
+{
+    uint32_t domid;
+    int opt, rc;
+    libxl_pqos_type qos_type;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-attach", 2) {
+        /* No options */
+    }
+
+    /* libxl_pqos_attach will handle the parameter check. */
+    libxl_pqos_type_from_string(argv[optind], &qos_type);
+    domid = find_domain(argv[optind + 1]);
+
+    rc = libxl_pqos_attach(ctx, domid, qos_type);
+
+    return rc;
+}
+
+int main_pqosdetach(int argc, char **argv)
+{
+    uint32_t domid;
+    int opt, rc;
+    libxl_pqos_type qos_type;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-detach", 2) {
+        /* No options */
+    }
+
+    /* libxl_pqos_detach will handle the parameter check. */
+    libxl_pqos_type_from_string(argv[optind], &qos_type);
+    domid = find_domain(argv[optind + 1]);
+
+    rc = libxl_pqos_detach(ctx, domid, qos_type);
+
+    return rc;
+}
+
+static void print_cqm_info(libxl_cqminfo *info, unsigned int l3c_total,
+    unsigned int nr_domains, unsigned int nr_rmids, unsigned int nr_sockets)
+{
+    unsigned int i, j;
+    char *domname;
+    int print_header;
+    int used_rmid = 0;
+
+    if (nr_rmids == 0) {
+        printf("System doesn't support CQM.\n");
+        return;
+    }
+
+    print_header = 1;
+    for (i = 0; i < nr_domains; i++) {
+        if (!info[i * nr_sockets].valid)
+            continue;
+        if (print_header) {
+            printf("Name                                        ID");
+            for (j = 0; j < nr_sockets; j++)
+                printf("\tSocketID\tL3C_Usage");
+            print_header = 0;
+        }
+
+        domname = libxl_domid_to_name(ctx, i);
+        printf("\n%-40s %5d", domname, i);
+        free(domname);
+        for (j = 0; j < nr_sockets; j++) {
+            printf("%10u %13lu KB     ", j, info[i * nr_sockets + j].l3c/1024);
+        }
+        used_rmid++;
+    }
+    if (!used_rmid)
+        printf("No RMID is assigned to domains.\n");
+    else
+        printf("\n");
+
+    printf("\nSocket L3 Cache Size: %d KB\n", l3c_total);
+    printf("RMID count %5d\tRMID available %5d\n",
+           nr_rmids, nr_rmids - used_rmid - 1);
+}
+
+int main_pqoslist(int argc, char **argv)
+{
+    int opt;
+    const char *qos_type = NULL;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-list", 1) {
+        /* No options */
+    }
+
+    qos_type = argv[optind];
+
+    if (!strcmp(qos_type, "cqm")) {
+        libxl_cqminfo *info;
+        unsigned int l3c_total, nr_domains, nr_rmids, nr_sockets;
+        info = libxl_getcqminfo(ctx, &l3c_total,
+                                &nr_domains, &nr_rmids, &nr_sockets);
+        if (!info)
+            return 1;
+        print_cqm_info(info, l3c_total, nr_domains, nr_rmids, nr_sockets);
+        free(info);
+    } else {
+        fprintf(stderr, "QoS resource type supported is: cqm.\n");
+        help("pqos-list");
+        return 2;
+    }
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index e8ab93a..0c45427 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -497,6 +497,21 @@ struct cmd_spec cmd_table[] = {
       "[options]",
       "-F                      Run in the foreground",
     },
+    { "pqos-attach",
+      &main_pqosattach, 0, 1,
+      "Allocate and map qos resources",
+      "<Resource> <Domain>",
+    },
+    { "pqos-detach",
+      &main_pqosdetach, 0, 1,
+      "Relinquish qos resources",
+      "<Resource> <Domain>",
+    },
+    { "pqos-list",
+      &main_pqoslist, 0, 0,
+      "List qos information for all domains",
+      "<Resource>",
+    },
 };
 
 int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc
  2014-03-26  6:35 ` [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc Dongxiao Xu
@ 2014-03-27 15:13   ` Ian Campbell
  2014-03-28  0:53     ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Ian Campbell @ 2014-03-27 15:13 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, stefano.stabellini, andrew.cooper3, Ian.Jackson, xen-devel,
	JBeulich, dgdegra

On Wed, 2014-03-26 at 14:35 +0800, Dongxiao Xu wrote:
[...]
> +int libxl_pqos_attach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type qos_type);
> +int libxl_pqos_detach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type qos_type);
> +libxl_cqminfo * libxl_getcqminfo(libxl_ctx *ctx, unsigned int *l3c_total,
> +    unsigned int *nr_domains, unsigned int *nr_rmids, unsigned int *nr_sockets);
[...]
> +libxl_cqminfo = Struct("cqminfo", [
> +    ("valid",         uint32),
> +    ("l3c",           uint64),
> +    ])

libxl_getcqminfo seems to have a strange mixture of returning an info
struct and returning things via pointers passed as integers. why is
that?

> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 8389468..664a34d 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -7381,6 +7381,115 @@ out:
>      return ret;
>  }
>  
> +int main_pqosattach(int argc, char **argv)
> +{
> +    uint32_t domid;
> +    int opt, rc;
> +    libxl_pqos_type qos_type;
> +
> +    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-attach", 2) {
> +        /* No options */
> +    }
> +
> +    /* libxl_pqos_attach will handle the parameter check. */
> +    libxl_pqos_type_from_string(argv[optind], &qos_type);

This can fail, I think you need to check the return value.

> +    domid = find_domain(argv[optind + 1]);
> +
> +    rc = libxl_pqos_attach(ctx, domid, qos_type);
> +
> +    return rc;
> +}
> +
> +int main_pqosdetach(int argc, char **argv)
> +{
> +    uint32_t domid;
> +    int opt, rc;
> +    libxl_pqos_type qos_type;
> +
> +    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-detach", 2) {
> +        /* No options */
> +    }
> +
> +    /* libxl_pqos_detach will handle the parameter check. */
> +    libxl_pqos_type_from_string(argv[optind], &qos_type);

Same again.

Ian.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc
  2014-03-27 15:13   ` Ian Campbell
@ 2014-03-28  0:53     ` Xu, Dongxiao
  2014-03-28  9:43       ` Ian Campbell
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-03-28  0:53 UTC (permalink / raw)
  To: Ian Campbell
  Cc: keir, stefano.stabellini, andrew.cooper3, Ian.Jackson, xen-devel,
	JBeulich, dgdegra

> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> Sent: Thursday, March 27, 2014 11:14 PM
> To: Xu, Dongxiao
> Cc: xen-devel@lists.xen.org; keir@xen.org; JBeulich@suse.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> andrew.cooper3@citrix.com; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov
> Subject: Re: [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for
> libxl/libxc
> 
> On Wed, 2014-03-26 at 14:35 +0800, Dongxiao Xu wrote:
> [...]
> > +int libxl_pqos_attach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type
> qos_type);
> > +int libxl_pqos_detach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type
> qos_type);
> > +libxl_cqminfo * libxl_getcqminfo(libxl_ctx *ctx, unsigned int *l3c_total,
> > +    unsigned int *nr_domains, unsigned int *nr_rmids, unsigned int
> *nr_sockets);
> [...]
> > +libxl_cqminfo = Struct("cqminfo", [
> > +    ("valid",         uint32),
> > +    ("l3c",           uint64),
> > +    ])
> 
> libxl_getcqminfo seems to have a strange mixture of returning an info
> struct and returning things via pointers passed as integers. why is
> that?

The returned info is a unit data structure containing the L3 cache usage data for certain RMID on certain socket.
These pointers passed integers tell caller the number of such units are returned.

This follows one existing function:
libxl_dominfo * libxl_list_domain(libxl_ctx *ctx, int *nb_domain_out);
where the return value is the unit dominfo data structure, and *nb_domain_out tells caller how many data structure units are returned.

> 
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 8389468..664a34d 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -7381,6 +7381,115 @@ out:
> >      return ret;
> >  }
> >
> > +int main_pqosattach(int argc, char **argv)
> > +{
> > +    uint32_t domid;
> > +    int opt, rc;
> > +    libxl_pqos_type qos_type;
> > +
> > +    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-attach", 2) {
> > +        /* No options */
> > +    }
> > +
> > +    /* libxl_pqos_attach will handle the parameter check. */
> > +    libxl_pqos_type_from_string(argv[optind], &qos_type);
> 
> This can fail, I think you need to check the return value.

Yes, I am aware of that.
The following libxl_pqos_attach() function will check the parameter correctness.
Actually I put a comment in above (/* libxl_pqos_attach will handle the parameter check. */).

Thanks,
Dongxiao

> 
> > +    domid = find_domain(argv[optind + 1]);
> > +
> > +    rc = libxl_pqos_attach(ctx, domid, qos_type);
> > +
> > +    return rc;
> > +}
> > +
> > +int main_pqosdetach(int argc, char **argv)
> > +{
> > +    uint32_t domid;
> > +    int opt, rc;
> > +    libxl_pqos_type qos_type;
> > +
> > +    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-detach", 2) {
> > +        /* No options */
> > +    }
> > +
> > +    /* libxl_pqos_detach will handle the parameter check. */
> > +    libxl_pqos_type_from_string(argv[optind], &qos_type);
> 
> Same again.
> 
> Ian.
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc
  2014-03-28  0:53     ` Xu, Dongxiao
@ 2014-03-28  9:43       ` Ian Campbell
  2014-04-03 11:31         ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Ian Campbell @ 2014-03-28  9:43 UTC (permalink / raw)
  To: Xu, Dongxiao
  Cc: keir, stefano.stabellini, andrew.cooper3, Ian.Jackson, xen-devel,
	JBeulich, dgdegra

On Fri, 2014-03-28 at 00:53 +0000, Xu, Dongxiao wrote:
> > -----Original Message-----
> > From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> > Sent: Thursday, March 27, 2014 11:14 PM
> > To: Xu, Dongxiao
> > Cc: xen-devel@lists.xen.org; keir@xen.org; JBeulich@suse.com;
> > Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> > andrew.cooper3@citrix.com; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov
> > Subject: Re: [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for
> > libxl/libxc
> > 
> > On Wed, 2014-03-26 at 14:35 +0800, Dongxiao Xu wrote:
> > [...]
> > > +int libxl_pqos_attach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type
> > qos_type);
> > > +int libxl_pqos_detach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type
> > qos_type);
> > > +libxl_cqminfo * libxl_getcqminfo(libxl_ctx *ctx, unsigned int *l3c_total,
> > > +    unsigned int *nr_domains, unsigned int *nr_rmids, unsigned int
> > *nr_sockets);
> > [...]
> > > +libxl_cqminfo = Struct("cqminfo", [
> > > +    ("valid",         uint32),
> > > +    ("l3c",           uint64),
> > > +    ])
> > 
> > libxl_getcqminfo seems to have a strange mixture of returning an info
> > struct and returning things via pointers passed as integers. why is
> > that?
> 
> The returned info is a unit data structure containing the L3 cache usage data for certain RMID on certain socket.
> These pointers passed integers tell caller the number of such units are returned.
> 
> This follows one existing function:
> libxl_dominfo * libxl_list_domain(libxl_ctx *ctx, int *nb_domain_out);
> where the return value is the unit dominfo data structure, and *nb_domain_out tells caller how many data structure units are returned.

But libxl_getqminfo has three such nr_* pointers as parameters, which
one is the length of the returned array? What are the other two? How
does one enumerate the things which the other nr_* refer to?

> > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > > index 8389468..664a34d 100644
> > > --- a/tools/libxl/xl_cmdimpl.c
> > > +++ b/tools/libxl/xl_cmdimpl.c
> > > @@ -7381,6 +7381,115 @@ out:
> > >      return ret;
> > >  }
> > >
> > > +int main_pqosattach(int argc, char **argv)
> > > +{
> > > +    uint32_t domid;
> > > +    int opt, rc;
> > > +    libxl_pqos_type qos_type;
> > > +
> > > +    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-attach", 2) {
> > > +        /* No options */
> > > +    }
> > > +
> > > +    /* libxl_pqos_attach will handle the parameter check. */
> > > +    libxl_pqos_type_from_string(argv[optind], &qos_type);
> > 
> > This can fail, I think you need to check the return value.
> 
> Yes, I am aware of that.
> The following libxl_pqos_attach() function will check the parameter correctness.
> Actually I put a comment in above (/* libxl_pqos_attach will handle the parameter check. */).

If libxl_pqos_type_from_string fails then qos_type remains undefined, so
you can't rely on some later function checking for correctness, the
undefined value may happen to appear valid.

Ian.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-26  6:35 ` [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature Dongxiao Xu
@ 2014-03-31 14:21   ` Jan Beulich
  2014-03-31 15:33     ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-03-31 14:21 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

>>> On 26.03.14 at 07:35, <dongxiao.xu@intel.com> wrote:
> Detect platform QoS feature status and enumerate the resource types,
> one of which is to monitor the L3 cache occupancy.
> 
> Also introduce a Xen grub command line parameter to control the
> QoS feature status.
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Is that really correct considering the re-work?

> +static int cqm_add_socket(int socket)

unsigned int.

> +{
> +    unsigned int i, order;
> +
> +    if ( cqm->l3c[socket] )
> +        return 0;
> +
> +    /* Allocate per-socket CQM LLC buffer */
> +    order = get_order_from_bytes((cqm->rmid_max + 1) * sizeof(unsigned long));
> +    cqm->l3c[socket] = alloc_xenheap_pages(order, 0);

This is not in an __init function, hence you should try hard to avoid
order-non-0 allocations (I think I said this in reply to an earlier revision
already). Nor is it really clear to me why you need Xen heap pages
here.

> +static int cpu_callback(
> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
> +{
> +    unsigned int cpu = (unsigned long)hcpu;
> +    int socket = cpu_to_socket(cpu);
> +
> +    if ( socket < 0 )
> +        return NOTIFY_DONE;
> +
> +    switch ( action )
> +    {
> +    case CPU_ONLINE:
> +        cqm_add_socket(socket);
> +        break;
> +    case CPU_DEAD:
> +        if ( !cpumask_weight(per_cpu(cpu_core_mask, cpu)) )
> +            cqm_remove_socket(socket);
> +        break;
> +    default:
> +        break;

Pointless default case.

> +void __init init_cqm(unsigned int rmid_max, unsigned long rmid_mask)
> +{
> +    unsigned int rmid, cpu, socket;
> +    unsigned int eax, edx;
> +    unsigned int i, order = 0;
> +
> +    if ( !rmid_max )
> +        return;
> +
> +    cqm = xzalloc(struct pqos_cqm);
> +    if ( !cqm )
> +        return;
> +
> +    cpuid_count(0xf, 1, &eax, &cqm->upscaling_factor, &cqm->rmid_max, &edx);
> +    if ( !(edx & QOS_MONITOR_EVTID_L3) )
> +        goto out;
> +
> +    cqm->rmid_mask = rmid_mask;
> +    cqm->rmid_inuse = 0;
> +    cqm->rmid_min = 1;
> +    cqm->rmid_max = min(rmid_max, cqm->rmid_max);
> +
> +    spin_lock_init(&cqm->cqm_lock);
> +
> +    /* According to Intel SDM, the possible maximum rmid number is 2^10 = 1024,
> +     * thus one page is enough to hold cqm->rmid_to_dom structure */
> +    cqm->rmid_to_dom = alloc_xenheap_page();

But please validate that this condition is being met (not necessarily here,
perhaps rather where rmid_max gets determined).

> +    if ( !cqm->rmid_to_dom )
> +        goto out;
> +    /* Reserve RMID 0 for all domains not being monitored */
> +    cqm->rmid_to_dom[0] = DOMID_XEN;
> +    for ( rmid = 1; rmid < PAGE_SIZE/sizeof(domid_t); rmid++ )
> +        cqm->rmid_to_dom[rmid] = DOMID_INVALID;
> +    /* Dom0 tool stack needs to know the RMID to DOMID mapping */
> +    share_xen_page_with_privileged_guests(
> +        virt_to_page((void *)cqm->rmid_to_dom), XENSHARE_readonly);

Do you really need this bogus cast?

> +
> +    /* Allocate the buffer that holds MFNs of per-socket CQM LLC */
> +    order = get_order_from_bytes(sizeof(unsigned long) * NR_CPUS);
> +    cqm->socket_l3c_mfn = alloc_xenheap_pages(order, 0);

You index this buffer by socket ID, yet socket ID values aren't
necessarily linear/contiguous, and hence I don't think there are
really any guarantees that the ID would be less than NR_CPUS. For
the n-th time: I don't think you'll get around doing this properly, and
I still think sufficient data should be retrievable from ACPI to
determine a proper upper bound here.

> +    if ( !cqm->socket_l3c_mfn )
> +        goto out;
> +    memset(cqm->socket_l3c_mfn, 0, PAGE_SIZE * (1 << order));

PAGE_SIZE << order

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-31 14:21   ` Jan Beulich
@ 2014-03-31 15:33     ` Xu, Dongxiao
  2014-03-31 15:42       ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-03-31 15:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, March 31, 2014 10:22 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> keir@xen.org
> Subject: Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring
> feature
> 
> >>> On 26.03.14 at 07:35, <dongxiao.xu@intel.com> wrote:
> > Detect platform QoS feature status and enumerate the resource types,
> > one of which is to monitor the L3 cache occupancy.
> >
> > Also introduce a Xen grub command line parameter to control the
> > QoS feature status.
> >
> > Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Is that really correct considering the re-work?

Will remove this.

> 
> > +static int cqm_add_socket(int socket)
> 
> unsigned int.

This function returns negative return values in error case.

> 
> > +{
> > +    unsigned int i, order;
> > +
> > +    if ( cqm->l3c[socket] )
> > +        return 0;
> > +
> > +    /* Allocate per-socket CQM LLC buffer */
> > +    order = get_order_from_bytes((cqm->rmid_max + 1) * sizeof(unsigned
> long));
> > +    cqm->l3c[socket] = alloc_xenheap_pages(order, 0);
> 
> This is not in an __init function, hence you should try hard to avoid
> order-non-0 allocations (I think I said this in reply to an earlier revision
> already). Nor is it really clear to me why you need Xen heap pages
> here.

Previously in v9 discussion, we (Andrew, you and me) agreed on an approach to allocate per-socket CQM LLC buffer.
Here we need to allocate the CQM LLC buffer for a new added processor to an empty socket.
Yes, I agree those non-zero order allocation should try best to avoided, actually the above condition only triggers when one physical processor is hot added, I think it is rare. Besides, even this allocation is failed, it will not impact others. (the cpu_callback() function will always return NOTIFY_DONE).

> 
> > +static int cpu_callback(
> > +    struct notifier_block *nfb, unsigned long action, void *hcpu)
> > +{
> > +    unsigned int cpu = (unsigned long)hcpu;
> > +    int socket = cpu_to_socket(cpu);
> > +
> > +    if ( socket < 0 )
> > +        return NOTIFY_DONE;
> > +
> > +    switch ( action )
> > +    {
> > +    case CPU_ONLINE:
> > +        cqm_add_socket(socket);
> > +        break;
> > +    case CPU_DEAD:
> > +        if ( !cpumask_weight(per_cpu(cpu_core_mask, cpu)) )
> > +            cqm_remove_socket(socket);
> > +        break;
> > +    default:
> > +        break;
> 
> Pointless default case.

I will remove this default in my code.
Need to mention that, there are such "default case" in most of cpu_callback() functions in current Xen code base.

> 
> > +void __init init_cqm(unsigned int rmid_max, unsigned long rmid_mask)
> > +{
> > +    unsigned int rmid, cpu, socket;
> > +    unsigned int eax, edx;
> > +    unsigned int i, order = 0;
> > +
> > +    if ( !rmid_max )
> > +        return;
> > +
> > +    cqm = xzalloc(struct pqos_cqm);
> > +    if ( !cqm )
> > +        return;
> > +
> > +    cpuid_count(0xf, 1, &eax, &cqm->upscaling_factor, &cqm->rmid_max,
> &edx);
> > +    if ( !(edx & QOS_MONITOR_EVTID_L3) )
> > +        goto out;
> > +
> > +    cqm->rmid_mask = rmid_mask;
> > +    cqm->rmid_inuse = 0;
> > +    cqm->rmid_min = 1;
> > +    cqm->rmid_max = min(rmid_max, cqm->rmid_max);
> > +
> > +    spin_lock_init(&cqm->cqm_lock);
> > +
> > +    /* According to Intel SDM, the possible maximum rmid number is 2^10 =
> 1024,
> > +     * thus one page is enough to hold cqm->rmid_to_dom structure */
> > +    cqm->rmid_to_dom = alloc_xenheap_page();
> 
> But please validate that this condition is being met (not necessarily here,
> perhaps rather where rmid_max gets determined).

Okay, will add one ASSERT() to validate this condition.

> 
> > +    if ( !cqm->rmid_to_dom )
> > +        goto out;
> > +    /* Reserve RMID 0 for all domains not being monitored */
> > +    cqm->rmid_to_dom[0] = DOMID_XEN;
> > +    for ( rmid = 1; rmid < PAGE_SIZE/sizeof(domid_t); rmid++ )
> > +        cqm->rmid_to_dom[rmid] = DOMID_INVALID;
> > +    /* Dom0 tool stack needs to know the RMID to DOMID mapping */
> > +    share_xen_page_with_privileged_guests(
> > +        virt_to_page((void *)cqm->rmid_to_dom), XENSHARE_readonly);
> 
> Do you really need this bogus cast?

Will remove it.

> 
> > +
> > +    /* Allocate the buffer that holds MFNs of per-socket CQM LLC */
> > +    order = get_order_from_bytes(sizeof(unsigned long) * NR_CPUS);
> > +    cqm->socket_l3c_mfn = alloc_xenheap_pages(order, 0);
> 
> You index this buffer by socket ID, yet socket ID values aren't
> necessarily linear/contiguous, and hence I don't think there are
> really any guarantees that the ID would be less than NR_CPUS. For
> the n-th time: I don't think you'll get around doing this properly, and
> I still think sufficient data should be retrievable from ACPI to
> determine a proper upper bound here.

I think we need to assume socket ID is less than NR_CPUS, otherwise many of current Xen's code is wrong, say credit scheduler 2.
xen/common/sched_credit2.c: init_pcpu()
	rqi = cpu_to_socket(cpu);
	rqd=prv->rqd + rqi;
Here the priv->rqd is defined as:
	struct csched_runqueue_data rqd[NR_CPUS];

For getting socket info from ACPI, do you mean ACPI MADT table?
It can enumerate the existing APIC structures, which based on the fact that the CPU is already plugged in the socket.
Per my understanding, I don't think it can enumerate empty CPU sockets.

Can you help to elaborate more about how to get all socket number from ACPI?

> 
> > +    if ( !cqm->socket_l3c_mfn )
> > +        goto out;
> > +    memset(cqm->socket_l3c_mfn, 0, PAGE_SIZE * (1 << order));
> 
> PAGE_SIZE << order

Okay.

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-31 15:33     ` Xu, Dongxiao
@ 2014-03-31 15:42       ` Jan Beulich
  2014-04-02  2:17         ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-03-31 15:42 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

>>> On 31.03.14 at 17:33, <dongxiao.xu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 26.03.14 at 07:35, <dongxiao.xu@intel.com> wrote:
>> > +static int cqm_add_socket(int socket)
>> 
>> unsigned int.
> 
> This function returns negative return values in error case.

Oh, sorry, this wasn't precise enough - I meant this for the parameter.

>> > +    /* According to Intel SDM, the possible maximum rmid number is 2^10 = 1024,
>> > +     * thus one page is enough to hold cqm->rmid_to_dom structure */
>> > +    cqm->rmid_to_dom = alloc_xenheap_page();
>> 
>> But please validate that this condition is being met (not necessarily here,
>> perhaps rather where rmid_max gets determined).
> 
> Okay, will add one ASSERT() to validate this condition.

An ASSERT() is the wrong thing here - non-debug builds would then
happily continue. Just check the value to be in range, and stay away
from enabling cache QoS if it's not.

>> > +
>> > +    /* Allocate the buffer that holds MFNs of per-socket CQM LLC */
>> > +    order = get_order_from_bytes(sizeof(unsigned long) * NR_CPUS);
>> > +    cqm->socket_l3c_mfn = alloc_xenheap_pages(order, 0);
>> 
>> You index this buffer by socket ID, yet socket ID values aren't
>> necessarily linear/contiguous, and hence I don't think there are
>> really any guarantees that the ID would be less than NR_CPUS. For
>> the n-th time: I don't think you'll get around doing this properly, and
>> I still think sufficient data should be retrievable from ACPI to
>> determine a proper upper bound here.
> 
> I think we need to assume socket ID is less than NR_CPUS, otherwise many of 
> current Xen's code is wrong, say credit scheduler 2.
> xen/common/sched_credit2.c: init_pcpu()
> 	rqi = cpu_to_socket(cpu);
> 	rqd=prv->rqd + rqi;
> Here the priv->rqd is defined as:
> 	struct csched_runqueue_data rqd[NR_CPUS];

Bad precedents are never a reason to introduce more problems.

> For getting socket info from ACPI, do you mean ACPI MADT table?
> It can enumerate the existing APIC structures, which based on the fact that 
> the CPU is already plugged in the socket.
> Per my understanding, I don't think it can enumerate empty CPU sockets.
> 
> Can you help to elaborate more about how to get all socket number from ACPI?

Iirc disabled CPUs (i.e. hot-pluggable ones) are still listed in MADT,
including their LAPIC ID (which the socket number is being calculated
from).

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-31 15:42       ` Jan Beulich
@ 2014-04-02  2:17         ` Xu, Dongxiao
  2014-04-02  8:36           ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-04-02  2:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, March 31, 2014 11:42 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> keir@xen.org
> Subject: RE: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring
> feature
> 
> >>> On 31.03.14 at 17:33, <dongxiao.xu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 26.03.14 at 07:35, <dongxiao.xu@intel.com> wrote:
> >> > +static int cqm_add_socket(int socket)
> >>
> >> unsigned int.
> >
> > This function returns negative return values in error case.
> 
> Oh, sorry, this wasn't precise enough - I meant this for the parameter.
> 
> >> > +    /* According to Intel SDM, the possible maximum rmid number is 2^10
> = 1024,
> >> > +     * thus one page is enough to hold cqm->rmid_to_dom structure */
> >> > +    cqm->rmid_to_dom = alloc_xenheap_page();
> >>
> >> But please validate that this condition is being met (not necessarily here,
> >> perhaps rather where rmid_max gets determined).
> >
> > Okay, will add one ASSERT() to validate this condition.
> 
> An ASSERT() is the wrong thing here - non-debug builds would then
> happily continue. Just check the value to be in range, and stay away
> from enabling cache QoS if it's not.
> 
> >> > +
> >> > +    /* Allocate the buffer that holds MFNs of per-socket CQM LLC */
> >> > +    order = get_order_from_bytes(sizeof(unsigned long) * NR_CPUS);
> >> > +    cqm->socket_l3c_mfn = alloc_xenheap_pages(order, 0);
> >>
> >> You index this buffer by socket ID, yet socket ID values aren't
> >> necessarily linear/contiguous, and hence I don't think there are
> >> really any guarantees that the ID would be less than NR_CPUS. For
> >> the n-th time: I don't think you'll get around doing this properly, and
> >> I still think sufficient data should be retrievable from ACPI to
> >> determine a proper upper bound here.
> >
> > I think we need to assume socket ID is less than NR_CPUS, otherwise many of
> > current Xen's code is wrong, say credit scheduler 2.
> > xen/common/sched_credit2.c: init_pcpu()
> > 	rqi = cpu_to_socket(cpu);
> > 	rqd=prv->rqd + rqi;
> > Here the priv->rqd is defined as:
> > 	struct csched_runqueue_data rqd[NR_CPUS];
> 
> Bad precedents are never a reason to introduce more problems.
> 
> > For getting socket info from ACPI, do you mean ACPI MADT table?
> > It can enumerate the existing APIC structures, which based on the fact that
> > the CPU is already plugged in the socket.
> > Per my understanding, I don't think it can enumerate empty CPU sockets.
> >
> > Can you help to elaborate more about how to get all socket number from ACPI?
> 
> Iirc disabled CPUs (i.e. hot-pluggable ones) are still listed in MADT,
> including their LAPIC ID (which the socket number is being calculated
> from).

I dumped one MADT table from the system, and following is one piece from it:
If the CPU is there, then APIC ID is shown as a valid value, while CPU is not there, then APIC ID is shown as 0xFF.
I know we can get socket info if CPU is there (by certain calculation), but if CPU is not there, could you help to explain how can we get the socket information from it?

...

[1E4h 0484  1]                Subtable Type : 00 <Processor Local APIC>
[1E5h 0485  1]                       Length : 08
[1E6h 0486  1]                 Processor ID : 3D
[1E7h 0487  1]                Local Apic ID : 3D
[1E8h 0488  4]        Flags (decoded below) : 00000001
                          Processor Enabled : 1

...

[1ECh 0492  1]                Subtable Type : 00 <Processor Local APIC>
[1EDh 0493  1]                       Length : 08
[1EEh 0494  1]                 Processor ID : FF
[1EFh 0495  1]                Local Apic ID : FF
[1F0h 0496  4]        Flags (decoded below) : 00000000
                          Processor Enabled : 0
...

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-04-02  2:17         ` Xu, Dongxiao
@ 2014-04-02  8:36           ` Jan Beulich
  2014-04-02 12:40             ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-04-02  8:36 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

>>> On 02.04.14 at 04:17, <dongxiao.xu@intel.com> wrote:
>> Iirc disabled CPUs (i.e. hot-pluggable ones) are still listed in MADT,
>> including their LAPIC ID (which the socket number is being calculated
>> from).
> 
> I dumped one MADT table from the system, and following is one piece from it:
> If the CPU is there, then APIC ID is shown as a valid value, while CPU is 
> not there, then APIC ID is shown as 0xFF.
> I know we can get socket info if CPU is there (by certain calculation), but 
> if CPU is not there, could you help to explain how can we get the socket 
> information from it?

Now a fundamental question is whether this was on a system where
CPUs are actually hotpluggable - it's quite common for APIC IDs to be
set to invalid values when the firmware just allocates a large table
and only fills (and marks enabled) the slots for existing CPUs.

In any event with sparse APIC ID allocations like this

ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled)
ACPI: LAPIC (acpi_id[0x12] lapic_id[0x22] enabled)
ACPI: LAPIC (acpi_id[0x14] lapic_id[0x24] enabled)
ACPI: LAPIC (acpi_id[0x16] lapic_id[0x26] enabled)

making assumptions about the socket ID range being a subset of
the NR_CPUS one is wrong - you'll either need some data structure
other than a flat array to deal with this, or at least dimension the
thing using MAX_APICS (but you'll then realize that you end up
with an even larger array, i.e. it'll become less acceptable). And
yes, other users making invalid assumptions on the socket ID
ranges should be fixed as well (i.e. likely you would want a cleanup
patch first, and then put the your QoS stuff on top).

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-04-02  8:36           ` Jan Beulich
@ 2014-04-02 12:40             ` Xu, Dongxiao
  2014-04-02 14:48               ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-04-02 12:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, April 02, 2014 4:37 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> keir@xen.org
> Subject: RE: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring
> feature
> 
> >>> On 02.04.14 at 04:17, <dongxiao.xu@intel.com> wrote:
> >> Iirc disabled CPUs (i.e. hot-pluggable ones) are still listed in MADT,
> >> including their LAPIC ID (which the socket number is being calculated
> >> from).
> >
> > I dumped one MADT table from the system, and following is one piece from it:
> > If the CPU is there, then APIC ID is shown as a valid value, while CPU is
> > not there, then APIC ID is shown as 0xFF.
> > I know we can get socket info if CPU is there (by certain calculation), but
> > if CPU is not there, could you help to explain how can we get the socket
> > information from it?
> 
> Now a fundamental question is whether this was on a system where
> CPUs are actually hotpluggable - it's quite common for APIC IDs to be
> set to invalid values when the firmware just allocates a large table
> and only fills (and marks enabled) the slots for existing CPUs.
> 
> In any event with sparse APIC ID allocations like this
> 
> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
> ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
> ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
> ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled)
> ACPI: LAPIC (acpi_id[0x12] lapic_id[0x22] enabled)
> ACPI: LAPIC (acpi_id[0x14] lapic_id[0x24] enabled)
> ACPI: LAPIC (acpi_id[0x16] lapic_id[0x26] enabled)
> 
> making assumptions about the socket ID range being a subset of
> the NR_CPUS one is wrong - you'll either need some data structure
> other than a flat array to deal with this, or at least dimension the
> thing using MAX_APICS (but you'll then realize that you end up
> with an even larger array, i.e. it'll become less acceptable). And
> yes, other users making invalid assumptions on the socket ID
> ranges should be fixed as well (i.e. likely you would want a cleanup
> patch first, and then put the your QoS stuff on top).

>From your above explanation, it seems that we still cannot get exact socket number from the MADT table, what we can get is the possible CPU numbers, which may also be a very big value.
As you mentioned, NR_CPUS may not able to cover the socket_id, but it can surely cover the socket number, so the following allocation code is OK from correctness point of view.
Do you agree? I think it doesn't differ a lot compared with getting the upper value from ACPI MADT table.

> +    /* Allocate the buffer that holds MFNs of per-socket CQM LLC */
> +    order = get_order_from_bytes(sizeof(unsigned long) * NR_CPUS);
> +    cqm->socket_l3c_mfn = alloc_xenheap_pages(order, 0);

But for the socket_id issue, I will introduce another dimension in the CQM buffer data structure in my code in case socket_id may be bigger than NR_CPUS.
	
As we implemented the dynamic CQM buffer memory allocation in this v10 patch, when hot plug a new CPU into a socket, we need to dynamically allocate certain continuous memory for it, which might fail. But the case is very rare and even the allocation failure will not impact other components.

Is it acceptable for you?

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-04-02 12:40             ` Xu, Dongxiao
@ 2014-04-02 14:48               ` Jan Beulich
  2014-04-03  3:54                 ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-04-02 14:48 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

>>> On 02.04.14 at 14:40, <dongxiao.xu@intel.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Wednesday, April 02, 2014 4:37 PM
>> To: Xu, Dongxiao
>> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
>> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
>> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
>> keir@xen.org 
>> Subject: RE: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring
>> feature
>> 
>> >>> On 02.04.14 at 04:17, <dongxiao.xu@intel.com> wrote:
>> >> Iirc disabled CPUs (i.e. hot-pluggable ones) are still listed in MADT,
>> >> including their LAPIC ID (which the socket number is being calculated
>> >> from).
>> >
>> > I dumped one MADT table from the system, and following is one piece from 
> it:
>> > If the CPU is there, then APIC ID is shown as a valid value, while CPU is
>> > not there, then APIC ID is shown as 0xFF.
>> > I know we can get socket info if CPU is there (by certain calculation), but
>> > if CPU is not there, could you help to explain how can we get the socket
>> > information from it?
>> 
>> Now a fundamental question is whether this was on a system where
>> CPUs are actually hotpluggable - it's quite common for APIC IDs to be
>> set to invalid values when the firmware just allocates a large table
>> and only fills (and marks enabled) the slots for existing CPUs.
>> 
>> In any event with sparse APIC ID allocations like this
>> 
>> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
>> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
>> ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
>> ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
>> ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled)
>> ACPI: LAPIC (acpi_id[0x12] lapic_id[0x22] enabled)
>> ACPI: LAPIC (acpi_id[0x14] lapic_id[0x24] enabled)
>> ACPI: LAPIC (acpi_id[0x16] lapic_id[0x26] enabled)
>> 
>> making assumptions about the socket ID range being a subset of
>> the NR_CPUS one is wrong - you'll either need some data structure
>> other than a flat array to deal with this, or at least dimension the
>> thing using MAX_APICS (but you'll then realize that you end up
>> with an even larger array, i.e. it'll become less acceptable). And
>> yes, other users making invalid assumptions on the socket ID
>> ranges should be fixed as well (i.e. likely you would want a cleanup
>> patch first, and then put the your QoS stuff on top).
> 
> From your above explanation, it seems that we still cannot get exact socket 
> number from the MADT table,

Why not? The socket IDs are derived from the LAPIC IDs, and as long
as you have the full set of the latter, you can also determine the full
set of the former.

> what we can get is the possible CPU numbers, 
> which may also be a very big value.
> As you mentioned, NR_CPUS may not able to cover the socket_id, but it can 
> surely cover the socket number, so the following allocation code is OK from 
> correctness point of view.

That depends on what you intend to use for indexing the array.

> Do you agree? I think it doesn't differ a lot compared with getting the 
> upper value from ACPI MADT table.
> 
>> +    /* Allocate the buffer that holds MFNs of per-socket CQM LLC */
>> +    order = get_order_from_bytes(sizeof(unsigned long) * NR_CPUS);
>> +    cqm->socket_l3c_mfn = alloc_xenheap_pages(order, 0);
> 
> But for the socket_id issue, I will introduce another dimension in the CQM 
> buffer data structure in my code in case socket_id may be bigger than 
> NR_CPUS.

Another dimension for something that can reasonably sparse? I
was thinking of using e.g. a radix tree instead.

> As we implemented the dynamic CQM buffer memory allocation in this v10 
> patch, when hot plug a new CPU into a socket, we need to dynamically allocate 
> certain continuous memory for it, which might fail. But the case is very rare 
> and even the allocation failure will not impact other components.
> 
> Is it acceptable for you?

Not really, no. I hate to repeat myself: We should be able to get
away without multi-page runtime allocations.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 2/6] x86: dynamically attach/detach CQM service for a guest
  2014-03-26  6:35 ` [PATCH v10 2/6] x86: dynamically attach/detach CQM service for a guest Dongxiao Xu
@ 2014-04-02 15:24   ` Jan Beulich
  2014-04-03  4:49     ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-04-02 15:24 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

>>> On 26.03.14 at 07:35, <dongxiao.xu@intel.com> wrote:
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -896,6 +896,14 @@ struct xen_domctl_cacheflush {
>  typedef struct xen_domctl_cacheflush xen_domctl_cacheflush_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_cacheflush_t);
>  
> +struct xen_domctl_qos_type {
> +#define _XEN_DOMCTL_pqos_cqm      0
> +#define XEN_DOMCTL_pqos_cqm       (1U<<_XEN_DOMCTL_pqos_cqm)
> +    uint64_t flags;

Can you remind me again why this is a flag rather than (for now)
a single enumerator (not necessarily implemented via enum, but
from the perspective of its meaning)?

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 3/6] x86: collect CQM information from all sockets
  2014-03-26  6:35 ` [PATCH v10 3/6] x86: collect CQM information from all sockets Dongxiao Xu
@ 2014-04-02 15:35   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2014-04-02 15:35 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

>>> On 26.03.14 at 07:35, <dongxiao.xu@intel.com> wrote:
> +static int select_socket_cpu(cpumask_t *cpu_bitmap)
> +{
> +    int i;

unsigned int

> +    unsigned int cpu;
> +    int socket, socket_curr = cpu_to_socket(smp_processor_id());
> +    cpumask_var_t sockets;
> +
> +    if ( !zalloc_cpumask_var(&sockets) )
> +        return -ENOMEM;
> +
> +    if ( socket_curr >= 0 )
> +        set_bit(socket_curr, sockets);
> +
> +    cpumask_clear(cpu_bitmap);
> +    for ( i = 0; i < NR_CPUS; i++ )
> +    {
> +        socket = cpu_to_socket(i);
> +        if ( socket < 0 || test_and_set_bit(socket, sockets) )
> +            continue;
> +        cpu = cpumask_any(per_cpu(cpu_core_mask, i));

Is cpu_to_socket() guaranteed to return negative values for all
offline CPUs at all times? If not, the per_cpu() access may end
up being invalid.

And of course the test_and_set_bit() above can corrupt memory
for sparse socket maps (remember in particular that
alloc_cpumask_var() allocates nr_cpumask_bits bits, not NR_CPUS
of them).

> +    case XEN_SYSCTL_getcqminfo:
> +    {
> +        struct cpuinfo_x86 *c = &boot_cpu_data;

This variable appears to be used just once, i.e. is rather pointless.

> +        cpumask_var_t cpu_cqmdata_map;
> +
> +        if ( !system_supports_cqm() )
> +        {
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        if ( !zalloc_cpumask_var(&cpu_cqmdata_map) )
> +        {
> +            ret = -ENOMEM;
> +            break;
> +        }
> +
> +        ret = select_socket_cpu(cpu_cqmdata_map);
> +        if ( ret < 0 )
> +        {
> +            free_cpumask_var(cpu_cqmdata_map);
> +            break;
> +        }
> +
> +        get_cqm_info(cpu_cqmdata_map);
> +
> +        sysctl->u.getcqminfo.socket_l3c_mfn = virt_to_mfn(cqm->socket_l3c_mfn);
> +        sysctl->u.getcqminfo.rmid_dom_mfn = virt_to_mfn(cqm->rmid_to_dom);
> +        sysctl->u.getcqminfo.nr_rmids = cqm->rmid_max + 1;
> +        sysctl->u.getcqminfo.nr_sockets = cpumask_weight(cpu_cqmdata_map) + 1;
> +        sysctl->u.getcqminfo.l3c_total = c->x86_cache_size;

Is this really always the L3 size (i.e. zero if there are only L1 and L2
caches)? Or does system_supports_cqm() imply this?

> --- a/xen/include/asm-x86/pqos.h
> +++ b/xen/include/asm-x86/pqos.h
> @@ -17,6 +17,8 @@
>  #ifndef ASM_PQOS_H
>  #define ASM_PQOS_H
>  #include <xen/sched.h>
> +#include <xen/cpumask.h>
> +#include <public/domctl.h>

What is this second #include doing here?

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-04-02 14:48               ` Jan Beulich
@ 2014-04-03  3:54                 ` Xu, Dongxiao
  2014-04-03  8:18                   ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-04-03  3:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, April 02, 2014 10:48 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> keir@xen.org
> Subject: RE: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring
> feature
> 
> >>> On 02.04.14 at 14:40, <dongxiao.xu@intel.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Wednesday, April 02, 2014 4:37 PM
> >> To: Xu, Dongxiao
> >> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> >> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> >> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> >> keir@xen.org
> >> Subject: RE: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring
> >> feature
> >>
> >> >>> On 02.04.14 at 04:17, <dongxiao.xu@intel.com> wrote:
> >> >> Iirc disabled CPUs (i.e. hot-pluggable ones) are still listed in MADT,
> >> >> including their LAPIC ID (which the socket number is being calculated
> >> >> from).
> >> >
> >> > I dumped one MADT table from the system, and following is one piece from
> > it:
> >> > If the CPU is there, then APIC ID is shown as a valid value, while CPU is
> >> > not there, then APIC ID is shown as 0xFF.
> >> > I know we can get socket info if CPU is there (by certain calculation), but
> >> > if CPU is not there, could you help to explain how can we get the socket
> >> > information from it?
> >>
> >> Now a fundamental question is whether this was on a system where
> >> CPUs are actually hotpluggable - it's quite common for APIC IDs to be
> >> set to invalid values when the firmware just allocates a large table
> >> and only fills (and marks enabled) the slots for existing CPUs.
> >>
> >> In any event with sparse APIC ID allocations like this
> >>
> >> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> >> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
> >> ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
> >> ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
> >> ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled)
> >> ACPI: LAPIC (acpi_id[0x12] lapic_id[0x22] enabled)
> >> ACPI: LAPIC (acpi_id[0x14] lapic_id[0x24] enabled)
> >> ACPI: LAPIC (acpi_id[0x16] lapic_id[0x26] enabled)
> >>
> >> making assumptions about the socket ID range being a subset of
> >> the NR_CPUS one is wrong - you'll either need some data structure
> >> other than a flat array to deal with this, or at least dimension the
> >> thing using MAX_APICS (but you'll then realize that you end up
> >> with an even larger array, i.e. it'll become less acceptable). And
> >> yes, other users making invalid assumptions on the socket ID
> >> ranges should be fixed as well (i.e. likely you would want a cleanup
> >> patch first, and then put the your QoS stuff on top).
> >
> > From your above explanation, it seems that we still cannot get exact socket
> > number from the MADT table,
> 
> Why not? The socket IDs are derived from the LAPIC IDs, and as long
> as you have the full set of the latter, you can also determine the full
> set of the former.

I dumped the MADT table from the machine that supports CPU hotplug.
However if I remove some CPUs from the sockets, those disabled CPU's APIC ID is shown as 0xFF.
Therefore to me it is difficult to get the full set of the LAPIC IDs, so as the socket IDs.

> 
> > what we can get is the possible CPU numbers,
> > which may also be a very big value.
> > As you mentioned, NR_CPUS may not able to cover the socket_id, but it can
> > surely cover the socket number, so the following allocation code is OK from
> > correctness point of view.
> 
> That depends on what you intend to use for indexing the array.

NR_CPUS is only to cover the number of sockets, but will not use it to index the array.

> 
> > Do you agree? I think it doesn't differ a lot compared with getting the
> > upper value from ACPI MADT table.
> >
> >> +    /* Allocate the buffer that holds MFNs of per-socket CQM LLC */
> >> +    order = get_order_from_bytes(sizeof(unsigned long) * NR_CPUS);
> >> +    cqm->socket_l3c_mfn = alloc_xenheap_pages(order, 0);
> >
> > But for the socket_id issue, I will introduce another dimension in the CQM
> > buffer data structure in my code in case socket_id may be bigger than
> > NR_CPUS.
> 
> Another dimension for something that can reasonably sparse? I
> was thinking of using e.g. a radix tree instead.

Not a sparse dimension. I was thinking of adding a data structure, which is something like below:
struct cqm_data {
	unsigned int socket_id;
	unsigned long l3c;
}
The socket_id field in this structure can be sparse, to avoid the issue where socket_id might be bigger than NR_CPUS.

In current implementation, the CQM buffer is shared between Xen and Dom0 tool stack.
If use radix tree, it is implemented as a list, where all nodes are allocated by xmalloc(). It is difficult to share these radix nodes between Xen and Dom0 tool stack. Besides, current libxl/libxc doesn't have such radix APIs.

> 
> > As we implemented the dynamic CQM buffer memory allocation in this v10
> > patch, when hot plug a new CPU into a socket, we need to dynamically allocate
> > certain continuous memory for it, which might fail. But the case is very rare
> > and even the allocation failure will not impact other components.
> >
> > Is it acceptable for you?
> 
> Not really, no. I hate to repeat myself: We should be able to get
> away without multi-page runtime allocations.

In the CQM case, the order for per-socket LLC buffer is 1, that is 2 continuous pages.
But if you are unhappy with that, maybe I will allocate the page twice with order=0.

Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 2/6] x86: dynamically attach/detach CQM service for a guest
  2014-04-02 15:24   ` Jan Beulich
@ 2014-04-03  4:49     ` Xu, Dongxiao
  0 siblings, 0 replies; 28+ messages in thread
From: Xu, Dongxiao @ 2014-04-03  4:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, April 02, 2014 11:24 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> keir@xen.org
> Subject: Re: [PATCH v10 2/6] x86: dynamically attach/detach CQM service for a
> guest
> 
> >>> On 26.03.14 at 07:35, <dongxiao.xu@intel.com> wrote:
> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -896,6 +896,14 @@ struct xen_domctl_cacheflush {
> >  typedef struct xen_domctl_cacheflush xen_domctl_cacheflush_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_domctl_cacheflush_t);
> >
> > +struct xen_domctl_qos_type {
> > +#define _XEN_DOMCTL_pqos_cqm      0
> > +#define XEN_DOMCTL_pqos_cqm       (1U<<_XEN_DOMCTL_pqos_cqm)
> > +    uint64_t flags;
> 
> Can you remind me again why this is a flag rather than (for now)
> a single enumerator (not necessarily implemented via enum, but
> from the perspective of its meaning)?

Users may need to attach multiple QoS types in one command, so flag could cover it.
E.g., # xl pqos-attach domid cqm xxx yyy zzz;
BTW, this needs a slight change in libxl side.

Dongxiao


> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-04-03  3:54                 ` Xu, Dongxiao
@ 2014-04-03  8:18                   ` Jan Beulich
  2014-04-03  8:27                     ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-04-03  8:18 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

>>> On 03.04.14 at 05:54, <dongxiao.xu@intel.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Wednesday, April 02, 2014 10:48 PM
>> To: Xu, Dongxiao
>> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
>> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
>> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
>> keir@xen.org 
>> Subject: RE: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring
>> feature
>> 
>> >>> On 02.04.14 at 14:40, <dongxiao.xu@intel.com> wrote:
>> >>  -----Original Message-----
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> Sent: Wednesday, April 02, 2014 4:37 PM
>> >> To: Xu, Dongxiao
>> >> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
>> >> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
>> >> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
>> >> keir@xen.org 
>> >> Subject: RE: [PATCH v10 1/6] x86: detect and initialize Cache QoS 
> Monitoring
>> >> feature
>> >>
>> >> >>> On 02.04.14 at 04:17, <dongxiao.xu@intel.com> wrote:
>> >> >> Iirc disabled CPUs (i.e. hot-pluggable ones) are still listed in MADT,
>> >> >> including their LAPIC ID (which the socket number is being calculated
>> >> >> from).
>> >> >
>> >> > I dumped one MADT table from the system, and following is one piece from
>> > it:
>> >> > If the CPU is there, then APIC ID is shown as a valid value, while CPU is
>> >> > not there, then APIC ID is shown as 0xFF.
>> >> > I know we can get socket info if CPU is there (by certain calculation), 
> but
>> >> > if CPU is not there, could you help to explain how can we get the socket
>> >> > information from it?
>> >>
>> >> Now a fundamental question is whether this was on a system where
>> >> CPUs are actually hotpluggable - it's quite common for APIC IDs to be
>> >> set to invalid values when the firmware just allocates a large table
>> >> and only fills (and marks enabled) the slots for existing CPUs.
>> >>
>> >> In any event with sparse APIC ID allocations like this
>> >>
>> >> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
>> >> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
>> >> ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
>> >> ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
>> >> ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled)
>> >> ACPI: LAPIC (acpi_id[0x12] lapic_id[0x22] enabled)
>> >> ACPI: LAPIC (acpi_id[0x14] lapic_id[0x24] enabled)
>> >> ACPI: LAPIC (acpi_id[0x16] lapic_id[0x26] enabled)
>> >>
>> >> making assumptions about the socket ID range being a subset of
>> >> the NR_CPUS one is wrong - you'll either need some data structure
>> >> other than a flat array to deal with this, or at least dimension the
>> >> thing using MAX_APICS (but you'll then realize that you end up
>> >> with an even larger array, i.e. it'll become less acceptable). And
>> >> yes, other users making invalid assumptions on the socket ID
>> >> ranges should be fixed as well (i.e. likely you would want a cleanup
>> >> patch first, and then put the your QoS stuff on top).
>> >
>> > From your above explanation, it seems that we still cannot get exact socket
>> > number from the MADT table,
>> 
>> Why not? The socket IDs are derived from the LAPIC IDs, and as long
>> as you have the full set of the latter, you can also determine the full
>> set of the former.
> 
> I dumped the MADT table from the machine that supports CPU hotplug.
> However if I remove some CPUs from the sockets, those disabled CPU's APIC ID 
> is shown as 0xFF.
> Therefore to me it is difficult to get the full set of the LAPIC IDs, so as 
> the socket IDs.

Okay, that convinces me then that we need to deal with socket IDs
being known only at hotplug time.

> If use radix tree, it is implemented as a list, where all nodes are 
> allocated by xmalloc(). It is difficult to share these radix nodes between 
> Xen and Dom0 tool stack. Besides, current libxl/libxc doesn't have such radix 
> APIs.

But you could get away with a split approach: Provide the socket ID
list via ordinary hypercall means (if this isn't already derivable from
the topology sysctl anyway), and only share the per-socket data
page(s).

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-04-03  8:18                   ` Jan Beulich
@ 2014-04-03  8:27                     ` Xu, Dongxiao
  2014-04-03  8:46                       ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-04-03  8:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org
> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Jan Beulich
> Sent: Thursday, April 03, 2014 4:19 PM
> To: Xu, Dongxiao
> Cc: keir@xen.org; Ian.Campbell@citrix.com; andrew.cooper3@citrix.com;
> stefano.stabellini@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> xen-devel@lists.xen.org; dgdegra@tycho.nsa.gov
> Subject: Re: [Xen-devel] [PATCH v10 1/6] x86: detect and initialize Cache QoS
> Monitoring feature
> 
> >>> On 03.04.14 at 05:54, <dongxiao.xu@intel.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Wednesday, April 02, 2014 10:48 PM
> >> To: Xu, Dongxiao
> >> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> >> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> >> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> >> keir@xen.org
> >> Subject: RE: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring
> >> feature
> >>
> >> >>> On 02.04.14 at 14:40, <dongxiao.xu@intel.com> wrote:
> >> >>  -----Original Message-----
> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> Sent: Wednesday, April 02, 2014 4:37 PM
> >> >> To: Xu, Dongxiao
> >> >> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> >> >> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> >> >> xen-devel@lists.xen.org; konrad.wilk@oracle.com;
> dgdegra@tycho.nsa.gov;
> >> >> keir@xen.org
> >> >> Subject: RE: [PATCH v10 1/6] x86: detect and initialize Cache QoS
> > Monitoring
> >> >> feature
> >> >>
> >> >> >>> On 02.04.14 at 04:17, <dongxiao.xu@intel.com> wrote:
> >> >> >> Iirc disabled CPUs (i.e. hot-pluggable ones) are still listed in MADT,
> >> >> >> including their LAPIC ID (which the socket number is being calculated
> >> >> >> from).
> >> >> >
> >> >> > I dumped one MADT table from the system, and following is one piece
> from
> >> > it:
> >> >> > If the CPU is there, then APIC ID is shown as a valid value, while CPU is
> >> >> > not there, then APIC ID is shown as 0xFF.
> >> >> > I know we can get socket info if CPU is there (by certain calculation),
> > but
> >> >> > if CPU is not there, could you help to explain how can we get the socket
> >> >> > information from it?
> >> >>
> >> >> Now a fundamental question is whether this was on a system where
> >> >> CPUs are actually hotpluggable - it's quite common for APIC IDs to be
> >> >> set to invalid values when the firmware just allocates a large table
> >> >> and only fills (and marks enabled) the slots for existing CPUs.
> >> >>
> >> >> In any event with sparse APIC ID allocations like this
> >> >>
> >> >> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> >> >> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
> >> >> ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
> >> >> ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
> >> >> ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled)
> >> >> ACPI: LAPIC (acpi_id[0x12] lapic_id[0x22] enabled)
> >> >> ACPI: LAPIC (acpi_id[0x14] lapic_id[0x24] enabled)
> >> >> ACPI: LAPIC (acpi_id[0x16] lapic_id[0x26] enabled)
> >> >>
> >> >> making assumptions about the socket ID range being a subset of
> >> >> the NR_CPUS one is wrong - you'll either need some data structure
> >> >> other than a flat array to deal with this, or at least dimension the
> >> >> thing using MAX_APICS (but you'll then realize that you end up
> >> >> with an even larger array, i.e. it'll become less acceptable). And
> >> >> yes, other users making invalid assumptions on the socket ID
> >> >> ranges should be fixed as well (i.e. likely you would want a cleanup
> >> >> patch first, and then put the your QoS stuff on top).
> >> >
> >> > From your above explanation, it seems that we still cannot get exact socket
> >> > number from the MADT table,
> >>
> >> Why not? The socket IDs are derived from the LAPIC IDs, and as long
> >> as you have the full set of the latter, you can also determine the full
> >> set of the former.
> >
> > I dumped the MADT table from the machine that supports CPU hotplug.
> > However if I remove some CPUs from the sockets, those disabled CPU's APIC
> ID
> > is shown as 0xFF.
> > Therefore to me it is difficult to get the full set of the LAPIC IDs, so as
> > the socket IDs.
> 
> Okay, that convinces me then that we need to deal with socket IDs
> being known only at hotplug time.

Thanks for the confirmation.

> 
> > If use radix tree, it is implemented as a list, where all nodes are
> > allocated by xmalloc(). It is difficult to share these radix nodes between
> > Xen and Dom0 tool stack. Besides, current libxl/libxc doesn't have such radix
> > APIs.
> 
> But you could get away with a split approach: Provide the socket ID
> list via ordinary hypercall means (if this isn't already derivable from
> the topology sysctl anyway), and only share the per-socket data
> page(s).

sysctl->u.getcqminfo has size limitation of 128 bytes.
The per-socket MFN array _may_ exceed this limitation...

We discussed this issue in previous threads, that's how this 2-level of data_mfn/data page sharing mechanism is proposed.

Thanks,
Dongxiao

> 
> Jan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-04-03  8:27                     ` Xu, Dongxiao
@ 2014-04-03  8:46                       ` Jan Beulich
  2014-04-03 11:04                         ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-04-03  8:46 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, xen-devel, dgdegra

>>> On 03.04.14 at 10:27, <dongxiao.xu@intel.com> wrote:
>> From: xen-devel-bounces@lists.xen.org 
>> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Jan Beulich
>> But you could get away with a split approach: Provide the socket ID
>> list via ordinary hypercall means (if this isn't already derivable from
>> the topology sysctl anyway), and only share the per-socket data
>> page(s).
> 
> sysctl->u.getcqminfo has size limitation of 128 bytes.
> The per-socket MFN array _may_ exceed this limitation...
> 
> We discussed this issue in previous threads, that's how this 2-level of 
> data_mfn/data page sharing mechanism is proposed.

That's unrelated - you can always introduce a handle pointing to the
array where the MFNs are to be stored. Whether that's preferable
over the shared page approach largely depends on the number of
entries in the page, and the performance needs. Both I think would
suggest to use the handle-to-array approach, and use shared pages
only for the actual data.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-04-03  8:46                       ` Jan Beulich
@ 2014-04-03 11:04                         ` Xu, Dongxiao
  0 siblings, 0 replies; 28+ messages in thread
From: Xu, Dongxiao @ 2014-04-03 11:04 UTC (permalink / raw)
  To: Jan Beulich, andrew.cooper3
  Cc: keir, Ian.Campbell, stefano.stabellini, Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, April 03, 2014 4:47 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; dgdegra@tycho.nsa.gov; keir@xen.org
> Subject: RE: [Xen-devel] [PATCH v10 1/6] x86: detect and initialize Cache QoS
> Monitoring feature
> 
> >>> On 03.04.14 at 10:27, <dongxiao.xu@intel.com> wrote:
> >> From: xen-devel-bounces@lists.xen.org
> >> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Jan Beulich
> >> But you could get away with a split approach: Provide the socket ID
> >> list via ordinary hypercall means (if this isn't already derivable from
> >> the topology sysctl anyway), and only share the per-socket data
> >> page(s).
> >
> > sysctl->u.getcqminfo has size limitation of 128 bytes.
> > The per-socket MFN array _may_ exceed this limitation...
> >
> > We discussed this issue in previous threads, that's how this 2-level of
> > data_mfn/data page sharing mechanism is proposed.
> 
> That's unrelated - you can always introduce a handle pointing to the
> array where the MFNs are to be stored. Whether that's preferable
> over the shared page approach largely depends on the number of
> entries in the page, and the performance needs. Both I think would
> suggest to use the handle-to-array approach, and use shared pages
> only for the actual data.

I am okay with either solution, whatever share or copy.

The 2-level sharing is previously proposed by Andrew, which I implemented in v10.
Andrew, are you okay with this copy solution for the MFN page?

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc
  2014-03-28  9:43       ` Ian Campbell
@ 2014-04-03 11:31         ` Xu, Dongxiao
  2014-04-03 12:52           ` Ian Campbell
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-04-03 11:31 UTC (permalink / raw)
  To: Ian Campbell
  Cc: keir, stefano.stabellini, andrew.cooper3, Ian.Jackson, xen-devel,
	JBeulich, dgdegra

> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> Sent: Friday, March 28, 2014 5:44 PM
> To: Xu, Dongxiao
> Cc: xen-devel@lists.xen.org; keir@xen.org; JBeulich@suse.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> andrew.cooper3@citrix.com; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov
> Subject: Re: [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for
> libxl/libxc
> 
> On Fri, 2014-03-28 at 00:53 +0000, Xu, Dongxiao wrote:
> > > -----Original Message-----
> > > From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> > > Sent: Thursday, March 27, 2014 11:14 PM
> > > To: Xu, Dongxiao
> > > Cc: xen-devel@lists.xen.org; keir@xen.org; JBeulich@suse.com;
> > > Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> > > andrew.cooper3@citrix.com; konrad.wilk@oracle.com;
> dgdegra@tycho.nsa.gov
> > > Subject: Re: [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for
> > > libxl/libxc
> > >
> > > On Wed, 2014-03-26 at 14:35 +0800, Dongxiao Xu wrote:
> > > [...]
> > > > +int libxl_pqos_attach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type
> > > qos_type);
> > > > +int libxl_pqos_detach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type
> > > qos_type);
> > > > +libxl_cqminfo * libxl_getcqminfo(libxl_ctx *ctx, unsigned int *l3c_total,
> > > > +    unsigned int *nr_domains, unsigned int *nr_rmids, unsigned int
> > > *nr_sockets);
> > > [...]
> > > > +libxl_cqminfo = Struct("cqminfo", [
> > > > +    ("valid",         uint32),
> > > > +    ("l3c",           uint64),
> > > > +    ])
> > >
> > > libxl_getcqminfo seems to have a strange mixture of returning an info
> > > struct and returning things via pointers passed as integers. why is
> > > that?
> >
> > The returned info is a unit data structure containing the L3 cache usage data for
> certain RMID on certain socket.
> > These pointers passed integers tell caller the number of such units are
> returned.
> >
> > This follows one existing function:
> > libxl_dominfo * libxl_list_domain(libxl_ctx *ctx, int *nb_domain_out);
> > where the return value is the unit dominfo data structure, and
> *nb_domain_out tells caller how many data structure units are returned.
> 
> But libxl_getqminfo has three such nr_* pointers as parameters, which
> one is the length of the returned array? What are the other two? How
> does one enumerate the things which the other nr_* refer to?

The size is calculated by nr_domains*nr_sockets. The nr_rmids is to show how many RMIDs a system has.

> 
> > > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > > > index 8389468..664a34d 100644
> > > > --- a/tools/libxl/xl_cmdimpl.c
> > > > +++ b/tools/libxl/xl_cmdimpl.c
> > > > @@ -7381,6 +7381,115 @@ out:
> > > >      return ret;
> > > >  }
> > > >
> > > > +int main_pqosattach(int argc, char **argv)
> > > > +{
> > > > +    uint32_t domid;
> > > > +    int opt, rc;
> > > > +    libxl_pqos_type qos_type;
> > > > +
> > > > +    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-attach", 2) {
> > > > +        /* No options */
> > > > +    }
> > > > +
> > > > +    /* libxl_pqos_attach will handle the parameter check. */
> > > > +    libxl_pqos_type_from_string(argv[optind], &qos_type);
> > >
> > > This can fail, I think you need to check the return value.
> >
> > Yes, I am aware of that.
> > The following libxl_pqos_attach() function will check the parameter
> correctness.
> > Actually I put a comment in above (/* libxl_pqos_attach will handle the
> parameter check. */).
> 
> If libxl_pqos_type_from_string fails then qos_type remains undefined, so
> you can't rely on some later function checking for correctness, the
> undefined value may happen to appear valid.

Okay, I will add a check in the function.

Thanks,
Dongxiao

> 
> Ian.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc
  2014-04-03 11:31         ` Xu, Dongxiao
@ 2014-04-03 12:52           ` Ian Campbell
  0 siblings, 0 replies; 28+ messages in thread
From: Ian Campbell @ 2014-04-03 12:52 UTC (permalink / raw)
  To: Xu, Dongxiao
  Cc: keir, stefano.stabellini, andrew.cooper3, Ian.Jackson, xen-devel,
	JBeulich, dgdegra

On Thu, 2014-04-03 at 11:31 +0000, Xu, Dongxiao wrote:
> > -----Original Message-----
> > From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> > Sent: Friday, March 28, 2014 5:44 PM
> > To: Xu, Dongxiao
> > Cc: xen-devel@lists.xen.org; keir@xen.org; JBeulich@suse.com;
> > Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> > andrew.cooper3@citrix.com; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov
> > Subject: Re: [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for
> > libxl/libxc
> > 
> > On Fri, 2014-03-28 at 00:53 +0000, Xu, Dongxiao wrote:
> > > > -----Original Message-----
> > > > From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> > > > Sent: Thursday, March 27, 2014 11:14 PM
> > > > To: Xu, Dongxiao
> > > > Cc: xen-devel@lists.xen.org; keir@xen.org; JBeulich@suse.com;
> > > > Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> > > > andrew.cooper3@citrix.com; konrad.wilk@oracle.com;
> > dgdegra@tycho.nsa.gov
> > > > Subject: Re: [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for
> > > > libxl/libxc
> > > >
> > > > On Wed, 2014-03-26 at 14:35 +0800, Dongxiao Xu wrote:
> > > > [...]
> > > > > +int libxl_pqos_attach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type
> > > > qos_type);
> > > > > +int libxl_pqos_detach(libxl_ctx *ctx, uint32_t domid, libxl_pqos_type
> > > > qos_type);
> > > > > +libxl_cqminfo * libxl_getcqminfo(libxl_ctx *ctx, unsigned int *l3c_total,
> > > > > +    unsigned int *nr_domains, unsigned int *nr_rmids, unsigned int
> > > > *nr_sockets);
> > > > [...]
> > > > > +libxl_cqminfo = Struct("cqminfo", [
> > > > > +    ("valid",         uint32),
> > > > > +    ("l3c",           uint64),
> > > > > +    ])
> > > >
> > > > libxl_getcqminfo seems to have a strange mixture of returning an info
> > > > struct and returning things via pointers passed as integers. why is
> > > > that?
> > >
> > > The returned info is a unit data structure containing the L3 cache usage data for
> > certain RMID on certain socket.
> > > These pointers passed integers tell caller the number of such units are
> > returned.
> > >
> > > This follows one existing function:
> > > libxl_dominfo * libxl_list_domain(libxl_ctx *ctx, int *nb_domain_out);
> > > where the return value is the unit dominfo data structure, and
> > *nb_domain_out tells caller how many data structure units are returned.
> > 
> > But libxl_getqminfo has three such nr_* pointers as parameters, which
> > one is the length of the returned array? What are the other two? How
> > does one enumerate the things which the other nr_* refer to?
> 
> The size is calculated by nr_domains*nr_sockets. The nr_rmids is to
> show how many RMIDs a system has.

How do I find the info for socket i and domain j then?

Please can you add a doc comment to this function to explain.

Is domains in this context a Xen domain or some other type of hardware
related domain?

Or maybe the result of this function should be something more structured
like:

	struct cqpcpuinfo {
		uint??_t socket, domain;
		uint32_t valid; /* why not a bool? */
		uint32_t l3c;
	};
	struct cqminfo {
		int nr_rmids;
		int nr_sockets;
		int nr_domains;
		cqmcpuifo *info; /* indexed by ...
	};
?

Or even a three level thing with top-level, sockets and domains (or
domains then sockets):
	struct cqminfo {
		...
		int nr_sockets
		struct cqsocketinfo {
			int nr_domains
			...
		}
	}

Also can valid be a bool? and what is "l3c", I guess it has something to
do with the l3 cache, but please can you name it something more
descriptive.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
@ 2014-03-31 15:44 Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2014-03-31 15:44 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

>>> On 31.03.14 at 17:33, <dongxiao.xu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 26.03.14 at 07:35, <dongxiao.xu@intel.com> wrote:
>> > +    /* According to Intel SDM, the possible maximum rmid number is 2^10 = 1024,
>> > +     * thus one page is enough to hold cqm->rmid_to_dom structure */
>> > +    cqm->rmid_to_dom = alloc_xenheap_page();
>> 
>> But please validate that this condition is being met (not necessarily here,
>> perhaps rather where rmid_max gets determined).
> 
> Okay, will add one ASSERT() to validate this condition.

And btw, please fix your multi-line comment style.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-04-03 12:52 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-26  6:35 [PATCH v10 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
2014-03-26  6:35 ` [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature Dongxiao Xu
2014-03-31 14:21   ` Jan Beulich
2014-03-31 15:33     ` Xu, Dongxiao
2014-03-31 15:42       ` Jan Beulich
2014-04-02  2:17         ` Xu, Dongxiao
2014-04-02  8:36           ` Jan Beulich
2014-04-02 12:40             ` Xu, Dongxiao
2014-04-02 14:48               ` Jan Beulich
2014-04-03  3:54                 ` Xu, Dongxiao
2014-04-03  8:18                   ` Jan Beulich
2014-04-03  8:27                     ` Xu, Dongxiao
2014-04-03  8:46                       ` Jan Beulich
2014-04-03 11:04                         ` Xu, Dongxiao
2014-03-26  6:35 ` [PATCH v10 2/6] x86: dynamically attach/detach CQM service for a guest Dongxiao Xu
2014-04-02 15:24   ` Jan Beulich
2014-04-03  4:49     ` Xu, Dongxiao
2014-03-26  6:35 ` [PATCH v10 3/6] x86: collect CQM information from all sockets Dongxiao Xu
2014-04-02 15:35   ` Jan Beulich
2014-03-26  6:35 ` [PATCH v10 4/6] x86: enable CQM monitoring for each domain RMID Dongxiao Xu
2014-03-26  6:35 ` [PATCH v10 5/6] xsm: add platform QoS related xsm policies Dongxiao Xu
2014-03-26  6:35 ` [PATCH v10 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc Dongxiao Xu
2014-03-27 15:13   ` Ian Campbell
2014-03-28  0:53     ` Xu, Dongxiao
2014-03-28  9:43       ` Ian Campbell
2014-04-03 11:31         ` Xu, Dongxiao
2014-04-03 12:52           ` Ian Campbell
2014-03-31 15:44 [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.