All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v11 0/9] enable Cache QoS Monitoring (CQM) feature
@ 2014-06-20 14:31 Dongxiao Xu
  2014-06-20 14:31 ` [PATCH v11 1/9] x86: add generic MSR access hypercall Dongxiao Xu
                   ` (8 more replies)
  0 siblings, 9 replies; 38+ messages in thread
From: Dongxiao Xu @ 2014-06-20 14:31 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Changes from v10:
 - Re-design and re-implement the whole logic. In this version,
   hypervisor provides basic mechanisms (like access MSRs) while all
   policies are put in user space.
   patch 1-3 provide a generic MSR hypercall for toolstack to access
   patch 4-9 implement the cache QoS monitoring feature

Changes from v9:
 - Revise the readonly mapping mechanism to share data between Xen and
   userspace. We create L3C buffer for each socket, share both buffer
   address MFNs and buffer MFNs to userspace.
 - Split the pqos.c into pqos.c and cqm.c for better code structure.
 - Show the total L3 cache size when issueing xl pqos-list cqm command.
 - Abstract a libxl_getcqminfo() function to fetch cqm data from Xen.
 - Several coding style fixes.

Changes from v8:
 - Address comments from Ian Campbell, including:
   * Modify the return handling for xc_sysctl();
   * Add man page items for platform QoS related commands.
   * Fix typo in commit message.

Changes from v7:
 - Address comments from Andrew Cooper, including:
   * Check CQM capability before allocating cpumask memory.
   * Move one function declaration into the correct patch.

Changes from v6:
 - Address comments from Jan Beulich, including:
   * Remove the unnecessary CPUID feature check.
   * Remove the unnecessary socket_cpu_map.
   * Spin_lock related changes, avoid spin_lock_irqsave().
   * Use readonly mapping to pass cqm data between Xen/Userspace,
     to avoid data copying.
   * Optimize RDMSR/WRMSR logic to avoid unnecessary calls.
   * Misc fixes including __read_mostly prefix, return value, etc.

Changes from v5:
 - Address comments from Dario Faggioli, including:
   * Define a new libxl_cqminfo structure to avoid reference of xc
     structure in libxl functions.
   * Use LOGE() instead of the LIBXL__LOG() functions.

Changes from v4:
 - When comparing xl cqm parameter, use strcmp instead of strncmp,
   otherwise, "xl pqos-attach cqmabcd domid" will be considered as
   a valid command line.
 - Address comments from Andrew Cooper, including:
   * Adjust the pqos parameter parsing function.
   * Modify the pqos related documentation.
   * Add a check for opt_cqm_max_rmid in initialization code.
   * Do not IPI CPU that is in same socket with current CPU.
 - Address comments from Dario Faggioli, including:
   * Fix an typo in export symbols.
   * Return correct libxl error code for qos related functions.
   * Abstract the error printing logic into a function.
 - Address comment from Daniel De Graaf, including:
   * Add return value in for pqos related check.
 - Address comments from Konrad Rzeszutek Wilk, including:
   * Modify the GPLv2 related file header, remove the address.

Changes from v3:
 - Use structure to better organize CQM related global variables.
 - Address comments from Andrew Cooper, including:
   * Remove the domain creation flag for CQM RMID allocation.
   * Adjust the boot parameter format, use custom_param().
   * Add documentation for the new added boot parameter.
   * Change QoS type flag to be uint64_t.
   * Initialize the per socket cpu bitmap in system boot time.
   * Remove get_cqm_avail() function.
   * Misc of format changes.
 - Address comment from Daniel De Graaf, including:
   * Use avc_current_has_perm() for XEN2__PQOS_OP that belongs to SECCLASS_XEN2.

Changes from v2:
 - Address comments from Andrew Cooper, including:
   * Merging tools stack changes into one patch.
   * Reduce the IPI number to one per socket.
   * Change structures for CQM data exchange between tools and Xen.
   * Misc of format/variable/function name changes.
 - Address comments from Konrad Rzeszutek Wilk, including:
   * Simplify the error printing logic.
   * Add xsm check for the new added hypercalls.

Changes from v1:
 - Address comments from Andrew Cooper, including:
   * Change function names, e.g., alloc_cqm_rmid(), system_supports_cqm(), etc.
   * Change some structure element order to save packing cost.
   * Correct some function's return value.
   * Some programming styles change.
   * ...

Future generations of Intel Xeon processor may offer monitoring capability in
each logical processor to measure specific quality-of-service metric,
for example, the Cache QoS Monitoring to get L3 cache occupancy.
Detailed information please refer to Intel SDM chapter 17.14.

Cache QoS Monitoring provides a layer of abstraction between applications and
logical processors through the use of Resource Monitoring IDs (RMIDs).
In Xen design, each guest in the system can be assigned an RMID independently,
while RMID=0 is reserved for monitoring domains that doesn't enable CQM service.
When any of the domain's vcpu is scheduled on a logical processor, the domain's
RMID will be activated by programming the value into one specific MSR, and when
the vcpu is scheduled out, a RMID=0 will be programmed into that MSR.
The Cache QoS Hardware tracks cache utilization of memory accesses according to
the RMIDs and reports monitored data via a counter register. With this solution,
we can get the knowledge how much L3 cache is used by a certain guest.

To attach QoS monitoring service to a certain guest:
xl pqos-monitor-attach domid

To detached CQM service from a guest:
xl pqos-monitor-detach domid

To get the L3 cache usage:
$ xl pqos-monitor-show cqm <domid>

The below data is just an example showing how the CQM related data is exposed to
end user.

[root@localhost]# xl pqos-monitor-show cqm
Total RMID: 55
Per-Socket L3 Cache Size: 35840 KB
Name                                        ID  SocketID        L3C_Usage       SocketID        L3C_Usage
Domain-0                                     0         0         31920 KB              1         28728 KB
ExampleHVMDomain                             1         0           504 KB              1          6160 KB

Dongxiao Xu (9):
  x86: add generic MSR access hypercall
  xsm: add MSR operation related xsm policy
  tools: provide interface for generic MSR access
  x86: detect and initialize Platform QoS Monitoring feature
  x86: dynamically attach/detach QoS monitoring service for a guest
  x86: collect global QoS monitoring information
  x86: enable QoS monitoring for each domain RMID
  xsm: add platform QoS related xsm policies
  tools: CMDs and APIs for Platform QoS Monitoring

 docs/man/xl.pod.1                            |  21 +++
 docs/misc/xen-command-line.markdown          |   7 +
 tools/flask/policy/policy/modules/xen/xen.if |   2 +-
 tools/flask/policy/policy/modules/xen/xen.te |   6 +-
 tools/libxc/Makefile                         |   2 +
 tools/libxc/xc_msr_x86.c                     |  53 +++++++
 tools/libxc/xc_msr_x86.h                     |  36 +++++
 tools/libxc/xc_pqos.c                        | 217 +++++++++++++++++++++++++++
 tools/libxc/xenctrl.h                        |  24 +++
 tools/libxl/Makefile                         |   2 +-
 tools/libxl/libxl.h                          |  21 +++
 tools/libxl/libxl_pqos.c                     | 171 +++++++++++++++++++++
 tools/libxl/libxl_types.idl                  |   4 +
 tools/libxl/xl.h                             |   5 +
 tools/libxl/xl_cmdimpl.c                     | 130 ++++++++++++++++
 tools/libxl/xl_cmdtable.c                    |  17 +++
 xen/arch/x86/Makefile                        |   1 +
 xen/arch/x86/domain.c                        |   8 +
 xen/arch/x86/domctl.c                        |  29 ++++
 xen/arch/x86/pqos.c                          | 202 +++++++++++++++++++++++++
 xen/arch/x86/setup.c                         |   3 +
 xen/arch/x86/sysctl.c                        | 106 +++++++++++++
 xen/include/asm-x86/cpufeature.h             |   1 +
 xen/include/asm-x86/domain.h                 |   2 +
 xen/include/asm-x86/msr-index.h              |   3 +
 xen/include/asm-x86/pqos.h                   |  63 ++++++++
 xen/include/public/domctl.h                  |  12 ++
 xen/include/public/sysctl.h                  |  39 +++++
 xen/xsm/flask/hooks.c                        |   8 +
 xen/xsm/flask/policy/access_vectors          |  18 ++-
 xen/xsm/flask/policy/security_classes        |   1 +
 31 files changed, 1208 insertions(+), 6 deletions(-)
 create mode 100644 tools/libxc/xc_msr_x86.c
 create mode 100644 tools/libxc/xc_msr_x86.h
 create mode 100644 tools/libxc/xc_pqos.c
 create mode 100644 tools/libxl/libxl_pqos.c
 create mode 100644 xen/arch/x86/pqos.c
 create mode 100644 xen/include/asm-x86/pqos.h

-- 
1.8.1.5

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 1/9] x86: add generic MSR access hypercall
  2014-06-20 14:31 [PATCH v11 0/9] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
@ 2014-06-20 14:31 ` Dongxiao Xu
  2014-06-20 14:57   ` Andrew Cooper
  2014-06-20 15:00   ` Jan Beulich
  2014-06-20 14:31 ` [PATCH v11 2/9] xsm: add MSR operation related xsm policy Dongxiao Xu
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 38+ messages in thread
From: Dongxiao Xu @ 2014-06-20 14:31 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Add a generic MSR access hypercall for tool stack or other components to
read or write certain system MSRs.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/sysctl.c       | 54 +++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h | 25 +++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 15d4b91..49f95e4 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -66,10 +66,27 @@ void arch_do_physinfo(xen_sysctl_physinfo_t *pi)
         pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm_directio;
 }
 
+static void msr_access_helper(void *data)
+{
+    struct msr_access_info *info = data;
+    xen_sysctl_msr_data_t *msr_data;
+    unsigned int i;
+
+    for ( i = 0; i < info->nr_ops; i++ )
+    {
+        msr_data = &info->msr_data[i];
+        if ( msr_data->cmd == XEN_SYSCTL_MSR_read )
+            rdmsrl(msr_data->msr, msr_data->val);
+        else if ( msr_data->cmd == XEN_SYSCTL_MSR_write )
+            wrmsrl(msr_data->msr, msr_data->val);
+    }
+}
+
 long arch_do_sysctl(
     struct xen_sysctl *sysctl, XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
     long ret = 0;
+    bool_t copyback = 0;
 
     switch ( sysctl->cmd )
     {
@@ -101,11 +118,48 @@ long arch_do_sysctl(
     }
     break;
 
+    case XEN_SYSCTL_msr_op:
+    {
+        unsigned int cpu = sysctl->u.msr_op.cpu;
+        struct msr_access_info info;
+
+        info.nr_ops = sysctl->u.msr_op.nr_ops;
+        info.msr_data = xzalloc_array(xen_sysctl_msr_data_t, info.nr_ops);
+        if ( !info.msr_data )
+            return -ENOMEM;
+
+        if ( copy_from_guest(info.msr_data, sysctl->u.msr_op.msr_data,
+                             info.nr_ops) )
+        {
+            xfree(info.msr_data);
+            return -EFAULT;
+        }
+
+        if ( cpu == smp_processor_id() )
+            msr_access_helper(&info);
+        else
+            on_selected_cpus(cpumask_of(cpu), msr_access_helper, &info, 1);
+
+        if ( copy_to_guest(sysctl->u.msr_op.msr_data, info.msr_data,
+                           info.nr_ops ) )
+        {
+            xfree(info.msr_data);
+            return -EFAULT;
+        }
+
+        xfree(info.msr_data);
+        copyback = 1;
+    }
+    break;
+
     default:
         ret = -ENOSYS;
         break;
     }
 
+    if ( copyback && __copy_to_guest(u_sysctl, sysctl, 1) )
+        ret = -EFAULT;
+
     return ret;
 }
 
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 3588698..fd042bb 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -636,6 +636,29 @@ struct xen_sysctl_coverage_op {
 typedef struct xen_sysctl_coverage_op xen_sysctl_coverage_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_coverage_op_t);
 
+#define XEN_SYSCTL_MSR_read                0
+#define XEN_SYSCTL_MSR_write               1
+
+struct xen_sysctl_msr_data {
+    uint32_t cmd;       /* XEN_SYSCTL_MSR_* */
+    uint32_t msr;
+    uint64_t val;
+};
+typedef struct xen_sysctl_msr_data xen_sysctl_msr_data_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_msr_data_t);
+struct xen_sysctl_msr_op {
+    uint32_t nr_ops;
+    uint32_t cpu;
+    XEN_GUEST_HANDLE_64(xen_sysctl_msr_data_t) msr_data;
+};
+typedef struct xen_sysctl_msr_op xen_sysctl_msr_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_msr_op_t);
+
+struct msr_access_info {
+    uint32_t nr_ops;
+    xen_sysctl_msr_data_t *msr_data;
+};
+
 
 struct xen_sysctl {
     uint32_t cmd;
@@ -658,6 +681,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_cpupool_op                    18
 #define XEN_SYSCTL_scheduler_op                  19
 #define XEN_SYSCTL_coverage_op                   20
+#define XEN_SYSCTL_msr_op                        21
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -679,6 +703,7 @@ struct xen_sysctl {
         struct xen_sysctl_cpupool_op        cpupool_op;
         struct xen_sysctl_scheduler_op      scheduler_op;
         struct xen_sysctl_coverage_op       coverage_op;
+        struct xen_sysctl_msr_op            msr_op;
         uint8_t                             pad[128];
     } u;
 };
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 2/9] xsm: add MSR operation related xsm policy
  2014-06-20 14:31 [PATCH v11 0/9] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
  2014-06-20 14:31 ` [PATCH v11 1/9] x86: add generic MSR access hypercall Dongxiao Xu
@ 2014-06-20 14:31 ` Dongxiao Xu
  2014-06-24 19:20   ` Daniel De Graaf
  2014-06-20 14:31 ` [PATCH v11 3/9] tools: provide interface for generic MSR access Dongxiao Xu
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Dongxiao Xu @ 2014-06-20 14:31 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Add xsm policies for MSR access related hypercall.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 tools/flask/policy/policy/modules/xen/xen.te |  3 +++
 xen/xsm/flask/hooks.c                        |  4 ++++
 xen/xsm/flask/policy/access_vectors          | 14 +++++++++++---
 xen/xsm/flask/policy/security_classes        |  1 +
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index bb59fe8..0e63e76 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -64,6 +64,9 @@ allow dom0_t xen_t:xen {
 	getidle debug getcpuinfo heap pm_op mca_op lockprof cpupool_op tmem_op
 	tmem_control getscheduler setscheduler
 };
+allow dom0_t xen_t:xen2 {
+    msr_op
+};
 allow dom0_t xen_t:mmu memorymap;
 
 # Allow dom0 to use these domctls on itself. For domctls acting on other
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index c008398..277a5de 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -770,6 +770,10 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_numainfo:
         return domain_has_xen(current->domain, XEN__PHYSINFO);
 
+    case XEN_SYSCTL_msr_op:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__MSR_OP, NULL);
+
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 32371a9..82b5484 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -3,9 +3,9 @@
 #
 # class class_name { permission_name ... }
 
-# Class xen consists of dom0-only operations dealing with the hypervisor itself.
-# Unless otherwise specified, the source is the domain executing the hypercall,
-# and the target is the xen initial sid (type xen_t).
+# Class xen and xen2 consists of dom0-only operations dealing with the
+# hypervisor itself. Unless otherwise specified, the source is the domain
+# executing the hypercall, and the target is the xen initial sid (type xen_t).
 class xen
 {
 # XENPF_settime
@@ -75,6 +75,14 @@ class xen
     setscheduler
 }
 
+# This is a continuation of class xen, since only 32 permissions can be
+# defined per class
+class xen2
+{
+# XEN_SYSCTL_msr_op
+    msr_op
+}
+
 # Classes domain and domain2 consist of operations that a domain performs on
 # another domain or on itself.  Unless otherwise specified, the source is the
 # domain executing the hypercall, and the target is the domain being operated on
diff --git a/xen/xsm/flask/policy/security_classes b/xen/xsm/flask/policy/security_classes
index ef134a7..ca191db 100644
--- a/xen/xsm/flask/policy/security_classes
+++ b/xen/xsm/flask/policy/security_classes
@@ -8,6 +8,7 @@
 # for userspace object managers
 
 class xen
+class xen2
 class domain
 class domain2
 class hvm
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 3/9] tools: provide interface for generic MSR access
  2014-06-20 14:31 [PATCH v11 0/9] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
  2014-06-20 14:31 ` [PATCH v11 1/9] x86: add generic MSR access hypercall Dongxiao Xu
  2014-06-20 14:31 ` [PATCH v11 2/9] xsm: add MSR operation related xsm policy Dongxiao Xu
@ 2014-06-20 14:31 ` Dongxiao Xu
  2014-06-20 17:34   ` Konrad Rzeszutek Wilk
  2014-06-27 13:05   ` Ian Campbell
  2014-06-20 14:31 ` [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring feature Dongxiao Xu
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 38+ messages in thread
From: Dongxiao Xu @ 2014-06-20 14:31 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Xen added a new sysctl hypercall for generic MSR access, and this is the
tool side change to wrapper the hypercall into xc APIs.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 tools/libxc/Makefile     |  1 +
 tools/libxc/xc_msr_x86.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h    |  6 ++++++
 3 files changed, 60 insertions(+)
 create mode 100644 tools/libxc/xc_msr_x86.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index a74b19e..fec4aca 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -35,6 +35,7 @@ CTRL_SRCS-y       += xc_kexec.c
 CTRL_SRCS-y       += xtl_core.c
 CTRL_SRCS-y       += xtl_logger_stdio.c
 CTRL_SRCS-$(CONFIG_X86) += xc_pagetab.c
+CTRL_SRCS-$(CONFIG_X86) += xc_msr_x86.c
 CTRL_SRCS-$(CONFIG_Linux) += xc_linux.c xc_linux_osdep.c
 CTRL_SRCS-$(CONFIG_SunOS) += xc_solaris.c
 CTRL_SRCS-$(CONFIG_NetBSD) += xc_netbsd.c
diff --git a/tools/libxc/xc_msr_x86.c b/tools/libxc/xc_msr_x86.c
new file mode 100644
index 0000000..46e77dd
--- /dev/null
+++ b/tools/libxc/xc_msr_x86.c
@@ -0,0 +1,53 @@
+/*
+ * xc_msr_x86.c
+ *
+ * Generic MSR access API
+ *
+ * Copyright (C) 2014      Intel Corporation
+ * Author Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "xc_private.h"
+
+int xc_msr_op(xc_interface *xch, uint32_t cpu,
+              uint32_t nr_ops, xc_msr_data_t *msr_data)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(msr_data, nr_ops * sizeof(*msr_data),
+                             XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+
+    if ( xc_hypercall_bounce_pre(xch, msr_data) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_msr_op;
+    sysctl.u.msr_op.nr_ops = nr_ops;
+    sysctl.u.msr_op.cpu = cpu;
+    set_xen_guest_handle(sysctl.u.msr_op.msr_data, msr_data);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, msr_data);
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 02129f7..50625ca 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -2424,4 +2424,10 @@ int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
  */
 int xc_kexec_unload(xc_interface *xch, int type);
 
+#if defined(__i386__) || defined(__x86_64__)
+typedef xen_sysctl_msr_data_t xc_msr_data_t;
+int xc_msr_op(xc_interface *xch, uint32_t cpu,
+    uint32_t nr_ops, xc_msr_data_t *msr_data);
+#endif
+
 #endif /* XENCTRL_H */
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring feature
  2014-06-20 14:31 [PATCH v11 0/9] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
                   ` (2 preceding siblings ...)
  2014-06-20 14:31 ` [PATCH v11 3/9] tools: provide interface for generic MSR access Dongxiao Xu
@ 2014-06-20 14:31 ` Dongxiao Xu
  2014-06-20 15:04   ` Jan Beulich
  2014-06-20 17:35   ` Konrad Rzeszutek Wilk
  2014-06-20 14:31 ` [PATCH v11 5/9] x86: dynamically attach/detach QoS monitoring service for a guest Dongxiao Xu
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 38+ messages in thread
From: Dongxiao Xu @ 2014-06-20 14:31 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Detect platform QoS feature status and enumerate the resource types,
one of which is to monitor the L3 cache occupancy.

Also introduce a Xen grub command line parameter to control the
QoS feature status.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 docs/misc/xen-command-line.markdown |   7 +++
 xen/arch/x86/Makefile               |   1 +
 xen/arch/x86/pqos.c                 | 122 ++++++++++++++++++++++++++++++++++++
 xen/arch/x86/setup.c                |   3 +
 xen/include/asm-x86/cpufeature.h    |   1 +
 xen/include/asm-x86/pqos.h          |  48 ++++++++++++++
 6 files changed, 182 insertions(+)
 create mode 100644 xen/arch/x86/pqos.c
 create mode 100644 xen/include/asm-x86/pqos.h

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index a7ac53d..542eee5 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -840,6 +840,13 @@ This option can be specified more than once (up to 8 times at present).
 ### ple\_window
 > `= <integer>`
 
+### pqos (Intel)
+> `= List of ( <boolean> | pqos_monitor:<boolean> | rmid_max:<integer> )`
+
+> Default: `pqos=1,pqos_monitor:1,rmid_max:255`
+
+Configure platform QoS services.
+
 ### reboot
 > `= t[riple] | k[bd] | n[o] [, [w]arm | [c]old]`
 
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index d502bdf..54962e0 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -58,6 +58,7 @@ obj-y += crash.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += xstate.o
+obj-y += pqos.o
 
 obj-$(crash_debug) += gdbstub.o
 
diff --git a/xen/arch/x86/pqos.c b/xen/arch/x86/pqos.c
new file mode 100644
index 0000000..1c431c7
--- /dev/null
+++ b/xen/arch/x86/pqos.c
@@ -0,0 +1,122 @@
+/*
+ * pqos.c: Platform QoS related service for guest.
+ *
+ * Copyright (c) 2014, Intel Corporation
+ * Author: Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <xen/init.h>
+#include <xen/cpu.h>
+#include <asm/pqos.h>
+
+struct pqos_monitor *__read_mostly pqosm = NULL;
+static bool_t __initdata opt_pqos = 1;
+static bool_t __initdata opt_pqos_monitor = 1;
+static unsigned int __initdata opt_rmid_max = 255;
+
+static void __init parse_pqos_param(char *s)
+{
+    char *ss, *val_str;
+    int val;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        val = parse_bool(s);
+        if ( val >= 0 )
+            opt_pqos = val;
+        else
+        {
+            val = !!strncmp(s, "no-", 3);
+            if ( !val )
+                s += 3;
+
+            val_str = strchr(s, ':');
+            if ( val_str )
+                *val_str++ = '\0';
+
+            if ( val_str && !strcmp(s, "pqos_monitor") &&
+                 (val = parse_bool(val_str)) >= 0 )
+                opt_pqos_monitor = val;
+            else if ( val_str && !strcmp(s, "rmid_max") )
+                opt_rmid_max = simple_strtoul(val_str, NULL, 0);
+        }
+
+        s = ss + 1;
+    } while ( ss );
+}
+
+custom_param("pqos", parse_pqos_param);
+static void __init init_pqos_monitor(unsigned int rmid_max)
+{
+    unsigned int eax, ebx, ecx, edx;
+    unsigned int rmid;
+
+    if ( !boot_cpu_has(X86_FEATURE_QOSM) )
+        return;
+
+    cpuid_count(0xf, 0, &eax, &ebx, &ecx, &edx);
+    if ( !edx )
+        return;
+
+    pqosm = xzalloc(struct pqos_monitor);
+    if ( !pqosm )
+        return;
+
+    pqosm->qm_features = edx;
+    pqosm->rmid_mask = ~(~0ull << get_count_order(ebx));
+    pqosm->rmid_inuse = 0;
+    pqosm->rmid_min = 1;
+    pqosm->rmid_max = min(rmid_max, ebx);
+
+    spin_lock_init(&pqosm->pqosm_lock);
+
+    if ( pqosm->qm_features & QOS_MONITOR_TYPE_L3 ) {
+        cpuid_count(0xf, 1, &eax, &ebx, &ecx, &edx);
+        pqosm->l3m.upscaling_factor = ebx;
+        pqosm->l3m.rmid_max = ecx;
+        pqosm->l3m.l3_features = edx;
+    }
+
+    pqosm->rmid_max = min(rmid_max, pqosm->l3m.rmid_max);
+    pqosm->rmid_to_dom = xmalloc_array(domid_t, pqosm->rmid_max + 1);
+    if ( !pqosm->rmid_to_dom ) {
+        xfree(pqosm);
+        return;
+    }
+    /* Reserve RMID 0 for all domains not being monitored */
+    pqosm->rmid_to_dom[0] = DOMID_XEN;
+    for ( rmid = pqosm->rmid_min; rmid <= pqosm->rmid_max; rmid++ )
+        pqosm->rmid_to_dom[rmid] = DOMID_INVALID;
+
+    printk(XENLOG_INFO "Platform QoS Monitoring Enabled.\n");
+}
+
+void __init init_platform_qos(void)
+{
+    if ( !opt_pqos )
+        return;
+
+    if ( opt_pqos_monitor && opt_rmid_max )
+        init_pqos_monitor(opt_rmid_max);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 508649d..e263e5f 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -48,6 +48,7 @@
 #include <asm/setup.h>
 #include <xen/cpu.h>
 #include <asm/nmi.h>
+#include <asm/pqos.h>
 
 /* opt_nosmp: If true, secondary processors are ignored. */
 static bool_t __initdata opt_nosmp;
@@ -1431,6 +1432,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     domain_unpause_by_systemcontroller(dom0);
 
+    init_platform_qos();
+
     reset_stack_and_jump(init_done);
 }
 
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 8014241..eb1dd27 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -148,6 +148,7 @@
 #define X86_FEATURE_ERMS	(7*32+ 9) /* Enhanced REP MOVSB/STOSB */
 #define X86_FEATURE_INVPCID	(7*32+10) /* Invalidate Process Context ID */
 #define X86_FEATURE_RTM 	(7*32+11) /* Restricted Transactional Memory */
+#define X86_FEATURE_QOSM	(7*32+12) /* Platform QoS monitoring capability */
 #define X86_FEATURE_NO_FPU_SEL 	(7*32+13) /* FPU CS/DS stored as zero */
 #define X86_FEATURE_MPX		(7*32+14) /* Memory Protection Extensions */
 #define X86_FEATURE_RDSEED	(7*32+18) /* RDSEED instruction */
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
new file mode 100644
index 0000000..410fa3d
--- /dev/null
+++ b/xen/include/asm-x86/pqos.h
@@ -0,0 +1,48 @@
+/*
+ * pqos.h: Platform QoS related service for guest.
+ *
+ * Copyright (c) 2014, Intel Corporation
+ * Author: Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#ifndef ASM_PQOS_H
+#define ASM_PQOS_H
+
+#include <public/xen.h>
+#include <xen/spinlock.h>
+
+/* QoS Resource Type Enumeration */
+#define QOS_MONITOR_TYPE_L3            0x2
+
+/* L3 Monitoring Features */
+#define L3_FEATURE_OCCUPANCY           0x1
+
+struct pqos_monitor_l3 {
+    unsigned int l3_features;
+    unsigned int upscaling_factor;
+    unsigned int rmid_max;
+};
+
+struct pqos_monitor {
+    unsigned long rmid_mask;
+    unsigned int rmid_min;
+    unsigned int rmid_max;
+    unsigned int rmid_inuse;
+    unsigned int qm_features;
+    domid_t *rmid_to_dom;
+    spinlock_t pqosm_lock;
+    struct pqos_monitor_l3 l3m;
+};
+extern struct pqos_monitor *pqosm;
+
+void __init init_platform_qos(void);
+
+#endif
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 5/9] x86: dynamically attach/detach QoS monitoring service for a guest
  2014-06-20 14:31 [PATCH v11 0/9] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
                   ` (3 preceding siblings ...)
  2014-06-20 14:31 ` [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring feature Dongxiao Xu
@ 2014-06-20 14:31 ` Dongxiao Xu
  2014-06-20 15:08   ` Jan Beulich
  2014-06-20 14:31 ` [PATCH v11 6/9] x86: collect global QoS monitoring information Dongxiao Xu
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Dongxiao Xu @ 2014-06-20 14:31 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Add hypervisor side support for dynamically attach and detach QoS
monitoring services for a certain guest.

When attach Qos monitoring service for a guest, system will allocate an
RMID for it. When detach or guest is shutdown, the RMID will be recycled
for future use.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/domain.c        |  3 +++
 xen/arch/x86/domctl.c        | 29 ++++++++++++++++++++++
 xen/arch/x86/pqos.c          | 59 ++++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h |  2 ++
 xen/include/asm-x86/pqos.h   |  9 +++++++
 xen/include/public/domctl.h  | 12 +++++++++
 6 files changed, 114 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index e896210..f8e0e33 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -60,6 +60,7 @@
 #include <xen/numa.h>
 #include <xen/iommu.h>
 #include <compat/vcpu.h>
+#include <asm/pqos.h>
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
 DEFINE_PER_CPU(unsigned long, cr4);
@@ -636,6 +637,8 @@ void arch_domain_destroy(struct domain *d)
 
     free_xenheap_page(d->shared_info);
     cleanup_domain_irq_mapping(d);
+
+    pqos_monitor_free_rmid(d);
 }
 
 unsigned long pv_guest_cr4_fixup(const struct vcpu *v, unsigned long guest_cr4)
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index f8b0a79..9047dc6 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -35,6 +35,7 @@
 #include <asm/mem_sharing.h>
 #include <asm/xstate.h>
 #include <asm/debugger.h>
+#include <asm/pqos.h>
 
 static int gdbsx_guest_mem_io(
     domid_t domid, struct xen_domctl_gdbsx_memio *iop)
@@ -1342,6 +1343,34 @@ long arch_do_domctl(
     }
     break;
 
+    case XEN_DOMCTL_pqos_monitor_op:
+        if ( !pqos_monitor_enabled() )
+        {
+            ret = -ENODEV;
+            break;
+        }
+
+        switch ( domctl->u.pqos_monitor_op.cmd )
+        {
+        case XEN_DOMCTL_PQOS_MONITOR_OP_ATTACH:
+            ret = pqos_monitor_alloc_rmid(d);
+            break;
+        case XEN_DOMCTL_PQOS_MONITOR_OP_DETACH:
+            if ( d->arch.pqos_rmid > 0 )
+                pqos_monitor_free_rmid(d);
+            else
+                ret = -ENOENT;
+            break;
+        case XEN_DOMCTL_PQOS_MONITOR_OP_QUERY_RMID:
+            domctl->u.pqos_monitor_op.data = d->arch.pqos_rmid;
+            copyback = 1;
+            break;
+        default:
+            ret = -ENOSYS;
+            break;
+        };
+        break;
+
     default:
         ret = iommu_do_domctl(domctl, d, u_domctl);
         break;
diff --git a/xen/arch/x86/pqos.c b/xen/arch/x86/pqos.c
index 1c431c7..c08cb99 100644
--- a/xen/arch/x86/pqos.c
+++ b/xen/arch/x86/pqos.c
@@ -111,6 +111,65 @@ void __init init_platform_qos(void)
         init_pqos_monitor(opt_rmid_max);
 }
 
+int pqos_monitor_alloc_rmid(struct domain *d)
+{
+    int rc = 0;
+    unsigned int rmid;
+
+    ASSERT(pqos_monitor_enabled());
+
+    spin_lock(&pqosm->pqosm_lock);
+
+    if ( d->arch.pqos_rmid > 0 )
+    {
+        rc = -EEXIST;
+        goto out;
+    }
+
+    for ( rmid = pqosm->rmid_min; rmid <= pqosm->rmid_max; rmid++ )
+    {
+        if ( pqosm->rmid_to_dom[rmid] != DOMID_INVALID)
+            continue;
+
+        pqosm->rmid_to_dom[rmid] = d->domain_id;
+        break;
+    }
+
+    /* No RMID available, assign RMID=0 by default */
+    if ( rmid > pqosm->rmid_max )
+    {
+        rmid = 0;
+        rc = -EUSERS;
+    }
+    else
+        pqosm->rmid_inuse++;
+
+    d->arch.pqos_rmid = rmid;
+
+out:
+    spin_unlock(&pqosm->pqosm_lock);
+
+    return rc;
+}
+
+void pqos_monitor_free_rmid(struct domain *d)
+{
+    unsigned int rmid;
+
+    spin_lock(&pqosm->pqosm_lock);
+    rmid = d->arch.pqos_rmid;
+    /* We do not free system reserved "RMID=0" */
+    if ( rmid == 0 )
+        goto out;
+
+    pqosm->rmid_to_dom[rmid] = DOMID_INVALID;
+    d->arch.pqos_rmid = 0;
+    pqosm->rmid_inuse--;
+
+out:
+    spin_unlock(&pqosm->pqosm_lock);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index abf55fb..cd53108 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -313,6 +313,8 @@ struct arch_domain
     /* Shared page for notifying that explicit PIRQ EOI is required. */
     unsigned long *pirq_eoi_map;
     unsigned long pirq_eoi_map_mfn;
+
+    unsigned int pqos_rmid; /* QoS monitoring RMID assigned to the domain */
 } __cacheline_aligned;
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
index 410fa3d..3a3f063 100644
--- a/xen/include/asm-x86/pqos.h
+++ b/xen/include/asm-x86/pqos.h
@@ -15,6 +15,7 @@
  */
 #ifndef ASM_PQOS_H
 #define ASM_PQOS_H
+#include <xen/sched.h>
 
 #include <public/xen.h>
 #include <xen/spinlock.h>
@@ -43,6 +44,14 @@ struct pqos_monitor {
 };
 extern struct pqos_monitor *pqosm;
 
+static inline bool_t pqos_monitor_enabled(void)
+{
+    return !!pqosm;
+}
+
 void __init init_platform_qos(void);
 
+int pqos_monitor_alloc_rmid(struct domain *d);
+void pqos_monitor_free_rmid(struct domain *d);
+
 #endif
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 565fa4c..32c0cb2 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -895,6 +895,16 @@ struct xen_domctl_cacheflush {
 typedef struct xen_domctl_cacheflush xen_domctl_cacheflush_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_cacheflush_t);
 
+struct xen_domctl_pqos_monitor_op {
+#define XEN_DOMCTL_PQOS_MONITOR_OP_DETACH         0
+#define XEN_DOMCTL_PQOS_MONITOR_OP_ATTACH         1
+#define XEN_DOMCTL_PQOS_MONITOR_OP_QUERY_RMID     2
+    uint32_t cmd;
+    uint32_t data;
+};
+typedef struct xen_domctl_pqos_op xen_domctl_pqos_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_pqos_op_t);
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -965,6 +975,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_getnodeaffinity               69
 #define XEN_DOMCTL_set_max_evtchn                70
 #define XEN_DOMCTL_cacheflush                    71
+#define XEN_DOMCTL_pqos_monitor_op               72
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1024,6 +1035,7 @@ struct xen_domctl {
         struct xen_domctl_cacheflush        cacheflush;
         struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
         struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
+        struct xen_domctl_pqos_monitor_op   pqos_monitor_op;
         uint8_t                             pad[128];
     } u;
 };
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 6/9] x86: collect global QoS monitoring information
  2014-06-20 14:31 [PATCH v11 0/9] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
                   ` (4 preceding siblings ...)
  2014-06-20 14:31 ` [PATCH v11 5/9] x86: dynamically attach/detach QoS monitoring service for a guest Dongxiao Xu
@ 2014-06-20 14:31 ` Dongxiao Xu
  2014-06-20 15:16   ` Jan Beulich
  2014-06-20 14:31 ` [PATCH v11 7/9] x86: enable QoS monitoring for each domain RMID Dongxiao Xu
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Dongxiao Xu @ 2014-06-20 14:31 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

This implementation tries to put all policies into user space, thus some
global QoS monitoring information needs to be exposed, such as the total
RMID count, L3 upscaling factor, etc.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/sysctl.c       | 52 +++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h | 14 ++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 49f95e4..ea9b2e4 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -28,6 +28,7 @@
 #include <xen/nodemask.h>
 #include <xen/cpu.h>
 #include <xsm/xsm.h>
+#include <asm/pqos.h>
 
 #define get_xen_guest_handle(val, hnd)  do { val = (hnd).p; } while (0)
 
@@ -152,6 +153,57 @@ long arch_do_sysctl(
     }
     break;
 
+    case XEN_SYSCTL_pqos_monitor_op:
+        if ( !pqos_monitor_enabled() )
+            return -ENODEV;
+
+        switch ( sysctl->u.pqos_monitor_op.cmd )
+        {
+        case XEN_SYSCTL_PQOS_MONITOR_cqm_enabled:
+            sysctl->u.pqos_monitor_op.data =
+                (pqosm->qm_features & QOS_MONITOR_TYPE_L3) &&
+                (pqosm->l3m.l3_features & L3_FEATURE_OCCUPANCY);
+            break;
+        case XEN_SYSCTL_PQOS_MONITOR_get_total_rmid:
+            sysctl->u.pqos_monitor_op.data =
+                pqosm->rmid_max - pqosm->rmid_min + 1;
+            break;
+        case XEN_SYSCTL_PQOS_MONITOR_get_l3_upscaling_factor:
+            sysctl->u.pqos_monitor_op.data = pqosm->l3m.upscaling_factor;
+            break;
+        case XEN_SYSCTL_PQOS_MONITOR_get_l3_cache_size:
+            sysctl->u.pqos_monitor_op.data = boot_cpu_data.x86_cache_size;
+            break;
+        case XEN_SYSCTL_PQOS_MONITOR_get_socket_cpu:
+        {
+            unsigned int i, cpu;
+            int socket = sysctl->u.pqos_monitor_op.data;
+
+            for ( i = 0; i < NR_CPUS; i++ )
+            {
+                if ( cpu_to_socket(i) < 0 || cpu_to_socket(i) != socket )
+                    continue;
+                cpu = cpumask_any(per_cpu(cpu_core_mask, i));
+                if ( cpu < nr_cpu_ids )
+                {
+                    sysctl->u.pqos_monitor_op.data = cpu;
+                    break;
+                }
+            }
+
+            if ( i == NR_CPUS )
+                ret = -EFAULT;
+        }
+        break;
+
+        default:
+            sysctl->u.pqos_monitor_op.data = 0;
+            ret = -ENOSYS;
+            break;
+        }
+        copyback = 1;
+    break;
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index fd042bb..a1c3ae5 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -659,6 +659,18 @@ struct msr_access_info {
     xen_sysctl_msr_data_t *msr_data;
 };
 
+#define XEN_SYSCTL_PQOS_MONITOR_get_total_rmid            0
+#define XEN_SYSCTL_PQOS_MONITOR_get_l3_upscaling_factor   1
+#define XEN_SYSCTL_PQOS_MONITOR_get_l3_cache_size         2
+#define XEN_SYSCTL_PQOS_MONITOR_get_socket_cpu            3
+#define XEN_SYSCTL_PQOS_MONITOR_cqm_enabled               4
+struct xen_sysctl_pqos_monitor_op {
+    uint32_t cmd;
+    uint64_t data;
+};
+typedef struct xen_sysctl_pqos_monitor_op xen_sysctl_pqos_monitor_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pqos_monitor_op_t);
+
 
 struct xen_sysctl {
     uint32_t cmd;
@@ -682,6 +694,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_scheduler_op                  19
 #define XEN_SYSCTL_coverage_op                   20
 #define XEN_SYSCTL_msr_op                        21
+#define XEN_SYSCTL_pqos_monitor_op               22
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -704,6 +717,7 @@ struct xen_sysctl {
         struct xen_sysctl_scheduler_op      scheduler_op;
         struct xen_sysctl_coverage_op       coverage_op;
         struct xen_sysctl_msr_op            msr_op;
+        struct xen_sysctl_pqos_monitor_op   pqos_monitor_op;
         uint8_t                             pad[128];
     } u;
 };
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 7/9] x86: enable QoS monitoring for each domain RMID
  2014-06-20 14:31 [PATCH v11 0/9] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
                   ` (5 preceding siblings ...)
  2014-06-20 14:31 ` [PATCH v11 6/9] x86: collect global QoS monitoring information Dongxiao Xu
@ 2014-06-20 14:31 ` Dongxiao Xu
  2014-06-20 15:20   ` Jan Beulich
  2014-06-20 14:31 ` [PATCH v11 8/9] xsm: add platform QoS related xsm policies Dongxiao Xu
  2014-06-20 14:31 ` [PATCH v11 9/9] tools: CMDs and APIs for Platform QoS Monitoring Dongxiao Xu
  8 siblings, 1 reply; 38+ messages in thread
From: Dongxiao Xu @ 2014-06-20 14:31 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

If the QoS monitoring service is attached to a domain, its related RMID
will be set to hardware for monitoring when the domain's vcpu is
scheduled in. When the domain's vcpu is scheduled out, RMID 0
(system reserved) will be set for monitoring.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/domain.c           |  5 +++++
 xen/arch/x86/pqos.c             | 21 +++++++++++++++++++++
 xen/include/asm-x86/msr-index.h |  3 +++
 xen/include/asm-x86/pqos.h      |  6 ++++++
 4 files changed, 35 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f8e0e33..bb9a4d2 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1398,6 +1398,8 @@ static void __context_switch(void)
     {
         memcpy(&p->arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
         vcpu_save_fpu(p);
+        if ( pqos_monitor_enabled() && pqosm->rmid_inuse )
+            pqos_monitor_assoc_rmid(0);
         p->arch.ctxt_switch_from(p);
     }
 
@@ -1422,6 +1424,9 @@ static void __context_switch(void)
         }
         vcpu_restore_fpu_eager(n);
         n->arch.ctxt_switch_to(n);
+
+        if ( pqos_monitor_enabled() && n->domain->arch.pqos_rmid > 0 )
+            pqos_monitor_assoc_rmid(n->domain->arch.pqos_rmid);
     }
 
     gdt = !is_pv_32on64_vcpu(n) ? per_cpu(gdt_table, cpu) :
diff --git a/xen/arch/x86/pqos.c b/xen/arch/x86/pqos.c
index c08cb99..c9d7499 100644
--- a/xen/arch/x86/pqos.c
+++ b/xen/arch/x86/pqos.c
@@ -21,6 +21,7 @@ struct pqos_monitor *__read_mostly pqosm = NULL;
 static bool_t __initdata opt_pqos = 1;
 static bool_t __initdata opt_pqos_monitor = 1;
 static unsigned int __initdata opt_rmid_max = 255;
+static DEFINE_PER_CPU(struct pqr_assoc, pqr_assoc);
 
 static void __init parse_pqos_param(char *s)
 {
@@ -170,6 +171,26 @@ out:
     spin_unlock(&pqosm->pqosm_lock);
 }
 
+void pqos_monitor_assoc_rmid(unsigned int rmid)
+{
+    uint64_t val;
+    uint64_t new_val;
+
+    if ( !this_cpu(pqr_assoc).initialized )
+    {
+        rdmsrl(MSR_IA32_PQR_ASSOC, this_cpu(pqr_assoc).val);
+        this_cpu(pqr_assoc).initialized = 1;
+    }
+    val = this_cpu(pqr_assoc).val;
+
+    new_val = (val & ~pqosm->rmid_mask) | (rmid & pqosm->rmid_mask);
+    if ( val != new_val )
+    {
+        wrmsrl(MSR_IA32_PQR_ASSOC, new_val);
+        this_cpu(pqr_assoc).val = new_val;
+    }
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 70a8201..f894dfa 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -489,4 +489,7 @@
 /* Geode defined MSRs */
 #define MSR_GEODE_BUSCONT_CONF0		0x00001900
 
+/* Platform QoS MSRs */
+#define MSR_IA32_PQR_ASSOC             0x00000c8f
+
 #endif /* __ASM_MSR_INDEX_H */
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
index 3a3f063..19f9bb3 100644
--- a/xen/include/asm-x86/pqos.h
+++ b/xen/include/asm-x86/pqos.h
@@ -44,6 +44,11 @@ struct pqos_monitor {
 };
 extern struct pqos_monitor *pqosm;
 
+struct pqr_assoc {
+    uint64_t val;
+    bool_t initialized;
+};
+
 static inline bool_t pqos_monitor_enabled(void)
 {
     return !!pqosm;
@@ -53,5 +58,6 @@ void __init init_platform_qos(void);
 
 int pqos_monitor_alloc_rmid(struct domain *d);
 void pqos_monitor_free_rmid(struct domain *d);
+void pqos_monitor_assoc_rmid(unsigned int rmid);
 
 #endif
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 8/9] xsm: add platform QoS related xsm policies
  2014-06-20 14:31 [PATCH v11 0/9] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
                   ` (6 preceding siblings ...)
  2014-06-20 14:31 ` [PATCH v11 7/9] x86: enable QoS monitoring for each domain RMID Dongxiao Xu
@ 2014-06-20 14:31 ` Dongxiao Xu
  2014-06-24 19:24   ` Daniel De Graaf
  2014-06-20 14:31 ` [PATCH v11 9/9] tools: CMDs and APIs for Platform QoS Monitoring Dongxiao Xu
  8 siblings, 1 reply; 38+ messages in thread
From: Dongxiao Xu @ 2014-06-20 14:31 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Add xsm policies for QoS monitoring related hypercalls.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 tools/flask/policy/policy/modules/xen/xen.if | 2 +-
 tools/flask/policy/policy/modules/xen/xen.te | 3 ++-
 xen/xsm/flask/hooks.c                        | 4 ++++
 xen/xsm/flask/policy/access_vectors          | 4 ++++
 4 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if
index dedc035..5e7042c 100644
--- a/tools/flask/policy/policy/modules/xen/xen.if
+++ b/tools/flask/policy/policy/modules/xen/xen.if
@@ -49,7 +49,7 @@ define(`create_domain_common', `
 			getdomaininfo hypercall setvcpucontext setextvcpucontext
 			getscheduler getvcpuinfo getvcpuextstate getaddrsize
 			getaffinity setaffinity };
-	allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim  set_max_evtchn };
+	allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim set_max_evtchn pqos_monitor_op };
 	allow $1 $2:security check_context;
 	allow $1 $2:shadow enable;
 	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op };
diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 0e63e76..41d1995 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -66,6 +66,7 @@ allow dom0_t xen_t:xen {
 };
 allow dom0_t xen_t:xen2 {
     msr_op
+    pqos_monitor_op
 };
 allow dom0_t xen_t:mmu memorymap;
 
@@ -79,7 +80,7 @@ allow dom0_t dom0_t:domain {
 	getpodtarget setpodtarget set_misc_info set_virq_handler
 };
 allow dom0_t dom0_t:domain2 {
-	set_cpuid gettsc settsc setscheduler set_max_evtchn
+	set_cpuid gettsc settsc setscheduler set_max_evtchn pqos_monitor_op
 };
 allow dom0_t dom0_t:resource { add remove };
 
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 277a5de..97dc1a3 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -773,6 +773,10 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_msr_op:
         return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
                                     XEN2__MSR_OP, NULL);
+                                    
+    case XEN_SYSCTL_pqos_monitor_op:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__PQOS_MONITOR_OP, NULL);
 
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 82b5484..d573eea 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -81,6 +81,8 @@ class xen2
 {
 # XEN_SYSCTL_msr_op
     msr_op
+# XEN_SYSCTL_pqos_monitor_op
+    pqos_monitor_op
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
@@ -208,6 +210,8 @@ class domain2
     cacheflush
 # Creation of the hardware domain when it is not dom0
     create_hardware_domain
+# XEN_DOMCTL_pqos_monitor_op
+    pqos_monitor_op
 }
 
 # Similar to class domain, but primarily contains domctls related to HVM domains
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 9/9] tools: CMDs and APIs for Platform QoS Monitoring
  2014-06-20 14:31 [PATCH v11 0/9] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
                   ` (7 preceding siblings ...)
  2014-06-20 14:31 ` [PATCH v11 8/9] xsm: add platform QoS related xsm policies Dongxiao Xu
@ 2014-06-20 14:31 ` Dongxiao Xu
  2014-06-23 15:22   ` Ian Jackson
  8 siblings, 1 reply; 38+ messages in thread
From: Dongxiao Xu @ 2014-06-20 14:31 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Introduced some new xl commands to handle the platform QoS monitoring
related services.

The following two commands is to attach/detach the QoS monitoring
service to/from a certain domain.

$ xl pqos-monitor-attach domid
$ xl pqos-monitor-detach domid

This command is to display the QoS monitoring information, such as CQM,
to show the L3 cache occupancy.

$ xl pqos-monitor-show cqm <domid>

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 docs/man/xl.pod.1           |  21 +++++
 tools/libxc/Makefile        |   1 +
 tools/libxc/xc_msr_x86.h    |  36 ++++++++
 tools/libxc/xc_pqos.c       | 217 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h       |  18 ++++
 tools/libxl/Makefile        |   2 +-
 tools/libxl/libxl.h         |  21 +++++
 tools/libxl/libxl_pqos.c    | 171 ++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_types.idl |   4 +
 tools/libxl/xl.h            |   5 +
 tools/libxl/xl_cmdimpl.c    | 130 ++++++++++++++++++++++++++
 tools/libxl/xl_cmdtable.c   |  17 ++++
 12 files changed, 642 insertions(+), 1 deletion(-)
 create mode 100644 tools/libxc/xc_msr_x86.h
 create mode 100644 tools/libxc/xc_pqos.c
 create mode 100644 tools/libxl/libxl_pqos.c

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 30bd4bf..de7f0a8 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1366,6 +1366,27 @@ Load FLASK policy from the given policy file. The initial policy is provided to
 the hypervisor as a multiboot module; this command allows runtime updates to the
 policy. Loading new security policy will reset runtime changes to device labels.
 
+=head1 PLATFORM QOS MONITORING
+
+New Intel processor may offer monitoring capability in each logical processor to
+measure specific quality-of-service metric.
+
+=over 4
+
+=item B<pqos-monitor-attach> [I<domain-id>]
+
+attach: Attach the platform QoS monitoring service to a domain.
+
+=item B<pqos-monitor-detach> [I<domain-id>]
+
+detach: Detach the platform QoS monitoring service from a domain.
+
+=item B<pqos-monitor-show> [I<qos-monitor-type>] [I<domain-id>]
+
+Show monitoring data for a certain domain or all domains. Current supported
+QoS monitor types are:
+ - "cqm": cache QoS monitoring, showing the L3 cache occupancy.
+
 =back
 
 =head1 TO BE DOCUMENTED
diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index fec4aca..75b84b4 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -36,6 +36,7 @@ CTRL_SRCS-y       += xtl_core.c
 CTRL_SRCS-y       += xtl_logger_stdio.c
 CTRL_SRCS-$(CONFIG_X86) += xc_pagetab.c
 CTRL_SRCS-$(CONFIG_X86) += xc_msr_x86.c
+CTRL_SRCS-$(CONFIG_X86) += xc_pqos.c
 CTRL_SRCS-$(CONFIG_Linux) += xc_linux.c xc_linux_osdep.c
 CTRL_SRCS-$(CONFIG_SunOS) += xc_solaris.c
 CTRL_SRCS-$(CONFIG_NetBSD) += xc_netbsd.c
diff --git a/tools/libxc/xc_msr_x86.h b/tools/libxc/xc_msr_x86.h
new file mode 100644
index 0000000..1e0ee99
--- /dev/null
+++ b/tools/libxc/xc_msr_x86.h
@@ -0,0 +1,36 @@
+/*
+ * xc_msr_x86.h
+ *
+ * MSR definition macros
+ *
+ * Copyright (C) 2014      Intel Corporation
+ * Author Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#ifndef XC_MSR_X86_H
+#define XC_MSR_X86_H
+
+#define MSR_IA32_QOSEVTSEL      0x00000c8d
+#define MSR_IA32_QMC            0x00000c8e
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xc_pqos.c b/tools/libxc/xc_pqos.c
new file mode 100644
index 0000000..6090b3c
--- /dev/null
+++ b/tools/libxc/xc_pqos.c
@@ -0,0 +1,217 @@
+/*
+ * xc_pqos.c
+ *
+ * platform QoS related API functions.
+ *
+ * Copyright (C) 2014      Intel Corporation
+ * Author Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "xc_private.h"
+#include "xc_msr_x86.h"
+
+#define IA32_QM_CTR_ERROR_MASK         (0x3ull << 62)
+
+#define EVTID_L3_OCCUPANCY             0x1
+
+int xc_pqos_monitor_attach(xc_interface *xch, uint32_t domid)
+{
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_pqos_monitor_op;
+    domctl.domain = (domid_t)domid;
+    domctl.u.pqos_monitor_op.cmd = XEN_DOMCTL_PQOS_MONITOR_OP_ATTACH;
+
+    return do_domctl(xch, &domctl);
+}
+
+int xc_pqos_monitor_detach(xc_interface *xch, uint32_t domid)
+{
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_pqos_monitor_op;
+    domctl.domain = (domid_t)domid;
+    domctl.u.pqos_monitor_op.cmd = XEN_DOMCTL_PQOS_MONITOR_OP_DETACH;
+
+    return do_domctl(xch, &domctl);
+}
+
+int xc_pqos_monitor_get_domain_rmid(xc_interface *xch, uint32_t domid,
+                                    uint32_t *rmid)
+{
+    int rc;
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_pqos_monitor_op;
+    domctl.domain = (domid_t)domid;
+    domctl.u.pqos_monitor_op.cmd = XEN_DOMCTL_PQOS_MONITOR_OP_QUERY_RMID;
+
+    rc = do_domctl(xch, &domctl);
+
+    *rmid = domctl.u.pqos_monitor_op.data;
+
+    return rc;
+}
+
+int xc_pqos_monitor_get_total_rmid(xc_interface *xch, uint32_t *total_rmid)
+{
+    static int val = 0;
+    int rc;
+    DECLARE_SYSCTL;
+
+    if ( val )
+    {
+        *total_rmid = val;
+        return 0;
+    }
+
+    sysctl.cmd = XEN_SYSCTL_pqos_monitor_op;
+    sysctl.u.pqos_monitor_op.cmd = XEN_SYSCTL_PQOS_MONITOR_get_total_rmid;
+
+    rc = xc_sysctl(xch, &sysctl);
+    if ( !rc )
+        val = *total_rmid = sysctl.u.pqos_monitor_op.data;
+
+    return rc;
+}
+
+int xc_pqos_monitor_get_l3_upscaling_factor(xc_interface *xch,
+                                            uint32_t *upscaling_factor)
+{
+    static int val = 0;
+    int rc;
+    DECLARE_SYSCTL;
+
+    if ( val )
+    {
+        *upscaling_factor = val;
+        return 0;
+    }
+
+    sysctl.cmd = XEN_SYSCTL_pqos_monitor_op;
+    sysctl.u.pqos_monitor_op.cmd =
+        XEN_SYSCTL_PQOS_MONITOR_get_l3_upscaling_factor;
+
+    rc = xc_sysctl(xch, &sysctl);
+    if ( !rc )
+        val = *upscaling_factor = sysctl.u.pqos_monitor_op.data;
+
+    return rc;
+}
+
+int xc_pqos_monitor_get_l3_cache_size(xc_interface *xch,
+                                      uint32_t *l3_cache_size)
+{
+    static int val = 0;
+    int rc;
+    DECLARE_SYSCTL;
+
+    if ( val )
+    {
+        *l3_cache_size = val;
+        return 0;
+    }
+
+    sysctl.cmd = XEN_SYSCTL_pqos_monitor_op;
+    sysctl.u.pqos_monitor_op.cmd =
+        XEN_SYSCTL_PQOS_MONITOR_get_l3_cache_size;
+
+    rc = xc_sysctl(xch, &sysctl);
+    if ( !rc )
+        val = *l3_cache_size= sysctl.u.pqos_monitor_op.data;
+
+    return rc;
+}
+
+int xc_pqos_monitor_select_socket_cpu(xc_interface *xch, uint32_t socket)
+{
+    int rc;
+    DECLARE_SYSCTL;
+
+    sysctl.cmd = XEN_SYSCTL_pqos_monitor_op;
+    sysctl.u.pqos_monitor_op.cmd = XEN_SYSCTL_PQOS_MONITOR_get_socket_cpu;
+    sysctl.u.pqos_monitor_op.data = socket;
+
+    rc = do_sysctl(xch, &sysctl);
+    if ( !rc )
+        return sysctl.u.pqos_monitor_op.data;
+
+    return -1;
+}
+
+int xc_pqos_monitor_get_data(xc_interface *xch, uint32_t rmid,
+    uint32_t cpu, xc_pqos_monitor_type type, uint64_t *monitor_data)
+{
+    xc_msr_data_t msr_data[2];
+    uint32_t evtid;
+    int rc;
+
+    switch ( type )
+    {
+    case XC_PQOS_MONITOR_L3_OCCUPANCY:
+        evtid = EVTID_L3_OCCUPANCY;
+        break;
+    default:
+        return -1;
+    }
+
+    msr_data[0].cmd = XEN_SYSCTL_MSR_write;
+    msr_data[0].msr = MSR_IA32_QOSEVTSEL;
+    msr_data[0].val = (uint64_t)rmid << 32 | evtid;
+    msr_data[1].cmd = XEN_SYSCTL_MSR_read;
+    msr_data[1].msr = MSR_IA32_QMC;
+    msr_data[1].val = 0;
+
+    rc = xc_msr_op(xch, cpu, 2, msr_data);
+    if ( rc )
+        return rc;
+
+    if ( msr_data[1].val & IA32_QM_CTR_ERROR_MASK )
+        return -1;
+
+    *monitor_data = msr_data[1].val;
+
+    return 0;
+}
+
+int xc_pqos_monitor_cqm_enabled(xc_interface *xch)
+{
+    static int val = -1;
+    int rc;
+    DECLARE_SYSCTL;
+
+    if ( val >= 0 )
+        return val;
+
+    sysctl.cmd = XEN_SYSCTL_pqos_monitor_op;
+    sysctl.u.pqos_monitor_op.cmd = XEN_SYSCTL_PQOS_MONITOR_cqm_enabled;
+
+    rc = do_sysctl(xch, &sysctl);
+    if ( !rc )
+    {
+        val = sysctl.u.pqos_monitor_op.data;
+        return val;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 50625ca..3146a19 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -2428,6 +2428,24 @@ int xc_kexec_unload(xc_interface *xch, int type);
 typedef xen_sysctl_msr_data_t xc_msr_data_t;
 int xc_msr_op(xc_interface *xch, uint32_t cpu,
     uint32_t nr_ops, xc_msr_data_t *msr_data);
+
+enum xc_pqos_monitor_type {
+    XC_PQOS_MONITOR_L3_OCCUPANCY,
+};
+typedef enum xc_pqos_monitor_type xc_pqos_monitor_type;
+int xc_pqos_monitor_attach(xc_interface *xch, uint32_t domid);
+int xc_pqos_monitor_detach(xc_interface *xch, uint32_t domid);
+int xc_pqos_monitor_get_domain_rmid(xc_interface *xch, uint32_t domid,
+    uint32_t *rmid);
+int xc_pqos_monitor_get_total_rmid(xc_interface *xch, uint32_t *total_rmid);
+int xc_pqos_monitor_get_l3_upscaling_factor(xc_interface *xch,
+    uint32_t *upscaling_factor);
+int xc_pqos_monitor_get_l3_cache_size(xc_interface *xch,
+    uint32_t *l3_cache_size);
+int xc_pqos_monitor_select_socket_cpu(xc_interface *xch, uint32_t socket);
+int xc_pqos_monitor_get_data(xc_interface *xch, uint32_t rmid,
+    uint32_t cpu, uint32_t pqos_monitor_type, uint64_t *monitor_data);
+int xc_pqos_monitor_cqm_enabled(xc_interface *xch);
 #endif
 
 #endif /* XENCTRL_H */
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 4cfa275..af6effe 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -43,7 +43,7 @@ LIBXL_OBJS-y += libxl_blktap2.o
 else
 LIBXL_OBJS-y += libxl_noblktap2.o
 endif
-LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
+LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_pqos.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
 ifeq ($(CONFIG_NetBSD),y)
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 80947c3..a8c80a4 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -494,6 +494,15 @@ typedef uint8_t libxl_mac[6];
 #define LIBXL_MAC_FMTLEN ((2*6)+5) /* 6 hex bytes plus 5 colons */
 #define LIBXL_MAC_BYTES(mac) mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]
 
+#if defined(__i386__) || defined(__x86_64__)
+/*
+ * LIBXL_HAVE_PQOS_MONITOR_CQM
+ *
+ * If this is defined, the cache QoS monitoring feature is suported.
+ */
+#define LIBXL_HAVE_PQOS_MONITOR_CQM 1
+#endif
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
@@ -1178,6 +1187,18 @@ int libxl_flask_getenforce(libxl_ctx *ctx);
 int libxl_flask_setenforce(libxl_ctx *ctx, int mode);
 int libxl_flask_loadpolicy(libxl_ctx *ctx, void *policy, uint32_t size);
 
+#if defined(__i386__) || defined(__x86_64__)
+int libxl_pqos_monitor_attach(libxl_ctx *ctx, uint32_t domid);
+int libxl_pqos_monitor_detach(libxl_ctx *ctx, uint32_t domid);
+int libxl_pqos_monitor_domain_attached(libxl_ctx *ctx, uint32_t domid);
+int libxl_pqos_monitor_cqm_enabled(libxl_ctx *ctx);
+int libxl_pqos_monitor_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid);
+int libxl_pqos_monitor_get_l3_cache_size(libxl_ctx *ctx,
+    uint32_t *l3_cache_size);
+int libxl_pqos_monitor_get_domain_cqm(libxl_ctx *ctx, uint32_t domid,
+    uint32_t socketid, uint64_t *l3_cache_occupancy);
+#endif
+
 /* misc */
 
 /* Each of these sets or clears the flag according to whether the
diff --git a/tools/libxl/libxl_pqos.c b/tools/libxl/libxl_pqos.c
new file mode 100644
index 0000000..117c5e5
--- /dev/null
+++ b/tools/libxl/libxl_pqos.c
@@ -0,0 +1,171 @@
+/*
+ * Copyright (C) 2014      Intel Corporation
+ * Author Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+#include "libxl_internal.h"
+
+
+#define IA32_QM_CTR_ERROR_MASK         (0x3ul << 62)
+
+static const char * const msg[] = {
+    [ENOSYS] = "unsupported operation",
+    [ENODEV] = "Platform QoS Monitoring is not supported in this system",
+    [EEXIST] = "Platform QoS Monitoring is already attached to this domain",
+    [ENOENT] = "Platform QoS Monitoring is not attached to this domain",
+    [EUSERS] = "there is no free RMID available",
+    [ESRCH]  = "is this Domain ID valid?",
+    [EFAULT] = "cannot find a valid CPU belonging to this socket",
+};
+
+static void libxl_pqos_monitor_err_msg(libxl_ctx *ctx, int err)
+{
+    GC_INIT(ctx);
+
+    switch (err) {
+    case EINVAL:
+    case ENODEV:
+    case EEXIST:
+    case EUSERS:
+    case ESRCH:
+    case ENOENT:
+        LOGE(ERROR, "%s", msg[err]);
+        break;
+    default:
+        LOGE(ERROR, "errno: %d", err);
+    }
+
+    GC_FREE;
+}
+
+int libxl_pqos_monitor_attach(libxl_ctx *ctx, uint32_t domid)
+{
+    int rc;
+
+    rc = xc_pqos_monitor_attach(ctx->xch, domid);
+    if (rc < 0) {
+        libxl_pqos_monitor_err_msg(ctx, errno);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+int libxl_pqos_monitor_detach(libxl_ctx *ctx, uint32_t domid)
+{
+    int rc;
+
+    rc = xc_pqos_monitor_detach(ctx->xch, domid);
+    if (rc < 0) {
+        libxl_pqos_monitor_err_msg(ctx, errno);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+int libxl_pqos_monitor_domain_attached(libxl_ctx *ctx, uint32_t domid)
+{
+    int rc;
+    uint32_t rmid;
+
+    rc = xc_pqos_monitor_get_domain_rmid(ctx->xch, domid, &rmid);
+    if (rc < 0)
+        return 0;
+
+    return !!rmid;
+}
+
+int libxl_pqos_monitor_cqm_enabled(libxl_ctx *ctx)
+{
+    return xc_pqos_monitor_cqm_enabled(ctx->xch);
+}
+
+int libxl_pqos_monitor_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid)
+{
+    int rc;
+
+    rc = xc_pqos_monitor_get_total_rmid(ctx->xch, total_rmid);
+    if (rc < 0) {
+        libxl_pqos_monitor_err_msg(ctx, errno);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+int libxl_pqos_monitor_get_l3_cache_size(libxl_ctx *ctx,
+                                         uint32_t *l3_cache_size)
+{
+    int rc;
+
+    rc = xc_pqos_monitor_get_l3_cache_size(ctx->xch, l3_cache_size);
+    if (rc < 0) {
+        libxl_pqos_monitor_err_msg(ctx, errno);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+int libxl_pqos_monitor_get_domain_cqm(libxl_ctx *ctx, uint32_t domid,
+    uint32_t socketid, uint64_t *l3_cache_occupancy)
+{
+    GC_INIT(ctx);
+
+    unsigned int rmid;
+    uint32_t upscaling_factor;
+    uint64_t monitor_data;
+    int cpu, rc;
+    xc_pqos_monitor_type type;
+
+    rc = xc_pqos_monitor_get_domain_rmid(ctx->xch, domid, &rmid);
+    if (rc < 0 || rmid == 0) {
+        LOGE(ERROR, "fail to get the domain rmid, "
+            "or domain is not attached with platform QoS monitoring service");
+        return -1;
+    }
+
+    cpu = xc_pqos_monitor_select_socket_cpu(ctx->xch, socketid);
+    if (cpu < 0) {
+        LOGE(ERROR, "fail to select a cpu from socket %d", socketid);
+        return -1;
+    }
+
+    type = XC_PQOS_MONITOR_L3_OCCUPANCY;
+    rc = xc_pqos_monitor_get_data(ctx->xch, rmid, cpu, type, &monitor_data);
+    if (rc < 0) {
+        LOGE(ERROR, "failed to get monitoring data");
+        return -1;
+    }
+
+    rc = xc_pqos_monitor_get_l3_upscaling_factor(ctx->xch, &upscaling_factor);
+    if (rc < 0) {
+        LOGE(ERROR, "failed to get L3 upscaling factor");
+        return -1;
+    }
+
+    *l3_cache_occupancy = upscaling_factor * monitor_data;
+
+    GC_FREE;
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 52f1aa9..506987d 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -600,3 +600,7 @@ libxl_event = Struct("event",[
                                  ])),
            ("domain_create_console_available", None),
            ]))])
+
+libxl_pqos_monitor_type = Enumeration("pqos_monitor_type", [
+    (1, "CQM"),
+    ])
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 10a2e66..438d020 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -110,6 +110,11 @@ int main_loadpolicy(int argc, char **argv);
 int main_remus(int argc, char **argv);
 #endif
 int main_devd(int argc, char **argv);
+#if defined(__i386__) || defined(__x86_64__)
+int main_pqos_monitor_attach(int argc, char **argv);
+int main_pqos_monitor_detach(int argc, char **argv);
+int main_pqos_monitor_show(int argc, char **argv);
+#endif
 
 void help(const char *command);
 
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 5195914..4c42031 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7386,6 +7386,136 @@ out:
     return ret;
 }
 
+static int pqos_monitor_show_cqm(uint32_t first_domain, uint32_t nr_domains,
+                                 int ignore_domain_err)
+{
+    uint32_t domid, socketid, nr_sockets, total_rmid, l3_cache_size;
+    uint64_t l3_cache_occupancy;
+    libxl_physinfo info;
+    char *domain_name;
+    int rc, print_header;
+
+    if (!libxl_pqos_monitor_cqm_enabled(ctx)) {
+        printf("CQM is not supported in the system\n");
+        return -1;
+    }
+
+    rc = libxl_get_physinfo(ctx, &info);
+    if (rc < 0) {
+        printf("failed to get system socket number\n");
+        return -1;
+    }
+    nr_sockets = info.nr_cpus / info.threads_per_core / info.cores_per_socket;
+
+    rc = libxl_pqos_monitor_get_total_rmid(ctx, &total_rmid);
+    if (rc < 0) {
+        printf("failed to get system total rmid number\n");
+        return -1;
+    }
+
+    rc = libxl_pqos_monitor_get_l3_cache_size(ctx, &l3_cache_size);
+    if (rc < 0) {
+        printf("failed to get system l3 cache size\n");
+        return -1;
+    }
+
+    printf("Total RMID: %d\n", total_rmid);
+    printf("Per-Socket L3 Cache Size: %d KB\n", l3_cache_size);
+
+    print_header = 1;
+    for (domid = first_domain; domid < first_domain + nr_domains; domid++) {
+        if (!libxl_pqos_monitor_domain_attached(ctx, domid)) {
+            if (!ignore_domain_err)
+                printf("No CQM info for Domain %d.\n", domid);
+            continue;
+        }
+        if (print_header) {
+            printf("Name                                        ID");
+            for (socketid = 0; socketid < nr_sockets; socketid++)
+                printf("\tSocketID\tL3C_Usage");
+            printf("\n");
+            print_header = 0;
+        }
+        domain_name = libxl_domid_to_name(ctx, domid);
+        printf("%-40s %5d", domain_name, domid);
+        free(domain_name);
+        for (socketid = 0; socketid < nr_sockets; socketid++) {
+            rc = libxl_pqos_monitor_get_domain_cqm(ctx, domid, socketid,
+                                                   &l3_cache_occupancy);
+            printf("%10u %13lu KB     ", socketid, l3_cache_occupancy/1024);
+        }
+        printf("\n");
+    }
+
+    return 0;
+}
+
+int main_pqos_monitor_attach(int argc, char **argv)
+{
+    uint32_t domid;
+    int opt, ret = 0;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-monitor-attach", 1) {
+        /* No options */
+    }
+
+    domid = find_domain(argv[optind]);
+    ret = libxl_pqos_monitor_attach(ctx, domid);
+
+    return ret;
+}
+
+int main_pqos_monitor_detach(int argc, char **argv)
+{
+    uint32_t domid;
+    int opt, ret = 0;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-monitor-detach", 1) {
+        /* No options */
+    }
+
+    domid = find_domain(argv[optind]);
+    ret = libxl_pqos_monitor_detach(ctx, domid);
+
+    return ret;
+}
+
+int main_pqos_monitor_show(int argc, char **argv)
+{
+    int opt, ret = 0;
+    uint32_t first_domain, nr_domains;
+    libxl_pqos_monitor_type type;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-monitor-show", 1) {
+        /* No options */
+    }
+
+    libxl_pqos_monitor_type_from_string(argv[optind], &type);
+
+    if (optind + 1 >= argc) {
+        first_domain = 0;
+        nr_domains = 1024;
+    } else if (optind + 1 == argc - 1){
+        first_domain = find_domain(argv[optind + 1]);
+        nr_domains = 1;
+    } else {
+        help("pqos-monitor-show");
+        return 2;
+    }
+
+    switch (type) {
+    case LIBXL_PQOS_MONITOR_TYPE_CQM:
+        ret = pqos_monitor_show_cqm(first_domain, nr_domains,
+                                    nr_domains == 1024);
+        break;
+    default:
+        help("pqos-monitor-show");
+        return 2;
+    }
+
+    return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 4279b9f..be2a4a7 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -500,6 +500,23 @@ struct cmd_spec cmd_table[] = {
       "[options]",
       "-F                      Run in the foreground",
     },
+    { "pqos-monitor-attach",
+      &main_pqos_monitor_attach, 0, 1,
+      "Attach platform QoS monitoring to a domain",
+      "<Domain>",
+    },
+    { "pqos-monitor-detach",
+      &main_pqos_monitor_detach, 0, 1,
+      "Detach platform QoS monitoring from a domain",
+      "<Domain>",
+    },
+    { "pqos-monitor-show",
+      &main_pqos_monitor_show, 0, 1,
+      "Show platform QoS monitoring information, current supported QoS monitor "
+      "types are:\n"
+      "\"cqm\": cache QoS monitoring, showing the L3 cache occupancy",
+      "<QoS-Monitor-Type> <Domain>",
+    },
 };
 
 int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
  2014-06-20 14:31 ` [PATCH v11 1/9] x86: add generic MSR access hypercall Dongxiao Xu
@ 2014-06-20 14:57   ` Andrew Cooper
  2014-06-23  6:34     ` Xu, Dongxiao
  2014-06-20 15:00   ` Jan Beulich
  1 sibling, 1 reply; 38+ messages in thread
From: Andrew Cooper @ 2014-06-20 14:57 UTC (permalink / raw)
  To: Dongxiao Xu, xen-devel
  Cc: keir, Ian.Campbell, George.Dunlap, stefano.stabellini,
	Ian.Jackson, JBeulich, dgdegra

On 20/06/14 15:31, Dongxiao Xu wrote:
> Add a generic MSR access hypercall for tool stack or other components to
> read or write certain system MSRs.
>
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> ---
>  xen/arch/x86/sysctl.c       | 54 +++++++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/sysctl.h | 25 +++++++++++++++++++++
>  2 files changed, 79 insertions(+)
>
> diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
> index 15d4b91..49f95e4 100644
> --- a/xen/arch/x86/sysctl.c
> +++ b/xen/arch/x86/sysctl.c
> @@ -66,10 +66,27 @@ void arch_do_physinfo(xen_sysctl_physinfo_t *pi)
>          pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm_directio;
>  }
>  
> +static void msr_access_helper(void *data)
> +{
> +    struct msr_access_info *info = data;
> +    xen_sysctl_msr_data_t *msr_data;
> +    unsigned int i;
> +
> +    for ( i = 0; i < info->nr_ops; i++ )
> +    {
> +        msr_data = &info->msr_data[i];
> +        if ( msr_data->cmd == XEN_SYSCTL_MSR_read )
> +            rdmsrl(msr_data->msr, msr_data->val);
> +        else if ( msr_data->cmd == XEN_SYSCTL_MSR_write )
> +            wrmsrl(msr_data->msr, msr_data->val);

These must be the _safe() variants, otherwise it is trivial for a
privileged domain to cause Xen to die with #GP faults.

> +    }
> +}
> +
>  long arch_do_sysctl(
>      struct xen_sysctl *sysctl, XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>  {
>      long ret = 0;
> +    bool_t copyback = 0;
>  
>      switch ( sysctl->cmd )
>      {
> @@ -101,11 +118,48 @@ long arch_do_sysctl(
>      }
>      break;
>  
> +    case XEN_SYSCTL_msr_op:
> +    {
> +        unsigned int cpu = sysctl->u.msr_op.cpu;
> +        struct msr_access_info info;
> +
> +        info.nr_ops = sysctl->u.msr_op.nr_ops;
> +        info.msr_data = xzalloc_array(xen_sysctl_msr_data_t, info.nr_ops);

This memory allocation can be avoided.  Do a copy_from_user() on the
msr_op structure and pass the guest handle and nr into msr_access_helper().

It can then do copy_to/from_user on the individual entries.

> +        if ( !info.msr_data )
> +            return -ENOMEM;
> +
> +        if ( copy_from_guest(info.msr_data, sysctl->u.msr_op.msr_data,
> +                             info.nr_ops) )
> +        {
> +            xfree(info.msr_data);
> +            return -EFAULT;
> +        }
> +
> +        if ( cpu == smp_processor_id() )
> +            msr_access_helper(&info);
> +        else
> +            on_selected_cpus(cpumask_of(cpu), msr_access_helper, &info, 1);
> +
> +        if ( copy_to_guest(sysctl->u.msr_op.msr_data, info.msr_data,
> +                           info.nr_ops ) )
> +        {
> +            xfree(info.msr_data);
> +            return -EFAULT;
> +        }
> +
> +        xfree(info.msr_data);
> +        copyback = 1;
> +    }
> +    break;
> +
>      default:
>          ret = -ENOSYS;
>          break;
>      }
>  
> +    if ( copyback && __copy_to_guest(u_sysctl, sysctl, 1) )
> +        ret = -EFAULT;
> +
>      return ret;
>  }
>  
> diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
> index 3588698..fd042bb 100644
> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -636,6 +636,29 @@ struct xen_sysctl_coverage_op {
>  typedef struct xen_sysctl_coverage_op xen_sysctl_coverage_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_coverage_op_t);
>  
> +#define XEN_SYSCTL_MSR_read                0
> +#define XEN_SYSCTL_MSR_write               1
> +
> +struct xen_sysctl_msr_data {
> +    uint32_t cmd;       /* XEN_SYSCTL_MSR_* */
> +    uint32_t msr;
> +    uint64_t val;
> +};
> +typedef struct xen_sysctl_msr_data xen_sysctl_msr_data_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_msr_data_t);
> +struct xen_sysctl_msr_op {
> +    uint32_t nr_ops;
> +    uint32_t cpu;
> +    XEN_GUEST_HANDLE_64(xen_sysctl_msr_data_t) msr_data;
> +};
> +typedef struct xen_sysctl_msr_op xen_sysctl_msr_op_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_msr_op_t);
> +
> +struct msr_access_info {
> +    uint32_t nr_ops;
> +    xen_sysctl_msr_data_t *msr_data;
> +};

This is private to Xen.  It should not be here in a public header.

~Andrew

> +
>  
>  struct xen_sysctl {
>      uint32_t cmd;
> @@ -658,6 +681,7 @@ struct xen_sysctl {
>  #define XEN_SYSCTL_cpupool_op                    18
>  #define XEN_SYSCTL_scheduler_op                  19
>  #define XEN_SYSCTL_coverage_op                   20
> +#define XEN_SYSCTL_msr_op                        21
>      uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
>      union {
>          struct xen_sysctl_readconsole       readconsole;
> @@ -679,6 +703,7 @@ struct xen_sysctl {
>          struct xen_sysctl_cpupool_op        cpupool_op;
>          struct xen_sysctl_scheduler_op      scheduler_op;
>          struct xen_sysctl_coverage_op       coverage_op;
> +        struct xen_sysctl_msr_op            msr_op;
>          uint8_t                             pad[128];
>      } u;
>  };

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
  2014-06-20 14:31 ` [PATCH v11 1/9] x86: add generic MSR access hypercall Dongxiao Xu
  2014-06-20 14:57   ` Andrew Cooper
@ 2014-06-20 15:00   ` Jan Beulich
  2014-06-23  6:27     ` Xu, Dongxiao
  1 sibling, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2014-06-20 15:00 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

>>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> Add a generic MSR access hypercall for tool stack or other components to
> read or write certain system MSRs.

If you want this to be usable by components other than the tool stack
then this can't be a sysctl (would e.g. need to be a platform-op then).

> --- a/xen/arch/x86/sysctl.c
> +++ b/xen/arch/x86/sysctl.c
> @@ -66,10 +66,27 @@ void arch_do_physinfo(xen_sysctl_physinfo_t *pi)
>          pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm_directio;
>  }
>  
> +static void msr_access_helper(void *data)
> +{
> +    struct msr_access_info *info = data;
> +    xen_sysctl_msr_data_t *msr_data;
> +    unsigned int i;
> +
> +    for ( i = 0; i < info->nr_ops; i++ )
> +    {
> +        msr_data = &info->msr_data[i];
> +        if ( msr_data->cmd == XEN_SYSCTL_MSR_read )
> +            rdmsrl(msr_data->msr, msr_data->val);
> +        else if ( msr_data->cmd == XEN_SYSCTL_MSR_write )
> +            wrmsrl(msr_data->msr, msr_data->val);
> +    }

Missing preemption check.

> @@ -101,11 +118,48 @@ long arch_do_sysctl(
>      }
>      break;
>  
> +    case XEN_SYSCTL_msr_op:
> +    {
> +        unsigned int cpu = sysctl->u.msr_op.cpu;
> +        struct msr_access_info info;
> +
> +        info.nr_ops = sysctl->u.msr_op.nr_ops;
> +        info.msr_data = xzalloc_array(xen_sysctl_msr_data_t, info.nr_ops);
> +        if ( !info.msr_data )
> +            return -ENOMEM;
> +
> +        if ( copy_from_guest(info.msr_data, sysctl->u.msr_op.msr_data,
> +                             info.nr_ops) )
> +        {
> +            xfree(info.msr_data);
> +            return -EFAULT;
> +        }
> +
> +        if ( cpu == smp_processor_id() )
> +            msr_access_helper(&info);
> +        else
> +            on_selected_cpus(cpumask_of(cpu), msr_access_helper, &info, 1);
> +
> +        if ( copy_to_guest(sysctl->u.msr_op.msr_data, info.msr_data,
> +                           info.nr_ops ) )
> +        {
> +            xfree(info.msr_data);
> +            return -EFAULT;
> +        }
> +
> +        xfree(info.msr_data);
> +        copyback = 1;

I don't see what you're meaning to copy back here.

Anyway, together with the comments on the interface itself (further
down) I think you'd be much better off basing this on
continue_hypercall_on_cpu().

> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -636,6 +636,29 @@ struct xen_sysctl_coverage_op {
>  typedef struct xen_sysctl_coverage_op xen_sysctl_coverage_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_coverage_op_t);
>  
> +#define XEN_SYSCTL_MSR_read                0
> +#define XEN_SYSCTL_MSR_write               1
> +
> +struct xen_sysctl_msr_data {
> +    uint32_t cmd;       /* XEN_SYSCTL_MSR_* */
> +    uint32_t msr;
> +    uint64_t val;
> +};
> +typedef struct xen_sysctl_msr_data xen_sysctl_msr_data_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_msr_data_t);
> +struct xen_sysctl_msr_op {
> +    uint32_t nr_ops;

Jusr "nr" would suffice.

> +    uint32_t cpu;

This is rather limiting - you should be able to specify a CPU for
each operation. That way you'll have another 32-bit field available
for flags: You indicated earlier on that you'd want certain
operations (like a write followed by a read) pairable without
potentially getting preempted in between. The field then freed up
here should then be reserved for use as flags field too (i.e. you'd
need to check that it's zero).

> +    XEN_GUEST_HANDLE_64(xen_sysctl_msr_data_t) msr_data;
> +};
> +typedef struct xen_sysctl_msr_op xen_sysctl_msr_op_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_msr_op_t);

I think there shouldn't be any mention of "MSR" in other than the
specific sub-op names. As already said earlier, I see this also as
a vehicle to potentially to port accesses (namely when two of them
need to be done in close succession, or one needs to be carried
out on a particular CPU).

> +struct msr_access_info {
> +    uint32_t nr_ops;
> +    xen_sysctl_msr_data_t *msr_data;
> +};

This can only be misplaced - there are no pointers in public headers.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring feature
  2014-06-20 14:31 ` [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring feature Dongxiao Xu
@ 2014-06-20 15:04   ` Jan Beulich
  2014-06-23  6:38     ` Xu, Dongxiao
  2014-06-20 17:35   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2014-06-20 15:04 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

>>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> Detect platform QoS feature status and enumerate the resource types,
> one of which is to monitor the L3 cache occupancy.
> 
> Also introduce a Xen grub command line parameter to control the
> QoS feature status.

"grub"?

> +static bool_t __initdata opt_pqos = 1;
> +static bool_t __initdata opt_pqos_monitor = 1;

Hmm, did we really settle on enabling these by default?

> +void __init init_platform_qos(void)
> +{
> +    if ( !opt_pqos )
> +        return;
> +
> +    if ( opt_pqos_monitor && opt_rmid_max )

Kind of pointless to split the two if()-s.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 5/9] x86: dynamically attach/detach QoS monitoring service for a guest
  2014-06-20 14:31 ` [PATCH v11 5/9] x86: dynamically attach/detach QoS monitoring service for a guest Dongxiao Xu
@ 2014-06-20 15:08   ` Jan Beulich
  2014-06-23  6:43     ` Xu, Dongxiao
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2014-06-20 15:08 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

>>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> +int pqos_monitor_alloc_rmid(struct domain *d)
> +{
> +    int rc = 0;
> +    unsigned int rmid;
> +
> +    ASSERT(pqos_monitor_enabled());
> +
> +    spin_lock(&pqosm->pqosm_lock);
> +
> +    if ( d->arch.pqos_rmid > 0 )
> +    {
> +        rc = -EEXIST;
> +        goto out;
> +    }

Does this check really need to be inside the locked region? You're
already protected against races by the domctl lock.

> +void pqos_monitor_free_rmid(struct domain *d)
> +{
> +    unsigned int rmid;
> +
> +    spin_lock(&pqosm->pqosm_lock);
> +    rmid = d->arch.pqos_rmid;
> +    /* We do not free system reserved "RMID=0" */
> +    if ( rmid == 0 )
> +        goto out;

Similarly here.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 6/9] x86: collect global QoS monitoring information
  2014-06-20 14:31 ` [PATCH v11 6/9] x86: collect global QoS monitoring information Dongxiao Xu
@ 2014-06-20 15:16   ` Jan Beulich
  2014-06-23  6:55     ` Xu, Dongxiao
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2014-06-20 15:16 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

>>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> This implementation tries to put all policies into user space, thus some
> global QoS monitoring information needs to be exposed, such as the total
> RMID count, L3 upscaling factor, etc.
> 
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> ---
>  xen/arch/x86/sysctl.c       | 52 
> +++++++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/sysctl.h | 14 ++++++++++++
>  2 files changed, 66 insertions(+)
> 
> diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
> index 49f95e4..ea9b2e4 100644
> --- a/xen/arch/x86/sysctl.c
> +++ b/xen/arch/x86/sysctl.c
> @@ -28,6 +28,7 @@
>  #include <xen/nodemask.h>
>  #include <xen/cpu.h>
>  #include <xsm/xsm.h>
> +#include <asm/pqos.h>
>  
>  #define get_xen_guest_handle(val, hnd)  do { val = (hnd).p; } while (0)
>  
> @@ -152,6 +153,57 @@ long arch_do_sysctl(
>      }
>      break;
>  
> +    case XEN_SYSCTL_pqos_monitor_op:
> +        if ( !pqos_monitor_enabled() )
> +            return -ENODEV;
> +
> +        switch ( sysctl->u.pqos_monitor_op.cmd )
> +        {
> +        case XEN_SYSCTL_PQOS_MONITOR_cqm_enabled:
> +            sysctl->u.pqos_monitor_op.data =
> +                (pqosm->qm_features & QOS_MONITOR_TYPE_L3) &&
> +                (pqosm->l3m.l3_features & L3_FEATURE_OCCUPANCY);
> +            break;
> +        case XEN_SYSCTL_PQOS_MONITOR_get_total_rmid:
> +            sysctl->u.pqos_monitor_op.data =
> +                pqosm->rmid_max - pqosm->rmid_min + 1;
> +            break;
> +        case XEN_SYSCTL_PQOS_MONITOR_get_l3_upscaling_factor:
> +            sysctl->u.pqos_monitor_op.data = pqosm->l3m.upscaling_factor;
> +            break;
> +        case XEN_SYSCTL_PQOS_MONITOR_get_l3_cache_size:
> +            sysctl->u.pqos_monitor_op.data = boot_cpu_data.x86_cache_size;
> +            break;
> +        case XEN_SYSCTL_PQOS_MONITOR_get_socket_cpu:
> +        {
> +            unsigned int i, cpu;
> +            int socket = sysctl->u.pqos_monitor_op.data;

unsigned int

> +
> +            for ( i = 0; i < NR_CPUS; i++ )

Why NR_CPUS when a few lines down you show that you know of
nr_cpu_ids?

> +            {
> +                if ( cpu_to_socket(i) < 0 || cpu_to_socket(i) != socket )
> +                    continue;
> +                cpu = cpumask_any(per_cpu(cpu_core_mask, i));
> +                if ( cpu < nr_cpu_ids )
> +                {
> +                    sysctl->u.pqos_monitor_op.data = cpu;
> +                    break;
> +                }
> +            }
> +
> +            if ( i == NR_CPUS )
> +                ret = -EFAULT;

-EFAULT?

> +        }
> +        break;
> +
> +        default:
> +            sysctl->u.pqos_monitor_op.data = 0;
> +            ret = -ENOSYS;
> +            break;
> +        }
> +        copyback = 1;
> +    break;

Indentation.

> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -659,6 +659,18 @@ struct msr_access_info {
>      xen_sysctl_msr_data_t *msr_data;
>  };
>  
> +#define XEN_SYSCTL_PQOS_MONITOR_get_total_rmid            0
> +#define XEN_SYSCTL_PQOS_MONITOR_get_l3_upscaling_factor   1
> +#define XEN_SYSCTL_PQOS_MONITOR_get_l3_cache_size         2
> +#define XEN_SYSCTL_PQOS_MONITOR_get_socket_cpu            3
> +#define XEN_SYSCTL_PQOS_MONITOR_cqm_enabled               4
> +struct xen_sysctl_pqos_monitor_op {
> +    uint32_t cmd;
> +    uint64_t data;

Missing explicit padding between the two fields, or need to use
uint64_aligned_t.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 7/9] x86: enable QoS monitoring for each domain RMID
  2014-06-20 14:31 ` [PATCH v11 7/9] x86: enable QoS monitoring for each domain RMID Dongxiao Xu
@ 2014-06-20 15:20   ` Jan Beulich
  2014-06-23  6:55     ` Xu, Dongxiao
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2014-06-20 15:20 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

>>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> --- a/xen/include/asm-x86/pqos.h
> +++ b/xen/include/asm-x86/pqos.h
> @@ -44,6 +44,11 @@ struct pqos_monitor {
>  };
>  extern struct pqos_monitor *pqosm;
>  
> +struct pqr_assoc {
> +    uint64_t val;
> +    bool_t initialized;
> +};

This is being used in only one source file and hence should be moved
there.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 3/9] tools: provide interface for generic MSR access
  2014-06-20 14:31 ` [PATCH v11 3/9] tools: provide interface for generic MSR access Dongxiao Xu
@ 2014-06-20 17:34   ` Konrad Rzeszutek Wilk
  2014-06-23  7:13     ` Jan Beulich
  2014-06-27 13:05   ` Ian Campbell
  1 sibling, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-20 17:34 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, JBeulich, dgdegra

On Fri, Jun 20, 2014 at 10:31:44PM +0800, Dongxiao Xu wrote:
> Xen added a new sysctl hypercall for generic MSR access, and this is the
> tool side change to wrapper the hypercall into xc APIs.
> 

I really like the idea that Jan surfaced. That is use an
cpu bitmap for the operation (see xc_tbuf.c as an example)
and my recent postings http://osdir.com/ml/general/2014-06/msg24386.html.

That way you can define exactly which CPUs you want to do it
at and you can do one nice hypercall that will do it all at
once for you.

> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> ---
>  tools/libxc/Makefile     |  1 +
>  tools/libxc/xc_msr_x86.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++
>  tools/libxc/xenctrl.h    |  6 ++++++
>  3 files changed, 60 insertions(+)
>  create mode 100644 tools/libxc/xc_msr_x86.c
> 
> diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
> index a74b19e..fec4aca 100644
> --- a/tools/libxc/Makefile
> +++ b/tools/libxc/Makefile
> @@ -35,6 +35,7 @@ CTRL_SRCS-y       += xc_kexec.c
>  CTRL_SRCS-y       += xtl_core.c
>  CTRL_SRCS-y       += xtl_logger_stdio.c
>  CTRL_SRCS-$(CONFIG_X86) += xc_pagetab.c
> +CTRL_SRCS-$(CONFIG_X86) += xc_msr_x86.c
>  CTRL_SRCS-$(CONFIG_Linux) += xc_linux.c xc_linux_osdep.c
>  CTRL_SRCS-$(CONFIG_SunOS) += xc_solaris.c
>  CTRL_SRCS-$(CONFIG_NetBSD) += xc_netbsd.c
> diff --git a/tools/libxc/xc_msr_x86.c b/tools/libxc/xc_msr_x86.c
> new file mode 100644
> index 0000000..46e77dd
> --- /dev/null
> +++ b/tools/libxc/xc_msr_x86.c
> @@ -0,0 +1,53 @@
> +/*
> + * xc_msr_x86.c
> + *
> + * Generic MSR access API
> + *
> + * Copyright (C) 2014      Intel Corporation
> + * Author Dongxiao Xu <dongxiao.xu@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as published
> + * by the Free Software Foundation; version 2.1 only. with the special
> + * exception on linking described in file LICENSE.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + */
> +
> +#include "xc_private.h"
> +
> +int xc_msr_op(xc_interface *xch, uint32_t cpu,
> +              uint32_t nr_ops, xc_msr_data_t *msr_data)
> +{
> +    int rc;
> +    DECLARE_SYSCTL;
> +    DECLARE_HYPERCALL_BOUNCE(msr_data, nr_ops * sizeof(*msr_data),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> +
> +    if ( xc_hypercall_bounce_pre(xch, msr_data) )
> +        return -1;
> +
> +    sysctl.cmd = XEN_SYSCTL_msr_op;
> +    sysctl.u.msr_op.nr_ops = nr_ops;
> +    sysctl.u.msr_op.cpu = cpu;
> +    set_xen_guest_handle(sysctl.u.msr_op.msr_data, msr_data);
> +
> +    rc = do_sysctl(xch, &sysctl);
> +
> +    xc_hypercall_bounce_post(xch, msr_data);
> +
> +    return rc;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index 02129f7..50625ca 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -2424,4 +2424,10 @@ int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
>   */
>  int xc_kexec_unload(xc_interface *xch, int type);
>  
> +#if defined(__i386__) || defined(__x86_64__)
> +typedef xen_sysctl_msr_data_t xc_msr_data_t;
> +int xc_msr_op(xc_interface *xch, uint32_t cpu,
> +    uint32_t nr_ops, xc_msr_data_t *msr_data);
> +#endif
> +
>  #endif /* XENCTRL_H */
> -- 
> 1.8.1.5
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring feature
  2014-06-20 14:31 ` [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring feature Dongxiao Xu
  2014-06-20 15:04   ` Jan Beulich
@ 2014-06-20 17:35   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-20 17:35 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, JBeulich, dgdegra

On Fri, Jun 20, 2014 at 10:31:45PM +0800, Dongxiao Xu wrote:
> Detect platform QoS feature status and enumerate the resource types,
> one of which is to monitor the L3 cache occupancy.
> 
> Also introduce a Xen grub command line parameter to control the
> QoS feature status.
> 
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> ---
>  docs/misc/xen-command-line.markdown |   7 +++
>  xen/arch/x86/Makefile               |   1 +
>  xen/arch/x86/pqos.c                 | 122 ++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/setup.c                |   3 +
>  xen/include/asm-x86/cpufeature.h    |   1 +
>  xen/include/asm-x86/pqos.h          |  48 ++++++++++++++
>  6 files changed, 182 insertions(+)
>  create mode 100644 xen/arch/x86/pqos.c
>  create mode 100644 xen/include/asm-x86/pqos.h
> 
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index a7ac53d..542eee5 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -840,6 +840,13 @@ This option can be specified more than once (up to 8 times at present).
>  ### ple\_window
>  > `= <integer>`
>  
> +### pqos (Intel)
> +> `= List of ( <boolean> | pqos_monitor:<boolean> | rmid_max:<integer> )`
> +
> +> Default: `pqos=1,pqos_monitor:1,rmid_max:255`
> +
> +Configure platform QoS services.

Could you elaborate a bit more about them please?

and perhaps also specify what kind of CPUs can use this?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
  2014-06-20 15:00   ` Jan Beulich
@ 2014-06-23  6:27     ` Xu, Dongxiao
  2014-06-23  6:45       ` Jan Beulich
  0 siblings, 1 reply; 38+ messages in thread
From: Xu, Dongxiao @ 2014-06-23  6:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, June 20, 2014 11:01 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> Subject: Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
> 
> >>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> > Add a generic MSR access hypercall for tool stack or other components to
> > read or write certain system MSRs.
> 
> If you want this to be usable by components other than the tool stack
> then this can't be a sysctl (would e.g. need to be a platform-op then).

Per my understanding, the platform-op related hypercalls are used by Dom0 kernel.
While in this CQM scenario, we want libvirt/libxl/libxc to call such hypercall to access the MSR, however I didn't see any usage of platform_op in libxl/libxc side.

Could you explain a bit more for it?

> 
> > --- a/xen/arch/x86/sysctl.c
> > +++ b/xen/arch/x86/sysctl.c
> > @@ -66,10 +66,27 @@ void arch_do_physinfo(xen_sysctl_physinfo_t *pi)
> >          pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm_directio;
> >  }
> >
> > +static void msr_access_helper(void *data)
> > +{
> > +    struct msr_access_info *info = data;
> > +    xen_sysctl_msr_data_t *msr_data;
> > +    unsigned int i;
> > +
> > +    for ( i = 0; i < info->nr_ops; i++ )
> > +    {
> > +        msr_data = &info->msr_data[i];
> > +        if ( msr_data->cmd == XEN_SYSCTL_MSR_read )
> > +            rdmsrl(msr_data->msr, msr_data->val);
> > +        else if ( msr_data->cmd == XEN_SYSCTL_MSR_write )
> > +            wrmsrl(msr_data->msr, msr_data->val);
> > +    }
> 
> Missing preemption check.

Do you mean hypercall_preemption_check() here? Or the preemption between the MSR accesses?

> 
> > @@ -101,11 +118,48 @@ long arch_do_sysctl(
> >      }
> >      break;
> >
> > +    case XEN_SYSCTL_msr_op:
> > +    {
> > +        unsigned int cpu = sysctl->u.msr_op.cpu;
> > +        struct msr_access_info info;
> > +
> > +        info.nr_ops = sysctl->u.msr_op.nr_ops;
> > +        info.msr_data = xzalloc_array(xen_sysctl_msr_data_t, info.nr_ops);
> > +        if ( !info.msr_data )
> > +            return -ENOMEM;
> > +
> > +        if ( copy_from_guest(info.msr_data, sysctl->u.msr_op.msr_data,
> > +                             info.nr_ops) )
> > +        {
> > +            xfree(info.msr_data);
> > +            return -EFAULT;
> > +        }
> > +
> > +        if ( cpu == smp_processor_id() )
> > +            msr_access_helper(&info);
> > +        else
> > +            on_selected_cpus(cpumask_of(cpu), msr_access_helper, &info,
> 1);
> > +
> > +        if ( copy_to_guest(sysctl->u.msr_op.msr_data, info.msr_data,
> > +                           info.nr_ops ) )
> > +        {
> > +            xfree(info.msr_data);
> > +            return -EFAULT;
> > +        }
> > +
> > +        xfree(info.msr_data);
> > +        copyback = 1;
> 
> I don't see what you're meaning to copy back here.

Yes, thanks! This needs to be removed, at least for the MSR access since we already explicitly copy back the data to guest.

> 
> Anyway, together with the comments on the interface itself (further
> down) I think you'd be much better off basing this on
> continue_hypercall_on_cpu().

continue_hypercall_on_cpu() function schedules a tasklet to run the specific function on certain CPU.
However in our CQM case, the libxl/libxc functions want to get the result immediately when the function returns, so that's why I didn't use continue_hypercall_on_cpu().

> 
> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -636,6 +636,29 @@ struct xen_sysctl_coverage_op {
> >  typedef struct xen_sysctl_coverage_op xen_sysctl_coverage_op_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_coverage_op_t);
> >
> > +#define XEN_SYSCTL_MSR_read                0
> > +#define XEN_SYSCTL_MSR_write               1
> > +
> > +struct xen_sysctl_msr_data {
> > +    uint32_t cmd;       /* XEN_SYSCTL_MSR_* */
> > +    uint32_t msr;
> > +    uint64_t val;
> > +};
> > +typedef struct xen_sysctl_msr_data xen_sysctl_msr_data_t;
> > +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_msr_data_t);
> > +struct xen_sysctl_msr_op {
> > +    uint32_t nr_ops;
> 
> Jusr "nr" would suffice.

Okay.

> 
> > +    uint32_t cpu;
> 
> This is rather limiting - you should be able to specify a CPU for
> each operation. That way you'll have another 32-bit field available
> for flags: You indicated earlier on that you'd want certain
> operations (like a write followed by a read) pairable without
> potentially getting preempted in between. The field then freed up
> here should then be reserved for use as flags field too (i.e. you'd
> need to check that it's zero).

Do you mean the following case?

Op1: CPU0, write MSR 0x100. Flag: 1 (indicating paring)
Op2: CPU1, read MSR 0x200. Flag: 0.
Op3: CPU0, read MSR 0x101. Flag: 1 (paring the above op1)

To avoid the preempt between certain operations, my previous solution is to pack all the operations towards a certain CPU via a single IPI. When the target CPU receives the IPI, it will execute the operations one by one (suppose during the execution for loop, no preemption will happen).
Now if we specify a CPU for each operation, can we pack the operations according to the target CPU, and finally issue one IPI per-target-CPU to avoid preemption? That will get rid of such flag usage.

> 
> > +    XEN_GUEST_HANDLE_64(xen_sysctl_msr_data_t) msr_data;
> > +};
> > +typedef struct xen_sysctl_msr_op xen_sysctl_msr_op_t;
> > +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_msr_op_t);
> 
> I think there shouldn't be any mention of "MSR" in other than the
> specific sub-op names. As already said earlier, I see this also as
> a vehicle to potentially to port accesses (namely when two of them
> need to be done in close succession, or one needs to be carried
> out on a particular CPU).

That's okay.

> 
> > +struct msr_access_info {
> > +    uint32_t nr_ops;
> > +    xen_sysctl_msr_data_t *msr_data;
> > +};
> 
> This can only be misplaced - there are no pointers in public headers.

Okay, will move it in other places.

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
  2014-06-20 14:57   ` Andrew Cooper
@ 2014-06-23  6:34     ` Xu, Dongxiao
  0 siblings, 0 replies; 38+ messages in thread
From: Xu, Dongxiao @ 2014-06-23  6:34 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: keir, Ian.Campbell, George.Dunlap, stefano.stabellini,
	Ian.Jackson, JBeulich, dgdegra

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, June 20, 2014 10:57 PM
> To: Xu, Dongxiao; xen-devel@lists.xen.org
> Cc: keir@xen.org; JBeulich@suse.com; Ian.Jackson@eu.citrix.com;
> Ian.Campbell@citrix.com; stefano.stabellini@eu.citrix.com;
> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> George.Dunlap@eu.citrix.com
> Subject: Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
> 
> On 20/06/14 15:31, Dongxiao Xu wrote:
> > Add a generic MSR access hypercall for tool stack or other components to
> > read or write certain system MSRs.
> >
> > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> > ---
> >  xen/arch/x86/sysctl.c       | 54
> +++++++++++++++++++++++++++++++++++++++++++++
> >  xen/include/public/sysctl.h | 25 +++++++++++++++++++++
> >  2 files changed, 79 insertions(+)
> >
> > diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
> > index 15d4b91..49f95e4 100644
> > --- a/xen/arch/x86/sysctl.c
> > +++ b/xen/arch/x86/sysctl.c
> > @@ -66,10 +66,27 @@ void arch_do_physinfo(xen_sysctl_physinfo_t *pi)
> >          pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm_directio;
> >  }
> >
> > +static void msr_access_helper(void *data)
> > +{
> > +    struct msr_access_info *info = data;
> > +    xen_sysctl_msr_data_t *msr_data;
> > +    unsigned int i;
> > +
> > +    for ( i = 0; i < info->nr_ops; i++ )
> > +    {
> > +        msr_data = &info->msr_data[i];
> > +        if ( msr_data->cmd == XEN_SYSCTL_MSR_read )
> > +            rdmsrl(msr_data->msr, msr_data->val);
> > +        else if ( msr_data->cmd == XEN_SYSCTL_MSR_write )
> > +            wrmsrl(msr_data->msr, msr_data->val);
> 
> These must be the _safe() variants, otherwise it is trivial for a
> privileged domain to cause Xen to die with #GP faults.

Okay, thanks!

> 
> > +    }
> > +}
> > +
> >  long arch_do_sysctl(
> >      struct xen_sysctl *sysctl, XEN_GUEST_HANDLE_PARAM(xen_sysctl_t)
> u_sysctl)
> >  {
> >      long ret = 0;
> > +    bool_t copyback = 0;
> >
> >      switch ( sysctl->cmd )
> >      {
> > @@ -101,11 +118,48 @@ long arch_do_sysctl(
> >      }
> >      break;
> >
> > +    case XEN_SYSCTL_msr_op:
> > +    {
> > +        unsigned int cpu = sysctl->u.msr_op.cpu;
> > +        struct msr_access_info info;
> > +
> > +        info.nr_ops = sysctl->u.msr_op.nr_ops;
> > +        info.msr_data = xzalloc_array(xen_sysctl_msr_data_t, info.nr_ops);
> 
> This memory allocation can be avoided.  Do a copy_from_user() on the
> msr_op structure and pass the guest handle and nr into msr_access_helper().
> 
> It can then do copy_to/from_user on the individual entries.

This implementation pack all operations together towards one CPU. Jan is suggesting if we can specify one CPU for each operation.
If that's the case, I will think about whether we can eliminate the memory allocation here.

> 
> > +        if ( !info.msr_data )
> > +            return -ENOMEM;
> > +
> > +        if ( copy_from_guest(info.msr_data, sysctl->u.msr_op.msr_data,
> > +                             info.nr_ops) )
> > +        {
> > +            xfree(info.msr_data);
> > +            return -EFAULT;
> > +        }
> > +
> > +        if ( cpu == smp_processor_id() )
> > +            msr_access_helper(&info);
> > +        else
> > +            on_selected_cpus(cpumask_of(cpu), msr_access_helper, &info,
> 1);
> > +
> > +        if ( copy_to_guest(sysctl->u.msr_op.msr_data, info.msr_data,
> > +                           info.nr_ops ) )
> > +        {
> > +            xfree(info.msr_data);
> > +            return -EFAULT;
> > +        }
> > +
> > +        xfree(info.msr_data);
> > +        copyback = 1;
> > +    }
> > +    break;
> > +
> >      default:
> >          ret = -ENOSYS;
> >          break;
> >      }
> >
> > +    if ( copyback && __copy_to_guest(u_sysctl, sysctl, 1) )
> > +        ret = -EFAULT;
> > +
> >      return ret;
> >  }
> >
> > diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
> > index 3588698..fd042bb 100644
> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -636,6 +636,29 @@ struct xen_sysctl_coverage_op {
> >  typedef struct xen_sysctl_coverage_op xen_sysctl_coverage_op_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_coverage_op_t);
> >
> > +#define XEN_SYSCTL_MSR_read                0
> > +#define XEN_SYSCTL_MSR_write               1
> > +
> > +struct xen_sysctl_msr_data {
> > +    uint32_t cmd;       /* XEN_SYSCTL_MSR_* */
> > +    uint32_t msr;
> > +    uint64_t val;
> > +};
> > +typedef struct xen_sysctl_msr_data xen_sysctl_msr_data_t;
> > +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_msr_data_t);
> > +struct xen_sysctl_msr_op {
> > +    uint32_t nr_ops;
> > +    uint32_t cpu;
> > +    XEN_GUEST_HANDLE_64(xen_sysctl_msr_data_t) msr_data;
> > +};
> > +typedef struct xen_sysctl_msr_op xen_sysctl_msr_op_t;
> > +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_msr_op_t);
> > +
> > +struct msr_access_info {
> > +    uint32_t nr_ops;
> > +    xen_sysctl_msr_data_t *msr_data;
> > +};
> 
> This is private to Xen.  It should not be here in a public header.

Okay, thanks!

Dongxiao

> 
> ~Andrew
> 
> > +
> >
> >  struct xen_sysctl {
> >      uint32_t cmd;
> > @@ -658,6 +681,7 @@ struct xen_sysctl {
> >  #define XEN_SYSCTL_cpupool_op                    18
> >  #define XEN_SYSCTL_scheduler_op                  19
> >  #define XEN_SYSCTL_coverage_op                   20
> > +#define XEN_SYSCTL_msr_op                        21
> >      uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
> >      union {
> >          struct xen_sysctl_readconsole       readconsole;
> > @@ -679,6 +703,7 @@ struct xen_sysctl {
> >          struct xen_sysctl_cpupool_op        cpupool_op;
> >          struct xen_sysctl_scheduler_op      scheduler_op;
> >          struct xen_sysctl_coverage_op       coverage_op;
> > +        struct xen_sysctl_msr_op            msr_op;
> >          uint8_t                             pad[128];
> >      } u;
> >  };

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring feature
  2014-06-20 15:04   ` Jan Beulich
@ 2014-06-23  6:38     ` Xu, Dongxiao
  2014-06-23  6:50       ` Jan Beulich
  0 siblings, 1 reply; 38+ messages in thread
From: Xu, Dongxiao @ 2014-06-23  6:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, June 20, 2014 11:04 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> Subject: Re: [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring
> feature
> 
> >>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> > Detect platform QoS feature status and enumerate the resource types,
> > one of which is to monitor the L3 cache occupancy.
> >
> > Also introduce a Xen grub command line parameter to control the
> > QoS feature status.
> 
> "grub"?

Sorry, what's your point here? Not quite understand it...

> 
> > +static bool_t __initdata opt_pqos = 1;
> > +static bool_t __initdata opt_pqos_monitor = 1;
> 
> Hmm, did we really settle on enabling these by default?

Okay, I can disable it by default.

> 
> > +void __init init_platform_qos(void)
> > +{
> > +    if ( !opt_pqos )
> > +        return;
> > +
> > +    if ( opt_pqos_monitor && opt_rmid_max )
> 
> Kind of pointless to split the two if()-s.

The split is reserved for the following QoS features, such as CQE, etc.

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 5/9] x86: dynamically attach/detach QoS monitoring service for a guest
  2014-06-20 15:08   ` Jan Beulich
@ 2014-06-23  6:43     ` Xu, Dongxiao
  0 siblings, 0 replies; 38+ messages in thread
From: Xu, Dongxiao @ 2014-06-23  6:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, June 20, 2014 11:08 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> Subject: Re: [PATCH v11 5/9] x86: dynamically attach/detach QoS monitoring
> service for a guest
> 
> >>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> > +int pqos_monitor_alloc_rmid(struct domain *d)
> > +{
> > +    int rc = 0;
> > +    unsigned int rmid;
> > +
> > +    ASSERT(pqos_monitor_enabled());
> > +
> > +    spin_lock(&pqosm->pqosm_lock);
> > +
> > +    if ( d->arch.pqos_rmid > 0 )
> > +    {
> > +        rc = -EEXIST;
> > +        goto out;
> > +    }
> 
> Does this check really need to be inside the locked region? You're
> already protected against races by the domctl lock.

It seems no need any more for these qos monitor related lock. Will remove it.

Thanks,
Dongxiao

> 
> > +void pqos_monitor_free_rmid(struct domain *d)
> > +{
> > +    unsigned int rmid;
> > +
> > +    spin_lock(&pqosm->pqosm_lock);
> > +    rmid = d->arch.pqos_rmid;
> > +    /* We do not free system reserved "RMID=0" */
> > +    if ( rmid == 0 )
> > +        goto out;
> 
> Similarly here.
> 
> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
  2014-06-23  6:27     ` Xu, Dongxiao
@ 2014-06-23  6:45       ` Jan Beulich
  2014-06-23  7:29         ` Xu, Dongxiao
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2014-06-23  6:45 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

>>> On 23.06.14 at 08:27, <dongxiao.xu@intel.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, June 20, 2014 11:01 PM
>> To: Xu, Dongxiao
>> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
>> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
>> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
>> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org 
>> Subject: Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
>> 
>> >>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
>> > Add a generic MSR access hypercall for tool stack or other components to
>> > read or write certain system MSRs.
>> 
>> If you want this to be usable by components other than the tool stack
>> then this can't be a sysctl (would e.g. need to be a platform-op then).
> 
> Per my understanding, the platform-op related hypercalls are used by Dom0 
> kernel.
> While in this CQM scenario, we want libvirt/libxl/libxc to call such 
> hypercall to access the MSR, however I didn't see any usage of platform_op in 
> libxl/libxc side.
> 
> Could you explain a bit more for it?

Platform ops are for use by Dom0, yes, but note the explicit omission
of "kernel" in my reply. The fact that libxc currently doesn't use any
doesn't mean it mustn't do so. The only question here that matter is
what audience we see for the new functionality: If this is to be a
purely tools interface, then a sysctl is fine. If the kernel is intended to
also be able to use it (as I suggested), something other than a sysctl
needs to be used (and I was merely _hinting_ at platform-op).

>> > --- a/xen/arch/x86/sysctl.c
>> > +++ b/xen/arch/x86/sysctl.c
>> > @@ -66,10 +66,27 @@ void arch_do_physinfo(xen_sysctl_physinfo_t *pi)
>> >          pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm_directio;
>> >  }
>> >
>> > +static void msr_access_helper(void *data)
>> > +{
>> > +    struct msr_access_info *info = data;
>> > +    xen_sysctl_msr_data_t *msr_data;
>> > +    unsigned int i;
>> > +
>> > +    for ( i = 0; i < info->nr_ops; i++ )
>> > +    {
>> > +        msr_data = &info->msr_data[i];
>> > +        if ( msr_data->cmd == XEN_SYSCTL_MSR_read )
>> > +            rdmsrl(msr_data->msr, msr_data->val);
>> > +        else if ( msr_data->cmd == XEN_SYSCTL_MSR_write )
>> > +            wrmsrl(msr_data->msr, msr_data->val);
>> > +    }
>> 
>> Missing preemption check.
> 
> Do you mean hypercall_preemption_check() here? Or the preemption between the 
> MSR accesses?

Not sure what distinction you make between the two (or what the
second question is supposed to refer to). The list of operations can
be arbitrarily long, and individual operations may also take arbitrary
time (especially if this later gets extended to I/O port accesses,
which may incur SMM invocation). Hence preemption checks between
operations are needed (except in well-defined cases where the
caller may request such check to be skipped).

>> Anyway, together with the comments on the interface itself (further
>> down) I think you'd be much better off basing this on
>> continue_hypercall_on_cpu().
> 
> continue_hypercall_on_cpu() function schedules a tasklet to run the specific 
> function on certain CPU.
> However in our CQM case, the libxl/libxc functions want to get the result 
> immediately when the function returns, so that's why I didn't use 
> continue_hypercall_on_cpu().

The use of a tasklet doesn't mean the result would only be available in
a deferred manner: The tasklet runs _before_ control transfers back
to the caller.

>> > --- a/xen/include/public/sysctl.h
>> > +++ b/xen/include/public/sysctl.h
>> > @@ -636,6 +636,29 @@ struct xen_sysctl_coverage_op {
>> >  typedef struct xen_sysctl_coverage_op xen_sysctl_coverage_op_t;
>> >  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_coverage_op_t);
>> >
>> > +#define XEN_SYSCTL_MSR_read                0
>> > +#define XEN_SYSCTL_MSR_write               1
>> > +
>> > +struct xen_sysctl_msr_data {
>> > +    uint32_t cmd;       /* XEN_SYSCTL_MSR_* */
>> > +    uint32_t msr;
>> > +    uint64_t val;
>> > +};
>> > +typedef struct xen_sysctl_msr_data xen_sysctl_msr_data_t;
>> > +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_msr_data_t);
>> > +struct xen_sysctl_msr_op {
>> > +    uint32_t nr_ops;
>> 
>> Jusr "nr" would suffice.
> 
> Okay.
> 
>> 
>> > +    uint32_t cpu;
>> 
>> This is rather limiting - you should be able to specify a CPU for
>> each operation. That way you'll have another 32-bit field available
>> for flags: You indicated earlier on that you'd want certain
>> operations (like a write followed by a read) pairable without
>> potentially getting preempted in between. The field then freed up
>> here should then be reserved for use as flags field too (i.e. you'd
>> need to check that it's zero).
> 
> Do you mean the following case?
> 
> Op1: CPU0, write MSR 0x100. Flag: 1 (indicating paring)
> Op2: CPU1, read MSR 0x200. Flag: 0.
> Op3: CPU0, read MSR 0x101. Flag: 1 (paring the above op1)

No - CPU changes alway imply preemption between individual
sub-ops.

> To avoid the preempt between certain operations, my previous solution is to 
> pack all the operations towards a certain CPU via a single IPI. When the 
> target CPU receives the IPI, it will execute the operations one by one 
> (suppose during the execution for loop, no preemption will happen).
> Now if we specify a CPU for each operation, can we pack the operations 
> according to the target CPU, and finally issue one IPI per-target-CPU to avoid 
> preemption? That will get rid of such flag usage.

No, I don't think re-ordering operations can be permitted here.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring feature
  2014-06-23  6:38     ` Xu, Dongxiao
@ 2014-06-23  6:50       ` Jan Beulich
  2014-06-23  7:30         ` Xu, Dongxiao
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2014-06-23  6:50 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

>>> On 23.06.14 at 08:38, <dongxiao.xu@intel.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, June 20, 2014 11:04 PM
>> To: Xu, Dongxiao
>> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
>> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
>> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
>> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org 
>> Subject: Re: [PATCH v11 4/9] x86: detect and initialize Platform QoS 
> Monitoring
>> feature
>> 
>> >>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
>> > Detect platform QoS feature status and enumerate the resource types,
>> > one of which is to monitor the L3 cache occupancy.
>> >
>> > Also introduce a Xen grub command line parameter to control the
>> > QoS feature status.
>> 
>> "grub"?
> 
> Sorry, what's your point here? Not quite understand it...

I'm asking what the mentioning of "grub" here means. Xen does have
command line parameters, but they can equally well be used when
booting without GrUB, e.g. directly from UEFI. IOW saying "grub" in
a statement like this is at best confusing, unless talk is of an option
that's only affecting booting via GrUB (e.g. something affecting the
handling of multiboot, albeit even in that case nothing dictates that
only GrUB can implement this protocol).

>> > +void __init init_platform_qos(void)
>> > +{
>> > +    if ( !opt_pqos )
>> > +        return;
>> > +
>> > +    if ( opt_pqos_monitor && opt_rmid_max )
>> 
>> Kind of pointless to split the two if()-s.
> 
> The split is reserved for the following QoS features, such as CQE, etc.

Since that isn't part of this series, I'd suggest avoiding such odd
looking constructs (unless you already have a follow-up series in the
works, and a reasonably certain it'll make 4.5). Future patches can
always split the one if() into multiple.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 6/9] x86: collect global QoS monitoring information
  2014-06-20 15:16   ` Jan Beulich
@ 2014-06-23  6:55     ` Xu, Dongxiao
  2014-06-23  7:06       ` Jan Beulich
  0 siblings, 1 reply; 38+ messages in thread
From: Xu, Dongxiao @ 2014-06-23  6:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, June 20, 2014 11:16 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> Subject: Re: [PATCH v11 6/9] x86: collect global QoS monitoring information
> 
> >>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> > This implementation tries to put all policies into user space, thus some
> > global QoS monitoring information needs to be exposed, such as the total
> > RMID count, L3 upscaling factor, etc.
> >
> > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> > ---
> >  xen/arch/x86/sysctl.c       | 52
> > +++++++++++++++++++++++++++++++++++++++++++++
> >  xen/include/public/sysctl.h | 14 ++++++++++++
> >  2 files changed, 66 insertions(+)
> >
> > diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
> > index 49f95e4..ea9b2e4 100644
> > --- a/xen/arch/x86/sysctl.c
> > +++ b/xen/arch/x86/sysctl.c
> > @@ -28,6 +28,7 @@
> >  #include <xen/nodemask.h>
> >  #include <xen/cpu.h>
> >  #include <xsm/xsm.h>
> > +#include <asm/pqos.h>
> >
> >  #define get_xen_guest_handle(val, hnd)  do { val = (hnd).p; } while (0)
> >
> > @@ -152,6 +153,57 @@ long arch_do_sysctl(
> >      }
> >      break;
> >
> > +    case XEN_SYSCTL_pqos_monitor_op:
> > +        if ( !pqos_monitor_enabled() )
> > +            return -ENODEV;
> > +
> > +        switch ( sysctl->u.pqos_monitor_op.cmd )
> > +        {
> > +        case XEN_SYSCTL_PQOS_MONITOR_cqm_enabled:
> > +            sysctl->u.pqos_monitor_op.data =
> > +                (pqosm->qm_features & QOS_MONITOR_TYPE_L3) &&
> > +                (pqosm->l3m.l3_features & L3_FEATURE_OCCUPANCY);
> > +            break;
> > +        case XEN_SYSCTL_PQOS_MONITOR_get_total_rmid:
> > +            sysctl->u.pqos_monitor_op.data =
> > +                pqosm->rmid_max - pqosm->rmid_min + 1;
> > +            break;
> > +        case XEN_SYSCTL_PQOS_MONITOR_get_l3_upscaling_factor:
> > +            sysctl->u.pqos_monitor_op.data =
> pqosm->l3m.upscaling_factor;
> > +            break;
> > +        case XEN_SYSCTL_PQOS_MONITOR_get_l3_cache_size:
> > +            sysctl->u.pqos_monitor_op.data =
> boot_cpu_data.x86_cache_size;
> > +            break;
> > +        case XEN_SYSCTL_PQOS_MONITOR_get_socket_cpu:
> > +        {
> > +            unsigned int i, cpu;
> > +            int socket = sysctl->u.pqos_monitor_op.data;
> 
> unsigned int

Okay.

> 
> > +
> > +            for ( i = 0; i < NR_CPUS; i++ )
> 
> Why NR_CPUS when a few lines down you show that you know of
> nr_cpu_ids?

Okay.

> 
> > +            {
> > +                if ( cpu_to_socket(i) < 0 || cpu_to_socket(i) != socket )
> > +                    continue;
> > +                cpu = cpumask_any(per_cpu(cpu_core_mask, i));
> > +                if ( cpu < nr_cpu_ids )
> > +                {
> > +                    sysctl->u.pqos_monitor_op.data = cpu;
> > +                    break;
> > +                }
> > +            }
> > +
> > +            if ( i == NR_CPUS )
> > +                ret = -EFAULT;
> 
> -EFAULT?

Here the intension is to tell we failed to select a CPU in this socket.
What errno do you prefer more?

> 
> > +        }
> > +        break;
> > +
> > +        default:
> > +            sysctl->u.pqos_monitor_op.data = 0;
> > +            ret = -ENOSYS;
> > +            break;
> > +        }
> > +        copyback = 1;
> > +    break;
> 
> Indentation.

Okay.

> 
> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -659,6 +659,18 @@ struct msr_access_info {
> >      xen_sysctl_msr_data_t *msr_data;
> >  };
> >
> > +#define XEN_SYSCTL_PQOS_MONITOR_get_total_rmid            0
> > +#define XEN_SYSCTL_PQOS_MONITOR_get_l3_upscaling_factor   1
> > +#define XEN_SYSCTL_PQOS_MONITOR_get_l3_cache_size         2
> > +#define XEN_SYSCTL_PQOS_MONITOR_get_socket_cpu            3
> > +#define XEN_SYSCTL_PQOS_MONITOR_cqm_enabled               4
> > +struct xen_sysctl_pqos_monitor_op {
> > +    uint32_t cmd;
> > +    uint64_t data;
> 
> Missing explicit padding between the two fields, or need to use
> uint64_aligned_t.

Okay.

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 7/9] x86: enable QoS monitoring for each domain RMID
  2014-06-20 15:20   ` Jan Beulich
@ 2014-06-23  6:55     ` Xu, Dongxiao
  0 siblings, 0 replies; 38+ messages in thread
From: Xu, Dongxiao @ 2014-06-23  6:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, June 20, 2014 11:20 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> Subject: Re: [PATCH v11 7/9] x86: enable QoS monitoring for each domain RMID
> 
> >>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> > --- a/xen/include/asm-x86/pqos.h
> > +++ b/xen/include/asm-x86/pqos.h
> > @@ -44,6 +44,11 @@ struct pqos_monitor {
> >  };
> >  extern struct pqos_monitor *pqosm;
> >
> > +struct pqr_assoc {
> > +    uint64_t val;
> > +    bool_t initialized;
> > +};
> 
> This is being used in only one source file and hence should be moved
> there.

Okay, thanks!

Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 6/9] x86: collect global QoS monitoring information
  2014-06-23  6:55     ` Xu, Dongxiao
@ 2014-06-23  7:06       ` Jan Beulich
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Beulich @ 2014-06-23  7:06 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

>>> On 23.06.14 at 08:55, <dongxiao.xu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
>> > +            {
>> > +                if ( cpu_to_socket(i) < 0 || cpu_to_socket(i) != socket )
>> > +                    continue;
>> > +                cpu = cpumask_any(per_cpu(cpu_core_mask, i));
>> > +                if ( cpu < nr_cpu_ids )
>> > +                {
>> > +                    sysctl->u.pqos_monitor_op.data = cpu;
>> > +                    break;
>> > +                }
>> > +            }
>> > +
>> > +            if ( i == NR_CPUS )
>> > +                ret = -EFAULT;
>> 
>> -EFAULT?
> 
> Here the intension is to tell we failed to select a CPU in this socket.
> What errno do you prefer more?

I suppose you know what -EFAULT means, so such a question should
not even need discussing. But to answer your question anyway, in
halfway comparable situations we tend to return -ESRCH, but -ENODEV
might also be suitable (-EINVAL, while generally also possible, is already
being used for way too many other cases, so I generally prefer less
ambiguous error codes). I hope you get the point: Which of possibly
multiple applicable error code you use is largely up to you, all I'm
asking for is that you don't use inapplicable ones.

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 3/9] tools: provide interface for generic MSR access
  2014-06-20 17:34   ` Konrad Rzeszutek Wilk
@ 2014-06-23  7:13     ` Jan Beulich
  2014-06-23 13:29       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2014-06-23  7:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, Dongxiao Xu, dgdegra

>>> On 20.06.14 at 19:34, <konrad.wilk@oracle.com> wrote:
> On Fri, Jun 20, 2014 at 10:31:44PM +0800, Dongxiao Xu wrote:
>> Xen added a new sysctl hypercall for generic MSR access, and this is the
>> tool side change to wrapper the hypercall into xc APIs.
> 
> I really like the idea that Jan surfaced. That is use an
> cpu bitmap for the operation (see xc_tbuf.c as an example)
> and my recent postings http://osdir.com/ml/general/2014-06/msg24386.html.
> 
> That way you can define exactly which CPUs you want to do it
> at and you can do one nice hypercall that will do it all at
> once for you.

CPU bitmap? Me? I don't think I ever suggested anything like that,
not the least because it would become problematic where to store
the data (would need to be either [potentially very] sparse arrays,
or need a complicated [or at least ugly] mechanism for each CPU to
find its slot).

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
  2014-06-23  6:45       ` Jan Beulich
@ 2014-06-23  7:29         ` Xu, Dongxiao
  2014-06-23  7:42           ` Jan Beulich
  2014-06-23 13:32           ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 38+ messages in thread
From: Xu, Dongxiao @ 2014-06-23  7:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, June 23, 2014 2:45 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> Subject: RE: [PATCH v11 1/9] x86: add generic MSR access hypercall
> 
> >>> On 23.06.14 at 08:27, <dongxiao.xu@intel.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, June 20, 2014 11:01 PM
> >> To: Xu, Dongxiao
> >> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> >> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> >> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> >> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> >> Subject: Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
> >>
> >> >>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> >> > Add a generic MSR access hypercall for tool stack or other components to
> >> > read or write certain system MSRs.
> >>
> >> If you want this to be usable by components other than the tool stack
> >> then this can't be a sysctl (would e.g. need to be a platform-op then).
> >
> > Per my understanding, the platform-op related hypercalls are used by Dom0
> > kernel.
> > While in this CQM scenario, we want libvirt/libxl/libxc to call such
> > hypercall to access the MSR, however I didn't see any usage of platform_op in
> > libxl/libxc side.
> >
> > Could you explain a bit more for it?
> 
> Platform ops are for use by Dom0, yes, but note the explicit omission
> of "kernel" in my reply. The fact that libxc currently doesn't use any
> doesn't mean it mustn't do so. The only question here that matter is
> what audience we see for the new functionality: If this is to be a
> purely tools interface, then a sysctl is fine. If the kernel is intended to
> also be able to use it (as I suggested), something other than a sysctl
> needs to be used (and I was merely _hinting_ at platform-op).

I got such concept in the file descriptor, and that's why I was thinking of platform-op is for dom0-kernel only...
Looking through all the current hypercalls, sysctl and platform-op are the most appropriate ones.

According to your word, if that's is not the limitation for Dom0 kernel only, I am okay to change it to platform-op wise.
BTW, do we need to modify the file header for platform_hypercall.c/h?

/******************************************************************************
 * platform_hypercall.c
 *
 * Hardware platform operations. Intended for use by domain-0 kernel.
 *
 * Copyright (c) 2002-2006, K Fraser
 */

> 
> >> > --- a/xen/arch/x86/sysctl.c
> >> > +++ b/xen/arch/x86/sysctl.c
> >> > @@ -66,10 +66,27 @@ void arch_do_physinfo(xen_sysctl_physinfo_t *pi)
> >> >          pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm_directio;
> >> >  }
> >> >
> >> > +static void msr_access_helper(void *data)
> >> > +{
> >> > +    struct msr_access_info *info = data;
> >> > +    xen_sysctl_msr_data_t *msr_data;
> >> > +    unsigned int i;
> >> > +
> >> > +    for ( i = 0; i < info->nr_ops; i++ )
> >> > +    {
> >> > +        msr_data = &info->msr_data[i];
> >> > +        if ( msr_data->cmd == XEN_SYSCTL_MSR_read )
> >> > +            rdmsrl(msr_data->msr, msr_data->val);
> >> > +        else if ( msr_data->cmd == XEN_SYSCTL_MSR_write )
> >> > +            wrmsrl(msr_data->msr, msr_data->val);
> >> > +    }
> >>
> >> Missing preemption check.
> >
> > Do you mean hypercall_preemption_check() here? Or the preemption
> between the
> > MSR accesses?
> 
> Not sure what distinction you make between the two (or what the
> second question is supposed to refer to). The list of operations can
> be arbitrarily long, and individual operations may also take arbitrary
> time (especially if this later gets extended to I/O port accesses,
> which may incur SMM invocation). Hence preemption checks between
> operations are needed (except in well-defined cases where the
> caller may request such check to be skipped).

Okay.

> 
> >> Anyway, together with the comments on the interface itself (further
> >> down) I think you'd be much better off basing this on
> >> continue_hypercall_on_cpu().
> >
> > continue_hypercall_on_cpu() function schedules a tasklet to run the specific
> > function on certain CPU.
> > However in our CQM case, the libxl/libxc functions want to get the result
> > immediately when the function returns, so that's why I didn't use
> > continue_hypercall_on_cpu().
> 
> The use of a tasklet doesn't mean the result would only be available in
> a deferred manner: The tasklet runs _before_ control transfers back
> to the caller.

Okay.

> 
> >> > --- a/xen/include/public/sysctl.h
> >> > +++ b/xen/include/public/sysctl.h
> >> > @@ -636,6 +636,29 @@ struct xen_sysctl_coverage_op {
> >> >  typedef struct xen_sysctl_coverage_op xen_sysctl_coverage_op_t;
> >> >  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_coverage_op_t);
> >> >
> >> > +#define XEN_SYSCTL_MSR_read                0
> >> > +#define XEN_SYSCTL_MSR_write               1
> >> > +
> >> > +struct xen_sysctl_msr_data {
> >> > +    uint32_t cmd;       /* XEN_SYSCTL_MSR_* */
> >> > +    uint32_t msr;
> >> > +    uint64_t val;
> >> > +};
> >> > +typedef struct xen_sysctl_msr_data xen_sysctl_msr_data_t;
> >> > +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_msr_data_t);
> >> > +struct xen_sysctl_msr_op {
> >> > +    uint32_t nr_ops;
> >>
> >> Jusr "nr" would suffice.
> >
> > Okay.
> >
> >>
> >> > +    uint32_t cpu;
> >>
> >> This is rather limiting - you should be able to specify a CPU for
> >> each operation. That way you'll have another 32-bit field available
> >> for flags: You indicated earlier on that you'd want certain
> >> operations (like a write followed by a read) pairable without
> >> potentially getting preempted in between. The field then freed up
> >> here should then be reserved for use as flags field too (i.e. you'd
> >> need to check that it's zero).
> >
> > Do you mean the following case?
> >
> > Op1: CPU0, write MSR 0x100. Flag: 1 (indicating paring)
> > Op2: CPU1, read MSR 0x200. Flag: 0.
> > Op3: CPU0, read MSR 0x101. Flag: 1 (paring the above op1)
> 
> No - CPU changes alway imply preemption between individual
> sub-ops.

Okay. Let's change the example a bit.

Op1: CPU0, write MSR 0x100.
Op2: CPU0, read MSR 0x101.
Op3: CPU1, read MSR 0x200.

Can we just follow the above example, if the two operations are "continuous" towards a target CPU, then we will pack them together via a single IPI?
Or do we really need the flag to indicate no-preemption there?

Thanks,
Dongxiao

> 
> > To avoid the preempt between certain operations, my previous solution is to
> > pack all the operations towards a certain CPU via a single IPI. When the
> > target CPU receives the IPI, it will execute the operations one by one
> > (suppose during the execution for loop, no preemption will happen).
> > Now if we specify a CPU for each operation, can we pack the operations
> > according to the target CPU, and finally issue one IPI per-target-CPU to avoid
> > preemption? That will get rid of such flag usage.
> 
> No, I don't think re-ordering operations can be permitted here.
> 
> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring feature
  2014-06-23  6:50       ` Jan Beulich
@ 2014-06-23  7:30         ` Xu, Dongxiao
  0 siblings, 0 replies; 38+ messages in thread
From: Xu, Dongxiao @ 2014-06-23  7:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, June 23, 2014 2:50 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> Subject: RE: [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring
> feature
> 
> >>> On 23.06.14 at 08:38, <dongxiao.xu@intel.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, June 20, 2014 11:04 PM
> >> To: Xu, Dongxiao
> >> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> >> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> >> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> >> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> >> Subject: Re: [PATCH v11 4/9] x86: detect and initialize Platform QoS
> > Monitoring
> >> feature
> >>
> >> >>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> >> > Detect platform QoS feature status and enumerate the resource types,
> >> > one of which is to monitor the L3 cache occupancy.
> >> >
> >> > Also introduce a Xen grub command line parameter to control the
> >> > QoS feature status.
> >>
> >> "grub"?
> >
> > Sorry, what's your point here? Not quite understand it...
> 
> I'm asking what the mentioning of "grub" here means. Xen does have
> command line parameters, but they can equally well be used when
> booting without GrUB, e.g. directly from UEFI. IOW saying "grub" in
> a statement like this is at best confusing, unless talk is of an option
> that's only affecting booting via GrUB (e.g. something affecting the
> handling of multiboot, albeit even in that case nothing dictates that
> only GrUB can implement this protocol).

Okay, got your point.

> 
> >> > +void __init init_platform_qos(void)
> >> > +{
> >> > +    if ( !opt_pqos )
> >> > +        return;
> >> > +
> >> > +    if ( opt_pqos_monitor && opt_rmid_max )
> >>
> >> Kind of pointless to split the two if()-s.
> >
> > The split is reserved for the following QoS features, such as CQE, etc.
> 
> Since that isn't part of this series, I'd suggest avoiding such odd
> looking constructs (unless you already have a follow-up series in the
> works, and a reasonably certain it'll make 4.5). Future patches can
> always split the one if() into multiple.

Okay.

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
  2014-06-23  7:29         ` Xu, Dongxiao
@ 2014-06-23  7:42           ` Jan Beulich
  2014-06-23 13:32           ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 38+ messages in thread
From: Jan Beulich @ 2014-06-23  7:42 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, dgdegra

>>> On 23.06.14 at 09:29, <dongxiao.xu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Platform ops are for use by Dom0, yes, but note the explicit omission
>> of "kernel" in my reply. The fact that libxc currently doesn't use any
>> doesn't mean it mustn't do so. The only question here that matter is
>> what audience we see for the new functionality: If this is to be a
>> purely tools interface, then a sysctl is fine. If the kernel is intended to
>> also be able to use it (as I suggested), something other than a sysctl
>> needs to be used (and I was merely _hinting_ at platform-op).
> 
> I got such concept in the file descriptor, and that's why I was thinking of 
> platform-op is for dom0-kernel only...
> Looking through all the current hypercalls, sysctl and platform-op are the 
> most appropriate ones.
> 
> According to your word, if that's is not the limitation for Dom0 kernel 
> only, I am okay to change it to platform-op wise.

The most obvious example where functionality shouldn't be limited to
the kernel would be the microcode update operation: Just like for
kexec there's really nothing requiring the user mode tool to use the
kernel as a relay for data it wants the hypervisor to process - it
could (and imo should) talk to the hypervisor directly, thus at once
ending the discussion about the need for a runtime kernel driver in
the pv-ops case under Xen.

> BTW, do we need to modify the file header for platform_hypercall.c/h?
> 
> /******************************************************************************
>  * platform_hypercall.c
>  *
>  * Hardware platform operations. Intended for use by domain-0 kernel.

We might, but I don't view this as strictly necessary. But if you do,
the public header would be even more relevant to adjust.

>> > Do you mean the following case?
>> >
>> > Op1: CPU0, write MSR 0x100. Flag: 1 (indicating paring)
>> > Op2: CPU1, read MSR 0x200. Flag: 0.
>> > Op3: CPU0, read MSR 0x101. Flag: 1 (paring the above op1)
>> 
>> No - CPU changes alway imply preemption between individual
>> sub-ops.
> 
> Okay. Let's change the example a bit.
> 
> Op1: CPU0, write MSR 0x100.
> Op2: CPU0, read MSR 0x101.
> Op3: CPU1, read MSR 0x200.
> 
> Can we just follow the above example, if the two operations are "continuous" 
> towards a target CPU, then we will pack them together via a single IPI?
> Or do we really need the flag to indicate no-preemption there?

Without explicit request for no preemption, a preemption check
should imo be put between any two sub-ops. And, as said before,
we should be rather rigid when it comes to noticing abuse of the
flag (i.e. a first implementation could not permit the flag to be set
in consecutive array elements).

Jan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 3/9] tools: provide interface for generic MSR access
  2014-06-23  7:13     ` Jan Beulich
@ 2014-06-23 13:29       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-23 13:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, Dongxiao Xu, dgdegra

On Mon, Jun 23, 2014 at 08:13:10AM +0100, Jan Beulich wrote:
> >>> On 20.06.14 at 19:34, <konrad.wilk@oracle.com> wrote:
> > On Fri, Jun 20, 2014 at 10:31:44PM +0800, Dongxiao Xu wrote:
> >> Xen added a new sysctl hypercall for generic MSR access, and this is the
> >> tool side change to wrapper the hypercall into xc APIs.
> > 
> > I really like the idea that Jan surfaced. That is use an
> > cpu bitmap for the operation (see xc_tbuf.c as an example)
> > and my recent postings http://osdir.com/ml/general/2014-06/msg24386.html.
> > 
> > That way you can define exactly which CPUs you want to do it
> > at and you can do one nice hypercall that will do it all at
> > once for you.
> 
> CPU bitmap? Me? I don't think I ever suggested anything like that,
> not the least because it would become problematic where to store
> the data (would need to be either [potentially very] sparse arrays,
> or need a complicated [or at least ugly] mechanism for each CPU to
> find its slot).

I am sorry for the assumption. In that case lets drop that idea.

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
  2014-06-23  7:29         ` Xu, Dongxiao
  2014-06-23  7:42           ` Jan Beulich
@ 2014-06-23 13:32           ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-23 13:32 UTC (permalink / raw)
  To: Xu, Dongxiao
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, xen-devel, Jan Beulich, dgdegra

On Mon, Jun 23, 2014 at 07:29:41AM +0000, Xu, Dongxiao wrote:
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Monday, June 23, 2014 2:45 PM
> > To: Xu, Dongxiao
> > Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> > George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> > stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> > konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> > Subject: RE: [PATCH v11 1/9] x86: add generic MSR access hypercall
> > 
> > >>> On 23.06.14 at 08:27, <dongxiao.xu@intel.com> wrote:
> > >>  -----Original Message-----
> > >> From: Jan Beulich [mailto:JBeulich@suse.com]
> > >> Sent: Friday, June 20, 2014 11:01 PM
> > >> To: Xu, Dongxiao
> > >> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> > >> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> > >> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> > >> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> > >> Subject: Re: [PATCH v11 1/9] x86: add generic MSR access hypercall
> > >>
> > >> >>> On 20.06.14 at 16:31, <dongxiao.xu@intel.com> wrote:
> > >> > Add a generic MSR access hypercall for tool stack or other components to
> > >> > read or write certain system MSRs.
> > >>
> > >> If you want this to be usable by components other than the tool stack
> > >> then this can't be a sysctl (would e.g. need to be a platform-op then).
> > >
> > > Per my understanding, the platform-op related hypercalls are used by Dom0
> > > kernel.
> > > While in this CQM scenario, we want libvirt/libxl/libxc to call such
> > > hypercall to access the MSR, however I didn't see any usage of platform_op in
> > > libxl/libxc side.
> > >
> > > Could you explain a bit more for it?
> > 
> > Platform ops are for use by Dom0, yes, but note the explicit omission
> > of "kernel" in my reply. The fact that libxc currently doesn't use any
> > doesn't mean it mustn't do so. The only question here that matter is
> > what audience we see for the new functionality: If this is to be a
> > purely tools interface, then a sysctl is fine. If the kernel is intended to
> > also be able to use it (as I suggested), something other than a sysctl
> > needs to be used (and I was merely _hinting_ at platform-op).
> 
> I got such concept in the file descriptor, and that's why I was thinking of platform-op is for dom0-kernel only...
> Looking through all the current hypercalls, sysctl and platform-op are the most appropriate ones.

Keep in mind that the sysctl one has a version field that is tied in with
the toolstack. So if you want this hypercall to work across different
versions (like the kernel has to), then you do _not_ want to use
sysctl.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 9/9] tools: CMDs and APIs for Platform QoS Monitoring
  2014-06-20 14:31 ` [PATCH v11 9/9] tools: CMDs and APIs for Platform QoS Monitoring Dongxiao Xu
@ 2014-06-23 15:22   ` Ian Jackson
  2014-06-27  7:15     ` Xu, Dongxiao
  0 siblings, 1 reply; 38+ messages in thread
From: Ian Jackson @ 2014-06-23 15:22 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, xen-devel, JBeulich, dgdegra

Dongxiao Xu writes ("[PATCH v11 9/9] tools: CMDs and APIs for Platform QoS Monitoring"):
> Introduced some new xl commands to handle the platform QoS monitoring
> related services.
> 
> The following two commands is to attach/detach the QoS monitoring
> service to/from a certain domain.

Thanks for this.  I'm no expert on the underlying layers.  But:

> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> index 30bd4bf..de7f0a8 100644
> --- a/docs/man/xl.pod.1
> +++ b/docs/man/xl.pod.1
> @@ -1366,6 +1366,27 @@ Load FLASK policy from the given policy file. The initial policy is provided to
>  the hypervisor as a multiboot module; this command allows runtime updates to the
>  policy. Loading new security policy will reset runtime changes to device labels.
>  
> +=head1 PLATFORM QOS MONITORING
> +
> +New Intel processor may offer monitoring capability in each logical
> +processor to measure specific quality-of-service metric.

This should not refer to a specific CPU vendor.

> +=over 4
> +
> +=item B<pqos-monitor-attach> [I<domain-id>]
> +
> +attach: Attach the platform QoS monitoring service to a domain.

The document should explain what effect this has and why it has to be
done separately.

> +=item B<pqos-monitor-detach> [I<domain-id>]
> +
> +detach: Detach the platform QoS monitoring service from a domain.
> +
> +=item B<pqos-monitor-show> [I<qos-monitor-type>] [I<domain-id>]
> +
> +Show monitoring data for a certain domain or all domains. Current supported
> +QoS monitor types are:
> + - "cqm": cache QoS monitoring, showing the L3 cache occupancy.

The documentation here is a bit sparse.  What is the output format ?
Does this program run continuously or is it one shot ?

I think "cqm" could reasonably be replaced by "cache", since the Q
(from QoS) and M (from Monitoring) are already in the rest of the
command.

> +int xc_pqos_monitor_attach(xc_interface *xch, uint32_t domid);
> +int xc_pqos_monitor_detach(xc_interface *xch, uint32_t domid);
> +int xc_pqos_monitor_get_domain_rmid(xc_interface *xch, uint32_t domid,
> +    uint32_t *rmid);
> +int xc_pqos_monitor_get_total_rmid(xc_interface *xch, uint32_t *total_rmid);
> +int xc_pqos_monitor_get_l3_upscaling_factor(xc_interface *xch,
> +    uint32_t *upscaling_factor);
> +int xc_pqos_monitor_get_l3_cache_size(xc_interface *xch,
> +    uint32_t *l3_cache_size);
> +int xc_pqos_monitor_select_socket_cpu(xc_interface *xch, uint32_t socket);
> +int xc_pqos_monitor_get_data(xc_interface *xch, uint32_t rmid,
> +    uint32_t cpu, uint32_t pqos_monitor_type, uint64_t *monitor_data);
> +int xc_pqos_monitor_cqm_enabled(xc_interface *xch);

I'm not the expert on libxc, but there seems be a lot of very
formulaic functions here which don't provide a great deal of (or, any)
functionality.  Perhaps it would be better to have libxl make the
sysctl directly ?

> +#if defined(__i386__) || defined(__x86_64__)
> +int libxl_pqos_monitor_attach(libxl_ctx *ctx, uint32_t domid);
> +int libxl_pqos_monitor_detach(libxl_ctx *ctx, uint32_t domid);
> +int libxl_pqos_monitor_domain_attached(libxl_ctx *ctx, uint32_t domid);
> +int libxl_pqos_monitor_cqm_enabled(libxl_ctx *ctx);

It's not usual in libxl to arch-#ifdef these kind of functions.

> +int libxl_pqos_monitor_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid);
> +int libxl_pqos_monitor_get_l3_cache_size(libxl_ctx *ctx,
> +    uint32_t *l3_cache_size);
> +int libxl_pqos_monitor_get_domain_cqm(libxl_ctx *ctx, uint32_t domid,
> +    uint32_t socketid, uint64_t *l3_cache_occupancy);

Why not one function to return the whole lot in a struct in the IDL ?

> +static const char * const msg[] = {
> +    [ENOSYS] = "unsupported operation",
> +    [ENODEV] = "Platform QoS Monitoring is not supported in this system",
> +    [EEXIST] = "Platform QoS Monitoring is already attached to this domain",
> +    [ENOENT] = "Platform QoS Monitoring is not attached to this domain",
> +    [EUSERS] = "there is no free RMID available",
> +    [ESRCH]  = "is this Domain ID valid?",
> +    [EFAULT] = "cannot find a valid CPU belonging to this socket",

I'm afraid this error handling approach is entirely wrong.

libxl functions should return libxl error codes.  You shouldn't be
using errno.

You should invent new libxl error codes as you need them.

I don't understand what ENOSYS might mean here.  If the operation is
not supported then it is unsupported.  What is the distinction between
that and ENODEV ?  One (or both) of them should probably be ERROR_NI.

For the others I think you may need to add new error codes to libxl in
libxl_types.idl.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 2/9] xsm: add MSR operation related xsm policy
  2014-06-20 14:31 ` [PATCH v11 2/9] xsm: add MSR operation related xsm policy Dongxiao Xu
@ 2014-06-24 19:20   ` Daniel De Graaf
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel De Graaf @ 2014-06-24 19:20 UTC (permalink / raw)
  To: Dongxiao Xu, xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich

On 06/20/2014 10:31 AM, Dongxiao Xu wrote:
> Add xsm policies for MSR access related hypercall.
>
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>

As an overall permissions check, this is workable, and is at least
suitable for an initial implementation.  I believe a more fine-grained
control over MSR operations will be needed in the long term so that a
disaggregated environment can take advantage of a feature that requires
these MSR operations.  Currently, allowing a domain the XEN2__MSR_OP
permission means that the domain is completely trusted by the
hypervisor, since the ability to write to arbitrary MSRs can be used to
gain execution in the hypervisor's context.  The most obvious method to
accomplish this would be to change IA32_SYSENTER_EIP while a PV domain
is running.

If a white-list of "safe" MSRs is maintained in the hypervisor while
processing the msr_op, then this concern goes away, but the name of the
permission should be changed to reflect the contents of the whitelist
(so, for this patchset, cqm_op or cqm_msr_op may be a better name).

If a generic MSR read/write operation is actually desired, the security
server should check each read/write with an MSR__READ/MSR__WRITE
permissions using a label for the individual MSR. The security server
would maintain lables for MSRs like it currently labels I/O ports and
PCI devices.  This is a significantly more complex change, but provides
maximum flexibility for handling access to arbitrary MSRs.

-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 8/9] xsm: add platform QoS related xsm policies
  2014-06-20 14:31 ` [PATCH v11 8/9] xsm: add platform QoS related xsm policies Dongxiao Xu
@ 2014-06-24 19:24   ` Daniel De Graaf
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel De Graaf @ 2014-06-24 19:24 UTC (permalink / raw)
  To: Dongxiao Xu, xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich

On 06/20/2014 10:31 AM, Dongxiao Xu wrote:
> Add xsm policies for QoS monitoring related hypercalls.
>
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>

The XEN_DOMCTL_pqos_monitor_op case was not actually handled in this
patch.

> ---
>   tools/flask/policy/policy/modules/xen/xen.if | 2 +-
>   tools/flask/policy/policy/modules/xen/xen.te | 3 ++-
>   xen/xsm/flask/hooks.c                        | 4 ++++
>   xen/xsm/flask/policy/access_vectors          | 4 ++++
>   4 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if
> index dedc035..5e7042c 100644
> --- a/tools/flask/policy/policy/modules/xen/xen.if
> +++ b/tools/flask/policy/policy/modules/xen/xen.if
> @@ -49,7 +49,7 @@ define(`create_domain_common', `
>   			getdomaininfo hypercall setvcpucontext setextvcpucontext
>   			getscheduler getvcpuinfo getvcpuextstate getaddrsize
>   			getaffinity setaffinity };
> -	allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim  set_max_evtchn };
> +	allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim set_max_evtchn pqos_monitor_op };
>   	allow $1 $2:security check_context;
>   	allow $1 $2:shadow enable;
>   	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op };
> diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
> index 0e63e76..41d1995 100644
> --- a/tools/flask/policy/policy/modules/xen/xen.te
> +++ b/tools/flask/policy/policy/modules/xen/xen.te
> @@ -66,6 +66,7 @@ allow dom0_t xen_t:xen {
>   };
>   allow dom0_t xen_t:xen2 {
>       msr_op
> +    pqos_monitor_op
>   };
>   allow dom0_t xen_t:mmu memorymap;
>
> @@ -79,7 +80,7 @@ allow dom0_t dom0_t:domain {
>   	getpodtarget setpodtarget set_misc_info set_virq_handler
>   };
>   allow dom0_t dom0_t:domain2 {
> -	set_cpuid gettsc settsc setscheduler set_max_evtchn
> +	set_cpuid gettsc settsc setscheduler set_max_evtchn pqos_monitor_op
>   };
>   allow dom0_t dom0_t:resource { add remove };
>
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 277a5de..97dc1a3 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -773,6 +773,10 @@ static int flask_sysctl(int cmd)
>       case XEN_SYSCTL_msr_op:
>           return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
>                                       XEN2__MSR_OP, NULL);
> +
> +    case XEN_SYSCTL_pqos_monitor_op:
> +        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
> +                                    XEN2__PQOS_MONITOR_OP, NULL);
>
>       default:
>           printk("flask_sysctl: Unknown op %d\n", cmd);
> diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
> index 82b5484..d573eea 100644
> --- a/xen/xsm/flask/policy/access_vectors
> +++ b/xen/xsm/flask/policy/access_vectors
> @@ -81,6 +81,8 @@ class xen2
>   {
>   # XEN_SYSCTL_msr_op
>       msr_op
> +# XEN_SYSCTL_pqos_monitor_op
> +    pqos_monitor_op
>   }
>
>   # Classes domain and domain2 consist of operations that a domain performs on
> @@ -208,6 +210,8 @@ class domain2
>       cacheflush
>   # Creation of the hardware domain when it is not dom0
>       create_hardware_domain
> +# XEN_DOMCTL_pqos_monitor_op
> +    pqos_monitor_op
>   }
>
>   # Similar to class domain, but primarily contains domctls related to HVM domains
>


-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 9/9] tools: CMDs and APIs for Platform QoS Monitoring
  2014-06-23 15:22   ` Ian Jackson
@ 2014-06-27  7:15     ` Xu, Dongxiao
  0 siblings, 0 replies; 38+ messages in thread
From: Xu, Dongxiao @ 2014-06-27  7:15 UTC (permalink / raw)
  To: Ian Jackson
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, xen-devel, JBeulich, dgdegra

> -----Original Message-----
> From: Ian Jackson [mailto:Ian.Jackson@eu.citrix.com]
> Sent: Monday, June 23, 2014 11:23 PM
> To: Xu, Dongxiao
> Cc: xen-devel@lists.xen.org; keir@xen.org; JBeulich@suse.com;
> Ian.Campbell@citrix.com; stefano.stabellini@eu.citrix.com;
> andrew.cooper3@citrix.com; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> George.Dunlap@eu.citrix.com
> Subject: Re: [PATCH v11 9/9] tools: CMDs and APIs for Platform QoS Monitoring
> 
> Dongxiao Xu writes ("[PATCH v11 9/9] tools: CMDs and APIs for Platform QoS
> Monitoring"):
> > Introduced some new xl commands to handle the platform QoS monitoring
> > related services.
> >
> > The following two commands is to attach/detach the QoS monitoring
> > service to/from a certain domain.
> 
> Thanks for this.  I'm no expert on the underlying layers.  But:

Ian, thanks for your review, and following are some feedback.

> 
> > diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> > index 30bd4bf..de7f0a8 100644
> > --- a/docs/man/xl.pod.1
> > +++ b/docs/man/xl.pod.1
> > @@ -1366,6 +1366,27 @@ Load FLASK policy from the given policy file. The
> initial policy is provided to
> >  the hypervisor as a multiboot module; this command allows runtime updates
> to the
> >  policy. Loading new security policy will reset runtime changes to device labels.
> >
> > +=head1 PLATFORM QOS MONITORING
> > +
> > +New Intel processor may offer monitoring capability in each logical
> > +processor to measure specific quality-of-service metric.
> 
> This should not refer to a specific CPU vendor.

Okay, will remove it the specific CPU vendor.

> 
> > +=over 4
> > +
> > +=item B<pqos-monitor-attach> [I<domain-id>]
> > +
> > +attach: Attach the platform QoS monitoring service to a domain.
> 
> The document should explain what effect this has and why it has to be
> done separately.

Okay, will state the reason in the patch.

> 
> > +=item B<pqos-monitor-detach> [I<domain-id>]
> > +
> > +detach: Detach the platform QoS monitoring service from a domain.
> > +
> > +=item B<pqos-monitor-show> [I<qos-monitor-type>] [I<domain-id>]
> > +
> > +Show monitoring data for a certain domain or all domains. Current supported
> > +QoS monitor types are:
> > + - "cqm": cache QoS monitoring, showing the L3 cache occupancy.
> 
> The documentation here is a bit sparse.  What is the output format ?

I checked its output after installed the xl, and the format looks okay, see below:

PLATFORM QOS MONITORING
       New Intel processor may offer monitoring capability in each logical processor to measure specific quality-of-service metric.

       pqos-monitor-attach [domain-id]
           Attach the platform QoS monitoring service to a domain.

       pqos-monitor-detach [domain-id]
           Detach the platform QoS monitoring service from a domain.

       pqos-monitor-show [qos-monitor-type] [domain-id]
           Show monitoring data for a certain domain or all domains. Current supported QoS monitor types are:
            - "cqm": cache QoS monitoring, showing the L3 cache occupancy.

> Does this program run continuously or is it one shot ?

The command is one-shot.

> 
> I think "cqm" could reasonably be replaced by "cache", since the Q
> (from QoS) and M (from Monitoring) are already in the rest of the
> command.

Would "cache_occupancy" be better?

> 
> > +int xc_pqos_monitor_attach(xc_interface *xch, uint32_t domid);
> > +int xc_pqos_monitor_detach(xc_interface *xch, uint32_t domid);
> > +int xc_pqos_monitor_get_domain_rmid(xc_interface *xch, uint32_t domid,
> > +    uint32_t *rmid);
> > +int xc_pqos_monitor_get_total_rmid(xc_interface *xch, uint32_t
> *total_rmid);
> > +int xc_pqos_monitor_get_l3_upscaling_factor(xc_interface *xch,
> > +    uint32_t *upscaling_factor);
> > +int xc_pqos_monitor_get_l3_cache_size(xc_interface *xch,
> > +    uint32_t *l3_cache_size);
> > +int xc_pqos_monitor_select_socket_cpu(xc_interface *xch, uint32_t socket);
> > +int xc_pqos_monitor_get_data(xc_interface *xch, uint32_t rmid,
> > +    uint32_t cpu, uint32_t pqos_monitor_type, uint64_t *monitor_data);
> > +int xc_pqos_monitor_cqm_enabled(xc_interface *xch);
> 
> I'm not the expert on libxc, but there seems be a lot of very
> formulaic functions here which don't provide a great deal of (or, any)
> functionality.  Perhaps it would be better to have libxl make the
> sysctl directly ?

There is always a libxc function to wrapper the hypercalls in current code.
This piece of code follow the current way...

> 
> > +#if defined(__i386__) || defined(__x86_64__)
> > +int libxl_pqos_monitor_attach(libxl_ctx *ctx, uint32_t domid);
> > +int libxl_pqos_monitor_detach(libxl_ctx *ctx, uint32_t domid);
> > +int libxl_pqos_monitor_domain_attached(libxl_ctx *ctx, uint32_t domid);
> > +int libxl_pqos_monitor_cqm_enabled(libxl_ctx *ctx);
> 
> It's not usual in libxl to arch-#ifdef these kind of functions.

Will remove the ifdef.

> 
> > +int libxl_pqos_monitor_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid);
> > +int libxl_pqos_monitor_get_l3_cache_size(libxl_ctx *ctx,
> > +    uint32_t *l3_cache_size);
> > +int libxl_pqos_monitor_get_domain_cqm(libxl_ctx *ctx, uint32_t domid,
> > +    uint32_t socketid, uint64_t *l3_cache_occupancy);
> 
> Why not one function to return the whole lot in a struct in the IDL ?

libxl_pqos_monitor_get_total_rmid() and libxl_pqos_monitor_get_l3_cache_size() may be called only once, while libxl_pqos_monitor_get_domain_cqm() will be called periodically.
This is why I split them...
> 
> > +static const char * const msg[] = {
> > +    [ENOSYS] = "unsupported operation",
> > +    [ENODEV] = "Platform QoS Monitoring is not supported in this system",
> > +    [EEXIST] = "Platform QoS Monitoring is already attached to this domain",
> > +    [ENOENT] = "Platform QoS Monitoring is not attached to this domain",
> > +    [EUSERS] = "there is no free RMID available",
> > +    [ESRCH]  = "is this Domain ID valid?",
> > +    [EFAULT] = "cannot find a valid CPU belonging to this socket",
> 
> I'm afraid this error handling approach is entirely wrong.
> 
> libxl functions should return libxl error codes.  You shouldn't be
> using errno.
> 
> You should invent new libxl error codes as you need them.

The errno is just to check the execution of inner libxc functions, and it will not act as the return value for libxl functions.
Currently the libxl_pqos_monitor_xxx functions return 0 on success, and " ERROR_FAIL " on failure.

> 
> I don't understand what ENOSYS might mean here.  If the operation is
> not supported then it is unsupported.  What is the distinction between
> that and ENODEV ?  One (or both) of them should probably be ERROR_NI.

Here the ENOSYS means platform QoS monitoring is enabled, however the tool stack is trying to get some value that system is not supported. You can refer to the 0006 patch.
Such errnos are just to enlight the error logging information in libxl functions, not the final return value. :)

Thanks,
Dongxiao

> 
> For the others I think you may need to add new error codes to libxl in
> libxl_types.idl.
> 
> Thanks,
> Ian.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 3/9] tools: provide interface for generic MSR access
  2014-06-20 14:31 ` [PATCH v11 3/9] tools: provide interface for generic MSR access Dongxiao Xu
  2014-06-20 17:34   ` Konrad Rzeszutek Wilk
@ 2014-06-27 13:05   ` Ian Campbell
  1 sibling, 0 replies; 38+ messages in thread
From: Ian Campbell @ 2014-06-27 13:05 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, stefano.stabellini, George.Dunlap, andrew.cooper3,
	Ian.Jackson, xen-devel, JBeulich, dgdegra

On Fri, 2014-06-20 at 22:31 +0800, Dongxiao Xu wrote:
> Xen added a new sysctl hypercall for generic MSR access, and this is the
> tool side change to wrapper the hypercall into xc APIs.
> 
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>

This looks like a plausible wrapper around a hypercall. So if the
hypervisor guys are happy with the underlying interface:

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2014-06-27 13:05 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-20 14:31 [PATCH v11 0/9] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
2014-06-20 14:31 ` [PATCH v11 1/9] x86: add generic MSR access hypercall Dongxiao Xu
2014-06-20 14:57   ` Andrew Cooper
2014-06-23  6:34     ` Xu, Dongxiao
2014-06-20 15:00   ` Jan Beulich
2014-06-23  6:27     ` Xu, Dongxiao
2014-06-23  6:45       ` Jan Beulich
2014-06-23  7:29         ` Xu, Dongxiao
2014-06-23  7:42           ` Jan Beulich
2014-06-23 13:32           ` Konrad Rzeszutek Wilk
2014-06-20 14:31 ` [PATCH v11 2/9] xsm: add MSR operation related xsm policy Dongxiao Xu
2014-06-24 19:20   ` Daniel De Graaf
2014-06-20 14:31 ` [PATCH v11 3/9] tools: provide interface for generic MSR access Dongxiao Xu
2014-06-20 17:34   ` Konrad Rzeszutek Wilk
2014-06-23  7:13     ` Jan Beulich
2014-06-23 13:29       ` Konrad Rzeszutek Wilk
2014-06-27 13:05   ` Ian Campbell
2014-06-20 14:31 ` [PATCH v11 4/9] x86: detect and initialize Platform QoS Monitoring feature Dongxiao Xu
2014-06-20 15:04   ` Jan Beulich
2014-06-23  6:38     ` Xu, Dongxiao
2014-06-23  6:50       ` Jan Beulich
2014-06-23  7:30         ` Xu, Dongxiao
2014-06-20 17:35   ` Konrad Rzeszutek Wilk
2014-06-20 14:31 ` [PATCH v11 5/9] x86: dynamically attach/detach QoS monitoring service for a guest Dongxiao Xu
2014-06-20 15:08   ` Jan Beulich
2014-06-23  6:43     ` Xu, Dongxiao
2014-06-20 14:31 ` [PATCH v11 6/9] x86: collect global QoS monitoring information Dongxiao Xu
2014-06-20 15:16   ` Jan Beulich
2014-06-23  6:55     ` Xu, Dongxiao
2014-06-23  7:06       ` Jan Beulich
2014-06-20 14:31 ` [PATCH v11 7/9] x86: enable QoS monitoring for each domain RMID Dongxiao Xu
2014-06-20 15:20   ` Jan Beulich
2014-06-23  6:55     ` Xu, Dongxiao
2014-06-20 14:31 ` [PATCH v11 8/9] xsm: add platform QoS related xsm policies Dongxiao Xu
2014-06-24 19:24   ` Daniel De Graaf
2014-06-20 14:31 ` [PATCH v11 9/9] tools: CMDs and APIs for Platform QoS Monitoring Dongxiao Xu
2014-06-23 15:22   ` Ian Jackson
2014-06-27  7:15     ` Xu, Dongxiao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.