All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs
@ 2015-01-28  8:04 Chao Peng
  2015-01-28  8:04 ` [PATCH v8 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op Chao Peng
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Chao Peng @ 2015-01-28  8:04 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, will.auld, JBeulich, keir

Changes from v7:
* Make obfuscating more complex as Jan suggested.
* Minor adjustment for commit message.

Changes from v6:
* Obfuscate the read value of MSR_IA32_TSC by adding a booting random;
* Minor coding style/comments adjustment;

Changes from v5:
* Remove common IRQ disable flag but instead disable IRQ when other MSR
  read followed by MSR_IA32_TSC read;
* Add comments for special handle for MSR_IA32_TSC;

Changes from v4:
* Make the counter read and timestamp read atomic by disable IRQ;
* Treat MSR_IA32_TSC as a special case and return NOW() for read path;
* Add MBM description in xl command line.

Changes from v3:
* Get timestamp information from host along with the monitoring counter;
  This is required for counter overlow detection.
* Address comments from Wei on the last patch.

Changes from v2:
* Remove the usage of "static" to cache data in xc;
  NOTE: Other places that already existed before are not touched due to
        the needs for API change. Will fix in separate patch if desirable.
* Coding style;

Changes from v1:
* Move event type check from xc to xl;
* Add retry capability for MBM sampling;
* Fix Coding style/docs;

Intel Memory Bandwidth Monitoring(MBM) is a new hardware feature
which builds on the CMT infrastructure to allow monitoring of system
memory bandwidth. Event codes are provided to monitor both "total"
and "local" bandwidth, meaning bandwidth over QPI and other external
links can be monitored.

For XEN, MBM is used to monitor memory bandwidth for VMs. Due to its
dependency on CMT, the software also makes use of most of CMT codes.
Actually, besides introducing two additional events and some cpuid
feature bits, there are no extra changes compared to cache occupancy
monitoring in CMT. Due to this, CMT should be enabled first to use
this feature.

For interface changes, the patch serial introduces a new command
"XEN_SYSCTL_PSR_CMT_get_l3_event_mask" which exposes MBM feature
capability to user space and modified "resource_op" to support reading
host system time together with the monitored counter.

On the tool stack side, two additional options introduced for
"xl psr-cmt-show":
total_mem_bandwidth:     Show total memory bandwidth
local_mem_bandwidth:     Show local memory bandwidth

The usage flow keeps the same with CMT.

Chao Peng (5):
  x86: allow reading MSR_IA32_TSC with XENPF_resource_op
  tools: add routine to get CMT L3 event mask
  tools: correct coding style for psr
  tools: code refactoring for MBM
  tools, docs: add total/local memory bandwith monitoring

 docs/man/xl.pod.1                   |   9 +++
 docs/misc/xen-command-line.markdown |   3 +
 tools/libxc/include/xenctrl.h       |  14 ++--
 tools/libxc/xc_msr_x86.h            |   1 +
 tools/libxc/xc_psr.c                |  60 +++++++++++---
 tools/libxl/libxl.h                 |  20 ++++-
 tools/libxl/libxl_psr.c             | 153 +++++++++++++++++++++++++++++++++---
 tools/libxl/libxl_types.idl         |   2 +
 tools/libxl/xl_cmdimpl.c            |  72 ++++++++++++-----
 tools/libxl/xl_cmdtable.c           |   4 +-
 xen/arch/x86/platform_hypercall.c   |  38 ++++++++-
 xen/common/random.c                 |   9 +++
 xen/include/public/platform.h       |  10 +++
 xen/include/xen/random.h            |   3 +
 14 files changed, 342 insertions(+), 56 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v8 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op
  2015-01-28  8:04 [PATCH v8 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
@ 2015-01-28  8:04 ` Chao Peng
  2015-01-28 15:42   ` Jan Beulich
  2015-01-28  8:04 ` [PATCH v8 2/5] tools: add routine to get CMT L3 event mask Chao Peng
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Chao Peng @ 2015-01-28  8:04 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, will.auld, JBeulich, keir

Memory bandwidth monitoring requires system time information returned
along with the monitoring counter to verify the correctness of the
counter value and to calculate the time elapsed between two samplings.

Add MSR_IA32_TSC to the read path and it returns scaled system time(ns)
instead of raw timestamp to elimanate the needs to convert. The return
time is obfuscated with booting random to eliminate the potential abuse
of it. RESOURCE_ACCESS_MAX_ENTRIES is also increased to 3 so MSR_IA32_TSC
can be used together with an MSR write/read operation pair.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
Changes in v8:
1. Make obfuscating more complex as Jan suggested.
2. Minor adjustment for commit message.
Changes in v7:
1. Obfuscate the read value of MSR_IA32_TSC by adding a booting random.
2. Minor coding style/comments adjustment.
Changes in v6:
1. Latch read and IRQ disable for MSR_IA32_TSC.
2. Add comments.
---
 xen/arch/x86/platform_hypercall.c | 37 ++++++++++++++++++++++++++++++++++---
 xen/common/random.c               |  9 +++++++++
 xen/include/public/platform.h     | 10 ++++++++++
 xen/include/xen/random.h          |  3 +++
 4 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 32f39b2..37ee18a 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -61,7 +61,7 @@ long cpu_down_helper(void *data);
 long core_parking_helper(void *data);
 uint32_t get_cur_idle_nums(void);
 
-#define RESOURCE_ACCESS_MAX_ENTRIES 2
+#define RESOURCE_ACCESS_MAX_ENTRIES 3
 struct xen_resource_access {
     unsigned int nr_done;
     unsigned int nr_entries;
@@ -75,6 +75,7 @@ static bool_t allow_access_msr(unsigned int msr)
     /* MSR for CMT, refer to chapter 17.14 of Intel SDM. */
     case MSR_IA32_CMT_EVTSEL:
     case MSR_IA32_CMT_CTR:
+    case MSR_IA32_TSC:
         return 1;
     }
 
@@ -124,6 +125,7 @@ static void resource_access(void *info)
 {
     struct xen_resource_access *ra = info;
     unsigned int i;
+    u64 tsc = 0;
 
     for ( i = 0; i < ra->nr_done; i++ )
     {
@@ -133,10 +135,40 @@ static void resource_access(void *info)
         switch ( entry->u.cmd )
         {
         case XEN_RESOURCE_OP_MSR_READ:
-            ret = rdmsr_safe(entry->idx, entry->val);
+            if ( unlikely(entry->idx == MSR_IA32_TSC) )
+            {
+                /* Return obfuscated scaled time instead of raw timestamp */
+                entry->val = get_s_time_fixed(tsc)
+                             + SECONDS(boot_random) - boot_random;
+                ret = 0;
+            }
+            else
+            {
+                unsigned long irqflags;
+                /*
+                 * If next entry is MSR_IA32_TSC read, then the actual rdtscll
+                 * is performed together with current entry, with IRQ disabled.
+                 */
+                bool_t read_tsc = (i < ra->nr_done - 1 &&
+                                   unlikely(entry[1].idx == MSR_IA32_TSC));
+
+                if ( unlikely(read_tsc) )
+                    local_irq_save(irqflags);
+
+                ret = rdmsr_safe(entry->idx, entry->val);
+
+                if ( unlikely(read_tsc) )
+                {
+                    rdtscll(tsc);
+                    local_irq_restore(irqflags);
+                }
+            }
             break;
         case XEN_RESOURCE_OP_MSR_WRITE:
-            ret = wrmsr_safe(entry->idx, entry->val);
+            if ( unlikely(entry->idx == MSR_IA32_TSC) )
+                ret = -EPERM;
+            else
+                ret = wrmsr_safe(entry->idx, entry->val);
             break;
         default:
             BUG();
diff --git a/xen/common/random.c b/xen/common/random.c
index 4a28a24..cb9187c 100644
--- a/xen/common/random.c
+++ b/xen/common/random.c
@@ -1,9 +1,11 @@
+#include <xen/init.h>
 #include <xen/percpu.h>
 #include <xen/random.h>
 #include <xen/time.h>
 #include <asm/random.h>
 
 static DEFINE_PER_CPU(unsigned int, seed);
+unsigned int __read_mostly boot_random;
 
 unsigned int get_random(void)
 {
@@ -27,3 +29,10 @@ unsigned int get_random(void)
 
     return val;
 }
+
+static int __init init_boot_random(void)
+{
+    boot_random = get_random();
+    return 0;
+}
+__initcall(init_boot_random);
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 5c57615..3e340b4 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -540,6 +540,16 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
 #define XEN_RESOURCE_OP_MSR_READ  0
 #define XEN_RESOURCE_OP_MSR_WRITE 1
 
+/*
+ * Specially handled MSRs:
+ * - MSR_IA32_TSC
+ * READ: Returns the scaled system time(ns) instead of raw timestamp. In
+ *       multiple entry case, if other MSR read is followed by a MSR_IA32_TSC
+ *       read, then both reads are guaranteed to be performed atomically (with
+ *       IRQ disabled). The return time indicates the point of reading that MSR.
+ * WRITE: Not supported.
+ */
+
 struct xenpf_resource_entry {
     union {
         uint32_t cmd;   /* IN: XEN_RESOURCE_OP_* */
diff --git a/xen/include/xen/random.h b/xen/include/xen/random.h
index 7c43d87..b950f03 100644
--- a/xen/include/xen/random.h
+++ b/xen/include/xen/random.h
@@ -3,4 +3,7 @@
 
 unsigned int get_random(void);
 
+/* The value keeps unchange once initialized for each booting */
+extern unsigned int boot_random;
+
 #endif /* __XEN_RANDOM_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 2/5] tools: add routine to get CMT L3 event mask
  2015-01-28  8:04 [PATCH v8 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
  2015-01-28  8:04 ` [PATCH v8 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op Chao Peng
@ 2015-01-28  8:04 ` Chao Peng
  2015-01-28 13:40   ` Ian Campbell
  2015-01-28  8:04 ` [PATCH v8 3/5] tools: correct coding style for psr Chao Peng
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Chao Peng @ 2015-01-28  8:04 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, will.auld, JBeulich, keir

This is the tools side wrapper for XEN_SYSCTL_PSR_CMT_get_l3_event_mask
of XEN_SYSCTL_psr_cmt_op.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 tools/libxc/include/xenctrl.h |  1 +
 tools/libxc/xc_psr.c          | 17 +++++++++++++++++
 tools/libxl/libxl.h           |  1 +
 tools/libxl/libxl_psr.c       | 15 +++++++++++++++
 4 files changed, 34 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0ad8b8d..96b357c 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2697,6 +2697,7 @@ int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid,
 int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid);
 int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
     uint32_t *upscaling_factor);
+int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
     uint32_t *l3_cache_size);
 int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid,
diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index 872e6dc..ac19fe4 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -112,6 +112,23 @@ int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
     return rc;
 }
 
+int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask)
+{
+    int rc;
+    DECLARE_SYSCTL;
+
+    sysctl.cmd = XEN_SYSCTL_psr_cmt_op;
+    sysctl.u.psr_cmt_op.cmd =
+        XEN_SYSCTL_PSR_CMT_get_l3_event_mask;
+    sysctl.u.psr_cmt_op.flags = 0;
+
+    rc = xc_sysctl(xch, &sysctl);
+    if ( !rc )
+        *event_mask = sysctl.u.psr_cmt_op.u.data;
+
+    return rc;
+}
+
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
                                       uint32_t *l3_cache_size)
 {
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 0a123f1..c9a64f9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1456,6 +1456,7 @@ int libxl_psr_cmt_enabled(libxl_ctx *ctx);
 int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid);
 int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid,
     uint32_t *l3_cache_size);
+int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask);
 int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
     uint32_t socketid, uint32_t *l3_cache_occupancy);
 #endif
diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index 0437465..07f2aee 100644
--- a/tools/libxl/libxl_psr.c
+++ b/tools/libxl/libxl_psr.c
@@ -160,6 +160,21 @@ out:
     return rc;
 }
 
+int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask)
+{
+    GC_INIT(ctx);
+    int rc;
+
+    rc = xc_psr_cmt_get_l3_event_mask(ctx->xch, event_mask);
+    if (rc < 0) {
+        libxl__psr_cmt_log_err_msg(gc, errno);
+        rc = ERROR_FAIL;
+    }
+
+    GC_FREE;
+    return rc;
+}
+
 int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
     uint32_t socketid, uint32_t *l3_cache_occupancy)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 3/5] tools: correct coding style for psr
  2015-01-28  8:04 [PATCH v8 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
  2015-01-28  8:04 ` [PATCH v8 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op Chao Peng
  2015-01-28  8:04 ` [PATCH v8 2/5] tools: add routine to get CMT L3 event mask Chao Peng
@ 2015-01-28  8:04 ` Chao Peng
  2015-01-28  8:04 ` [PATCH v8 4/5] tools: code refactoring for MBM Chao Peng
  2015-01-28  8:04 ` [PATCH v8 5/5] tools, docs: add total/local memory bandwith monitoring Chao Peng
  4 siblings, 0 replies; 14+ messages in thread
From: Chao Peng @ 2015-01-28  8:04 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, will.auld, JBeulich, keir

- space: remove space after '(' or before ')' in 'if' condition;
- indention: align function definition/call arguments;

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xenctrl.h | 10 +++++-----
 tools/libxc/xc_psr.c          | 10 +++++-----
 tools/libxl/libxl.h           | 11 +++++++----
 tools/libxl/libxl_psr.c       | 11 +++++++----
 tools/libxl/xl_cmdimpl.c      | 11 ++++++-----
 5 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 96b357c..8a0eab6 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2693,15 +2693,15 @@ typedef enum xc_psr_cmt_type xc_psr_cmt_type;
 int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid);
 int xc_psr_cmt_detach(xc_interface *xch, uint32_t domid);
 int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid,
-    uint32_t *rmid);
+                               uint32_t *rmid);
 int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid);
 int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
-    uint32_t *upscaling_factor);
+                                       uint32_t *upscaling_factor);
 int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
-    uint32_t *l3_cache_size);
-int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid,
-    uint32_t cpu, uint32_t psr_cmt_type, uint64_t *monitor_data);
+                                 uint32_t *l3_cache_size);
+int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
+                        uint32_t psr_cmt_type, uint64_t *monitor_data);
 int xc_psr_cmt_enabled(xc_interface *xch);
 #endif
 
diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index ac19fe4..a9881a4 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -47,7 +47,7 @@ int xc_psr_cmt_detach(xc_interface *xch, uint32_t domid)
 }
 
 int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid,
-                                    uint32_t *rmid)
+                               uint32_t *rmid)
 {
     int rc;
     DECLARE_DOMCTL;
@@ -88,7 +88,7 @@ int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid)
 }
 
 int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
-                                            uint32_t *upscaling_factor)
+                                       uint32_t *upscaling_factor)
 {
     static int val = 0;
     int rc;
@@ -130,7 +130,7 @@ int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask)
 }
 
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
-                                      uint32_t *l3_cache_size)
+                                 uint32_t *l3_cache_size)
 {
     static int val = 0;
     int rc;
@@ -155,8 +155,8 @@ int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
     return rc;
 }
 
-int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid,
-    uint32_t cpu, xc_psr_cmt_type type, uint64_t *monitor_data)
+int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
+                        xc_psr_cmt_type type, uint64_t *monitor_data)
 {
     xc_resource_op_t op;
     xc_resource_entry_t entries[2];
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index c9a64f9..596d2a0 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1454,11 +1454,14 @@ int libxl_psr_cmt_detach(libxl_ctx *ctx, uint32_t domid);
 int libxl_psr_cmt_domain_attached(libxl_ctx *ctx, uint32_t domid);
 int libxl_psr_cmt_enabled(libxl_ctx *ctx);
 int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid);
-int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid,
-    uint32_t *l3_cache_size);
+int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx,
+                                    uint32_t socketid,
+                                    uint32_t *l3_cache_size);
 int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask);
-int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
-    uint32_t socketid, uint32_t *l3_cache_occupancy);
+int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
+                                      uint32_t domid,
+                                      uint32_t socketid,
+                                      uint32_t *l3_cache_occupancy);
 #endif
 
 /* misc */
diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index 07f2aee..84819e6 100644
--- a/tools/libxl/libxl_psr.c
+++ b/tools/libxl/libxl_psr.c
@@ -135,8 +135,9 @@ int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid)
     return rc;
 }
 
-int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid,
-                                         uint32_t *l3_cache_size)
+int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx,
+                                    uint32_t socketid,
+                                    uint32_t *l3_cache_size)
 {
     GC_INIT(ctx);
 
@@ -175,8 +176,10 @@ int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask)
     return rc;
 }
 
-int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
-    uint32_t socketid, uint32_t *l3_cache_occupancy)
+int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
+                                      uint32_t domid,
+                                      uint32_t socketid,
+                                      uint32_t *l3_cache_occupancy)
 {
     GC_INIT(ctx);
 
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 0b02a6c..8fce979 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7808,7 +7808,7 @@ out:
 
 #ifdef LIBXL_HAVE_PSR_CMT
 static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo,
-                                                    uint32_t nr_sockets)
+                                                 uint32_t nr_sockets)
 {
     char *domain_name;
     uint32_t socketid;
@@ -7822,8 +7822,8 @@ static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo,
     free(domain_name);
 
     for (socketid = 0; socketid < nr_sockets; socketid++) {
-        if ( !libxl_psr_cmt_get_cache_occupancy(ctx, dominfo->domid,
-                 socketid, &l3_cache_occupancy) )
+        if (!libxl_psr_cmt_get_cache_occupancy(ctx, dominfo->domid, socketid,
+                                               &l3_cache_occupancy))
             printf("%13u KB", l3_cache_occupancy);
     }
 
@@ -7871,8 +7871,9 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid)
     for (socketid = 0; socketid < nr_sockets; socketid++) {
         rc = libxl_psr_cmt_get_l3_cache_size(ctx, socketid, &l3_cache_size);
         if (rc < 0) {
-            fprintf(stderr, "Failed to get system l3 cache size for socket:%d\n",
-                            socketid);
+            fprintf(stderr,
+                    "Failed to get system l3 cache size for socket:%d\n",
+                    socketid);
             return -1;
         }
         printf("%13u KB", l3_cache_size);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 4/5] tools: code refactoring for MBM
  2015-01-28  8:04 [PATCH v8 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
                   ` (2 preceding siblings ...)
  2015-01-28  8:04 ` [PATCH v8 3/5] tools: correct coding style for psr Chao Peng
@ 2015-01-28  8:04 ` Chao Peng
  2015-01-28  8:04 ` [PATCH v8 5/5] tools, docs: add total/local memory bandwith monitoring Chao Peng
  4 siblings, 0 replies; 14+ messages in thread
From: Chao Peng @ 2015-01-28  8:04 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, will.auld, JBeulich, keir

Make some internal routines common so that total/local memory bandwidth
monitoring in the next patch can make use of them.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_psr.c  | 44 +++++++++++++++++++++++++--------------
 tools/libxl/xl_cmdimpl.c | 54 ++++++++++++++++++++++++++++--------------------
 2 files changed, 61 insertions(+), 37 deletions(-)

diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index 84819e6..c88c421 100644
--- a/tools/libxl/libxl_psr.c
+++ b/tools/libxl/libxl_psr.c
@@ -176,20 +176,16 @@ int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask)
     return rc;
 }
 
-int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
-                                      uint32_t domid,
-                                      uint32_t socketid,
-                                      uint32_t *l3_cache_occupancy)
+static int libxl__psr_cmt_get_l3_monitoring_data(libxl__gc *gc,
+                                                 uint32_t domid,
+                                                 xc_psr_cmt_type type,
+                                                 uint32_t socketid,
+                                                 uint64_t *data)
 {
-    GC_INIT(ctx);
-
     unsigned int rmid;
-    uint32_t upscaling_factor;
-    uint64_t monitor_data;
     int cpu, rc;
-    xc_psr_cmt_type type;
 
-    rc = xc_psr_cmt_get_domain_rmid(ctx->xch, domid, &rmid);
+    rc = xc_psr_cmt_get_domain_rmid(CTX->xch, domid, &rmid);
     if (rc < 0 || rmid == 0) {
         LOGE(ERROR, "fail to get the domain rmid, "
             "or domain is not attached with platform QoS monitoring service");
@@ -204,14 +200,32 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
         goto out;
     }
 
-    type = XC_PSR_CMT_L3_OCCUPANCY;
-    rc = xc_psr_cmt_get_data(ctx->xch, rmid, cpu, type, &monitor_data);
+    rc = xc_psr_cmt_get_data(CTX->xch, rmid, cpu, type, data);
     if (rc < 0) {
         LOGE(ERROR, "failed to get monitoring data");
         rc = ERROR_FAIL;
-        goto out;
     }
 
+out:
+    return rc;
+}
+
+int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
+                                      uint32_t domid,
+                                      uint32_t socketid,
+                                      uint32_t *l3_cache_occupancy)
+{
+    GC_INIT(ctx);
+    uint64_t data;
+    uint32_t upscaling_factor;
+    int rc;
+
+    rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid,
+                                               XC_PSR_CMT_L3_OCCUPANCY,
+                                               socketid, &data);
+    if (rc < 0)
+            goto out;
+
     rc = xc_psr_cmt_get_l3_upscaling_factor(ctx->xch, &upscaling_factor);
     if (rc < 0) {
         LOGE(ERROR, "failed to get L3 upscaling factor");
@@ -219,8 +233,8 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
         goto out;
     }
 
-    *l3_cache_occupancy = upscaling_factor * monitor_data / 1024;
-    rc = 0;
+    *l3_cache_occupancy = upscaling_factor * data / 1024;
+
 out:
     GC_FREE;
     return rc;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 8fce979..1827c63 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7807,12 +7807,13 @@ out:
 }
 
 #ifdef LIBXL_HAVE_PSR_CMT
-static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo,
-                                                 uint32_t nr_sockets)
+static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo,
+                                         libxl_psr_cmt_type type,
+                                         uint32_t nr_sockets)
 {
     char *domain_name;
     uint32_t socketid;
-    uint32_t l3_cache_occupancy;
+    uint32_t data;
 
     if (!libxl_psr_cmt_domain_attached(ctx, dominfo->domid))
         return;
@@ -7822,15 +7823,21 @@ static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo,
     free(domain_name);
 
     for (socketid = 0; socketid < nr_sockets; socketid++) {
-        if (!libxl_psr_cmt_get_cache_occupancy(ctx, dominfo->domid, socketid,
-                                               &l3_cache_occupancy))
-            printf("%13u KB", l3_cache_occupancy);
+        switch (type) {
+        case LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY:
+            if (!libxl_psr_cmt_get_cache_occupancy(ctx, dominfo->domid,
+                                                   socketid, &data))
+                printf("%13u KB", data);
+            break;
+        default:
+            return;
+        }
     }
 
     printf("\n");
 }
 
-static int psr_cmt_show_cache_occupancy(uint32_t domid)
+static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
 {
     uint32_t i, socketid, nr_sockets, total_rmid;
     uint32_t l3_cache_size;
@@ -7866,19 +7873,22 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid)
         printf("%14s %d", "Socket", socketid);
     printf("\n");
 
-    /* Total L3 cache size */
-    printf("%-46s", "Total L3 Cache Size");
-    for (socketid = 0; socketid < nr_sockets; socketid++) {
-        rc = libxl_psr_cmt_get_l3_cache_size(ctx, socketid, &l3_cache_size);
-        if (rc < 0) {
-            fprintf(stderr,
-                    "Failed to get system l3 cache size for socket:%d\n",
-                    socketid);
-            return -1;
-        }
-        printf("%13u KB", l3_cache_size);
+    if (type == LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY) {
+            /* Total L3 cache size */
+            printf("%-46s", "Total L3 Cache Size");
+            for (socketid = 0; socketid < nr_sockets; socketid++) {
+                rc = libxl_psr_cmt_get_l3_cache_size(ctx, socketid,
+                                                     &l3_cache_size);
+                if (rc < 0) {
+                    fprintf(stderr,
+                            "Failed to get system l3 cache size for socket:%d\n",
+                            socketid);
+                    return -1;
+                }
+                printf("%13u KB", l3_cache_size);
+            }
+            printf("\n");
     }
-    printf("\n");
 
     /* Each domain */
     if (domid != INVALID_DOMID) {
@@ -7887,7 +7897,7 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid)
             fprintf(stderr, "Failed to get domain info for %d\n", domid);
             return -1;
         }
-        psr_cmt_print_domain_cache_occupancy(&dominfo, nr_sockets);
+        psr_cmt_print_domain_l3_info(&dominfo, type, nr_sockets);
     }
     else
     {
@@ -7897,7 +7907,7 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid)
             return -1;
         }
         for (i = 0; i < nr_domains; i++)
-            psr_cmt_print_domain_cache_occupancy(list + i, nr_sockets);
+            psr_cmt_print_domain_l3_info(list + i, type, nr_sockets);
         libxl_dominfo_list_free(list, nr_domains);
     }
     return 0;
@@ -7956,7 +7966,7 @@ int main_psr_cmt_show(int argc, char **argv)
 
     switch (type) {
     case LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY:
-        ret = psr_cmt_show_cache_occupancy(domid);
+        ret = psr_cmt_show_l3_info(type, domid);
         break;
     default:
         help("psr-cmt-show");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 5/5] tools, docs: add total/local memory bandwith monitoring
  2015-01-28  8:04 [PATCH v8 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
                   ` (3 preceding siblings ...)
  2015-01-28  8:04 ` [PATCH v8 4/5] tools: code refactoring for MBM Chao Peng
@ 2015-01-28  8:04 ` Chao Peng
  2015-01-28 14:04   ` Ian Campbell
  4 siblings, 1 reply; 14+ messages in thread
From: Chao Peng @ 2015-01-28  8:04 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, will.auld, JBeulich, keir

Add Memory Bandwidth Monitoring(MBM) for VMs. Two types of monitoring
are supported: total and local memory bandwidth monitoring. To use it,
CMT should be enabled in hypervisor.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
Changes in v6:
1. Remove DISABLE_IRQ flag as hypervisor disable IRQ for MSR_IA32_TSC
   implicitly.
Changes in v5:
1. Add MBM description in xen command line.
2. Use the tsc from hypervisor directly which is already ns.
3. Call resource_op with DISABLE_IRQ flag.
Changes in v4:
1. Get timestamp from hypervisor and use that for bandwidth calculation.
2. Minor document and coding style fix.
---
 docs/man/xl.pod.1                   |   9 ++++
 docs/misc/xen-command-line.markdown |   3 ++
 tools/libxc/include/xenctrl.h       |   5 +-
 tools/libxc/xc_msr_x86.h            |   1 +
 tools/libxc/xc_psr.c                |  35 +++++++++++--
 tools/libxl/libxl.h                 |   8 +++
 tools/libxl/libxl_psr.c             | 101 ++++++++++++++++++++++++++++++++++--
 tools/libxl/libxl_types.idl         |   2 +
 tools/libxl/xl_cmdimpl.c            |  21 +++++++-
 tools/libxl/xl_cmdtable.c           |   4 +-
 10 files changed, 178 insertions(+), 11 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 6b89ba8..50759c0 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1461,6 +1461,13 @@ is domain level. To monitor a specific domain, just attach the domain id with
 the monitoring service. When the domain doesn't need to be monitored any more,
 detach the domain id from the monitoring service.
 
+Intel Broadwell and later server platforms also offer total/local memory
+bandwidth monitoring. Xen supports per-domain monitoring for these two
+additional monitoring types. Both memory bandwidth monitoring and L3 cache
+occupancy monitoring share the same set of underlying monitoring service. Once
+a domain is attached to the monitoring service, monitoring data can be showed
+for any of these monitoring types.
+
 =over 4
 
 =item B<psr-cmt-attach> [I<domain-id>]
@@ -1476,6 +1483,8 @@ detach: Detach the platform shared resource monitoring service from a domain.
 Show monitoring data for a certain domain or all domains. Current supported
 monitor types are:
  - "cache-occupancy": showing the L3 cache occupancy.
+ - "total-mem-bandwidth": showing the total memory bandwidth.
+ - "local-mem-bandwidth": showing the local memory bandwidth.
 
 =back
 
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index a061aa4..0491dbb 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1097,6 +1097,9 @@ The following resources are available:
   L3 cache occupancy.
   * `cmt` instructs Xen to enable/disable Cache Monitoring Technology.
   * `rmid_max` indicates the max value for rmid.
+* Memory Bandwidth Monitoring (Broadwell and later). Information regarding the
+  total/local memory bandwidth. Follow the same options with Cache Monitoring
+  Technology.
 
 ### reboot
 > `= t[riple] | k[bd] | a[cpi] | p[ci] | n[o] [, [w]arm | [c]old]`
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 8a0eab6..8f964ad 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2688,6 +2688,8 @@ int xc_resource_op(xc_interface *xch, uint32_t nr_ops, xc_resource_op_t *ops);
 #if defined(__i386__) || defined(__x86_64__)
 enum xc_psr_cmt_type {
     XC_PSR_CMT_L3_OCCUPANCY,
+    XC_PSR_CMT_TOTAL_MEM_BANDWIDTH,
+    XC_PSR_CMT_LOCAL_MEM_BANDWIDTH,
 };
 typedef enum xc_psr_cmt_type xc_psr_cmt_type;
 int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid);
@@ -2701,7 +2703,8 @@ int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
                                  uint32_t *l3_cache_size);
 int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
-                        uint32_t psr_cmt_type, uint64_t *monitor_data);
+                        uint32_t psr_cmt_type, uint64_t *monitor_data,
+                        uint64_t *tsc);
 int xc_psr_cmt_enabled(xc_interface *xch);
 #endif
 
diff --git a/tools/libxc/xc_msr_x86.h b/tools/libxc/xc_msr_x86.h
index 7c3e1a3..7f100e7 100644
--- a/tools/libxc/xc_msr_x86.h
+++ b/tools/libxc/xc_msr_x86.h
@@ -20,6 +20,7 @@
 #ifndef XC_MSR_X86_H
 #define XC_MSR_X86_H
 
+#define MSR_IA32_TSC            0x00000010
 #define MSR_IA32_CMT_EVTSEL     0x00000c8d
 #define MSR_IA32_CMT_CTR        0x00000c8e
 
diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index a9881a4..bf9ec72 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -23,6 +23,8 @@
 #define IA32_CMT_CTR_ERROR_MASK         (0x3ull << 62)
 
 #define EVTID_L3_OCCUPANCY             0x1
+#define EVTID_TOTAL_MEM_BANDWIDTH      0x2
+#define EVTID_LOCAL_MEM_BANDWIDTH      0x3
 
 int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid)
 {
@@ -156,11 +158,12 @@ int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
 }
 
 int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
-                        xc_psr_cmt_type type, uint64_t *monitor_data)
+                        xc_psr_cmt_type type, uint64_t *monitor_data,
+                        uint64_t *tsc)
 {
     xc_resource_op_t op;
-    xc_resource_entry_t entries[2];
-    uint32_t evtid;
+    xc_resource_entry_t entries[3];
+    uint32_t evtid, nr_entries;
     int rc;
 
     switch ( type )
@@ -168,6 +171,12 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
     case XC_PSR_CMT_L3_OCCUPANCY:
         evtid = EVTID_L3_OCCUPANCY;
         break;
+    case XC_PSR_CMT_TOTAL_MEM_BANDWIDTH:
+        evtid = EVTID_TOTAL_MEM_BANDWIDTH;
+        break;
+    case XC_PSR_CMT_LOCAL_MEM_BANDWIDTH:
+        evtid = EVTID_LOCAL_MEM_BANDWIDTH;
+        break;
     default:
         return -1;
     }
@@ -182,19 +191,35 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
     entries[1].val = 0;
     entries[1].rsvd = 0;
 
+    if ( type == XC_PSR_CMT_L3_OCCUPANCY )
+        nr_entries = 2;
+    else
+    {
+        entries[2].u.cmd = XEN_RESOURCE_OP_MSR_READ;
+        entries[2].idx = MSR_IA32_TSC;
+        entries[2].val = 0;
+        entries[2].rsvd = 0;
+
+        nr_entries = 3;
+    }
+
     op.cpu = cpu;
-    op.nr_entries = 2;
+    op.nr_entries = nr_entries;
     op.entries = entries;
 
     rc = xc_resource_op(xch, 1, &op);
     if ( rc < 0 )
         return rc;
 
-    if ( op.result !=2 || entries[1].val & IA32_CMT_CTR_ERROR_MASK )
+    if ( op.result != nr_entries || entries[1].val & IA32_CMT_CTR_ERROR_MASK )
         return -1;
 
     *monitor_data = entries[1].val;
 
+    if ( type == XC_PSR_CMT_TOTAL_MEM_BANDWIDTH ||
+         type == XC_PSR_CMT_LOCAL_MEM_BANDWIDTH )
+        *tsc = entries[2].val;
+
     return 0;
 }
 
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 596d2a0..347ef52 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1462,6 +1462,14 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
                                       uint32_t domid,
                                       uint32_t socketid,
                                       uint32_t *l3_cache_occupancy);
+int libxl_psr_cmt_get_total_mem_bandwidth(libxl_ctx *ctx,
+                                          uint32_t domid,
+                                          uint32_t socketid,
+                                          uint32_t *bandwidth);
+int libxl_psr_cmt_get_local_mem_bandwidth(libxl_ctx *ctx,
+                                          uint32_t domid,
+                                          uint32_t socketid,
+                                          uint32_t *bandwidth);
 #endif
 
 /* misc */
diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index c88c421..4c8bcc8 100644
--- a/tools/libxl/libxl_psr.c
+++ b/tools/libxl/libxl_psr.c
@@ -18,6 +18,7 @@
 
 
 #define IA32_QM_CTR_ERROR_MASK         (0x3ul << 62)
+#define MBM_SAMPLE_RETRY_MAX 4
 
 static void libxl__psr_cmt_log_err_msg(libxl__gc *gc, int err)
 {
@@ -180,7 +181,8 @@ static int libxl__psr_cmt_get_l3_monitoring_data(libxl__gc *gc,
                                                  uint32_t domid,
                                                  xc_psr_cmt_type type,
                                                  uint32_t socketid,
-                                                 uint64_t *data)
+                                                 uint64_t *data,
+                                                 uint64_t *tsc)
 {
     unsigned int rmid;
     int cpu, rc;
@@ -200,7 +202,7 @@ static int libxl__psr_cmt_get_l3_monitoring_data(libxl__gc *gc,
         goto out;
     }
 
-    rc = xc_psr_cmt_get_data(CTX->xch, rmid, cpu, type, data);
+    rc = xc_psr_cmt_get_data(CTX->xch, rmid, cpu, type, data, tsc);
     if (rc < 0) {
         LOGE(ERROR, "failed to get monitoring data");
         rc = ERROR_FAIL;
@@ -222,7 +224,7 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
 
     rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid,
                                                XC_PSR_CMT_L3_OCCUPANCY,
-                                               socketid, &data);
+                                               socketid, &data, NULL);
     if (rc < 0)
             goto out;
 
@@ -240,6 +242,99 @@ out:
     return rc;
 }
 
+static int libxl__psr_cmt_get_mem_bandwidth(libxl__gc *gc,
+                                            uint32_t domid,
+                                            xc_psr_cmt_type type,
+                                            uint32_t socketid,
+                                            uint32_t *bandwidth)
+{
+    uint64_t sample1, sample2;
+    uint64_t tsc1, tsc2;
+    uint32_t upscaling_factor;
+    int retry_attempts = 0;
+    int rc;
+
+    while (1) {
+        rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid, type, socketid,
+                                                   &sample1, &tsc1);
+        if (rc < 0) {
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+        usleep(10000);
+
+        rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid, type, socketid,
+                                                   &sample2, &tsc2);
+        if (rc < 0) {
+           rc = ERROR_FAIL;
+           goto out;
+        }
+        if (tsc2 <= tsc1) {
+            rc = ERROR_FAIL;
+            goto out;
+        }
+        /*
+         * Hardware guarantees at most 1 overflow can happen if the duration
+         * between two samples is less than 1 second. Note that tsc returned
+         * from hypervisor is already-scaled time(ns).
+         */
+        if (tsc2 - tsc1 < 1000000000 && sample2 >= sample1)
+            break;
+
+        if (retry_attempts < MBM_SAMPLE_RETRY_MAX) {
+            retry_attempts++;
+        } else {
+            LOGE(ERROR, "event counter overflowed");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+    }
+
+    rc = xc_psr_cmt_get_l3_upscaling_factor(CTX->xch, &upscaling_factor);
+    if (rc < 0) {
+        LOGE(ERROR, "failed to get L3 upscaling factor");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    *bandwidth = (sample2 - sample1) * 1000000000 / (tsc2 - tsc1)
+                 *  upscaling_factor / 1024;
+out:
+    return rc;
+}
+
+int libxl_psr_cmt_get_total_mem_bandwidth(libxl_ctx *ctx,
+                                          uint32_t domid,
+                                          uint32_t socketid,
+                                          uint32_t *bandwidth)
+{
+    GC_INIT(ctx);
+    int rc;
+
+    rc = libxl__psr_cmt_get_mem_bandwidth(gc, domid,
+                                          XC_PSR_CMT_TOTAL_MEM_BANDWIDTH,
+                                          socketid, bandwidth);
+    GC_FREE;
+    return rc;
+}
+
+int libxl_psr_cmt_get_local_mem_bandwidth(libxl_ctx *ctx,
+                                          uint32_t domid,
+                                          uint32_t socketid,
+                                          uint32_t *bandwidth)
+{
+    GC_INIT(ctx);
+    int rc;
+
+    rc = libxl__psr_cmt_get_mem_bandwidth(gc, domid,
+                                          XC_PSR_CMT_LOCAL_MEM_BANDWIDTH,
+                                          socketid, bandwidth);
+    GC_FREE;
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 1214d2e..46d160e 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -694,4 +694,6 @@ libxl_event = Struct("event",[
 
 libxl_psr_cmt_type = Enumeration("psr_cmt_type", [
     (1, "CACHE_OCCUPANCY"),
+    (2, "TOTAL_MEM_BANDWIDTH"),
+    (3, "LOCAL_MEM_BANDWIDTH"),
     ])
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 1827c63..6d5c7af 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7829,6 +7829,16 @@ static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo,
                                                    socketid, &data))
                 printf("%13u KB", data);
             break;
+        case LIBXL_PSR_CMT_TYPE_TOTAL_MEM_BANDWIDTH:
+            if (!libxl_psr_cmt_get_total_mem_bandwidth(ctx, dominfo->domid,
+                                                       socketid, &data))
+                printf("%11u KB/s", data);
+            break;
+        case LIBXL_PSR_CMT_TYPE_LOCAL_MEM_BANDWIDTH:
+            if (!libxl_psr_cmt_get_local_mem_bandwidth(ctx, dominfo->domid,
+                                                       socketid, &data))
+                printf("%11u KB/s", data);
+            break;
         default:
             return;
         }
@@ -7840,7 +7850,7 @@ static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo,
 static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
 {
     uint32_t i, socketid, nr_sockets, total_rmid;
-    uint32_t l3_cache_size;
+    uint32_t l3_cache_size, l3_event_mask;
     libxl_physinfo info;
     int rc, nr_domains;
 
@@ -7849,6 +7859,13 @@ static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
         return -1;
     }
 
+    rc = libxl_psr_cmt_get_l3_event_mask(ctx, &l3_event_mask);
+    if (rc < 0 || !(l3_event_mask & (1 << (type - 1)))) {
+        fprintf(stderr, "Monitor type '%s' is not supported in the system\n",
+                libxl_psr_cmt_type_to_string(type));
+        return -1;
+    }
+
     libxl_physinfo_init(&info);
     rc = libxl_get_physinfo(ctx, &info);
     if (rc < 0) {
@@ -7966,6 +7983,8 @@ int main_psr_cmt_show(int argc, char **argv)
 
     switch (type) {
     case LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY:
+    case LIBXL_PSR_CMT_TYPE_TOTAL_MEM_BANDWIDTH:
+    case LIBXL_PSR_CMT_TYPE_LOCAL_MEM_BANDWIDTH:
         ret = psr_cmt_show_l3_info(type, domid);
         break;
     default:
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 4b30d3d..2d8f272 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -538,7 +538,9 @@ struct cmd_spec cmd_table[] = {
       "Show Cache Monitoring Technology information",
       "<PSR-CMT-Type> <Domain>",
       "Available monitor types:\n"
-      "\"cache_occupancy\":         Show L3 cache occupancy\n",
+      "\"cache_occupancy\":         Show L3 cache occupancy\n"
+      "\"total_mem_bandwidth\":     Show total memory bandwidth\n"
+      "\"local_mem_bandwidth\":     Show local memory bandwidth\n",
     },
 #endif
 };
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 2/5] tools: add routine to get CMT L3 event mask
  2015-01-28  8:04 ` [PATCH v8 2/5] tools: add routine to get CMT L3 event mask Chao Peng
@ 2015-01-28 13:40   ` Ian Campbell
  2015-01-29  8:32     ` Chao Peng
  0 siblings, 1 reply; 14+ messages in thread
From: Ian Campbell @ 2015-01-28 13:40 UTC (permalink / raw)
  To: Chao Peng
  Cc: keir, stefano.stabellini, andrew.cooper3, Ian.Jackson, xen-devel,
	will.auld, JBeulich, wei.liu2

On Wed, 2015-01-28 at 16:04 +0800, Chao Peng wrote:
> This is the tools side wrapper for XEN_SYSCTL_PSR_CMT_get_l3_event_mask
> of XEN_SYSCTL_psr_cmt_op.
> 
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> ---
>  tools/libxc/include/xenctrl.h |  1 +
>  tools/libxc/xc_psr.c          | 17 +++++++++++++++++
>  tools/libxl/libxl.h           |  1 +

This needs a LIBXL_HAVE #define in libxl.h to advertise the new
functionality.

> +int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);

What are the possible values of the event mask? Is there a guarantee
from somewhere that 32-bits is always sufficient?

How can the libxl caller decode the meaning of the bits since they are
not defined in the libxl.h header?

Perhaps a struct of booleans would be a better interface at the libxl
level?

Or perhaps the interface should be more along the lines of "is psr
feature X available", like libxl_psr_..._feature_enabled(ctx,
SOME_SYMBOL)?

Or perhaps the function to actual access the info should have an
ERROR_PSR_FUNCTION NOT_SUPPORTED return?

Ian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 5/5] tools, docs: add total/local memory bandwith monitoring
  2015-01-28  8:04 ` [PATCH v8 5/5] tools, docs: add total/local memory bandwith monitoring Chao Peng
@ 2015-01-28 14:04   ` Ian Campbell
  2015-01-28 14:10     ` Ian Jackson
  2015-01-29  8:17     ` Chao Peng
  0 siblings, 2 replies; 14+ messages in thread
From: Ian Campbell @ 2015-01-28 14:04 UTC (permalink / raw)
  To: Chao Peng
  Cc: keir, stefano.stabellini, andrew.cooper3, Ian.Jackson, xen-devel,
	will.auld, JBeulich, wei.liu2

On Wed, 2015-01-28 at 16:04 +0800, Chao Peng wrote:
> @@ -1476,6 +1483,8 @@ detach: Detach the platform shared resource monitoring service from a domain.
>  Show monitoring data for a certain domain or all domains. Current supported
>  monitor types are:
>   - "cache-occupancy": showing the L3 cache occupancy.
> + - "total-mem-bandwidth": showing the total memory bandwidth.
> + - "local-mem-bandwidth": showing the local memory bandwidth.

Some mention of the units might be useful here (and I suspect elsewhere
in the interface too).

> @@ -156,11 +158,12 @@ int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
>  }
>  
>  int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
> -                        xc_psr_cmt_type type, uint64_t *monitor_data)
> +                        xc_psr_cmt_type type, uint64_t *monitor_data,
> +                        uint64_t *tsc)
>  {
>      xc_resource_op_t op;
> -    xc_resource_entry_t entries[2];
> -    uint32_t evtid;
> +    xc_resource_entry_t entries[3];
> +    uint32_t evtid, nr_entries;
>      int rc;
>  
>      switch ( type )
> @@ -168,6 +171,12 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
>      case XC_PSR_CMT_L3_OCCUPANCY:
>          evtid = EVTID_L3_OCCUPANCY;
>          break;
> +    case XC_PSR_CMT_TOTAL_MEM_BANDWIDTH:
> +        evtid = EVTID_TOTAL_MEM_BANDWIDTH;
> +        break;
> +    case XC_PSR_CMT_LOCAL_MEM_BANDWIDTH:
> +        evtid = EVTID_LOCAL_MEM_BANDWIDTH;
> +        break;
>      default:
>          return -1;
>      }
> @@ -182,19 +191,35 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
>      entries[1].val = 0;
>      entries[1].rsvd = 0;
>  
> +    if ( type == XC_PSR_CMT_L3_OCCUPANCY )

Please use a switch.

Is there no reason to want the TSC for L3_OCCUPANCY? Perhaps a simpler
interface would be to always request the TSC if the tsc argument is
non-null?

> +        nr_entries = 2;
> +    else
> +    {
> +        entries[2].u.cmd = XEN_RESOURCE_OP_MSR_READ;
> +        entries[2].idx = MSR_IA32_TSC;
> +        entries[2].val = 0;
> +        entries[2].rsvd = 0;
> +
> +        nr_entries = 3;
> +    }
> +
>      op.cpu = cpu;
> -    op.nr_entries = 2;
> +    op.nr_entries = nr_entries;
>      op.entries = entries;
>  
>      rc = xc_resource_op(xch, 1, &op);
>      if ( rc < 0 )
>          return rc;
>  
> -    if ( op.result !=2 || entries[1].val & IA32_CMT_CTR_ERROR_MASK )
> +    if ( op.result != nr_entries || entries[1].val & IA32_CMT_CTR_ERROR_MASK )
>          return -1;
>  
>      *monitor_data = entries[1].val;
>  
> +    if ( type == XC_PSR_CMT_TOTAL_MEM_BANDWIDTH ||
> +         type == XC_PSR_CMT_LOCAL_MEM_BANDWIDTH )
> +        *tsc = entries[2].val;

Are there going to be many more of these hardcoded array indexes as new
features are added?

Perhaps it should be 
	entries[nr].foo = whatever;
	entries[nr].bar = another;
	nr++

And for the TSC one, tsc_entry = &entries[nr], with tsc_entry being NULL
otherwise so you can check it.



> +
>      return 0;
>  }
>  
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 596d2a0..347ef52 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -1462,6 +1462,14 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
>                                        uint32_t domid,
>                                        uint32_t socketid,
>                                        uint32_t *l3_cache_occupancy);
> +int libxl_psr_cmt_get_total_mem_bandwidth(libxl_ctx *ctx,
> +                                          uint32_t domid,
> +                                          uint32_t socketid,
> +                                          uint32_t *bandwidth);
> +int libxl_psr_cmt_get_local_mem_bandwidth(libxl_ctx *ctx,
> +                                          uint32_t domid,
> +                                          uint32_t socketid,
> +                                          uint32_t *bandwidth);

Needs #define LIBXL_HAVE_FOO (one for both sets of bandwidth should be
OK).

What are the units of bandwidth? Is a 32-bit number sufficient to cover
future improvements to the hardware?
>  
> +static int libxl__psr_cmt_get_mem_bandwidth(libxl__gc *gc,
> +                                            uint32_t domid,
> +                                            xc_psr_cmt_type type,
> +                                            uint32_t socketid,
> +                                            uint32_t *bandwidth)
> +{
> +    uint64_t sample1, sample2;
> +    uint64_t tsc1, tsc2;
> +    uint32_t upscaling_factor;
> +    int retry_attempts = 0;
> +    int rc;
> +
> +    while (1) {
> +        rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid, type, socketid,
> +                                                   &sample1, &tsc1);
> +        if (rc < 0) {
> +            rc = ERROR_FAIL;
> +            goto out;
> +        }
> +
> +        usleep(10000);

I'm afraid that sleeping in libxl in this way isn't acceptable,
especially for such a long duration.

Any function which does this certainly meets the definition of "slow" in
libxl_internal.h and therefore really ought to be using the ao
machinery.

If that isn't possible/desirable for some reason then it needs to be
made very clear to the caller, through at a minimum some very extensive
comments in the interface header what the constraints and implications
of using this function are.

Another alternative would be to expose the instantaneous values to the
libxl user and let it sleep as it wishes and calculate the bandwidth
itself.

> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 1214d2e..46d160e 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -694,4 +694,6 @@ libxl_event = Struct("event",[
>  
>  libxl_psr_cmt_type = Enumeration("psr_cmt_type", [
>      (1, "CACHE_OCCUPANCY"),
> +    (2, "TOTAL_MEM_BANDWIDTH"),
> +    (3, "LOCAL_MEM_BANDWIDTH"),

I'm really starting to wonder if wedging all of these logically
unrelated statistics into an umbrella psr interface at the libxl level
makes sense.

But now I notice that this type isn't even used by libxl, it seems to be
xl internal. If so then please can you arrange to move it to a new
xl_types.idl and add these new ones there.

I think we can get away with deprecating the libxl variant pretty
aggressively since it's not currently all that useful.

>      ])
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 1827c63..6d5c7af 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -7829,6 +7829,16 @@ static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo,

I wondered this during the refactoring in an earlier patch, but why is
"l3 info" a suitable name for this function, seems that only one of the
three things is related to l3 caches.

Does it really make sense to wedge these things (cache use and
bandwidth) into a common function?

What are the upcoming future PSR events and how will they fit in? (this
is less critical at the xl level, but it feeds into my thinking about
libxl interfaces too)

>                                                     socketid, &data))
>                  printf("%13u KB", data);
>              break;
> +        case LIBXL_PSR_CMT_TYPE_TOTAL_MEM_BANDWIDTH:
> +            if (!libxl_psr_cmt_get_total_mem_bandwidth(ctx, dominfo->domid,
> +                                                       socketid, &data))
> +                printf("%11u KB/s", data);
> +            break;
> +        case LIBXL_PSR_CMT_TYPE_LOCAL_MEM_BANDWIDTH:
> +            if (!libxl_psr_cmt_get_local_mem_bandwidth(ctx, dominfo->domid,
> +                                                       socketid, &data))
> +                printf("%11u KB/s", data);
> +            break;
>          default:
>              return;
>          }
> @@ -7840,7 +7850,7 @@ static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo,
>  static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
>  {
>      uint32_t i, socketid, nr_sockets, total_rmid;
> -    uint32_t l3_cache_size;
> +    uint32_t l3_cache_size, l3_event_mask;
>      libxl_physinfo info;
>      int rc, nr_domains;
>  
> @@ -7849,6 +7859,13 @@ static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
>          return -1;
>      }
>  
> +    rc = libxl_psr_cmt_get_l3_event_mask(ctx, &l3_event_mask);
> +    if (rc < 0 || !(l3_event_mask & (1 << (type - 1)))) {

I'm not mad keen on this. I'd prefer either 3 predicate functions, one
for each mask type or for a single one which takes a LIBXL_PSR_CMT_TYPE
option.

Is this concept of the PSR l3 even mask already exposed to users of
libxl anywhere before this series?

> diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
> index 4b30d3d..2d8f272 100644
> --- a/tools/libxl/xl_cmdtable.c
> +++ b/tools/libxl/xl_cmdtable.c
> @@ -538,7 +538,9 @@ struct cmd_spec cmd_table[] = {
>        "Show Cache Monitoring Technology information",
>        "<PSR-CMT-Type> <Domain>",
>        "Available monitor types:\n"
> -      "\"cache_occupancy\":         Show L3 cache occupancy\n",
> +      "\"cache_occupancy\":         Show L3 cache occupancy\n"
> +      "\"total_mem_bandwidth\":     Show total memory bandwidth\n"
> +      "\"local_mem_bandwidth\":     Show local memory bandwidth\n",
>      },
>  #endif
>  };

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 5/5] tools, docs: add total/local memory bandwith monitoring
  2015-01-28 14:04   ` Ian Campbell
@ 2015-01-28 14:10     ` Ian Jackson
  2015-01-28 14:12       ` Ian Campbell
  2015-01-29  8:17     ` Chao Peng
  1 sibling, 1 reply; 14+ messages in thread
From: Ian Jackson @ 2015-01-28 14:10 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, keir, stefano.stabellini, andrew.cooper3, xen-devel,
	will.auld, JBeulich, Chao Peng

Ian Campbell writes ("Re: [PATCH v8 5/5] tools, docs: add total/local memory bandwith monitoring"):
...
> Another alternative would be to expose the instantaneous values to the
> libxl user and let it sleep as it wishes and calculate the bandwidth
> itself.

I think that would be better.

Suppose the libxl caller wants to record the mean bandwidth over a
longer period: they shouldn't have to keep calling this function and
adding up the results (and if they did their algorithm would be racy).

Ian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 5/5] tools, docs: add total/local memory bandwith monitoring
  2015-01-28 14:10     ` Ian Jackson
@ 2015-01-28 14:12       ` Ian Campbell
  0 siblings, 0 replies; 14+ messages in thread
From: Ian Campbell @ 2015-01-28 14:12 UTC (permalink / raw)
  To: Ian Jackson
  Cc: wei.liu2, keir, stefano.stabellini, andrew.cooper3, xen-devel,
	will.auld, JBeulich, Chao Peng

On Wed, 2015-01-28 at 14:10 +0000, Ian Jackson wrote:
> Ian Campbell writes ("Re: [PATCH v8 5/5] tools, docs: add total/local memory bandwith monitoring"):
> ...
> > Another alternative would be to expose the instantaneous values to the
> > libxl user and let it sleep as it wishes and calculate the bandwidth
> > itself.
> 
> I think that would be better.
> 
> Suppose the libxl caller wants to record the mean bandwidth over a
> longer period: they shouldn't have to keep calling this function and
> adding up the results (and if they did their algorithm would be racy).

True.

>From the code it seems like there is some h/w constraint on roll over of
the underlying counter, which might complicate this somewhat. At the
very least the libxl interface docs would need to be quite extensive.

Ian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op
  2015-01-28  8:04 ` [PATCH v8 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op Chao Peng
@ 2015-01-28 15:42   ` Jan Beulich
  2015-01-29  6:35     ` Chao Peng
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2015-01-28 15:42 UTC (permalink / raw)
  To: Chao Peng
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, xen-devel, will.auld, keir

>>> On 28.01.15 at 09:04, <chao.p.peng@linux.intel.com> wrote:
> +            else
> +            {
> +                unsigned long irqflags;

Some gcc versions can't figure out that this variable doesn't get used
uninitialized, and hence warn about it. I fixed this while committing, at
once changing the name to the more conventional "flags". Please be
more careful with such conditional uses in the future.

Jan

> +                /*
> +                 * If next entry is MSR_IA32_TSC read, then the actual rdtscll
> +                 * is performed together with current entry, with IRQ disabled.
> +                 */
> +                bool_t read_tsc = (i < ra->nr_done - 1 &&
> +                                   unlikely(entry[1].idx == MSR_IA32_TSC));
> +
> +                if ( unlikely(read_tsc) )
> +                    local_irq_save(irqflags);
> +
> +                ret = rdmsr_safe(entry->idx, entry->val);
> +
> +                if ( unlikely(read_tsc) )
> +                {
> +                    rdtscll(tsc);
> +                    local_irq_restore(irqflags);
> +                }
> +            }
>              break;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op
  2015-01-28 15:42   ` Jan Beulich
@ 2015-01-29  6:35     ` Chao Peng
  0 siblings, 0 replies; 14+ messages in thread
From: Chao Peng @ 2015-01-29  6:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, xen-devel, will.auld, keir

On Wed, Jan 28, 2015 at 03:42:34PM +0000, Jan Beulich wrote:
> >>> On 28.01.15 at 09:04, <chao.p.peng@linux.intel.com> wrote:
> > +            else
> > +            {
> > +                unsigned long irqflags;
> 
> Some gcc versions can't figure out that this variable doesn't get used
> uninitialized, and hence warn about it. I fixed this while committing, at
> once changing the name to the more conventional "flags". Please be
> more careful with such conditional uses in the future.

Appreciate! I had it initialized in v5 but missed it in a latter rebase.

Chao

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 5/5] tools, docs: add total/local memory bandwith monitoring
  2015-01-28 14:04   ` Ian Campbell
  2015-01-28 14:10     ` Ian Jackson
@ 2015-01-29  8:17     ` Chao Peng
  1 sibling, 0 replies; 14+ messages in thread
From: Chao Peng @ 2015-01-29  8:17 UTC (permalink / raw)
  To: Ian Campbell
  Cc: keir, stefano.stabellini, andrew.cooper3, Ian.Jackson, xen-devel,
	will.auld, JBeulich, wei.liu2


Thanks Ian for your words. It makes things much clear.

On Wed, Jan 28, 2015 at 02:04:59PM +0000, Ian Campbell wrote:
> On Wed, 2015-01-28 at 16:04 +0800, Chao Peng wrote:
> > @@ -1476,6 +1483,8 @@ detach: Detach the platform shared resource monitoring service from a domain.
> >  Show monitoring data for a certain domain or all domains. Current supported
> >  monitor types are:
> >   - "cache-occupancy": showing the L3 cache occupancy.
> > + - "total-mem-bandwidth": showing the total memory bandwidth.
> > + - "local-mem-bandwidth": showing the local memory bandwidth.
> 
> Some mention of the units might be useful here (and I suspect elsewhere
> in the interface too).

No problem.

> 
> > @@ -156,11 +158,12 @@ int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
> >  }
> >  
> >  int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
> > -                        xc_psr_cmt_type type, uint64_t *monitor_data)
> > +                        xc_psr_cmt_type type, uint64_t *monitor_data,
> > +                        uint64_t *tsc)
> >  {
> >      xc_resource_op_t op;
> > -    xc_resource_entry_t entries[2];
> > -    uint32_t evtid;
> > +    xc_resource_entry_t entries[3];
> > +    uint32_t evtid, nr_entries;
> >      int rc;
> >  
> >      switch ( type )
> > @@ -168,6 +171,12 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
> >      case XC_PSR_CMT_L3_OCCUPANCY:
> >          evtid = EVTID_L3_OCCUPANCY;
> >          break;
> > +    case XC_PSR_CMT_TOTAL_MEM_BANDWIDTH:
> > +        evtid = EVTID_TOTAL_MEM_BANDWIDTH;
> > +        break;
> > +    case XC_PSR_CMT_LOCAL_MEM_BANDWIDTH:
> > +        evtid = EVTID_LOCAL_MEM_BANDWIDTH;
> > +        break;
> >      default:
> >          return -1;
> >      }
> > @@ -182,19 +191,35 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
> >      entries[1].val = 0;
> >      entries[1].rsvd = 0;
> >  
> > +    if ( type == XC_PSR_CMT_L3_OCCUPANCY )
> 
> Please use a switch.
> 
> Is there no reason to want the TSC for L3_OCCUPANCY? Perhaps a simpler
> interface would be to always request the TSC if the tsc argument is
> non-null?

Only the value of memory bandwidth but not the cache occupancy is
time-related at present. Agreed with you to change the condition to "tsc
== NULL".

> 
> > +        nr_entries = 2;
> > +    else
> > +    {
> > +        entries[2].u.cmd = XEN_RESOURCE_OP_MSR_READ;
> > +        entries[2].idx = MSR_IA32_TSC;
> > +        entries[2].val = 0;
> > +        entries[2].rsvd = 0;
> > +
> > +        nr_entries = 3;
> > +    }
> > +
> >      op.cpu = cpu;
> > -    op.nr_entries = 2;
> > +    op.nr_entries = nr_entries;
> >      op.entries = entries;
> >  
> >      rc = xc_resource_op(xch, 1, &op);
> >      if ( rc < 0 )
> >          return rc;
> >  
> > -    if ( op.result !=2 || entries[1].val & IA32_CMT_CTR_ERROR_MASK )
> > +    if ( op.result != nr_entries || entries[1].val & IA32_CMT_CTR_ERROR_MASK )
> >          return -1;
> >  
> >      *monitor_data = entries[1].val;
> >  
> > +    if ( type == XC_PSR_CMT_TOTAL_MEM_BANDWIDTH ||
> > +         type == XC_PSR_CMT_LOCAL_MEM_BANDWIDTH )
> > +        *tsc = entries[2].val;
> 
> Are there going to be many more of these hardcoded array indexes as new
> features are added?
> 
> Perhaps it should be 
> 	entries[nr].foo = whatever;
> 	entries[nr].bar = another;
> 	nr++
> 
> And for the TSC one, tsc_entry = &entries[nr], with tsc_entry being NULL
> otherwise so you can check it.
> 

Sure, I will follow this pattern.

> 
> 
> > +
> >      return 0;
> >  }
> >  
> > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> > index 596d2a0..347ef52 100644
> > --- a/tools/libxl/libxl.h
> > +++ b/tools/libxl/libxl.h
> > @@ -1462,6 +1462,14 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
> >                                        uint32_t domid,
> >                                        uint32_t socketid,
> >                                        uint32_t *l3_cache_occupancy);
> > +int libxl_psr_cmt_get_total_mem_bandwidth(libxl_ctx *ctx,
> > +                                          uint32_t domid,
> > +                                          uint32_t socketid,
> > +                                          uint32_t *bandwidth);
> > +int libxl_psr_cmt_get_local_mem_bandwidth(libxl_ctx *ctx,
> > +                                          uint32_t domid,
> > +                                          uint32_t socketid,
> > +                                          uint32_t *bandwidth);
> 
> Needs #define LIBXL_HAVE_FOO (one for both sets of bandwidth should be
> OK).

OK.

> 
> What are the units of bandwidth? Is a 32-bit number sufficient to cover
> future improvements to the hardware?

The unit is KB/s. Can change it to 64 bit. Thanks.

> >  
> > +static int libxl__psr_cmt_get_mem_bandwidth(libxl__gc *gc,
> > +                                            uint32_t domid,
> > +                                            xc_psr_cmt_type type,
> > +                                            uint32_t socketid,
> > +                                            uint32_t *bandwidth)
> > +{
> > +    uint64_t sample1, sample2;
> > +    uint64_t tsc1, tsc2;
> > +    uint32_t upscaling_factor;
> > +    int retry_attempts = 0;
> > +    int rc;
> > +
> > +    while (1) {
> > +        rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid, type, socketid,
> > +                                                   &sample1, &tsc1);
> > +        if (rc < 0) {
> > +            rc = ERROR_FAIL;
> > +            goto out;
> > +        }
> > +
> > +        usleep(10000);
> 
> I'm afraid that sleeping in libxl in this way isn't acceptable,
> especially for such a long duration.
> 
> Any function which does this certainly meets the definition of "slow" in
> libxl_internal.h and therefore really ought to be using the ao
> machinery.
> 
> If that isn't possible/desirable for some reason then it needs to be
> made very clear to the caller, through at a minimum some very extensive
> comments in the interface header what the constraints and implications
> of using this function are.
> 
> Another alternative would be to expose the instantaneous values to the
> libxl user and let it sleep as it wishes and calculate the bandwidth
> itself.

I'd like to adopt this suggestion. Move sleep to xl and return sampling
value here. This is acutally good for future enhancement (As Ian Jackson
said in another reply, It will allow us to record avarage bandwidth over
a longer time once hardware supports that).

> 
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index 1214d2e..46d160e 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -694,4 +694,6 @@ libxl_event = Struct("event",[
> >  
> >  libxl_psr_cmt_type = Enumeration("psr_cmt_type", [
> >      (1, "CACHE_OCCUPANCY"),
> > +    (2, "TOTAL_MEM_BANDWIDTH"),
> > +    (3, "LOCAL_MEM_BANDWIDTH"),
> 
> I'm really starting to wonder if wedging all of these logically
> unrelated statistics into an umbrella psr interface at the libxl level
> makes sense.
> 
> But now I notice that this type isn't even used by libxl, it seems to be
> xl internal. If so then please can you arrange to move it to a new
> xl_types.idl and add these new ones there.

As will explain below: I'm likely to use it for
libxl_psr_cmt_type_supported(libxl_psr_cmt_type type), I'd keep it here.

> 
> I think we can get away with deprecating the libxl variant pretty
> aggressively since it's not currently all that useful.
> 
> >      ])
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 1827c63..6d5c7af 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -7829,6 +7829,16 @@ static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo,
> 
> I wondered this during the refactoring in an earlier patch, but why is
> "l3 info" a suitable name for this function, seems that only one of the
> three things is related to l3 caches.
> 
> Does it really make sense to wedge these things (cache use and
> bandwidth) into a common function?

Generally speaking, the platform shared resource monitoring are designed
to allow monitoring of occupancy/bandwidth from one level of the cache
hierarchy to the next. What we called CACHE_OCCUPANY here is actually
for L3 occupancy. And MEM_BANDWIDTH is actually the bandwidth from L3 to
its next, which is typically the system memory. In that case, it
represents the memory bandwidth. 

> 
> What are the upcoming future PSR events and how will they fit in? (this
> is less critical at the xl level, but it feeds into my thinking about
> libxl interfaces too)

I can't see any upcoming event in the near future, but in the long
time, we may have L2 occupancy/bandwidth similarly.

> 
> >                                                     socketid, &data))
> >                  printf("%13u KB", data);
> >              break;
> > +        case LIBXL_PSR_CMT_TYPE_TOTAL_MEM_BANDWIDTH:
> > +            if (!libxl_psr_cmt_get_total_mem_bandwidth(ctx, dominfo->domid,
> > +                                                       socketid, &data))
> > +                printf("%11u KB/s", data);
> > +            break;
> > +        case LIBXL_PSR_CMT_TYPE_LOCAL_MEM_BANDWIDTH:
> > +            if (!libxl_psr_cmt_get_local_mem_bandwidth(ctx, dominfo->domid,
> > +                                                       socketid, &data))
> > +                printf("%11u KB/s", data);
> > +            break;
> >          default:
> >              return;
> >          }
> > @@ -7840,7 +7850,7 @@ static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo,
> >  static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
> >  {
> >      uint32_t i, socketid, nr_sockets, total_rmid;
> > -    uint32_t l3_cache_size;
> > +    uint32_t l3_cache_size, l3_event_mask;
> >      libxl_physinfo info;
> >      int rc, nr_domains;
> >  
> > @@ -7849,6 +7859,13 @@ static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
> >          return -1;
> >      }
> >  
> > +    rc = libxl_psr_cmt_get_l3_event_mask(ctx, &l3_event_mask);
> > +    if (rc < 0 || !(l3_event_mask & (1 << (type - 1)))) {
> 
> I'm not mad keen on this. I'd prefer either 3 predicate functions, one
> for each mask type or for a single one which takes a LIBXL_PSR_CMT_TYPE
> option.

In the v2, the following definition is used actually:
int libxl_psr_cmt_type_supported(libxl_ctx *ctx, libxl_psr_cmt_type type)

But I changed it to what it's now, for performance consideration:
Basically because xl may call this function several times if it wants to
display all the available events to user. This will costs several
hypercalls once we are not allowed to cache the result with static in
libxc.

As now the xl code does not really support available events display, so I
feel comfortable to use it again.

> 
> Is this concept of the PSR l3 even mask already exposed to users of
> libxl anywhere before this series?

This is the first time we expose it.

Chao

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 2/5] tools: add routine to get CMT L3 event mask
  2015-01-28 13:40   ` Ian Campbell
@ 2015-01-29  8:32     ` Chao Peng
  0 siblings, 0 replies; 14+ messages in thread
From: Chao Peng @ 2015-01-29  8:32 UTC (permalink / raw)
  To: Ian Campbell
  Cc: keir, stefano.stabellini, andrew.cooper3, Ian.Jackson, xen-devel,
	will.auld, JBeulich, wei.liu2

On Wed, Jan 28, 2015 at 01:40:25PM +0000, Ian Campbell wrote:
> On Wed, 2015-01-28 at 16:04 +0800, Chao Peng wrote:
> > This is the tools side wrapper for XEN_SYSCTL_PSR_CMT_get_l3_event_mask
> > of XEN_SYSCTL_psr_cmt_op.
> > 
> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> > ---
> >  tools/libxc/include/xenctrl.h |  1 +
> >  tools/libxc/xc_psr.c          | 17 +++++++++++++++++
> >  tools/libxl/libxl.h           |  1 +
> 
> This needs a LIBXL_HAVE #define in libxl.h to advertise the new
> functionality.
OK.
> 
> > +int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);
> 
> What are the possible values of the event mask? Is there a guarantee
> from somewhere that 32-bits is always sufficient?

It comes directly from cpuid which is 32 bit now and look like not
possible to extend it to 64 bit. Only bits[0:2] is used at present.

> 
> How can the libxl caller decode the meaning of the bits since they are
> not defined in the libxl.h header?
> 
> Perhaps a struct of booleans would be a better interface at the libxl
> level?
> 
> Or perhaps the interface should be more along the lines of "is psr
> feature X available", like libxl_psr_..._feature_enabled(ctx,
> SOME_SYMBOL)?
> 
> Or perhaps the function to actual access the info should have an
> ERROR_PSR_FUNCTION NOT_SUPPORTED return?

As answered in another reply, I'd use it as:
int libxl_psr_cmt_type_supported(libxl_ctx *ctx, libxl_psr_cmt_type type);

Thanks,
Chao

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-01-29  8:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-28  8:04 [PATCH v8 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
2015-01-28  8:04 ` [PATCH v8 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op Chao Peng
2015-01-28 15:42   ` Jan Beulich
2015-01-29  6:35     ` Chao Peng
2015-01-28  8:04 ` [PATCH v8 2/5] tools: add routine to get CMT L3 event mask Chao Peng
2015-01-28 13:40   ` Ian Campbell
2015-01-29  8:32     ` Chao Peng
2015-01-28  8:04 ` [PATCH v8 3/5] tools: correct coding style for psr Chao Peng
2015-01-28  8:04 ` [PATCH v8 4/5] tools: code refactoring for MBM Chao Peng
2015-01-28  8:04 ` [PATCH v8 5/5] tools, docs: add total/local memory bandwith monitoring Chao Peng
2015-01-28 14:04   ` Ian Campbell
2015-01-28 14:10     ` Ian Jackson
2015-01-28 14:12       ` Ian Campbell
2015-01-29  8:17     ` Chao Peng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.