linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support
@ 2016-02-11  1:07 Vikas Shivappa
  2016-02-11  1:07 ` [PATCH 1/5] x86,perf/cqm : Fix cqm handling of grouping events into a cache_group Vikas Shivappa
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Vikas Shivappa @ 2016-02-11  1:07 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: x86, linux-kernel, hpa, tglx, vikas.shivappa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin

The V4 version of MBM is almost a complete rewrite of the prior
versions. It tries to address all of Thomas earlier
comments.

The patch series has one preparatory patch for cqm and then 4 MBM
patches. *Patches apply on 4.5-rc1*.

Memory bandwitdh monitoring(MBM) provides OS/VMM a way to monitor
bandwidth from one level of cache to another. The current patches
support L3 external bandwitch monitoring.
It supports both 'local bandwidth' and 'total bandwidth' monitoring for
the socket. Local bandwidth measures the amount of data sent through
the memory controller on the socket and total b/w measures the total
system bandwidth.

The tasks are associated with a Resouce Monitoring ID(RMID) just like in
cqm and OS uses a MSR write to indicate the RMID of the task during
scheduling.


[PATCH 1/5] x86,perf/cqm : Fix cqm handling of grouping events into a
[PATCH 2/5] x86/mbm: Intel Memory B/W Monitoring enumeration and init
[PATCH 3/5] x86/mbm: Memory bandwidth monitoring event management
[PATCH 4/5] x86/mbm: RMID Recycling MBM changes
[PATCH 5/5] x86/mbm: Add support for MBM counter overflow handling

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] x86,perf/cqm : Fix cqm handling of grouping events into a cache_group
  2016-02-11  1:07 [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support Vikas Shivappa
@ 2016-02-11  1:07 ` Vikas Shivappa
  2016-02-11  1:07 ` [PATCH 2/5] x86/mbm: Intel Memory B/W Monitoring enumeration and init Vikas Shivappa
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Vikas Shivappa @ 2016-02-11  1:07 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: x86, linux-kernel, hpa, tglx, vikas.shivappa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin

Currently cqm(cache quality of service monitoring) is grouping all
events belonging to same PID to use one RMID. However its not counting
all of these different events. Hence we end up with a count of zero for
all events other than the group leader. The patch tries to address the
issue by keeping a flag in the perf_event.hw which has other cqm related
fields. The field is updated at event creation and during grouping.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_cqm.c | 13 ++++++++++---
 include/linux/perf_event.h                 |  1 +
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index a316ca9..e6be335 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -281,9 +281,13 @@ static bool __match_event(struct perf_event *a, struct perf_event *b)
 
 	/*
 	 * Events that target same task are placed into the same cache group.
+	 * Mark it as a multi event group, so that we update ->count
+	 * for every event rather than just the group leader later.
 	 */
-	if (a->hw.target == b->hw.target)
+	if (a->hw.target == b->hw.target) {
+		b->hw.is_group_event = true;
 		return true;
+	}
 
 	/*
 	 * Are we an inherited event?
@@ -849,6 +853,7 @@ static void intel_cqm_setup_event(struct perf_event *event,
 	bool conflict = false;
 	u32 rmid;
 
+	event->hw.is_group_event = false;
 	list_for_each_entry(iter, &cache_groups, hw.cqm_groups_entry) {
 		rmid = iter->hw.cqm_rmid;
 
@@ -940,7 +945,9 @@ static u64 intel_cqm_event_count(struct perf_event *event)
 		return __perf_event_count(event);
 
 	/*
-	 * Only the group leader gets to report values. This stops us
+	 * Only the group leader gets to report values except in case of
+	 * multiple events in the same group, we still need to read the
+	 * other events.This stops us
 	 * reporting duplicate values to userspace, and gives us a clear
 	 * rule for which task gets to report the values.
 	 *
@@ -948,7 +955,7 @@ static u64 intel_cqm_event_count(struct perf_event *event)
 	 * specific packages - we forfeit that ability when we create
 	 * task events.
 	 */
-	if (!cqm_group_leader(event))
+	if (!cqm_group_leader(event) && !event->hw.is_group_event)
 		return 0;
 
 	/*
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f9828a4..f11c732 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -121,6 +121,7 @@ struct hw_perf_event {
 		struct { /* intel_cqm */
 			int			cqm_state;
 			u32			cqm_rmid;
+			bool			is_group_event;
 			struct list_head	cqm_events_entry;
 			struct list_head	cqm_groups_entry;
 			struct list_head	cqm_group_entry;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] x86/mbm: Intel Memory B/W Monitoring enumeration and init
  2016-02-11  1:07 [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support Vikas Shivappa
  2016-02-11  1:07 ` [PATCH 1/5] x86,perf/cqm : Fix cqm handling of grouping events into a cache_group Vikas Shivappa
@ 2016-02-11  1:07 ` Vikas Shivappa
  2016-02-24 10:22   ` Thomas Gleixner
  2016-02-11  1:07 ` [PATCH 3/5] x86/mbm: Memory bandwidth monitoring event management Vikas Shivappa
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Vikas Shivappa @ 2016-02-11  1:07 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: x86, linux-kernel, hpa, tglx, vikas.shivappa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin

The MBM init patch enumerates the Intel (Memory b/w monitoring)MBM and
initializes the perf events and datastructures for monitoring the memory
b/w. Its based on original patch series by Kanaka Juvva.

Memory bandwidth monitoring(MBM) provides OS/VMM a way to monitor
bandwidth from one level of cache to another. The current patches
support L3 external bandwidth monitoring. It supports both 'local
bandwidth' and 'total bandwidth' monitoring for the socket. Local
bandwidth measures the amount of data sent through the memory controller
on the socket and total b/w measures the total system bandwidth.

Extending the cache quality of service monitoring(CQM) we add four more
events to the perf infrastructure:
intel_cqm_llc/local_bytes - bytes sent through local socket memory
controller
intel_cqm_llc/total_bytes - total L3 external bytes sent
intel_cqm_llc/local_bw - Current local b/w
intel_cqm_llc/total_bw - current total b/w

The tasks are associated with a Resouce Monitoring ID(RMID) just like in
cqm and OS uses a MSR write to indicate the RMID of the task during
scheduling.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/cpufeature.h          |   2 +
 arch/x86/kernel/cpu/common.c               |   4 +-
 arch/x86/kernel/cpu/perf_event_intel_cqm.c | 172 +++++++++++++++++++++++++++--
 3 files changed, 169 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 7ad8c94..fd80ef6 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -241,6 +241,8 @@
 
 /* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (edx), word 12 */
 #define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring if 1 */
+#define X86_FEATURE_CQM_MBM_TOTAL (12*32+ 1) /* LLC Total MBM monitoring  */
+#define X86_FEATURE_CQM_MBM_LOCAL (12*32+ 2) /* LLC Local MBM monitoring  */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (ebx), word 13 */
 #define X86_FEATURE_CLZERO	(13*32+0) /* CLZERO instruction */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 37830de..f770221 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -635,7 +635,9 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 			cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
 			c->x86_capability[CPUID_F_1_EDX] = edx;
 
-			if (cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC)) {
+			if ((cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC)) ||
+			      ((cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL)) ||
+			       (cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)))) {
 				c->x86_cache_max_rmid = ecx;
 				c->x86_cache_occ_scale = ebx;
 			}
diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index e6be335..e45f5aa 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -15,6 +15,8 @@
 
 static u32 cqm_max_rmid = -1;
 static unsigned int cqm_l3_scale; /* supposedly cacheline size */
+static bool cqm_enabled, mbm_enabled;
+static u16  mbm_socket_max;
 
 /**
  * struct intel_pqr_state - State cache for the PQR MSR
@@ -42,6 +44,30 @@ struct intel_pqr_state {
  * interrupts disabled, which is sufficient for the protection.
  */
 static DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
+/**
+ * struct sample - mbm event's (local or total) data
+ * @interval_start Time this interval began
+ * @interval_bytes #bytes in this interval
+ * @total_bytes    #bytes since we began monitoring
+ * @prev_msr       previous value of MSR
+ * @bandwidth      bytes/sec in previous completed interval
+ */
+struct sample {
+	ktime_t interval_start;
+	u64	interval_bytes;
+	u64	total_bytes;
+	u64	prev_msr;
+	u64	bandwidth;
+};
+
+/*
+ * samples profiled for total memory bandwidth type events
+ */
+static struct sample *mbm_total;
+/*
+ * samples profiled for local memory bandwidth type events
+ */
+static struct sample *mbm_local;
 
 /*
  * Protects cache_cgroups and cqm_rmid_free_lru and cqm_rmid_limbo_lru.
@@ -1152,6 +1178,28 @@ EVENT_ATTR_STR(llc_occupancy.unit, intel_cqm_llc_unit, "Bytes");
 EVENT_ATTR_STR(llc_occupancy.scale, intel_cqm_llc_scale, NULL);
 EVENT_ATTR_STR(llc_occupancy.snapshot, intel_cqm_llc_snapshot, "1");
 
+EVENT_ATTR_STR(total_bytes, intel_cqm_total_bytes, "event=0x02");
+EVENT_ATTR_STR(total_bytes.per-pkg, intel_cqm_total_bytes_pkg, "1");
+EVENT_ATTR_STR(total_bytes.unit, intel_cqm_total_bytes_unit, "MB");
+EVENT_ATTR_STR(total_bytes.scale, intel_cqm_total_bytes_scale, "1e-6");
+
+EVENT_ATTR_STR(local_bytes, intel_cqm_local_bytes, "event=0x03");
+EVENT_ATTR_STR(local_bytes.per-pkg, intel_cqm_local_bytes_pkg, "1");
+EVENT_ATTR_STR(local_bytes.unit, intel_cqm_local_bytes_unit, "MB");
+EVENT_ATTR_STR(local_bytes.scale, intel_cqm_local_bytes_scale, "1e-6");
+
+EVENT_ATTR_STR(total_bw, intel_cqm_total_bw, "event=0x04");
+EVENT_ATTR_STR(total_bw.per-pkg, intel_cqm_total_bw_pkg, "1");
+EVENT_ATTR_STR(total_bw.unit, intel_cqm_total_bw_unit, "MB/sec");
+EVENT_ATTR_STR(total_bw.scale, intel_cqm_total_bw_scale, "1e-6");
+EVENT_ATTR_STR(total_bw.snapshot, intel_cqm_total_bw_snapshot, "1");
+
+EVENT_ATTR_STR(local_bw, intel_cqm_local_bw, "event=0x05");
+EVENT_ATTR_STR(local_bw.per-pkg, intel_cqm_local_bw_pkg, "1");
+EVENT_ATTR_STR(local_bw.unit, intel_cqm_local_bw_unit, "MB/sec");
+EVENT_ATTR_STR(local_bw.scale, intel_cqm_local_bw_scale, "1e-6");
+EVENT_ATTR_STR(local_bw.snapshot, intel_cqm_local_bw_snapshot, "1");
+
 static struct attribute *intel_cqm_events_attr[] = {
 	EVENT_PTR(intel_cqm_llc),
 	EVENT_PTR(intel_cqm_llc_pkg),
@@ -1161,9 +1209,58 @@ static struct attribute *intel_cqm_events_attr[] = {
 	NULL,
 };
 
+static struct attribute *intel_mbm_events_attr[] = {
+	EVENT_PTR(intel_cqm_total_bytes),
+	EVENT_PTR(intel_cqm_local_bytes),
+	EVENT_PTR(intel_cqm_total_bw),
+	EVENT_PTR(intel_cqm_local_bw),
+	EVENT_PTR(intel_cqm_total_bytes_pkg),
+	EVENT_PTR(intel_cqm_local_bytes_pkg),
+	EVENT_PTR(intel_cqm_total_bw_pkg),
+	EVENT_PTR(intel_cqm_local_bw_pkg),
+	EVENT_PTR(intel_cqm_total_bytes_unit),
+	EVENT_PTR(intel_cqm_local_bytes_unit),
+	EVENT_PTR(intel_cqm_total_bw_unit),
+	EVENT_PTR(intel_cqm_local_bw_unit),
+	EVENT_PTR(intel_cqm_total_bytes_scale),
+	EVENT_PTR(intel_cqm_local_bytes_scale),
+	EVENT_PTR(intel_cqm_total_bw_scale),
+	EVENT_PTR(intel_cqm_local_bw_scale),
+	EVENT_PTR(intel_cqm_total_bw_snapshot),
+	EVENT_PTR(intel_cqm_local_bw_snapshot),
+	NULL,
+};
+
+static struct attribute *intel_cmt_mbm_events_attr[] = {
+	EVENT_PTR(intel_cqm_llc),
+	EVENT_PTR(intel_cqm_total_bytes),
+	EVENT_PTR(intel_cqm_local_bytes),
+	EVENT_PTR(intel_cqm_total_bw),
+	EVENT_PTR(intel_cqm_local_bw),
+	EVENT_PTR(intel_cqm_llc_pkg),
+	EVENT_PTR(intel_cqm_total_bytes_pkg),
+	EVENT_PTR(intel_cqm_local_bytes_pkg),
+	EVENT_PTR(intel_cqm_total_bw_pkg),
+	EVENT_PTR(intel_cqm_local_bw_pkg),
+	EVENT_PTR(intel_cqm_llc_unit),
+	EVENT_PTR(intel_cqm_total_bytes_unit),
+	EVENT_PTR(intel_cqm_local_bytes_unit),
+	EVENT_PTR(intel_cqm_total_bw_unit),
+	EVENT_PTR(intel_cqm_local_bw_unit),
+	EVENT_PTR(intel_cqm_llc_scale),
+	EVENT_PTR(intel_cqm_total_bytes_scale),
+	EVENT_PTR(intel_cqm_local_bytes_scale),
+	EVENT_PTR(intel_cqm_total_bw_scale),
+	EVENT_PTR(intel_cqm_local_bw_scale),
+	EVENT_PTR(intel_cqm_llc_snapshot),
+	EVENT_PTR(intel_cqm_total_bw_snapshot),
+	EVENT_PTR(intel_cqm_local_bw_snapshot),
+	NULL,
+};
+
 static struct attribute_group intel_cqm_events_group = {
 	.name = "events",
-	.attrs = intel_cqm_events_attr,
+	.attrs = NULL,
 };
 
 PMU_FORMAT_ATTR(event, "config:0-7");
@@ -1320,12 +1417,51 @@ static const struct x86_cpu_id intel_cqm_match[] = {
 	{}
 };
 
+static const struct x86_cpu_id intel_mbm_local_match[] = {
+	{ .vendor = X86_VENDOR_INTEL, .feature = X86_FEATURE_CQM_MBM_LOCAL },
+	{}
+};
+
+static const struct x86_cpu_id intel_mbm_total_match[] = {
+	{ .vendor = X86_VENDOR_INTEL, .feature = X86_FEATURE_CQM_MBM_TOTAL },
+	{}
+};
+
+static int intel_mbm_init(void)
+{
+	int ret = 0, array_size, maxid = cqm_max_rmid + 1;
+
+	mbm_socket_max = cpumask_weight(&cqm_cpumask);
+	array_size = sizeof(struct sample) * maxid * mbm_socket_max;
+	mbm_local = kmalloc(array_size, GFP_KERNEL);
+	if (!mbm_local) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	mbm_total = kmalloc(array_size, GFP_KERNEL);
+	if (!mbm_total) {
+		kfree(mbm_local);
+		ret = -ENOMEM;
+	}
+out:
+
+	return ret;
+}
+
 static int __init intel_cqm_init(void)
 {
-	char *str, scale[20];
+	char *str = NULL, scale[20];
 	int i, cpu, ret;
 
-	if (!x86_match_cpu(intel_cqm_match))
+	if (x86_match_cpu(intel_cqm_match))
+		cqm_enabled = true;
+
+	if (x86_match_cpu(intel_mbm_local_match) &&
+	     x86_match_cpu(intel_mbm_total_match))
+		mbm_enabled = true;
+
+	if (!cqm_enabled && !mbm_enabled)
 		return -ENODEV;
 
 	cqm_l3_scale = boot_cpu_data.x86_cache_occ_scale;
@@ -1384,14 +1520,34 @@ static int __init intel_cqm_init(void)
 
 	__perf_cpu_notifier(intel_cqm_cpu_notifier);
 
-	ret = perf_pmu_register(&intel_cqm_pmu, "intel_cqm", -1);
-	if (ret)
-		pr_err("Intel CQM perf registration failed: %d\n", ret);
-	else
-		pr_info("Intel CQM monitoring enabled\n");
+	if (mbm_enabled)
+		ret = intel_mbm_init();
+	if (ret && !cqm_enabled)
+		goto out;
+
+	if (cqm_enabled && mbm_enabled)
+		intel_cqm_events_group.attrs = intel_cmt_mbm_events_attr;
+	else if (!cqm_enabled && mbm_enabled)
+		intel_cqm_events_group.attrs = intel_mbm_events_attr;
+	else if (cqm_enabled && !mbm_enabled)
+		intel_cqm_events_group.attrs = intel_cqm_events_attr;
 
+	ret = perf_pmu_register(&intel_cqm_pmu, "intel_cqm", -1);
+	if (ret) {
+		pr_err("Intel CQM MBM perf registration failed: %d\n", ret);
+	} else {
+		if (cqm_enabled)
+			pr_info("Intel CQM monitoring enabled\n");
+		if (mbm_enabled)
+			pr_info("Intel MBM enabled\n");
+	}
 out:
 	cpu_notifier_register_done();
+	if (ret) {
+		mbm_enabled = false;
+		cqm_enabled = false;
+		kfree(str);
+	}
 
 	return ret;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] x86/mbm: Memory bandwidth monitoring event management
  2016-02-11  1:07 [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support Vikas Shivappa
  2016-02-11  1:07 ` [PATCH 1/5] x86,perf/cqm : Fix cqm handling of grouping events into a cache_group Vikas Shivappa
  2016-02-11  1:07 ` [PATCH 2/5] x86/mbm: Intel Memory B/W Monitoring enumeration and init Vikas Shivappa
@ 2016-02-11  1:07 ` Vikas Shivappa
  2016-02-11  1:07 ` [PATCH 4/5] x86/mbm: RMID Recycling MBM changes Vikas Shivappa
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Vikas Shivappa @ 2016-02-11  1:07 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: x86, linux-kernel, hpa, tglx, vikas.shivappa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin

From: Tony Luck<tony.luck@intel.com>

Includes all the core infrastructure to measure the total_bytes and
bandwidth.

We have per socket counters for both total system wide L3 external bytes
and local socket memory-controller bytes. The current b/w is calculated
for a minimum diff time(time since it was last counted) of 100ms. The OS
does MSR writes to MSR_IA32_QM_EVTSEL and MSR_IA32_QM_CTR to read the
counters and uses the IA32_PQR_ASSOC_MSR to associate the RMID with
the task. The tasks have a common RMID for cqm(cache quality of
 service monitoring) and MBM. Hence most of the scheduling code is
reused from cqm.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_cqm.c | 159 ++++++++++++++++++++++++++++-
 1 file changed, 155 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index e45f5aa..b1c9663 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -13,6 +13,11 @@
 #define MSR_IA32_QM_CTR		0x0c8e
 #define MSR_IA32_QM_EVTSEL	0x0c8d
 
+/*
+ * MBM Counter is 24bits wide. MBM_CNTR_MAX defines max counter
+ * value
+ */
+#define MBM_CNTR_MAX		0xffffff
 static u32 cqm_max_rmid = -1;
 static unsigned int cqm_l3_scale; /* supposedly cacheline size */
 static bool cqm_enabled, mbm_enabled;
@@ -69,6 +74,16 @@ static struct sample *mbm_total;
  */
 static struct sample *mbm_local;
 
+#define pkg_id	topology_physical_package_id(smp_processor_id())
+/*
+ * rmid_2_index returns the index for the rmid in mbm_local/mbm_total array.
+ * mbm_total[] and mbm_local[] are linearly indexed by socket# * max number of
+ * rmids per socket, an example is given below
+ * RMID1 of Socket0:  vrmid =  1
+ * RMID1 of Socket1:  vrmid =  1 * (cqm_max_rmid + 1) + 1
+ * RMID1 of Socket2:  vrmid =  2 * (cqm_max_rmid + 1) + 1
+ */
+#define rmid_2_index(rmid)  ((pkg_id * (cqm_max_rmid + 1)) + rmid)
 /*
  * Protects cache_cgroups and cqm_rmid_free_lru and cqm_rmid_limbo_lru.
  * Also protects event->hw.cqm_rmid
@@ -92,8 +107,19 @@ static cpumask_t cqm_cpumask;
 #define RMID_VAL_UNAVAIL	(1ULL << 62)
 
 #define QOS_L3_OCCUP_EVENT_ID	(1 << 0)
+/*
+ * MBM Event IDs as defined in SDM section 17.15.5
+ * Event IDs are used to program EVTSEL MSRs before reading mbm event counters
+ */
+enum mbm_evt_type {
+	QOS_MBM_TOTAL_EVENT_ID = 0x02,
+	QOS_MBM_LOCAL_EVENT_ID,
+	QOS_MBM_TOTAL_BW_EVENT_ID,
+	QOS_MBM_LOCAL_BW_EVENT_ID,
+};
 
-#define QOS_EVENT_MASK	QOS_L3_OCCUP_EVENT_ID
+#define QOS_MBM_BW_EVENT_MASK 0x04
+#define QOS_MBM_LOCAL_EVENT_MASK 0x01
 
 /*
  * This is central to the rotation algorithm in __intel_cqm_rmid_rotate().
@@ -423,9 +449,16 @@ static bool __conflict_event(struct perf_event *a, struct perf_event *b)
 struct rmid_read {
 	u32 rmid;
 	atomic64_t value;
+	enum mbm_evt_type evt_type;
 };
 
 static void __intel_cqm_event_count(void *info);
+static void init_mbm_sample(u32 rmid, enum mbm_evt_type evt_type);
+
+static bool is_mbm_event(int e)
+{
+	return (e >= QOS_MBM_TOTAL_EVENT_ID && e <= QOS_MBM_LOCAL_BW_EVENT_ID);
+}
 
 /*
  * Exchange the RMID of a group of events.
@@ -867,6 +900,98 @@ static void intel_cqm_rmid_rotate(struct work_struct *work)
 	schedule_delayed_work(&intel_cqm_rmid_work, delay);
 }
 
+static struct sample *update_sample(unsigned int rmid,
+				    enum mbm_evt_type evt_type, int first)
+{
+	ktime_t cur_time;
+	struct sample *mbm_current;
+	u32 vrmid = rmid_2_index(rmid);
+	u64 val, bytes, diff_time;
+	u32 eventid;
+
+	if (evt_type & QOS_MBM_LOCAL_EVENT_MASK) {
+		mbm_current = &mbm_local[vrmid];
+		eventid     =  QOS_MBM_LOCAL_EVENT_ID;
+	} else {
+		mbm_current = &mbm_total[vrmid];
+		eventid     = QOS_MBM_TOTAL_EVENT_ID;
+	}
+
+	cur_time = ktime_get();
+	wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
+	rdmsrl(MSR_IA32_QM_CTR, val);
+	if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
+		return mbm_current;
+	val &= MBM_CNTR_MAX;
+
+	if (first) {
+		mbm_current->interval_start = cur_time;
+		mbm_current->prev_msr = val;
+		mbm_current->total_bytes = 0;
+		mbm_current->interval_bytes = 0;
+		mbm_current->bandwidth = 0;
+		return mbm_current;
+	}
+
+	if (val < mbm_current->prev_msr)
+		bytes = MBM_CNTR_MAX - mbm_current->prev_msr + val + 1;
+	else
+		bytes = val - mbm_current->prev_msr;
+	bytes *= cqm_l3_scale;
+
+	mbm_current->total_bytes += bytes;
+	mbm_current->interval_bytes += bytes;
+	mbm_current->prev_msr = val;
+	diff_time = ktime_ms_delta(cur_time, mbm_current->interval_start);
+
+	/*
+	 * The b/w measured is really the most recent/current b/w.
+	 * We wait till enough time has passed to avoid
+	 * arthmetic rounding problems.Having it at >=100ms,
+	 * such errors would be <=1%.
+	 */
+	if (diff_time > 100) {
+		bytes = mbm_current->interval_bytes * MSEC_PER_SEC;
+		do_div(bytes, diff_time);
+		mbm_current->bandwidth = bytes;
+		mbm_current->interval_bytes = 0;
+		mbm_current->interval_start = cur_time;
+	}
+
+	return mbm_current;
+}
+
+static u64 rmid_read_mbm(unsigned int rmid, enum mbm_evt_type evt_type)
+{
+	struct sample *mbm_current;
+
+	mbm_current = update_sample(rmid, evt_type, 0);
+
+	if (evt_type & QOS_MBM_BW_EVENT_MASK)
+		return mbm_current->bandwidth;
+	else
+		return mbm_current->total_bytes;
+}
+
+static void __intel_mbm_event_init(void *info)
+{
+	struct rmid_read *rr = info;
+
+	update_sample(rr->rmid, rr->evt_type, 1);
+}
+
+static void init_mbm_sample(u32 rmid, enum mbm_evt_type evt_type)
+{
+	struct rmid_read rr = {
+		.value = ATOMIC64_INIT(0),
+	};
+
+	rr.rmid = rmid;
+	rr.evt_type = evt_type;
+	/* on each socket, init sample */
+	on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_init, &rr, 1);
+}
+
 /*
  * Find a group and setup RMID.
  *
@@ -887,6 +1012,9 @@ static void intel_cqm_setup_event(struct perf_event *event,
 			/* All tasks in a group share an RMID */
 			event->hw.cqm_rmid = rmid;
 			*group = iter;
+			if (is_mbm_event(event->attr.config) &&
+			    __rmid_valid(rmid))
+				init_mbm_sample(rmid, event->attr.config);
 			return;
 		}
 
@@ -903,6 +1031,9 @@ static void intel_cqm_setup_event(struct perf_event *event,
 	else
 		rmid = __get_rmid();
 
+	if (is_mbm_event(event->attr.config) && __rmid_valid(rmid))
+		init_mbm_sample(rmid, event->attr.config);
+
 	event->hw.cqm_rmid = rmid;
 }
 
@@ -924,7 +1055,10 @@ static void intel_cqm_event_read(struct perf_event *event)
 	if (!__rmid_valid(rmid))
 		goto out;
 
-	val = __rmid_read(rmid);
+	if (is_mbm_event(event->attr.config))
+		val = rmid_read_mbm(rmid, event->attr.config);
+	else
+		val = __rmid_read(rmid);
 
 	/*
 	 * Ignore this reading on error states and do not update the value.
@@ -955,6 +1089,17 @@ static inline bool cqm_group_leader(struct perf_event *event)
 	return !list_empty(&event->hw.cqm_groups_entry);
 }
 
+static void __intel_mbm_event_count(void *info)
+{
+	struct rmid_read *rr = info;
+	u64 val;
+
+	val = rmid_read_mbm(rr->rmid, rr->evt_type);
+	if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
+		return;
+	atomic64_add(val, &rr->value);
+}
+
 static u64 intel_cqm_event_count(struct perf_event *event)
 {
 	unsigned long flags;
@@ -1008,7 +1153,12 @@ static u64 intel_cqm_event_count(struct perf_event *event)
 	if (!__rmid_valid(rr.rmid))
 		goto out;
 
-	on_each_cpu_mask(&cqm_cpumask, __intel_cqm_event_count, &rr, 1);
+	if (is_mbm_event(event->attr.config)) {
+		rr.evt_type = event->attr.config;
+		on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_count, &rr, 1);
+	} else {
+		on_each_cpu_mask(&cqm_cpumask, __intel_cqm_event_count, &rr, 1);
+	}
 
 	raw_spin_lock_irqsave(&cache_lock, flags);
 	if (event->hw.cqm_rmid == rr.rmid)
@@ -1123,7 +1273,8 @@ static int intel_cqm_event_init(struct perf_event *event)
 	if (event->attr.type != intel_cqm_pmu.type)
 		return -ENOENT;
 
-	if (event->attr.config & ~QOS_EVENT_MASK)
+	if ((event->attr.config < QOS_L3_OCCUP_EVENT_ID) ||
+	     (event->attr.config > QOS_MBM_LOCAL_BW_EVENT_ID))
 		return -EINVAL;
 
 	/* unsupported modes and filters */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] x86/mbm: RMID Recycling MBM changes
  2016-02-11  1:07 [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support Vikas Shivappa
                   ` (2 preceding siblings ...)
  2016-02-11  1:07 ` [PATCH 3/5] x86/mbm: Memory bandwidth monitoring event management Vikas Shivappa
@ 2016-02-11  1:07 ` Vikas Shivappa
  2016-02-11  1:07 ` [PATCH 5/5] x86/mbm: Add support for MBM counter overflow handling Vikas Shivappa
  2016-02-17 18:52 ` [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support Vikas Shivappa
  5 siblings, 0 replies; 15+ messages in thread
From: Vikas Shivappa @ 2016-02-11  1:07 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: x86, linux-kernel, hpa, tglx, vikas.shivappa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin

RMID could be allocated or deallocated as part of RMID recycling.
When an RMID is allocated for mbm event, the mbm counter needs to be
initialized because next time we read the counter we need the previous
value to account for total bytes that went to the memory controller.
Similarly, when RMID is deallocated we need to update the ->count
variable.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_cqm.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index b1c9663..fea22c8 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -454,6 +454,7 @@ struct rmid_read {
 
 static void __intel_cqm_event_count(void *info);
 static void init_mbm_sample(u32 rmid, enum mbm_evt_type evt_type);
+static void __intel_mbm_event_count(void *info);
 
 static bool is_mbm_event(int e)
 {
@@ -480,8 +481,14 @@ static u32 intel_cqm_xchg_rmid(struct perf_event *group, u32 rmid)
 			.rmid = old_rmid,
 		};
 
-		on_each_cpu_mask(&cqm_cpumask, __intel_cqm_event_count,
-				 &rr, 1);
+		if (is_mbm_event(group->attr.config)) {
+			rr.evt_type = group->attr.config;
+			on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_count,
+					 &rr, 1);
+		} else {
+			on_each_cpu_mask(&cqm_cpumask, __intel_cqm_event_count,
+					 &rr, 1);
+		}
 		local64_set(&group->count, atomic64_read(&rr.value));
 	}
 
@@ -493,6 +500,22 @@ static u32 intel_cqm_xchg_rmid(struct perf_event *group, u32 rmid)
 
 	raw_spin_unlock_irq(&cache_lock);
 
+	/*
+	 * If the allocation is for mbm, init the mbm stats.
+	 * Need to check if each event in the group is mbm event
+	 * because there could be multiple type of events in the same group.
+	 */
+	if (__rmid_valid(rmid)) {
+		event = group;
+		if (is_mbm_event(event->attr.config))
+			init_mbm_sample(rmid, event->attr.config);
+
+		list_for_each_entry(event, head, hw.cqm_group_entry) {
+			if (is_mbm_event(event->attr.config))
+				init_mbm_sample(rmid, event->attr.config);
+		}
+	}
+
 	return old_rmid;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] x86/mbm: Add support for MBM counter overflow handling
  2016-02-11  1:07 [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support Vikas Shivappa
                   ` (3 preceding siblings ...)
  2016-02-11  1:07 ` [PATCH 4/5] x86/mbm: RMID Recycling MBM changes Vikas Shivappa
@ 2016-02-11  1:07 ` Vikas Shivappa
  2016-02-24 10:36   ` Thomas Gleixner
  2016-02-17 18:52 ` [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support Vikas Shivappa
  5 siblings, 1 reply; 15+ messages in thread
From: Vikas Shivappa @ 2016-02-11  1:07 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: x86, linux-kernel, hpa, tglx, vikas.shivappa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin

This patch adds a per package timer which periodically updates the
Memory bandwidth counters for the events that are currently active.
Current patch has a periodic timer every 1s since the SDM guarantees
that the counter will not overflow in 1s but this time can be definitely
improved by calibrating on the system. The overflow is really a function
of the max memory b/w that the socket can support, max counter value and
scaling factor.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_cqm.c | 111 ++++++++++++++++++++++++++++-
 1 file changed, 110 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index fea22c8..8245dbd 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -18,6 +18,11 @@
  * value
  */
 #define MBM_CNTR_MAX		0xffffff
+/*
+ * Guaranteed time in ms as per SDM where MBM counters will not overflow.
+ */
+#define MBM_CTR_OVERFLOW_TIME	1000
+
 static u32 cqm_max_rmid = -1;
 static unsigned int cqm_l3_scale; /* supposedly cacheline size */
 static bool cqm_enabled, mbm_enabled;
@@ -49,6 +54,7 @@ struct intel_pqr_state {
  * interrupts disabled, which is sufficient for the protection.
  */
 static DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
+static struct hrtimer *mbm_timers;
 /**
  * struct sample - mbm event's (local or total) data
  * @interval_start Time this interval began
@@ -1123,6 +1129,84 @@ static void __intel_mbm_event_count(void *info)
 	atomic64_add(val, &rr->value);
 }
 
+static enum hrtimer_restart mbm_hrtimer_handle(struct hrtimer *hrtimer)
+{
+	struct perf_event *iter, *iter1;
+	struct list_head *head;
+	unsigned long flags;
+	u32 grp_rmid;
+
+	/*
+	 * Need to cache_lock as the timer Event Select MSR reads
+	 * can race with the mbm/cqm count() and mbm_init() reads.
+	 */
+	mutex_lock(&cache_mutex);
+	raw_spin_lock_irqsave(&cache_lock, flags);
+
+	if (list_empty(&cache_groups))
+		goto out;
+
+	list_for_each_entry(iter, &cache_groups, hw.cqm_groups_entry) {
+		grp_rmid = iter->hw.cqm_rmid;
+		if (!__rmid_valid(grp_rmid))
+			continue;
+		if (is_mbm_event(iter->attr.config))
+			update_sample(grp_rmid, iter->attr.config, 0);
+
+		head = &iter->hw.cqm_group_entry;
+		if (list_empty(head))
+			continue;
+		list_for_each_entry(iter1, head, hw.cqm_group_entry) {
+			if (!iter1->hw.is_group_event)
+				break;
+			if (is_mbm_event(iter1->attr.config))
+				update_sample(iter1->hw.cqm_rmid,
+					      iter1->attr.config, 0);
+		}
+	}
+
+out:
+	raw_spin_unlock_irqrestore(&cache_lock, flags);
+	mutex_unlock(&cache_mutex);
+
+	hrtimer_forward_now(hrtimer, ms_to_ktime(MBM_CTR_OVERFLOW_TIME));
+
+	return HRTIMER_RESTART;
+}
+
+static void __mbm_start_timer(void *info)
+{
+	hrtimer_start(&mbm_timers[pkg_id], ms_to_ktime(MBM_CTR_OVERFLOW_TIME),
+			     HRTIMER_MODE_REL_PINNED);
+}
+
+static void __mbm_stop_timer(void *info)
+{
+	hrtimer_cancel(&mbm_timers[pkg_id]);
+}
+
+static void mbm_start_timers(void)
+{
+	on_each_cpu_mask(&cqm_cpumask, __mbm_start_timer, NULL, 1);
+}
+
+static void mbm_stop_timers(void)
+{
+	on_each_cpu_mask(&cqm_cpumask, __mbm_stop_timer, NULL, 1);
+}
+
+static void mbm_hrtimer_init(void)
+{
+	struct hrtimer *hr;
+	int i;
+
+	for (i = 0; i < mbm_socket_max; i++) {
+		hr = &mbm_timers[i];
+		hrtimer_init(hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+		hr->function = mbm_hrtimer_handle;
+	}
+}
+
 static u64 intel_cqm_event_count(struct perf_event *event)
 {
 	unsigned long flags;
@@ -1285,6 +1369,12 @@ static void intel_cqm_event_destroy(struct perf_event *event)
 		}
 	}
 
+	/*
+	 * Stop the mbm overflow timers when the last event is destroyed.
+	*/
+	if (list_empty(&cache_groups))
+		mbm_stop_timers();
+
 	mutex_unlock(&cache_mutex);
 }
 
@@ -1317,6 +1407,12 @@ static int intel_cqm_event_init(struct perf_event *event)
 
 	mutex_lock(&cache_mutex);
 
+	/*
+	 * Start the mbm overflow timers when the first event is created.
+	*/
+	if (list_empty(&cache_groups))
+		mbm_start_timers();
+
 	/* Will also set rmid */
 	intel_cqm_setup_event(event, &group);
 
@@ -1615,10 +1711,23 @@ static int intel_mbm_init(void)
 
 	mbm_total = kmalloc(array_size, GFP_KERNEL);
 	if (!mbm_total) {
-		kfree(mbm_local);
 		ret = -ENOMEM;
+		goto out;
+	}
+
+	array_size = sizeof(struct hrtimer) * mbm_socket_max;
+	mbm_timers = kmalloc(array_size, GFP_KERNEL);
+	if (!mbm_timers) {
+		ret = -ENOMEM;
+		goto out;
 	}
+	mbm_hrtimer_init();
+
 out:
+	if (ret) {
+		kfree(mbm_local);
+		kfree(mbm_total);
+	}
 
 	return ret;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support
  2016-02-11  1:07 [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support Vikas Shivappa
                   ` (4 preceding siblings ...)
  2016-02-11  1:07 ` [PATCH 5/5] x86/mbm: Add support for MBM counter overflow handling Vikas Shivappa
@ 2016-02-17 18:52 ` Vikas Shivappa
  2016-02-17 19:11   ` Thomas Gleixner
  5 siblings, 1 reply; 15+ messages in thread
From: Vikas Shivappa @ 2016-02-17 18:52 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, x86, linux-kernel, hpa, tglx, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin


Hello Thomas,

Was wondering if you had any feedback on the MBM patches. These are almost a 
complete rewrite (the core part 3/5 from Tony Luck). The reused parts are mostly 
the declarations of datastructures. This has tried to address all your comments on 
V3 about the commenting, the core calculation, init, lot of unnecessary 
and confusing constants and code..

http://lkml.kernel.org/r/alpine.DEB.2.11.1508192243081.3873@nanos

Thanks,
Vikas

On Wed, 10 Feb 2016, Vikas Shivappa wrote:

> The V4 version of MBM is almost a complete rewrite of the prior
> versions. It tries to address all of Thomas earlier
> comments.
>
> The patch series has one preparatory patch for cqm and then 4 MBM
> patches. *Patches apply on 4.5-rc1*.
>
> Memory bandwitdh monitoring(MBM) provides OS/VMM a way to monitor
> bandwidth from one level of cache to another. The current patches
> support L3 external bandwitch monitoring.
> It supports both 'local bandwidth' and 'total bandwidth' monitoring for
> the socket. Local bandwidth measures the amount of data sent through
> the memory controller on the socket and total b/w measures the total
> system bandwidth.
>
> The tasks are associated with a Resouce Monitoring ID(RMID) just like in
> cqm and OS uses a MSR write to indicate the RMID of the task during
> scheduling.
>
>
> [PATCH 1/5] x86,perf/cqm : Fix cqm handling of grouping events into a
> [PATCH 2/5] x86/mbm: Intel Memory B/W Monitoring enumeration and init
> [PATCH 3/5] x86/mbm: Memory bandwidth monitoring event management
> [PATCH 4/5] x86/mbm: RMID Recycling MBM changes
> [PATCH 5/5] x86/mbm: Add support for MBM counter overflow handling
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support
  2016-02-17 18:52 ` [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support Vikas Shivappa
@ 2016-02-17 19:11   ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2016-02-17 19:11 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: Vikas Shivappa, x86, linux-kernel, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin

Vikas,

On Wed, 17 Feb 2016, Vikas Shivappa wrote:
> Was wondering if you had any feedback on the MBM patches. These are almost a
> complete rewrite (the core part 3/5 from Tony Luck). The reused parts are
> mostly the declarations of datastructures. This has tried to address all your
> comments on V3 about the commenting, the core calculation, init, lot of
> unnecessary and confusing constants and code..

It's on my list. I tend to it, when I cleaned up the unholy mess in
perf_intel_* et al.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] x86/mbm: Intel Memory B/W Monitoring enumeration and init
  2016-02-11  1:07 ` [PATCH 2/5] x86/mbm: Intel Memory B/W Monitoring enumeration and init Vikas Shivappa
@ 2016-02-24 10:22   ` Thomas Gleixner
  2016-02-24 18:12     ` Vikas Shivappa
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2016-02-24 10:22 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, x86, linux-kernel, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin

On Wed, 10 Feb 2016, Vikas Shivappa wrote:ar	
> +static int intel_mbm_init(void)
> +{
> +	int ret = 0, array_size, maxid = cqm_max_rmid + 1;
> +
> +	mbm_socket_max = cpumask_weight(&cqm_cpumask);

This should use the new topology_max_packages() function, so you can alloc
your array correctly even if not all sockets are online/plugged yet.

> +	array_size = sizeof(struct sample) * maxid * mbm_socket_max;
> +	mbm_local = kmalloc(array_size, GFP_KERNEL);
> +	if (!mbm_local) {
> +		ret = -ENOMEM;

return -ENOMEM is sufficient here.

> +		goto out;
> +	}
> +
> +	mbm_total = kmalloc(array_size, GFP_KERNEL);
> +	if (!mbm_total) {
> +		kfree(mbm_local);
> +		ret = -ENOMEM;
> +	}
> +out:
> +
> +	return ret;
> +}
> +
>  	cqm_l3_scale = boot_cpu_data.x86_cache_occ_scale;
> @@ -1384,14 +1520,34 @@ static int __init intel_cqm_init(void)
>  
>  	__perf_cpu_notifier(intel_cqm_cpu_notifier);

You really should register the notifier _AFTER_ registering the pmu. That
needs to be fixed anyway, because the existing code leaks the notifier AND
memory in case perf_pmu_register fails.
  
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/5] x86/mbm: Add support for MBM counter overflow handling
  2016-02-11  1:07 ` [PATCH 5/5] x86/mbm: Add support for MBM counter overflow handling Vikas Shivappa
@ 2016-02-24 10:36   ` Thomas Gleixner
  2016-02-24 18:13     ` Vikas Shivappa
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2016-02-24 10:36 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, x86, linux-kernel, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin

On Wed, 10 Feb 2016, Vikas Shivappa wrote:
> +static enum hrtimer_restart mbm_hrtimer_handle(struct hrtimer *hrtimer)
> +{
> +	if (list_empty(&cache_groups))
> +		goto out;
> +
> +	list_for_each_entry(iter, &cache_groups, hw.cqm_groups_entry) {

....

> +		}
> +	}
> +
> +out:
> +	raw_spin_unlock_irqrestore(&cache_lock, flags);
> +	mutex_unlock(&cache_mutex);
> +
> +	hrtimer_forward_now(hrtimer, ms_to_ktime(MBM_CTR_OVERFLOW_TIME));
> +
> +	return HRTIMER_RESTART;

Why is that timer restarted if cache_groups is empty?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] x86/mbm: Intel Memory B/W Monitoring enumeration and init
  2016-02-24 10:22   ` Thomas Gleixner
@ 2016-02-24 18:12     ` Vikas Shivappa
  2016-02-24 19:21       ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Vikas Shivappa @ 2016-02-24 18:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, x86, linux-kernel, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin



On Wed, 24 Feb 2016, Thomas Gleixner wrote:

> On Wed, 10 Feb 2016, Vikas Shivappa wrote:ar
>> +static int intel_mbm_init(void)
>> +{
>> +	int ret = 0, array_size, maxid = cqm_max_rmid + 1;
>> +
>> +	mbm_socket_max = cpumask_weight(&cqm_cpumask);
>
> This should use the new topology_max_packages() function, so you can alloc
> your array correctly even if not all sockets are online/plugged yet.

Will fix. Thanks for pointing out.

>
>> +	array_size = sizeof(struct sample) * maxid * mbm_socket_max;
>> +	mbm_local = kmalloc(array_size, GFP_KERNEL);
>> +	if (!mbm_local) {
>> +		ret = -ENOMEM;
>
> return -ENOMEM is sufficient here.

Will fix

>
>> +		goto out;
>> +	}
>> +
>> +	mbm_total = kmalloc(array_size, GFP_KERNEL);
>> +	if (!mbm_total) {
>> +		kfree(mbm_local);
>> +		ret = -ENOMEM;
>> +	}
>> +out:
>> +
>> +	return ret;
>> +}
>> +
>>  	cqm_l3_scale = boot_cpu_data.x86_cache_occ_scale;
>> @@ -1384,14 +1520,34 @@ static int __init intel_cqm_init(void)
>>
>>  	__perf_cpu_notifier(intel_cqm_cpu_notifier);
>
> You really should register the notifier _AFTER_ registering the pmu. That
> needs to be fixed anyway, because the existing code leaks the notifier AND
> memory in case perf_pmu_register fails.

Correct, Will fix the notifier leak

Thanks,
Vikas

>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/5] x86/mbm: Add support for MBM counter overflow handling
  2016-02-24 10:36   ` Thomas Gleixner
@ 2016-02-24 18:13     ` Vikas Shivappa
  2016-02-24 19:21       ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Vikas Shivappa @ 2016-02-24 18:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, x86, linux-kernel, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin



On Wed, 24 Feb 2016, Thomas Gleixner wrote:

> On Wed, 10 Feb 2016, Vikas Shivappa wrote:
>> +static enum hrtimer_restart mbm_hrtimer_handle(struct hrtimer *hrtimer)
>> +{
>> +	if (list_empty(&cache_groups))
>> +		goto out;
>> +
>> +	list_for_each_entry(iter, &cache_groups, hw.cqm_groups_entry) {
>
> ....
>
>> +		}
>> +	}
>> +
>> +out:
>> +	raw_spin_unlock_irqrestore(&cache_lock, flags);
>> +	mutex_unlock(&cache_mutex);
>> +
>> +	hrtimer_forward_now(hrtimer, ms_to_ktime(MBM_CTR_OVERFLOW_TIME));
>> +
>> +	return HRTIMER_RESTART;
>
> Why is that timer restarted if cache_groups is empty?

Will fix. It should not be as the timers may be stopped by now as well. Also 
this patch incorrectly uses mutex(above) in the timer context. Will fix that as 
well.

Thanks,
Vikas

>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] x86/mbm: Intel Memory B/W Monitoring enumeration and init
  2016-02-24 18:12     ` Vikas Shivappa
@ 2016-02-24 19:21       ` Thomas Gleixner
  2016-02-24 20:05         ` Vikas Shivappa
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2016-02-24 19:21 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: Vikas Shivappa, x86, linux-kernel, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin

On Wed, 24 Feb 2016, Vikas Shivappa wrote:
> On Wed, 24 Feb 2016, Thomas Gleixner wrote:
> > You really should register the notifier _AFTER_ registering the pmu. That
> > needs to be fixed anyway, because the existing code leaks the notifier AND
> > memory in case perf_pmu_register fails.
> 
> Correct, Will fix the notifier leak

It's not only a notifier leak. The existing code also leaks memory. Please fix
that before adding the new stuff in a seperate patch.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/5] x86/mbm: Add support for MBM counter overflow handling
  2016-02-24 18:13     ` Vikas Shivappa
@ 2016-02-24 19:21       ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2016-02-24 19:21 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: Vikas Shivappa, x86, linux-kernel, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin

On Wed, 24 Feb 2016, Vikas Shivappa wrote:
> On Wed, 24 Feb 2016, Thomas Gleixner wrote:
> > Why is that timer restarted if cache_groups is empty?
> 
> Will fix. It should not be as the timers may be stopped by now as well. Also
> this patch incorrectly uses mutex(above) in the timer context. Will fix that
> as well.

Indeed. Did not notice.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] x86/mbm: Intel Memory B/W Monitoring enumeration and init
  2016-02-24 19:21       ` Thomas Gleixner
@ 2016-02-24 20:05         ` Vikas Shivappa
  0 siblings, 0 replies; 15+ messages in thread
From: Vikas Shivappa @ 2016-02-24 20:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, Vikas Shivappa, x86, linux-kernel, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, kanaka.d.juvva,
	h.peter.anvin



On Wed, 24 Feb 2016, Thomas Gleixner wrote:

> On Wed, 24 Feb 2016, Vikas Shivappa wrote:
>> On Wed, 24 Feb 2016, Thomas Gleixner wrote:
>>> You really should register the notifier _AFTER_ registering the pmu. That
>>> needs to be fixed anyway, because the existing code leaks the notifier AND
>>> memory in case perf_pmu_register fails.
>>
>> Correct, Will fix the notifier leak
>
> It's not only a notifier leak. The existing code also leaks memory. Please fix
> that before adding the new stuff in a seperate patch.

ok..Makes sense. will send the notifier and memory leak fix in seperate patch as 
its a existing cqm issue.

I had added a kfree(str) in the end for the memory leak fix in the current 
patch.

out:
 	cpu_notifier_register_done();
 	if (ret) {
 		mbm_enabled = false;
 		cqm_enabled = false;
 		kfree(str);
 	}



>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-02-24 20:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-11  1:07 [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support Vikas Shivappa
2016-02-11  1:07 ` [PATCH 1/5] x86,perf/cqm : Fix cqm handling of grouping events into a cache_group Vikas Shivappa
2016-02-11  1:07 ` [PATCH 2/5] x86/mbm: Intel Memory B/W Monitoring enumeration and init Vikas Shivappa
2016-02-24 10:22   ` Thomas Gleixner
2016-02-24 18:12     ` Vikas Shivappa
2016-02-24 19:21       ` Thomas Gleixner
2016-02-24 20:05         ` Vikas Shivappa
2016-02-11  1:07 ` [PATCH 3/5] x86/mbm: Memory bandwidth monitoring event management Vikas Shivappa
2016-02-11  1:07 ` [PATCH 4/5] x86/mbm: RMID Recycling MBM changes Vikas Shivappa
2016-02-11  1:07 ` [PATCH 5/5] x86/mbm: Add support for MBM counter overflow handling Vikas Shivappa
2016-02-24 10:36   ` Thomas Gleixner
2016-02-24 18:13     ` Vikas Shivappa
2016-02-24 19:21       ` Thomas Gleixner
2016-02-17 18:52 ` [PATCH V4 0/5] x86/mbm : Intel Memory Bandwidth monitoring support Vikas Shivappa
2016-02-17 19:11   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).