linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management
@ 2016-02-22 11:06 Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 01/28] x86/perf/intel/uncore: Remove pointless mask check Thomas Gleixner
                   ` (28 more replies)
  0 siblings, 29 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

This series addresses the following issues:

 - Add proper error handling to uncore and rapl drivers

 - Get rid of the pseudo per cpuness of these drivers and do a proper per
   package storage

 - Allow them to be modular

In order to do proper per package storage I added a facility which sanity
checks the physical package id of the processors which is supplied by bios and
does a logical package id translation. That allows drivers to do allocations
for the maximum number of possible packages independent of possible BIOS
creativity.

Changes vs. V1:

  - Add logical package management

  - Drop the exit_box callbacks for modern processors. This needs to be
    revisited on a case by case basis.

  - Fixup and convert the rapl driver as well

Thanks,

	tglx

---
 arch/x86/Kconfig                                    |    6 
 arch/x86/include/asm/processor.h                    |    2 
 arch/x86/include/asm/topology.h                     |   10 
 arch/x86/kernel/apic/apic.c                         |   14 
 arch/x86/kernel/cpu/Makefile                        |    7 
 arch/x86/kernel/cpu/common.c                        |    2 
 arch/x86/kernel/cpu/intel.c                         |   13 
 arch/x86/kernel/cpu/perf_event_intel_cqm.c          |   34 
 arch/x86/kernel/cpu/perf_event_intel_rapl.c         |  438 ++++++------
 arch/x86/kernel/cpu/perf_event_intel_uncore.c       |  694 ++++++++++----------
 arch/x86/kernel/cpu/perf_event_intel_uncore.h       |   51 +
 arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c |    6 
 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c   |   14 
 arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c |   10 
 arch/x86/kernel/cpu/proc.c                          |    1 
 arch/x86/kernel/smpboot.c                           |  100 ++
 b/arch/x86/Kconfig.perf                             |   23 
 include/linux/perf_event.h                          |    1 
 kernel/events/core.c                                |    1 
 lib/cpumask.c                                       |    1 
 20 files changed, 827 insertions(+), 601 deletions(-)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 01/28] x86/perf/intel/uncore: Remove pointless mask check
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 03/28] x86/perf/intel/uncore: Fix error handling Thomas Gleixner
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel_uncore--Remove-pointless-mask-check.patch --]
[-- Type: text/plain, Size: 619 bytes --]

uncore_cpumask_init() is only ever called from intel_uncore_init() where the
mask is guaranteed to be empty.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    6 ------
 1 file changed, 6 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1342,12 +1342,6 @@ static void __init uncore_cpumask_init(v
 {
 	int cpu;
 
-	/*
-	 * ony invoke once from msr or pci init code
-	 */
-	if (!cpumask_empty(&uncore_cpu_mask))
-		return;
-
 	cpu_notifier_register_begin();
 
 	for_each_online_cpu(cpu) {

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 02/28] x86/perf/intel/uncore: Simplify error rollback
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 01/28] x86/perf/intel/uncore: Remove pointless mask check Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 03/28] x86/perf/intel/uncore: Fix error handling Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 04/28] x86/perf/intel/uncore: Add sanity checks for pci dev package id Thomas Gleixner
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel_uncore--Simplify-error-rollback.patch --]
[-- Type: text/plain, Size: 3058 bytes --]

No point in doing partial rollbacks. Robustify uncore_exit_type() so it does
not dereference type->pmus unconditionally and remove all the partial rollback
hackery.

Preparatory patch for proper error handling.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   45 +++++++++++++-------------
 1 file changed, 24 insertions(+), 21 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -767,10 +767,12 @@ static void __init uncore_type_exit(stru
 {
 	int i;
 
-	for (i = 0; i < type->num_boxes; i++)
-		free_percpu(type->pmus[i].box);
-	kfree(type->pmus);
-	type->pmus = NULL;
+	if (type->pmus) {
+		for (i = 0; i < type->num_boxes; i++)
+			free_percpu(type->pmus[i].box);
+		kfree(type->pmus);
+		type->pmus = NULL;
+	}
 	kfree(type->events_group);
 	type->events_group = NULL;
 }
@@ -778,6 +780,7 @@ static void __init uncore_type_exit(stru
 static void __init uncore_types_exit(struct intel_uncore_type **types)
 {
 	int i;
+
 	for (i = 0; types[i]; i++)
 		uncore_type_exit(types[i]);
 }
@@ -806,7 +809,7 @@ static int __init uncore_type_init(struc
 		INIT_LIST_HEAD(&pmus[i].box_list);
 		pmus[i].box = alloc_percpu(struct intel_uncore_box *);
 		if (!pmus[i].box)
-			goto fail;
+			return -ENOMEM;
 	}
 
 	if (type->event_descs) {
@@ -817,7 +820,7 @@ static int __init uncore_type_init(struc
 		attr_group = kzalloc(sizeof(struct attribute *) * (i + 1) +
 					sizeof(*attr_group), GFP_KERNEL);
 		if (!attr_group)
-			goto fail;
+			return -ENOMEM;
 
 		attrs = (struct attribute **)(attr_group + 1);
 		attr_group->name = "events";
@@ -831,9 +834,6 @@ static int __init uncore_type_init(struc
 
 	type->pmu_group = &uncore_pmu_attr_group;
 	return 0;
-fail:
-	uncore_type_exit(type);
-	return -ENOMEM;
 }
 
 static int __init uncore_types_init(struct intel_uncore_type **types)
@@ -843,13 +843,9 @@ static int __init uncore_types_init(stru
 	for (i = 0; types[i]; i++) {
 		ret = uncore_type_init(types[i]);
 		if (ret)
-			goto fail;
+			return ret;
 	}
 	return 0;
-fail:
-	while (--i >= 0)
-		uncore_type_exit(types[i]);
-	return ret;
 }
 
 /*
@@ -1007,17 +1003,21 @@ static int __init uncore_pci_init(void)
 
 	ret = uncore_types_init(uncore_pci_uncores);
 	if (ret)
-		return ret;
+		goto err;
 
 	uncore_pci_driver->probe = uncore_pci_probe;
 	uncore_pci_driver->remove = uncore_pci_remove;
 
 	ret = pci_register_driver(uncore_pci_driver);
-	if (ret == 0)
-		pcidrv_registered = true;
-	else
-		uncore_types_exit(uncore_pci_uncores);
+	if (ret)
+		goto err;
+
+	pcidrv_registered = true;
+	return 0;
 
+err:
+	uncore_types_exit(uncore_pci_uncores);
+	uncore_pci_uncores = empty_uncore;
 	return ret;
 }
 
@@ -1316,9 +1316,12 @@ static int __init uncore_cpu_init(void)
 
 	ret = uncore_types_init(uncore_msr_uncores);
 	if (ret)
-		return ret;
-
+		goto err;
 	return 0;
+err:
+	uncore_types_exit(uncore_msr_uncores);
+	uncore_msr_uncores = empty_uncore;
+	return ret;
 }
 
 static int __init uncore_pmus_register(void)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 03/28] x86/perf/intel/uncore: Fix error handling
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 01/28] x86/perf/intel/uncore: Remove pointless mask check Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 02/28] x86/perf/intel/uncore: Simplify error rollback Thomas Gleixner
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel_uncore--Fix-error-handling.patch --]
[-- Type: text/plain, Size: 6930 bytes --]

This driver lacks any form of proper error handling. If initialization fails
or hotplug prepare fails, it lets the facility with half initialized stuff
around.

Fix the state and memory leaks in a first step. As a second step we need to
undo the hardware state which is set via uncore_box_init() on some of the
uncore implementations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  122 ++++++++++++++++++--------
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |   15 +--
 2 files changed, 94 insertions(+), 43 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -38,6 +38,16 @@ int uncore_pcibus_to_physid(struct pci_b
 	return phys_id;
 }
 
+static void uncore_free_pcibus_map(void)
+{
+	struct pci2phy_map *map, *tmp;
+
+	list_for_each_entry_safe(map, tmp, &pci2phy_map_head, list) {
+		list_del(&map->list);
+		kfree(map);
+	}
+}
+
 struct pci2phy_map *__find_pci2phy_map(int segment)
 {
 	struct pci2phy_map *map, *alloc = NULL;
@@ -760,16 +770,28 @@ static int uncore_pmu_register(struct in
 	}
 
 	ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
+	if (!ret)
+		pmu->registered = true;
 	return ret;
 }
 
+static void uncore_pmu_unregister(struct intel_uncore_pmu *pmu)
+{
+	if (!pmu->registered)
+		return;
+	perf_pmu_unregister(&pmu->pmu);
+	pmu->registered = false;
+}
+
 static void __init uncore_type_exit(struct intel_uncore_type *type)
 {
 	int i;
 
 	if (type->pmus) {
-		for (i = 0; i < type->num_boxes; i++)
+		for (i = 0; i < type->num_boxes; i++) {
+			uncore_pmu_unregister(&type->pmus[i]);
 			free_percpu(type->pmus[i].box);
+		}
 		kfree(type->pmus);
 		type->pmus = NULL;
 	}
@@ -856,8 +878,8 @@ static int uncore_pci_probe(struct pci_d
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
 	struct intel_uncore_type *type;
-	int phys_id;
 	bool first_box = false;
+	int phys_id, ret;
 
 	phys_id = uncore_pcibus_to_physid(pdev->bus);
 	if (phys_id < 0)
@@ -906,9 +928,18 @@ static int uncore_pci_probe(struct pci_d
 	list_add_tail(&box->list, &pmu->box_list);
 	raw_spin_unlock(&uncore_box_lock);
 
-	if (first_box)
-		uncore_pmu_register(pmu);
-	return 0;
+	if (!first_box)
+		return 0;
+
+	ret = uncore_pmu_register(pmu);
+	if (ret) {
+		pci_set_drvdata(pdev, NULL);
+		raw_spin_lock(&uncore_box_lock);
+		list_del(&box->list);
+		raw_spin_unlock(&uncore_box_lock);
+		kfree(box);
+	}
+	return ret;
 }
 
 static void uncore_pci_remove(struct pci_dev *pdev)
@@ -954,7 +985,7 @@ static void uncore_pci_remove(struct pci
 	kfree(box);
 
 	if (last_box)
-		perf_pmu_unregister(&pmu->pmu);
+		uncore_pmu_unregister(pmu);
 }
 
 static int __init uncore_pci_init(void)
@@ -1018,6 +1049,7 @@ static int __init uncore_pci_init(void)
 err:
 	uncore_types_exit(uncore_pci_uncores);
 	uncore_pci_uncores = empty_uncore;
+	uncore_free_pcibus_map();
 	return ret;
 }
 
@@ -1027,6 +1059,7 @@ static void __init uncore_pci_exit(void)
 		pcidrv_registered = false;
 		pci_unregister_driver(uncore_pci_driver);
 		uncore_types_exit(uncore_pci_uncores);
+		uncore_free_pcibus_map();
 	}
 }
 
@@ -1223,8 +1256,7 @@ static int uncore_cpu_notifier(struct no
 	/* allocate/free data structure for uncore box */
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_UP_PREPARE:
-		uncore_cpu_prepare(cpu, -1);
-		break;
+		return notifier_from_errno(uncore_cpu_prepare(cpu, -1));
 	case CPU_STARTING:
 		uncore_cpu_starting(cpu);
 		break;
@@ -1265,9 +1297,29 @@ static struct notifier_block uncore_cpu_
 	.priority	= CPU_PRI_PERF + 1,
 };
 
-static void __init uncore_cpu_setup(void *dummy)
+static int __init type_pmu_register(struct intel_uncore_type *type)
 {
-	uncore_cpu_starting(smp_processor_id());
+	int i, ret;
+
+	for (i = 0; i < type->num_boxes; i++) {
+		ret = uncore_pmu_register(&type->pmus[i]);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int __init uncore_msr_pmus_register(void)
+{
+	struct intel_uncore_type **types = uncore_msr_uncores;
+	int ret;
+
+	while (*types) {
+		ret = type_pmu_register(*types++);
+		if (ret)
+			return ret;
+	}
+	return 0;
 }
 
 static int __init uncore_cpu_init(void)
@@ -1317,6 +1369,10 @@ static int __init uncore_cpu_init(void)
 	ret = uncore_types_init(uncore_msr_uncores);
 	if (ret)
 		goto err;
+
+	ret = uncore_msr_pmus_register();
+	if (ret)
+		goto err;
 	return 0;
 err:
 	uncore_types_exit(uncore_msr_uncores);
@@ -1324,26 +1380,14 @@ static int __init uncore_cpu_init(void)
 	return ret;
 }
 
-static int __init uncore_pmus_register(void)
+static void __init uncore_cpu_setup(void *dummy)
 {
-	struct intel_uncore_pmu *pmu;
-	struct intel_uncore_type *type;
-	int i, j;
-
-	for (i = 0; uncore_msr_uncores[i]; i++) {
-		type = uncore_msr_uncores[i];
-		for (j = 0; j < type->num_boxes; j++) {
-			pmu = &type->pmus[j];
-			uncore_pmu_register(pmu);
-		}
-	}
-
-	return 0;
+	uncore_cpu_starting(smp_processor_id());
 }
 
-static void __init uncore_cpumask_init(void)
+static int __init uncore_cpumask_init(void)
 {
-	int cpu;
+	int cpu, ret = 0;
 
 	cpu_notifier_register_begin();
 
@@ -1359,17 +1403,20 @@ static void __init uncore_cpumask_init(v
 		if (phys_id < 0)
 			continue;
 
-		uncore_cpu_prepare(cpu, phys_id);
+		ret = uncore_cpu_prepare(cpu, phys_id);
+		if (ret)
+			goto out;
 		uncore_event_init_cpu(cpu);
 	}
 	on_each_cpu(uncore_cpu_setup, NULL, 1);
 
 	__register_cpu_notifier(&uncore_cpu_nb);
 
+out:
 	cpu_notifier_register_done();
+	return ret;
 }
 
-
 static int __init intel_uncore_init(void)
 {
 	int ret;
@@ -1382,17 +1429,20 @@ static int __init intel_uncore_init(void
 
 	ret = uncore_pci_init();
 	if (ret)
-		goto fail;
+		return ret;
 	ret = uncore_cpu_init();
-	if (ret) {
-		uncore_pci_exit();
-		goto fail;
-	}
-	uncore_cpumask_init();
+	if (ret)
+		goto errpci;
+	ret = uncore_cpumask_init();
+	if (ret)
+		goto errcpu;
 
-	uncore_pmus_register();
 	return 0;
-fail:
+
+errcpu:
+	uncore_types_exit(uncore_msr_uncores);
+errpci:
+	uncore_pci_exit();
 	return ret;
 }
 device_initcall(intel_uncore_init);
Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -73,13 +73,14 @@ struct intel_uncore_ops {
 };
 
 struct intel_uncore_pmu {
-	struct pmu pmu;
-	char name[UNCORE_PMU_NAME_LEN];
-	int pmu_idx;
-	int func_id;
-	struct intel_uncore_type *type;
-	struct intel_uncore_box ** __percpu box;
-	struct list_head box_list;
+	struct pmu			pmu;
+	char				name[UNCORE_PMU_NAME_LEN];
+	int				pmu_idx;
+	int				func_id;
+	bool				registered;
+	struct intel_uncore_type	*type;
+	struct intel_uncore_box		** __percpu box;
+	struct list_head		box_list;
 };
 
 struct intel_uncore_extra_reg {

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 04/28] x86/perf/intel/uncore: Add sanity checks for pci dev package id
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (2 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 02/28] x86/perf/intel/uncore: Simplify error rollback Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 05/28] x86/perf/intel_uncore: Cleanup hardware on exit Thomas Gleixner
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel_uncore--Add-sanity-checks-for-package-id.patch --]
[-- Type: text/plain, Size: 742 bytes --]

The storage array is size limited, but misses a sanity check

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -882,7 +882,7 @@ static int uncore_pci_probe(struct pci_d
 	int phys_id, ret;
 
 	phys_id = uncore_pcibus_to_physid(pdev->bus);
-	if (phys_id < 0)
+	if (phys_id < 0 || phys_id >= UNCORE_SOCKET_MAX)
 		return -ENODEV;
 
 	if (UNCORE_PCI_DEV_TYPE(id->driver_data) == UNCORE_EXTRA_PCI_DEV) {

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 05/28] x86/perf/intel_uncore: Cleanup hardware on exit
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (3 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 04/28] x86/perf/intel/uncore: Add sanity checks for pci dev package id Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 06/28] x86/perf/intel/uncore: Drop pointless hotplug priority Thomas Gleixner
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel_uncore--Cleanup-hardware-on-exit.patch --]
[-- Type: text/plain, Size: 4817 bytes --]

When tearing down the boxes nothing undoes the hardware state which was setup
by box->init_box(). Add a box->exit_box() callback and implement it for the
uncores which have an init_box() callback.

This misses the cleanup in the error exit pathes, but I cannot be bothered to
implement it before cleaning up the rest of the driver, which makes that task
way simpler.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Removed the code which affects IVB and newer as Liang pointed out that
    it's incorrect
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c       |    6 +++++-
 arch/x86/kernel/cpu/perf_event_intel_uncore.h       |    9 +++++++++
 arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c |    6 ++++++
 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c   |   13 +++++++++++++
 4 files changed, 33 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -937,6 +937,7 @@ static int uncore_pci_probe(struct pci_d
 		raw_spin_lock(&uncore_box_lock);
 		list_del(&box->list);
 		raw_spin_unlock(&uncore_box_lock);
+		uncore_box_exit(box);
 		kfree(box);
 	}
 	return ret;
@@ -982,6 +983,7 @@ static void uncore_pci_remove(struct pci
 	}
 
 	WARN_ON_ONCE(atomic_read(&box->refcnt) != 1);
+	uncore_box_exit(box);
 	kfree(box);
 
 	if (last_box)
@@ -1091,8 +1093,10 @@ static void uncore_cpu_dying(int cpu)
 			pmu = &type->pmus[j];
 			box = *per_cpu_ptr(pmu->box, cpu);
 			*per_cpu_ptr(pmu->box, cpu) = NULL;
-			if (box && atomic_dec_and_test(&box->refcnt))
+			if (box && atomic_dec_and_test(&box->refcnt)) {
 				list_add(&box->list, &boxes_to_free);
+				uncore_box_exit(box);
+			}
 		}
 	}
 }
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -61,6 +61,7 @@ struct intel_uncore_type {
 
 struct intel_uncore_ops {
 	void (*init_box)(struct intel_uncore_box *);
+	void (*exit_box)(struct intel_uncore_box *);
 	void (*disable_box)(struct intel_uncore_box *);
 	void (*enable_box)(struct intel_uncore_box *);
 	void (*disable_event)(struct intel_uncore_box *, struct perf_event *);
@@ -306,6 +307,14 @@ static inline void uncore_box_init(struc
 	}
 }
 
+static inline void uncore_box_exit(struct intel_uncore_box *box)
+{
+	if (test_and_clear_bit(UNCORE_BOX_FLAG_INITIATED, &box->flags)) {
+		if (box->pmu->type->ops->exit_box)
+			box->pmu->type->ops->exit_box(box);
+	}
+}
+
 static inline bool uncore_box_is_fake(struct intel_uncore_box *box)
 {
 	return (box->phys_id < 0);
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c
@@ -201,6 +201,11 @@ static void nhmex_uncore_msr_init_box(st
 	wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL, NHMEX_U_PMON_GLOBAL_EN_ALL);
 }
 
+static void nhmex_uncore_msr_exit_box(struct intel_uncore_box *box)
+{
+	wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL, 0);
+}
+
 static void nhmex_uncore_msr_disable_box(struct intel_uncore_box *box)
 {
 	unsigned msr = uncore_msr_box_ctl(box);
@@ -250,6 +255,7 @@ static void nhmex_uncore_msr_enable_even
 
 #define NHMEX_UNCORE_OPS_COMMON_INIT()				\
 	.init_box	= nhmex_uncore_msr_init_box,		\
+	.exit_box	= nhmex_uncore_msr_exit_box,		\
 	.disable_box	= nhmex_uncore_msr_disable_box,		\
 	.enable_box	= nhmex_uncore_msr_enable_box,		\
 	.disable_event	= nhmex_uncore_msr_disable_event,	\
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
@@ -95,6 +95,12 @@ static void snb_uncore_msr_init_box(stru
 	}
 }
 
+static void snb_uncore_msr_exit_box(struct intel_uncore_box *box)
+{
+	if (box->pmu->pmu_idx == 0)
+		wrmsrl(SNB_UNC_PERF_GLOBAL_CTL, 0);
+}
+
 static struct uncore_event_desc snb_uncore_events[] = {
 	INTEL_UNCORE_EVENT_DESC(clockticks, "event=0xff,umask=0x00"),
 	{ /* end: all zeroes */ },
@@ -116,6 +122,7 @@ static struct attribute_group snb_uncore
 
 static struct intel_uncore_ops snb_uncore_msr_ops = {
 	.init_box	= snb_uncore_msr_init_box,
+	.exit_box	= snb_uncore_msr_exit_box,
 	.disable_event	= snb_uncore_msr_disable_event,
 	.enable_event	= snb_uncore_msr_enable_event,
 	.read_counter	= uncore_msr_read_counter,
@@ -231,6 +238,11 @@ static void snb_uncore_imc_init_box(stru
 	box->hrtimer_duration = UNCORE_SNB_IMC_HRTIMER_INTERVAL;
 }
 
+static void snb_uncore_imc_exit_box(struct intel_uncore_box *box)
+{
+	iounmap(box->io_addr);
+}
+
 static void snb_uncore_imc_enable_box(struct intel_uncore_box *box)
 {}
 
@@ -458,6 +470,7 @@ static struct pmu snb_uncore_imc_pmu = {
 
 static struct intel_uncore_ops snb_uncore_imc_ops = {
 	.init_box	= snb_uncore_imc_init_box,
+	.exit_box	= snb_uncore_imc_exit_box,
 	.enable_box	= snb_uncore_imc_enable_box,
 	.disable_box	= snb_uncore_imc_disable_box,
 	.disable_event	= snb_uncore_imc_disable_event,

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 06/28] x86/perf/intel/uncore: Drop pointless hotplug priority
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (4 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 05/28] x86/perf/intel_uncore: Cleanup hardware on exit Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 07/28] x86/perf/intel_uncore: Make code readable Thomas Gleixner
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel-uncore--Drop-pointless-hotplug-priority.patch --]
[-- Type: text/plain, Size: 713 bytes --]

The perf core doesn't do anything related to hardware pmus on hotplug. Drop
the priority.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    5 -----
 1 file changed, 5 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1294,11 +1294,6 @@ static int uncore_cpu_notifier(struct no
 
 static struct notifier_block uncore_cpu_nb = {
 	.notifier_call	= uncore_cpu_notifier,
-	/*
-	 * to migrate uncore events, our notifier should be executed
-	 * before perf core's notifier.
-	 */
-	.priority	= CPU_PRI_PERF + 1,
 };
 
 static int __init type_pmu_register(struct intel_uncore_type *type)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 07/28] x86/perf/intel_uncore: Make code readable
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (5 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 06/28] x86/perf/intel/uncore: Drop pointless hotplug priority Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 08/28] x86/perf/uncore: Make uncore_pcibus_to_physid static Thomas Gleixner
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel_uncore--Make-code-readable.patch --]
[-- Type: text/plain, Size: 5486 bytes --]

Cleanup the code a bit before reworking it completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   95 +++++++++++++-------------
 1 file changed, 50 insertions(+), 45 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -217,7 +217,8 @@ u64 uncore_shared_reg_config(struct inte
 	return config;
 }
 
-static void uncore_assign_hw_event(struct intel_uncore_box *box, struct perf_event *event, int idx)
+static void uncore_assign_hw_event(struct intel_uncore_box *box,
+				   struct perf_event *event, int idx)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
@@ -312,18 +313,19 @@ static void uncore_pmu_init_hrtimer(stru
 	box->hrtimer.function = uncore_pmu_hrtimer;
 }
 
-static struct intel_uncore_box *uncore_alloc_box(struct intel_uncore_type *type, int node)
+static struct intel_uncore_box *uncore_alloc_box(struct intel_uncore_type *type,
+						 int node)
 {
+	int i, size, numshared = type->num_shared_regs ;
 	struct intel_uncore_box *box;
-	int i, size;
 
-	size = sizeof(*box) + type->num_shared_regs * sizeof(struct intel_uncore_extra_reg);
+	size = sizeof(*box) + numshared * sizeof(struct intel_uncore_extra_reg);
 
 	box = kzalloc_node(size, GFP_KERNEL, node);
 	if (!box)
 		return NULL;
 
-	for (i = 0; i < type->num_shared_regs; i++)
+	for (i = 0; i < numshared; i++)
 		raw_spin_lock_init(&box->shared_regs[i].lock);
 
 	uncore_pmu_init_hrtimer(box);
@@ -351,7 +353,8 @@ static bool is_uncore_event(struct perf_
 }
 
 static int
-uncore_collect_events(struct intel_uncore_box *box, struct perf_event *leader, bool dogrp)
+uncore_collect_events(struct intel_uncore_box *box, struct perf_event *leader,
+		      bool dogrp)
 {
 	struct perf_event *event;
 	int n, max_count;
@@ -412,7 +415,8 @@ uncore_get_event_constraint(struct intel
 	return &type->unconstrainted;
 }
 
-static void uncore_put_event_constraint(struct intel_uncore_box *box, struct perf_event *event)
+static void uncore_put_event_constraint(struct intel_uncore_box *box,
+					struct perf_event *event)
 {
 	if (box->pmu->type->ops->put_constraint)
 		box->pmu->type->ops->put_constraint(box, event);
@@ -592,7 +596,7 @@ static void uncore_pmu_event_del(struct
 		if (event == box->event_list[i]) {
 			uncore_put_event_constraint(box, event);
 
-			while (++i < box->n_events)
+			for (++i; i < box->n_events; i++)
 				box->event_list[i - 1] = box->event_list[i];
 
 			--box->n_events;
@@ -801,10 +805,8 @@ static void __init uncore_type_exit(stru
 
 static void __init uncore_types_exit(struct intel_uncore_type **types)
 {
-	int i;
-
-	for (i = 0; types[i]; i++)
-		uncore_type_exit(types[i]);
+	for (; *types; types++)
+		uncore_type_exit(*types);
 }
 
 static int __init uncore_type_init(struct intel_uncore_type *type)
@@ -908,9 +910,11 @@ static int uncore_pci_probe(struct pci_d
 	 * some device types. Hence PCI device idx would be 0 for all devices.
 	 * So increment pmu pointer to point to an unused array element.
 	 */
-	if (boot_cpu_data.x86_model == 87)
+	if (boot_cpu_data.x86_model == 87) {
 		while (pmu->func_id >= 0)
 			pmu++;
+	}
+
 	if (pmu->func_id < 0)
 		pmu->func_id = pdev->devfn;
 	else
@@ -1170,44 +1174,45 @@ static int uncore_cpu_prepare(int cpu, i
 	return 0;
 }
 
-static void
-uncore_change_context(struct intel_uncore_type **uncores, int old_cpu, int new_cpu)
+static void uncore_change_type_ctx(struct intel_uncore_type *type, int old_cpu,
+				   int new_cpu)
 {
-	struct intel_uncore_type *type;
-	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_pmu *pmu = type->pmus;
 	struct intel_uncore_box *box;
-	int i, j;
-
-	for (i = 0; uncores[i]; i++) {
-		type = uncores[i];
-		for (j = 0; j < type->num_boxes; j++) {
-			pmu = &type->pmus[j];
-			if (old_cpu < 0)
-				box = uncore_pmu_to_box(pmu, new_cpu);
-			else
-				box = uncore_pmu_to_box(pmu, old_cpu);
-			if (!box)
-				continue;
+	int i;
 
-			if (old_cpu < 0) {
-				WARN_ON_ONCE(box->cpu != -1);
-				box->cpu = new_cpu;
-				continue;
-			}
+	for (i = 0; i < type->num_boxes; i++, pmu++) {
+		if (old_cpu < 0)
+			box = uncore_pmu_to_box(pmu, new_cpu);
+		else
+			box = uncore_pmu_to_box(pmu, old_cpu);
+		if (!box)
+			continue;
 
-			WARN_ON_ONCE(box->cpu != old_cpu);
-			if (new_cpu >= 0) {
-				uncore_pmu_cancel_hrtimer(box);
-				perf_pmu_migrate_context(&pmu->pmu,
-						old_cpu, new_cpu);
-				box->cpu = new_cpu;
-			} else {
-				box->cpu = -1;
-			}
+		if (old_cpu < 0) {
+			WARN_ON_ONCE(box->cpu != -1);
+			box->cpu = new_cpu;
+			continue;
 		}
+
+		WARN_ON_ONCE(box->cpu != old_cpu);
+		box->cpu = -1;
+		if (new_cpu < 0)
+			continue;
+
+		uncore_pmu_cancel_hrtimer(box);
+		perf_pmu_migrate_context(&pmu->pmu, old_cpu, new_cpu);
+		box->cpu = new_cpu;
 	}
 }
 
+static void uncore_change_context(struct intel_uncore_type **uncores,
+				  int old_cpu, int new_cpu)
+{
+	for (; *uncores; uncores++)
+		uncore_change_type_ctx(*uncores, old_cpu, new_cpu);
+}
+
 static void uncore_event_exit_cpu(int cpu)
 {
 	int i, phys_id, target;
@@ -1313,8 +1318,8 @@ static int __init uncore_msr_pmus_regist
 	struct intel_uncore_type **types = uncore_msr_uncores;
 	int ret;
 
-	while (*types) {
-		ret = type_pmu_register(*types++);
+	for (; *types; types++) {
+		ret = type_pmu_register(*types);
 		if (ret)
 			return ret;
 	}

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 08/28] x86/perf/uncore: Make uncore_pcibus_to_physid static
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (6 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 07/28] x86/perf/intel_uncore: Make code readable Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 09/28] perf: Allow storage of pmu private data in event Thomas Gleixner
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-uncore--Make-uncore_pcibus_to_physid-static.patch --]
[-- Type: text/plain, Size: 1233 bytes --]

No users outside of this file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    2 +-
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |    1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -21,7 +21,7 @@ static struct event_constraint uncore_co
 struct event_constraint uncore_constraint_empty =
 	EVENT_CONSTRAINT(0, 0, 0);
 
-int uncore_pcibus_to_physid(struct pci_bus *bus)
+static int uncore_pcibus_to_physid(struct pci_bus *bus)
 {
 	struct pci2phy_map *map;
 	int phys_id = -1;
Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -125,7 +125,6 @@ struct pci2phy_map {
 	int pbus_to_physid[256];
 };
 
-int uncore_pcibus_to_physid(struct pci_bus *bus);
 struct pci2phy_map *__find_pci2phy_map(int segment);
 
 ssize_t uncore_event_show(struct kobject *kobj,

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 09/28] perf: Allow storage of pmu private data in event
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (7 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 08/28] x86/perf/uncore: Make uncore_pcibus_to_physid static Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 11/28] x86/topology: Create logical package id Thomas Gleixner
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: perf--Allow-storage-of-pmu-private-data-in-event.patch --]
[-- Type: text/plain, Size: 726 bytes --]

For pmus which are not per cpu, but e.g. per package/socket, we want to be
able to store a reference to the underlying per package/socket facility in the
event at init time so we can avoid magic storage constructs in the pmu driver.

This allows us to get rid of the per cpu dance in the intel uncore and rapl
drivers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/perf_event.h |    1 +
 1 file changed, 1 insertion(+)

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -467,6 +467,7 @@ struct perf_event {
 	int				group_flags;
 	struct perf_event		*group_leader;
 	struct pmu			*pmu;
+	void				*pmu_private;
 
 	enum perf_event_active_state	state;
 	unsigned int			attach_state;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 11/28] x86/topology: Create logical package id
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (8 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 09/28] perf: Allow storage of pmu private data in event Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 18:54   ` Andi Kleen
  2016-02-22 11:06 ` [patch V2 10/28] x86/perf/intel_uncore: Store box in event->pmu_private Thomas Gleixner
                   ` (18 subsequent siblings)
  28 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-topology--Create-logical-package-id.patch --]
[-- Type: text/plain, Size: 8848 bytes --]

For per package oriented services we must be able to rely on the number of cpu
packages to be within bounds. Create a tracking facility, which

- calculates the number of possible packages depending on nr_cpu_ids after
  boot

- makes sure that the package id is within the number of possible packages. If
  the apic id is outside we map it to a logical package id if there is enough
  space available.

Provide interfaces for drivers to query the mapping and do translations from
physcial to logical ids.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    2 
 arch/x86/include/asm/topology.h  |   10 +++
 arch/x86/kernel/apic/apic.c      |   14 +++++
 arch/x86/kernel/cpu/common.c     |    2 
 arch/x86/kernel/cpu/intel.c      |   13 +++++
 arch/x86/kernel/cpu/proc.c       |    1 
 arch/x86/kernel/smpboot.c        |  100 +++++++++++++++++++++++++++++++++++++++
 7 files changed, 142 insertions(+)

Index: b/arch/x86/include/asm/processor.h
===================================================================
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -129,6 +129,8 @@ struct cpuinfo_x86 {
 	u16			booted_cores;
 	/* Physical processor id: */
 	u16			phys_proc_id;
+	/* Logical processor id: */
+	u16			logical_proc_id;
 	/* Core id: */
 	u16			cpu_core_id;
 	/* Compute unit id */
Index: b/arch/x86/include/asm/topology.h
===================================================================
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -119,12 +119,22 @@ static inline void setup_node_to_cpumask
 
 extern const struct cpumask *cpu_coregroup_mask(int cpu);
 
+#define topology_logical_package_id(cpu)	(cpu_data(cpu).logical_proc_id)
 #define topology_physical_package_id(cpu)	(cpu_data(cpu).phys_proc_id)
 #define topology_core_id(cpu)			(cpu_data(cpu).cpu_core_id)
 
 #ifdef ENABLE_TOPO_DEFINES
 #define topology_core_cpumask(cpu)		(per_cpu(cpu_core_map, cpu))
 #define topology_sibling_cpumask(cpu)		(per_cpu(cpu_sibling_map, cpu))
+
+extern unsigned int __max_logical_packages;
+#define topology_max_packages()			(__max_logical_packages)
+int topology_update_package_map(unsigned int apicid, unsigned int cpu);
+extern int topology_phys_to_logical_pkg(unsigned int pkg);
+#else
+#define topology_max_packages()			(1)
+static inline int
+topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; }
 #endif
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
Index: b/arch/x86/kernel/apic/apic.c
===================================================================
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2078,6 +2078,20 @@ int generic_processor_info(int apicid, i
 		cpu = cpumask_next_zero(-1, cpu_present_mask);
 
 	/*
+	 * This can happen on physical hotplug. The sanity check at boot time
+	 * is done from native_smp_prepare_cpus() after num_possible_cpus() is
+	 * established.
+	 */
+	if (topology_update_package_map(apicid, cpu) < 0) {
+		int thiscpu = max + disabled_cpus;
+
+		pr_warning("ACPI: Package limit reached. Processor %d/0x%x ignored.\n",
+			   thiscpu, apicid);
+		disabled_cpus++;
+		return -ENOSPC;
+	}
+
+	/*
 	 * Validate version
 	 */
 	if (version == 0x0) {
Index: b/arch/x86/kernel/cpu/common.c
===================================================================
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -977,6 +977,8 @@ static void identify_cpu(struct cpuinfo_
 #ifdef CONFIG_NUMA
 	numa_add_cpu(smp_processor_id());
 #endif
+	/* The boot/hotplug time assigment got cleared, restore it */
+	c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
 }
 
 /*
Index: b/arch/x86/kernel/cpu/intel.c
===================================================================
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -160,6 +160,19 @@ static void early_init_intel(struct cpui
 		pr_info("Disabling PGE capability bit\n");
 		setup_clear_cpu_cap(X86_FEATURE_PGE);
 	}
+
+	if (c->cpuid_level >= 0x00000001) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid(0x00000001, &eax, &ebx, &ecx, &edx);
+		/*
+		 * If HTT (EDX[28]) is set EBX[16:23] contain the number of
+		 * apicids which are reserved per package. Store the resulting
+		 * shift value for the package management code.
+		 */
+		if (edx & (1U << 28))
+			c->x86_coreid_bits = get_count_order((ebx >> 16) & 0xff);
+	}
 }
 
 #ifdef CONFIG_X86_32
Index: b/arch/x86/kernel/cpu/proc.c
===================================================================
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -12,6 +12,7 @@ static void show_cpuinfo_core(struct seq
 {
 #ifdef CONFIG_SMP
 	seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
+	seq_printf(m, "logical id\t: %d\n", c->logical_proc_id);
 	seq_printf(m, "siblings\t: %d\n",
 		   cpumask_weight(topology_core_cpumask(cpu)));
 	seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
Index: b/arch/x86/kernel/smpboot.c
===================================================================
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -97,6 +97,14 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t
 DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info);
 EXPORT_PER_CPU_SYMBOL(cpu_info);
 
+/* Logical package management. We might want to allocate that dynamically */
+static int *physical_to_logical_pkg __read_mostly;
+static unsigned long *physical_package_map __read_mostly;;
+static unsigned long *logical_package_map  __read_mostly;
+static unsigned int max_physical_pkg_id __read_mostly;
+unsigned int __max_logical_packages __read_mostly;
+EXPORT_SYMBOL(__max_logical_packages);
+
 static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
 {
 	unsigned long flags;
@@ -251,6 +259,97 @@ static void notrace start_secondary(void
 	cpu_startup_entry(CPUHP_ONLINE);
 }
 
+int topology_update_package_map(unsigned int apicid, unsigned int cpu)
+{
+	unsigned int new, pkg = apicid >> boot_cpu_data.x86_coreid_bits;
+
+	/* Called from early boot ? */
+	if (!physical_package_map)
+		return 0;
+
+	if (pkg >= max_physical_pkg_id)
+		return -EINVAL;
+
+	/* Set the logical package id */
+	if (test_and_set_bit(pkg, physical_package_map))
+		goto found;
+
+	if (pkg < __max_logical_packages) {
+		set_bit(pkg, logical_package_map);
+		physical_to_logical_pkg[pkg] = pkg;
+		goto found;
+	}
+	new = find_first_zero_bit(logical_package_map, __max_logical_packages);
+	if (new >= __max_logical_packages) {
+		physical_to_logical_pkg[pkg] = -1;
+		pr_warn("APIC(%x) Package %u exceeds logical package map\n",
+			apicid, pkg);
+		return -ENOSPC;
+	}
+	set_bit(new, logical_package_map);
+	pr_info("APIC(%x) Converting physical %u to logical package %u\n",
+		apicid, pkg, new);
+	physical_to_logical_pkg[pkg] = new;
+
+found:
+	cpu_data(cpu).logical_proc_id = physical_to_logical_pkg[pkg];
+	return 0;
+}
+
+/**
+ * topology_phys_to_logical_pkg - Map a physical package id to a logical
+ *
+ * Returns logical package id or -1 if not found
+ */
+int topology_phys_to_logical_pkg(unsigned int phys_pkg)
+{
+	if (phys_pkg >= max_physical_pkg_id)
+		return -1;
+	return physical_to_logical_pkg[phys_pkg];
+}
+EXPORT_SYMBOL(topology_phys_to_logical_pkg);
+
+static void __init smp_init_package_map(void)
+{
+	unsigned int ncpus, cpu;
+	size_t size;
+
+	/*
+	 * Today neither Intel nor AMD support heterogenous systems. That
+	 * might change in the future....
+	 */
+	ncpus = boot_cpu_data.x86_max_cores * smp_num_siblings;
+	__max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
+
+	/*
+	 * Possibly larger than what we need as the number of apic ids per
+	 * package can be smaller than the actual used apic ids.
+	 */
+	max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);
+	size = max_physical_pkg_id * sizeof(unsigned int);
+	physical_to_logical_pkg = kmalloc(size, GFP_KERNEL);
+	memset(physical_to_logical_pkg, 0xff, size);
+	size = BITS_TO_LONGS(max_physical_pkg_id);
+	physical_package_map = kzalloc(size, GFP_KERNEL);
+	size = BITS_TO_LONGS(__max_logical_packages);
+	logical_package_map = kzalloc(size, GFP_KERNEL);
+
+	pr_info("Max logical packages: %u\n", __max_logical_packages);
+
+	for_each_present_cpu(cpu) {
+		unsigned int apicid = apic->cpu_present_to_apicid(cpu);
+
+		if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
+			continue;
+		if (!topology_update_package_map(apicid, cpu))
+			continue;
+		pr_warn("CPU %u APICId %x disabled\n", cpu, apicid);
+		per_cpu(x86_bios_cpu_apicid, cpu) = BAD_APICID;
+		set_cpu_possible(cpu, false);
+		set_cpu_present(cpu, false);
+	}
+}
+
 void __init smp_store_boot_cpu_info(void)
 {
 	int id = 0; /* CPU 0 */
@@ -258,6 +357,7 @@ void __init smp_store_boot_cpu_info(void
 
 	*c = boot_cpu_data;
 	c->cpu_index = id;
+	smp_init_package_map();
 }
 
 /*

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 10/28] x86/perf/intel_uncore: Store box in event->pmu_private
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (9 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 11/28] x86/topology: Create logical package id Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:45   ` Peter Zijlstra
  2016-02-22 11:52   ` Peter Zijlstra
  2016-02-22 11:06 ` [patch V2 12/28] x86/perf/uncore: Track packages not per cpu data Thomas Gleixner
                   ` (17 subsequent siblings)
  28 siblings, 2 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel_uncore--Store-box-in-event->pmu_private.patch --]
[-- Type: text/plain, Size: 3190 bytes --]

Store the pmu in event->pmu_private, so we can get rid of the per cpu data
storage.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c     |   15 +--------------
 arch/x86/kernel/cpu/perf_event_intel_uncore.h     |   12 ++++++++++--
 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c |    1 +
 3 files changed, 12 insertions(+), 16 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -92,11 +92,6 @@ ssize_t uncore_event_show(struct kobject
 	return sprintf(buf, "%s", event->config);
 }
 
-struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
-{
-	return container_of(event->pmu, struct intel_uncore_pmu, pmu);
-}
-
 struct intel_uncore_box *uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
 {
 	struct intel_uncore_box *box;
@@ -122,15 +117,6 @@ struct intel_uncore_box *uncore_pmu_to_b
 	return *per_cpu_ptr(pmu->box, cpu);
 }
 
-struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
-{
-	/*
-	 * perf core schedules event on the basis of cpu, uncore events are
-	 * collected by one of the cpus inside a physical package.
-	 */
-	return uncore_pmu_to_box(uncore_event_to_pmu(event), smp_processor_id());
-}
-
 u64 uncore_msr_read_counter(struct intel_uncore_box *box, struct perf_event *event)
 {
 	u64 count;
@@ -690,6 +676,7 @@ static int uncore_pmu_event_init(struct
 	if (!box || box->cpu < 0)
 		return -EINVAL;
 	event->cpu = box->cpu;
+	event->pmu_private = box;
 
 	event->hw.idx = -1;
 	event->hw.last_tag = ~0ULL;
Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -319,9 +319,17 @@ static inline bool uncore_box_is_fake(st
 	return (box->phys_id < 0);
 }
 
-struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event);
+static inline struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
+{
+	return container_of(event->pmu, struct intel_uncore_pmu, pmu);
+}
+
+static inline struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
+{
+	return event->pmu_private;
+}
+
 struct intel_uncore_box *uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu);
-struct intel_uncore_box *uncore_event_to_box(struct perf_event *event);
 u64 uncore_msr_read_counter(struct intel_uncore_box *box, struct perf_event *event);
 void uncore_pmu_start_hrtimer(struct intel_uncore_box *box);
 void uncore_pmu_cancel_hrtimer(struct intel_uncore_box *box);
Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
@@ -313,6 +313,7 @@ static int snb_uncore_imc_event_init(str
 		return -EINVAL;
 
 	event->cpu = box->cpu;
+	event->pmu_private = pmu;
 
 	event->hw.idx = -1;
 	event->hw.last_tag = ~0ULL;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 12/28] x86/perf/uncore: Track packages not per cpu data
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (10 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 10/28] x86/perf/intel_uncore: Store box in event->pmu_private Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 13/28] x86/perf/intel_uncore: Clear all hardware state on exit Thomas Gleixner
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-uncore--Track-packages-not-per-cpu-data.patch --]
[-- Type: text/plain, Size: 22117 bytes --]

uncore is a per package facility, but the code tries to mimick a per cpu
facility with completely convoluted constructs.

Simplify the whole machinery by tracking per package information. While at it,
avoid the kfree/alloc dance when a cpu goes offline and online again. There is
no point in freeing the box after it was allocated. We just keep proper
refcounting and the first cpu which comes online in a package does the
initialization/activation of the box.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c       |  420 ++++++++------------
 arch/x86/kernel/cpu/perf_event_intel_uncore.h       |   18 
 arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c |   10 
 3 files changed, 197 insertions(+), 251 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -9,9 +9,9 @@ struct pci_driver *uncore_pci_driver;
 /* pci bus to socket mapping */
 DEFINE_RAW_SPINLOCK(pci2phy_map_lock);
 struct list_head pci2phy_map_head = LIST_HEAD_INIT(pci2phy_map_head);
-struct pci_dev *uncore_extra_pci_dev[UNCORE_SOCKET_MAX][UNCORE_EXTRA_PCI_DEV_MAX];
+struct pci_extra_dev *uncore_extra_pci_dev;
+static int max_packages;
 
-static DEFINE_RAW_SPINLOCK(uncore_box_lock);
 /* mask of cpus that collect uncore events */
 static cpumask_t uncore_cpu_mask;
 
@@ -94,27 +94,7 @@ ssize_t uncore_event_show(struct kobject
 
 struct intel_uncore_box *uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
 {
-	struct intel_uncore_box *box;
-
-	box = *per_cpu_ptr(pmu->box, cpu);
-	if (box)
-		return box;
-
-	raw_spin_lock(&uncore_box_lock);
-	/* Recheck in lock to handle races. */
-	if (*per_cpu_ptr(pmu->box, cpu))
-		goto out;
-	list_for_each_entry(box, &pmu->box_list, list) {
-		if (box->phys_id == topology_physical_package_id(cpu)) {
-			atomic_inc(&box->refcnt);
-			*per_cpu_ptr(pmu->box, cpu) = box;
-			break;
-		}
-	}
-out:
-	raw_spin_unlock(&uncore_box_lock);
-
-	return *per_cpu_ptr(pmu->box, cpu);
+	return pmu->boxes[topology_logical_package_id(cpu)];
 }
 
 u64 uncore_msr_read_counter(struct intel_uncore_box *box, struct perf_event *event)
@@ -315,9 +295,9 @@ static struct intel_uncore_box *uncore_a
 		raw_spin_lock_init(&box->shared_regs[i].lock);
 
 	uncore_pmu_init_hrtimer(box);
-	atomic_set(&box->refcnt, 1);
 	box->cpu = -1;
-	box->phys_id = -1;
+	box->pci_phys_id = -1;
+	box->pkgid = -1;
 
 	/* set default hrtimer timeout */
 	box->hrtimer_duration = UNCORE_PMU_HRTIMER_INTERVAL;
@@ -774,14 +754,24 @@ static void uncore_pmu_unregister(struct
 	pmu->registered = false;
 }
 
+static void uncore_free_boxes(struct intel_uncore_pmu *pmu)
+{
+	int pkg;
+
+	for (pkg = 0; pkg < max_packages; pkg++)
+		kfree(pmu->boxes[pkg]);
+	kfree(pmu->boxes);
+}
+
 static void __init uncore_type_exit(struct intel_uncore_type *type)
 {
+	struct intel_uncore_pmu *pmu = type->pmus;
 	int i;
 
-	if (type->pmus) {
-		for (i = 0; i < type->num_boxes; i++) {
-			uncore_pmu_unregister(&type->pmus[i]);
-			free_percpu(type->pmus[i].box);
+	if (pmu) {
+		for (i = 0; i < type->num_boxes; i++, pmu++) {
+			uncore_pmu_unregister(pmu);
+			uncore_free_boxes(pmu);
 		}
 		kfree(type->pmus);
 		type->pmus = NULL;
@@ -796,37 +786,36 @@ static void __init uncore_types_exit(str
 		uncore_type_exit(*types);
 }
 
-static int __init uncore_type_init(struct intel_uncore_type *type)
+static int __init uncore_type_init(struct intel_uncore_type *type, bool setid)
 {
 	struct intel_uncore_pmu *pmus;
 	struct attribute_group *attr_group;
 	struct attribute **attrs;
+	size_t size;
 	int i, j;
 
 	pmus = kzalloc(sizeof(*pmus) * type->num_boxes, GFP_KERNEL);
 	if (!pmus)
 		return -ENOMEM;
 
-	type->pmus = pmus;
-
-	type->unconstrainted = (struct event_constraint)
-		__EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
-				0, type->num_counters, 0, 0);
+	size = max_packages * sizeof(struct intel_uncore_box *);
 
 	for (i = 0; i < type->num_boxes; i++) {
-		pmus[i].func_id = -1;
-		pmus[i].pmu_idx = i;
-		pmus[i].type = type;
-		INIT_LIST_HEAD(&pmus[i].box_list);
-		pmus[i].box = alloc_percpu(struct intel_uncore_box *);
-		if (!pmus[i].box)
+		pmus[i].func_id	= setid ? i : -1;
+		pmus[i].pmu_idx	= i;
+		pmus[i].type	= type;
+		pmus[i].boxes	= kzalloc(size, GFP_KERNEL);
+		if (!pmus[i].boxes)
 			return -ENOMEM;
 	}
 
+	type->pmus = pmus;
+	type->unconstrainted = (struct event_constraint)
+		__EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
+				0, type->num_counters, 0, 0);
+
 	if (type->event_descs) {
-		i = 0;
-		while (type->event_descs[i].attr.attr.name)
-			i++;
+		for (i = 0; type->event_descs[i].attr.attr.name; i++);
 
 		attr_group = kzalloc(sizeof(struct attribute *) * (i + 1) +
 					sizeof(*attr_group), GFP_KERNEL);
@@ -847,12 +836,13 @@ static int __init uncore_type_init(struc
 	return 0;
 }
 
-static int __init uncore_types_init(struct intel_uncore_type **types)
+static int __init
+uncore_types_init(struct intel_uncore_type **types, bool setid)
 {
-	int i, ret;
+	int ret;
 
-	for (i = 0; types[i]; i++) {
-		ret = uncore_type_init(types[i]);
+	for (; *types; types++) {
+		ret = uncore_type_init(*types, setid);
 		if (ret)
 			return ret;
 	}
@@ -864,28 +854,28 @@ static int __init uncore_types_init(stru
  */
 static int uncore_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
+	struct intel_uncore_type *type;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
-	struct intel_uncore_type *type;
-	bool first_box = false;
-	int phys_id, ret;
+	int phys_id, pkg, ret;
 
 	phys_id = uncore_pcibus_to_physid(pdev->bus);
-	if (phys_id < 0 || phys_id >= UNCORE_SOCKET_MAX)
+	if (phys_id < 0)
 		return -ENODEV;
 
+	pkg = topology_phys_to_logical_pkg(phys_id);
+	if (WARN_ON_ONCE(pkg < 0))
+		return -EINVAL;
+
 	if (UNCORE_PCI_DEV_TYPE(id->driver_data) == UNCORE_EXTRA_PCI_DEV) {
 		int idx = UNCORE_PCI_DEV_IDX(id->driver_data);
-		uncore_extra_pci_dev[phys_id][idx] = pdev;
+
+		uncore_extra_pci_dev[pkg].dev[idx] = pdev;
 		pci_set_drvdata(pdev, NULL);
 		return 0;
 	}
 
 	type = uncore_pci_uncores[UNCORE_PCI_DEV_TYPE(id->driver_data)];
-	box = uncore_alloc_box(type, NUMA_NO_NODE);
-	if (!box)
-		return -ENOMEM;
-
 	/*
 	 * for performance monitoring unit with multiple boxes,
 	 * each box has a different function id.
@@ -902,32 +892,35 @@ static int uncore_pci_probe(struct pci_d
 			pmu++;
 	}
 
+	if (WARN_ON_ONCE(pmu->boxes[pkg] != NULL))
+		return -EINVAL;
+
+	box = uncore_alloc_box(type, NUMA_NO_NODE);
+	if (!box)
+		return -ENOMEM;
+
 	if (pmu->func_id < 0)
 		pmu->func_id = pdev->devfn;
 	else
 		WARN_ON_ONCE(pmu->func_id != pdev->devfn);
 
-	box->phys_id = phys_id;
+	atomic_inc(&box->refcnt);
+	box->pci_phys_id = phys_id;
+	box->pkgid = pkg;
 	box->pci_dev = pdev;
 	box->pmu = pmu;
 	uncore_box_init(box);
 	pci_set_drvdata(pdev, box);
 
-	raw_spin_lock(&uncore_box_lock);
-	if (list_empty(&pmu->box_list))
-		first_box = true;
-	list_add_tail(&box->list, &pmu->box_list);
-	raw_spin_unlock(&uncore_box_lock);
-
-	if (!first_box)
+	pmu->boxes[pkg] = box;
+	if (atomic_inc_return(&pmu->activeboxes) > 1)
 		return 0;
 
+	/* First active box registers the pmu */
 	ret = uncore_pmu_register(pmu);
 	if (ret) {
 		pci_set_drvdata(pdev, NULL);
-		raw_spin_lock(&uncore_box_lock);
-		list_del(&box->list);
-		raw_spin_unlock(&uncore_box_lock);
+		pmu->boxes[pkg] = NULL;
 		uncore_box_exit(box);
 		kfree(box);
 	}
@@ -938,15 +931,16 @@ static void uncore_pci_remove(struct pci
 {
 	struct intel_uncore_box *box = pci_get_drvdata(pdev);
 	struct intel_uncore_pmu *pmu;
-	int i, cpu, phys_id;
-	bool last_box = false;
+	int i, phys_id, pkg;
 
 	phys_id = uncore_pcibus_to_physid(pdev->bus);
+	pkg = topology_phys_to_logical_pkg(phys_id);
+
 	box = pci_get_drvdata(pdev);
 	if (!box) {
 		for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
-			if (uncore_extra_pci_dev[phys_id][i] == pdev) {
-				uncore_extra_pci_dev[phys_id][i] = NULL;
+			if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
+				uncore_extra_pci_dev[pkg].dev[i] = NULL;
 				break;
 			}
 		}
@@ -955,34 +949,20 @@ static void uncore_pci_remove(struct pci
 	}
 
 	pmu = box->pmu;
-	if (WARN_ON_ONCE(phys_id != box->phys_id))
+	if (WARN_ON_ONCE(phys_id != box->pci_phys_id))
 		return;
 
 	pci_set_drvdata(pdev, NULL);
-
-	raw_spin_lock(&uncore_box_lock);
-	list_del(&box->list);
-	if (list_empty(&pmu->box_list))
-		last_box = true;
-	raw_spin_unlock(&uncore_box_lock);
-
-	for_each_possible_cpu(cpu) {
-		if (*per_cpu_ptr(pmu->box, cpu) == box) {
-			*per_cpu_ptr(pmu->box, cpu) = NULL;
-			atomic_dec(&box->refcnt);
-		}
-	}
-
-	WARN_ON_ONCE(atomic_read(&box->refcnt) != 1);
+	pmu->boxes[pkg] = NULL;
+	if (atomic_dec_return(&pmu->activeboxes) == 0)
+		uncore_pmu_unregister(pmu);
 	uncore_box_exit(box);
 	kfree(box);
-
-	if (last_box)
-		uncore_pmu_unregister(pmu);
 }
 
 static int __init uncore_pci_init(void)
 {
+	size_t size;
 	int ret;
 
 	switch (boot_cpu_data.x86_model) {
@@ -1025,24 +1005,34 @@ static int __init uncore_pci_init(void)
 	if (ret)
 		return ret;
 
-	ret = uncore_types_init(uncore_pci_uncores);
-	if (ret)
+	size = max_packages * sizeof(struct pci_extra_dev);
+	uncore_extra_pci_dev = kzalloc(size, GFP_KERNEL);
+	if (!uncore_extra_pci_dev) {
+		ret = -ENOMEM;
 		goto err;
+	}
+
+	ret = uncore_types_init(uncore_pci_uncores, false);
+	if (ret)
+		goto errtype;
 
 	uncore_pci_driver->probe = uncore_pci_probe;
 	uncore_pci_driver->remove = uncore_pci_remove;
 
 	ret = pci_register_driver(uncore_pci_driver);
 	if (ret)
-		goto err;
+		goto errtype;
 
 	pcidrv_registered = true;
 	return 0;
 
-err:
+errtype:
 	uncore_types_exit(uncore_pci_uncores);
-	uncore_pci_uncores = empty_uncore;
+	kfree(uncore_extra_pci_dev);
+	uncore_extra_pci_dev = NULL;
 	uncore_free_pcibus_map();
+err:
+	uncore_pci_uncores = empty_uncore;
 	return ret;
 }
 
@@ -1052,110 +1042,81 @@ static void __init uncore_pci_exit(void)
 		pcidrv_registered = false;
 		pci_unregister_driver(uncore_pci_driver);
 		uncore_types_exit(uncore_pci_uncores);
+		kfree(uncore_extra_pci_dev);
 		uncore_free_pcibus_map();
 	}
 }
 
-/* CPU hot plug/unplug are serialized by cpu_add_remove_lock mutex */
-static LIST_HEAD(boxes_to_free);
-
-static void uncore_kfree_boxes(void)
-{
-	struct intel_uncore_box *box;
-
-	while (!list_empty(&boxes_to_free)) {
-		box = list_entry(boxes_to_free.next,
-				 struct intel_uncore_box, list);
-		list_del(&box->list);
-		kfree(box);
-	}
-}
-
 static void uncore_cpu_dying(int cpu)
 {
-	struct intel_uncore_type *type;
+	struct intel_uncore_type *type, **types = uncore_msr_uncores;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
-	int i, j;
+	int i, pkg;
 
-	for (i = 0; uncore_msr_uncores[i]; i++) {
-		type = uncore_msr_uncores[i];
-		for (j = 0; j < type->num_boxes; j++) {
-			pmu = &type->pmus[j];
-			box = *per_cpu_ptr(pmu->box, cpu);
-			*per_cpu_ptr(pmu->box, cpu) = NULL;
-			if (box && atomic_dec_and_test(&box->refcnt)) {
-				list_add(&box->list, &boxes_to_free);
+	pkg = topology_logical_package_id(cpu);
+	for (; *types; types++) {
+		type = *types;
+		pmu = type->pmus;
+		for (i = 0; i < type->num_boxes; i++, pmu++) {
+			box = pmu->boxes[pkg];
+			if (box && atomic_dec_return(&box->refcnt) == 0)
 				uncore_box_exit(box);
-			}
 		}
 	}
 }
 
-static int uncore_cpu_starting(int cpu)
+static void uncore_cpu_starting(int cpu, bool init)
 {
-	struct intel_uncore_type *type;
+	struct intel_uncore_type *type, **types = uncore_msr_uncores;
 	struct intel_uncore_pmu *pmu;
-	struct intel_uncore_box *box, *exist;
-	int i, j, k, phys_id;
+	struct intel_uncore_box *box;
+	int i, pkg, ncpus = 1;
 
-	phys_id = topology_physical_package_id(cpu);
+	if (init) {
+		/*
+		 * On init we get the number of online cpus in the package
+		 * and set refcount for all of them.
+		 */
+		ncpus = cpumask_weight(topology_core_cpumask(cpu));
+	}
 
-	for (i = 0; uncore_msr_uncores[i]; i++) {
-		type = uncore_msr_uncores[i];
-		for (j = 0; j < type->num_boxes; j++) {
-			pmu = &type->pmus[j];
-			box = *per_cpu_ptr(pmu->box, cpu);
-			/* called by uncore_cpu_init? */
-			if (box && box->phys_id >= 0) {
-				uncore_box_init(box);
+	pkg = topology_logical_package_id(cpu);
+	for (; *types; types++) {
+		type = *types;
+		pmu = type->pmus;
+		for (i = 0; i < type->num_boxes; i++, pmu++) {
+			box = pmu->boxes[pkg];
+			if (!box)
 				continue;
-			}
-
-			for_each_online_cpu(k) {
-				exist = *per_cpu_ptr(pmu->box, k);
-				if (exist && exist->phys_id == phys_id) {
-					atomic_inc(&exist->refcnt);
-					*per_cpu_ptr(pmu->box, cpu) = exist;
-					if (box) {
-						list_add(&box->list,
-							 &boxes_to_free);
-						box = NULL;
-					}
-					break;
-				}
-			}
-
-			if (box) {
-				box->phys_id = phys_id;
+			/* The first cpu on a package activates the box */
+			if (atomic_add_return(ncpus, &box->refcnt) == ncpus)
 				uncore_box_init(box);
-			}
 		}
 	}
-	return 0;
 }
 
-static int uncore_cpu_prepare(int cpu, int phys_id)
+static int uncore_cpu_prepare(int cpu)
 {
-	struct intel_uncore_type *type;
+	struct intel_uncore_type *type, **types = uncore_msr_uncores;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
-	int i, j;
-
-	for (i = 0; uncore_msr_uncores[i]; i++) {
-		type = uncore_msr_uncores[i];
-		for (j = 0; j < type->num_boxes; j++) {
-			pmu = &type->pmus[j];
-			if (pmu->func_id < 0)
-				pmu->func_id = j;
+	int i, pkg;
 
+	pkg = topology_logical_package_id(cpu);
+	for (; *types; types++) {
+		type = *types;
+		pmu = type->pmus;
+		for (i = 0; i < type->num_boxes; i++, pmu++) {
+			if (pmu->boxes[pkg])
+				continue;
+			/* First cpu of a package allocates the box */
 			box = uncore_alloc_box(type, cpu_to_node(cpu));
 			if (!box)
 				return -ENOMEM;
-
 			box->pmu = pmu;
-			box->phys_id = phys_id;
-			*per_cpu_ptr(pmu->box, cpu) = box;
+			box->pkgid = pkg;
+			pmu->boxes[pkg] = box;
 		}
 	}
 	return 0;
@@ -1166,13 +1127,11 @@ static void uncore_change_type_ctx(struc
 {
 	struct intel_uncore_pmu *pmu = type->pmus;
 	struct intel_uncore_box *box;
-	int i;
+	int i, pkg;
 
+	pkg = topology_logical_package_id(old_cpu < 0 ? new_cpu : old_cpu);
 	for (i = 0; i < type->num_boxes; i++, pmu++) {
-		if (old_cpu < 0)
-			box = uncore_pmu_to_box(pmu, new_cpu);
-		else
-			box = uncore_pmu_to_box(pmu, old_cpu);
+		box = pmu->boxes[pkg];
 		if (!box)
 			continue;
 
@@ -1202,27 +1161,20 @@ static void uncore_change_context(struct
 
 static void uncore_event_exit_cpu(int cpu)
 {
-	int i, phys_id, target;
+	int target;
 
-	/* if exiting cpu is used for collecting uncore events */
+	/* Check if exiting cpu is used for collecting uncore events */
 	if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask))
 		return;
 
-	/* find a new cpu to collect uncore events */
-	phys_id = topology_physical_package_id(cpu);
-	target = -1;
-	for_each_online_cpu(i) {
-		if (i == cpu)
-			continue;
-		if (phys_id == topology_physical_package_id(i)) {
-			target = i;
-			break;
-		}
-	}
+	/* Find a new cpu to collect uncore events */
+	target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
 
-	/* migrate uncore events to the new cpu */
-	if (target >= 0)
+	/* Migrate uncore events to the new target */
+	if (target < nr_cpu_ids)
 		cpumask_set_cpu(target, &uncore_cpu_mask);
+	else
+		target = -1;
 
 	uncore_change_context(uncore_msr_uncores, cpu, target);
 	uncore_change_context(uncore_pci_uncores, cpu, target);
@@ -1230,13 +1182,15 @@ static void uncore_event_exit_cpu(int cp
 
 static void uncore_event_init_cpu(int cpu)
 {
-	int i, phys_id;
+	int target;
 
-	phys_id = topology_physical_package_id(cpu);
-	for_each_cpu(i, &uncore_cpu_mask) {
-		if (phys_id == topology_physical_package_id(i))
-			return;
-	}
+	/*
+	 * Check if there is an online cpu in the package
+	 * which collects uncore events already.
+	 */
+	target = cpumask_any_and(&uncore_cpu_mask, topology_core_cpumask(cpu));
+	if (target < nr_cpu_ids)
+		return;
 
 	cpumask_set_cpu(cpu, &uncore_cpu_mask);
 
@@ -1249,38 +1203,25 @@ static int uncore_cpu_notifier(struct no
 {
 	unsigned int cpu = (long)hcpu;
 
-	/* allocate/free data structure for uncore box */
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_UP_PREPARE:
-		return notifier_from_errno(uncore_cpu_prepare(cpu, -1));
+		return notifier_from_errno(uncore_cpu_prepare(cpu));
+
 	case CPU_STARTING:
-		uncore_cpu_starting(cpu);
+		uncore_cpu_starting(cpu, false);
+	case CPU_DOWN_FAILED:
+		uncore_event_init_cpu(cpu);
 		break;
+
 	case CPU_UP_CANCELED:
 	case CPU_DYING:
 		uncore_cpu_dying(cpu);
 		break;
-	case CPU_ONLINE:
-	case CPU_DEAD:
-		uncore_kfree_boxes();
-		break;
-	default:
-		break;
-	}
 
-	/* select the cpu that collects uncore events */
-	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_DOWN_FAILED:
-	case CPU_STARTING:
-		uncore_event_init_cpu(cpu);
-		break;
 	case CPU_DOWN_PREPARE:
 		uncore_event_exit_cpu(cpu);
 		break;
-	default:
-		break;
 	}
-
 	return NOTIFY_OK;
 }
 
@@ -1357,7 +1298,7 @@ static int __init uncore_cpu_init(void)
 		return 0;
 	}
 
-	ret = uncore_types_init(uncore_msr_uncores);
+	ret = uncore_types_init(uncore_msr_uncores, true);
 	if (ret)
 		goto err;
 
@@ -1373,39 +1314,34 @@ static int __init uncore_cpu_init(void)
 
 static void __init uncore_cpu_setup(void *dummy)
 {
-	uncore_cpu_starting(smp_processor_id());
+	uncore_cpu_starting(smp_processor_id(), true);
 }
 
+/* Lazy to avoid allocation of a few bytes for the normal case */
+static __initdata DECLARE_BITMAP(packages, MAX_LOCAL_APIC);
+
 static int __init uncore_cpumask_init(void)
 {
-	int cpu, ret = 0;
-
-	cpu_notifier_register_begin();
+	unsigned int cpu;
 
 	for_each_online_cpu(cpu) {
-		int i, phys_id = topology_physical_package_id(cpu);
+		unsigned int pkg = topology_logical_package_id(cpu);
+		int ret;
 
-		for_each_cpu(i, &uncore_cpu_mask) {
-			if (phys_id == topology_physical_package_id(i)) {
-				phys_id = -1;
-				break;
-			}
-		}
-		if (phys_id < 0)
+		if (test_and_set_bit(pkg, packages))
 			continue;
-
-		ret = uncore_cpu_prepare(cpu, phys_id);
+		/*
+		 * The first online cpu of each package takes the refcounts
+		 * for all other online cpus in that package.
+		 */
+		ret = uncore_cpu_prepare(cpu);
 		if (ret)
-			goto out;
+			return ret;
 		uncore_event_init_cpu(cpu);
+		smp_call_function_single(cpu, uncore_cpu_setup, NULL, 1);
 	}
-	on_each_cpu(uncore_cpu_setup, NULL, 1);
-
 	__register_cpu_notifier(&uncore_cpu_nb);
-
-out:
-	cpu_notifier_register_done();
-	return ret;
+	return 0;
 }
 
 static int __init intel_uncore_init(void)
@@ -1418,22 +1354,26 @@ static int __init intel_uncore_init(void
 	if (cpu_has_hypervisor)
 		return -ENODEV;
 
+	max_packages = topology_max_packages();
+
 	ret = uncore_pci_init();
 	if (ret)
 		return ret;
 	ret = uncore_cpu_init();
 	if (ret)
-		goto errpci;
+		goto err;
+
+	cpu_notifier_register_begin();
 	ret = uncore_cpumask_init();
 	if (ret)
-		goto errcpu;
-
+		goto err;
+	cpu_notifier_register_done();
 	return 0;
 
-errcpu:
+err:
 	uncore_types_exit(uncore_msr_uncores);
-errpci:
 	uncore_pci_exit();
+	cpu_notifier_register_done();
 	return ret;
 }
 device_initcall(intel_uncore_init);
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -19,11 +19,12 @@
 #define UNCORE_EXTRA_PCI_DEV		0xff
 #define UNCORE_EXTRA_PCI_DEV_MAX	3
 
-/* support up to 8 sockets */
-#define UNCORE_SOCKET_MAX		8
-
 #define UNCORE_EVENT_CONSTRAINT(c, n) EVENT_CONSTRAINT(c, n, 0xff)
 
+struct pci_extra_dev {
+	struct pci_dev *dev[UNCORE_EXTRA_PCI_DEV_MAX];
+};
+
 struct intel_uncore_ops;
 struct intel_uncore_pmu;
 struct intel_uncore_box;
@@ -79,9 +80,9 @@ struct intel_uncore_pmu {
 	int				pmu_idx;
 	int				func_id;
 	bool				registered;
+	atomic_t			activeboxes;
 	struct intel_uncore_type	*type;
-	struct intel_uncore_box		** __percpu box;
-	struct list_head		box_list;
+	struct intel_uncore_box		**boxes;
 };
 
 struct intel_uncore_extra_reg {
@@ -91,7 +92,8 @@ struct intel_uncore_extra_reg {
 };
 
 struct intel_uncore_box {
-	int phys_id;
+	int pci_phys_id;
+	int pkgid;
 	int n_active;	/* number of active events */
 	int n_events;
 	int cpu;	/* cpu to collect events */
@@ -316,7 +318,7 @@ static inline void uncore_box_exit(struc
 
 static inline bool uncore_box_is_fake(struct intel_uncore_box *box)
 {
-	return (box->phys_id < 0);
+	return (box->pkgid < 0);
 }
 
 static inline struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
@@ -345,7 +347,7 @@ extern struct intel_uncore_type **uncore
 extern struct pci_driver *uncore_pci_driver;
 extern raw_spinlock_t pci2phy_map_lock;
 extern struct list_head pci2phy_map_head;
-extern struct pci_dev *uncore_extra_pci_dev[UNCORE_SOCKET_MAX][UNCORE_EXTRA_PCI_DEV_MAX];
+extern struct pci_extra_dev *uncore_extra_pci_dev;
 extern struct event_constraint uncore_constraint_empty;
 
 /* perf_event_intel_uncore_snb.c */
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c
@@ -987,7 +987,9 @@ static void snbep_qpi_enable_event(struc
 
 	if (reg1->idx != EXTRA_REG_NONE) {
 		int idx = box->pmu->pmu_idx + SNBEP_PCI_QPI_PORT0_FILTER;
-		struct pci_dev *filter_pdev = uncore_extra_pci_dev[box->phys_id][idx];
+		int pkg = topology_phys_to_logical_pkg(box->pci_phys_id);
+		struct pci_dev *filter_pdev = uncore_extra_pci_dev[pkg].dev[idx];
+
 		if (filter_pdev) {
 			pci_write_config_dword(filter_pdev, reg1->reg,
 						(u32)reg1->config);
@@ -2521,14 +2523,16 @@ static struct intel_uncore_type *hswep_m
 
 void hswep_uncore_cpu_init(void)
 {
+	int pkg = topology_phys_to_logical_pkg(0);
+
 	if (hswep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
 		hswep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
 
 	/* Detect 6-8 core systems with only two SBOXes */
-	if (uncore_extra_pci_dev[0][HSWEP_PCI_PCU_3]) {
+	if (uncore_extra_pci_dev[pkg].dev[HSWEP_PCI_PCU_3]) {
 		u32 capid4;
 
-		pci_read_config_dword(uncore_extra_pci_dev[0][HSWEP_PCI_PCU_3],
+		pci_read_config_dword(uncore_extra_pci_dev[pkg].dev[HSWEP_PCI_PCU_3],
 				      0x94, &capid4);
 		if (((capid4 >> 6) & 0x3) == 0)
 			hswep_uncore_sbox.num_boxes = 2;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 13/28] x86/perf/intel_uncore: Clear all hardware state on exit
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (11 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 12/28] x86/perf/uncore: Track packages not per cpu data Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:58   ` Peter Zijlstra
  2016-02-22 11:06 ` [patch V2 14/28] x86/perf/intel_uncore: Make PCI and MSR uncore independent Thomas Gleixner
                   ` (15 subsequent siblings)
  28 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel_uncore--Clear-all-hardware-state-on-exit.patch --]
[-- Type: text/plain, Size: 1597 bytes --]

The only missing bit is to completely clear the hardware state on failure
exit. This is now a pretty simple exercise.

Undo the box->init_box() setup on all packages which have been initialized so
far.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -754,6 +754,30 @@ static void uncore_pmu_unregister(struct
 	pmu->registered = false;
 }
 
+static void __init __uncore_exit_boxes(struct intel_uncore_type *type, int cpu)
+{
+	struct intel_uncore_pmu *pmu = type->pmus;
+	struct intel_uncore_box *box;
+	int i, pkg;
+
+	if (pmu) {
+		pkg = topology_physical_package_id(cpu);
+		for (i = 0; i < type->num_boxes; i++, pmu++) {
+			box = pmu->boxes[pkg];
+			if (box)
+				uncore_box_exit(box);
+		}
+	}
+}
+
+static void __init uncore_exit_boxes(void *dummy)
+{
+	struct intel_uncore_type **types = uncore_msr_uncores;
+
+	while (*types)
+		__uncore_exit_boxes(*types++, smp_processor_id());
+}
+
 static void uncore_free_boxes(struct intel_uncore_pmu *pmu)
 {
 	int pkg;
@@ -1371,6 +1395,8 @@ static int __init intel_uncore_init(void
 	return 0;
 
 err:
+	/* Undo box->init_box() */
+	on_each_cpu_mask(&uncore_cpu_mask, uncore_exit_boxes, NULL, 1);
 	uncore_types_exit(uncore_msr_uncores);
 	uncore_pci_exit();
 	cpu_notifier_register_done();

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 15/28] cpumask: Export cpumask_any_but
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (13 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 14/28] x86/perf/intel_uncore: Make PCI and MSR uncore independent Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 16/28] x86/perf/intel_uncore: Make it modular Thomas Gleixner
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: cpumask--Export-cpumask_any_but.patch --]
[-- Type: text/plain, Size: 560 bytes --]

Almost every cpumask function is exported, just not the one I need to make the
intel uncore driver modular.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 lib/cpumask.c |    1 +
 1 file changed, 1 insertion(+)

Index: b/lib/cpumask.c
===================================================================
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -41,6 +41,7 @@ int cpumask_any_but(const struct cpumask
 			break;
 	return i;
 }
+EXPORT_SYMBOL(cpumask_any_but);
 
 /* These are not inline because of header tangles. */
 #ifdef CONFIG_CPUMASK_OFFSTACK

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 14/28] x86/perf/intel_uncore: Make PCI and MSR uncore independent
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (12 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 13/28] x86/perf/intel_uncore: Clear all hardware state on exit Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 15/28] cpumask: Export cpumask_any_but Thomas Gleixner
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan, Andi Kleen

[-- Attachment #1: x86-perf-intel_uncore--Make-PCI-and-MSR-uncore-independent.patch --]
[-- Type: text/plain, Size: 2608 bytes --]

Andi wanted to do this before, but the patch fell down the cracks. Implement
it with the proper error handling.

Requested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   34 +++++++++++++-------------
 1 file changed, 18 insertions(+), 16 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1023,7 +1023,7 @@ static int __init uncore_pci_init(void)
 		ret = skl_uncore_pci_init();
 		break;
 	default:
-		return 0;
+		return -ENODEV;
 	}
 
 	if (ret)
@@ -1319,7 +1319,7 @@ static int __init uncore_cpu_init(void)
 		knl_uncore_cpu_init();
 		break;
 	default:
-		return 0;
+		return -ENODEV;
 	}
 
 	ret = uncore_types_init(uncore_msr_uncores, true);
@@ -1344,7 +1344,7 @@ static void __init uncore_cpu_setup(void
 /* Lazy to avoid allocation of a few bytes for the normal case */
 static __initdata DECLARE_BITMAP(packages, MAX_LOCAL_APIC);
 
-static int __init uncore_cpumask_init(void)
+static int __init uncore_cpumask_init(bool msr)
 {
 	unsigned int cpu;
 
@@ -1355,12 +1355,15 @@ static int __init uncore_cpumask_init(vo
 		if (test_and_set_bit(pkg, packages))
 			continue;
 		/*
-		 * The first online cpu of each package takes the refcounts
-		 * for all other online cpus in that package.
+		 * The first online cpu of each package allocates and takes
+		 * the refcounts for all other online cpus in that package.
+		 * If msrs are not enabled no allocation is required.
 		 */
-		ret = uncore_cpu_prepare(cpu);
-		if (ret)
-			return ret;
+		if (msr) {
+			ret = uncore_cpu_prepare(cpu);
+			if (ret)
+				return ret;
+		}
 		uncore_event_init_cpu(cpu);
 		smp_call_function_single(cpu, uncore_cpu_setup, NULL, 1);
 	}
@@ -1370,7 +1373,7 @@ static int __init uncore_cpumask_init(vo
 
 static int __init intel_uncore_init(void)
 {
-	int ret;
+	int pret, cret, ret;
 
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
 		return -ENODEV;
@@ -1380,15 +1383,14 @@ static int __init intel_uncore_init(void
 
 	max_packages = topology_max_packages();
 
-	ret = uncore_pci_init();
-	if (ret)
-		return ret;
-	ret = uncore_cpu_init();
-	if (ret)
-		goto err;
+	pret = uncore_pci_init();
+	cret = uncore_cpu_init();
+
+	if (cret && pret)
+		return -ENODEV;
 
 	cpu_notifier_register_begin();
-	ret = uncore_cpumask_init();
+	ret = uncore_cpumask_init(!cret);
 	if (ret)
 		goto err;
 	cpu_notifier_register_done();

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 16/28] x86/perf/intel_uncore: Make it modular
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (14 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 15/28] cpumask: Export cpumask_any_but Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 17/28] x86/perf/cqm: Get rid of the silly for_each_cpu lookups Thomas Gleixner
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel_uncore--Make-it-modular.patch --]
[-- Type: text/plain, Size: 4533 bytes --]

Now that we have a proper cleanup all over the place, it's simple to make this
a modular driver.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig                              |    6 ++----
 arch/x86/Kconfig.perf                         |   13 +++++++++++++
 arch/x86/kernel/cpu/Makefile                  |    3 ++-
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   24 ++++++++++++++++++------
 4 files changed, 35 insertions(+), 11 deletions(-)

Index: b/arch/x86/Kconfig
===================================================================
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -160,10 +160,6 @@ config INSTRUCTION_DECODER
 	def_bool y
 	depends on KPROBES || PERF_EVENTS || UPROBES
 
-config PERF_EVENTS_INTEL_UNCORE
-	def_bool y
-	depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
-
 config OUTPUT_FORMAT
 	string
 	default "elf32-i386" if X86_32
@@ -1039,6 +1035,8 @@ config X86_THERMAL_VECTOR
 	def_bool y
 	depends on X86_MCE_INTEL
 
+source "arch/x86/Kconfig.perf"
+
 config X86_LEGACY_VM86
 	bool "Legacy VM86 support"
 	default n
Index: b/arch/x86/Kconfig.perf
===================================================================
--- /dev/null
+++ b/arch/x86/Kconfig.perf
@@ -0,0 +1,13 @@
+menu "Performance monitoring"
+
+config PERF_EVENTS_INTEL_UNCORE
+	tristate "Intel uncore performance events"
+	depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
+	default y
+	---help---
+	  Include support for Intel uncore performance events. These are
+	  available on NehalemEX and more modern processors.
+
+	  If unsure say y.
+
+endmenu
Index: b/arch/x86/kernel/cpu/Makefile
===================================================================
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -43,7 +43,8 @@ obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_eve
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_pt.o perf_event_intel_bts.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_cstate.o
 
-obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)	+= perf_event_intel_uncore.o \
+obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)	+= perf_event_intel_uncores.o
+perf_event_intel_uncores-objs		:= perf_event_intel_uncore.o \
 					   perf_event_intel_uncore_snb.o \
 					   perf_event_intel_uncore_snbep.o \
 					   perf_event_intel_uncore_nhmex.o
Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -21,6 +21,8 @@ static struct event_constraint uncore_co
 struct event_constraint uncore_constraint_empty =
 	EVENT_CONSTRAINT(0, 0, 0);
 
+MODULE_LICENSE("GPL");
+
 static int uncore_pcibus_to_physid(struct pci_bus *bus)
 {
 	struct pci2phy_map *map;
@@ -754,7 +756,7 @@ static void uncore_pmu_unregister(struct
 	pmu->registered = false;
 }
 
-static void __init __uncore_exit_boxes(struct intel_uncore_type *type, int cpu)
+static void __uncore_exit_boxes(struct intel_uncore_type *type, int cpu)
 {
 	struct intel_uncore_pmu *pmu = type->pmus;
 	struct intel_uncore_box *box;
@@ -770,7 +772,7 @@ static void __init __uncore_exit_boxes(s
 	}
 }
 
-static void __init uncore_exit_boxes(void *dummy)
+static void uncore_exit_boxes(void *dummy)
 {
 	struct intel_uncore_type **types = uncore_msr_uncores;
 
@@ -787,7 +789,7 @@ static void uncore_free_boxes(struct int
 	kfree(pmu->boxes);
 }
 
-static void __init uncore_type_exit(struct intel_uncore_type *type)
+static void uncore_type_exit(struct intel_uncore_type *type)
 {
 	struct intel_uncore_pmu *pmu = type->pmus;
 	int i;
@@ -804,7 +806,7 @@ static void __init uncore_type_exit(stru
 	type->events_group = NULL;
 }
 
-static void __init uncore_types_exit(struct intel_uncore_type **types)
+static void uncore_types_exit(struct intel_uncore_type **types)
 {
 	for (; *types; types++)
 		uncore_type_exit(*types);
@@ -1060,7 +1062,7 @@ static int __init uncore_pci_init(void)
 	return ret;
 }
 
-static void __init uncore_pci_exit(void)
+static void uncore_pci_exit(void)
 {
 	if (pcidrv_registered) {
 		pcidrv_registered = false;
@@ -1404,4 +1406,14 @@ static int __init intel_uncore_init(void
 	cpu_notifier_register_done();
 	return ret;
 }
-device_initcall(intel_uncore_init);
+module_init(intel_uncore_init);
+
+static void __exit intel_uncore_exit(void)
+{
+	cpu_notifier_register_done();
+	__unregister_cpu_notifier(&uncore_cpu_nb);
+	uncore_types_exit(uncore_msr_uncores);
+	uncore_pci_exit();
+	cpu_notifier_register_done();
+}
+module_exit(intel_uncore_exit);

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 17/28] x86/perf/cqm: Get rid of the silly for_each_cpu lookups
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (15 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 16/28] x86/perf/intel_uncore: Make it modular Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 18/28] x86/perf/intel_rapl: Make Knights Landings support functional Thomas Gleixner
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-cqm--Get-rid-of-the-silly-for_each_cpu-lookups.patch --]
[-- Type: text/plain, Size: 1909 bytes --]

CQM is a strict per package facility. Use the proper cpumasks to lookup the
readers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_cqm.c |   34 ++++++++++-------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -1244,15 +1244,12 @@ static struct pmu intel_cqm_pmu = {
 
 static inline void cqm_pick_event_reader(int cpu)
 {
-	int phys_id = topology_physical_package_id(cpu);
-	int i;
+	int reader;
 
-	for_each_cpu(i, &cqm_cpumask) {
-		if (phys_id == topology_physical_package_id(i))
-			return;	/* already got reader for this socket */
-	}
-
-	cpumask_set_cpu(cpu, &cqm_cpumask);
+	/* First online cpu in package becomes the reader */
+	reader = cpumask_any_and(&cqm_cpumask, topology_core_cpumask(cpu));
+	if (reader >= nr_cpu_ids)
+		cpumask_set_cpu(cpu, &cqm_cpumask);
 }
 
 static void intel_cqm_cpu_starting(unsigned int cpu)
@@ -1270,24 +1267,17 @@ static void intel_cqm_cpu_starting(unsig
 
 static void intel_cqm_cpu_exit(unsigned int cpu)
 {
-	int phys_id = topology_physical_package_id(cpu);
-	int i;
+	int target;
 
-	/*
-	 * Is @cpu a designated cqm reader?
-	 */
+	/* Is @cpu the current cqm reader for this package ? */
 	if (!cpumask_test_and_clear_cpu(cpu, &cqm_cpumask))
 		return;
 
-	for_each_online_cpu(i) {
-		if (i == cpu)
-			continue;
-
-		if (phys_id == topology_physical_package_id(i)) {
-			cpumask_set_cpu(i, &cqm_cpumask);
-			break;
-		}
-	}
+	/* Find another online reader in this package */
+	target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+
+	if (target < nr_cpu_ids)
+		cpumask_set_cpu(target, &cqm_cpumask);
 }
 
 static int intel_cqm_cpu_notifier(struct notifier_block *nb,

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 19/28] x86/perf/intel/rapl: Add proper error handling
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (17 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 18/28] x86/perf/intel_rapl: Make Knights Landings support functional Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 20/28] x86/perf/intel/rapl: Sanitize the quirk handling Thomas Gleixner
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel-rapl--Add-proper-error-handling.patch --]
[-- Type: text/plain, Size: 2385 bytes --]

Like uncore the rapl driver lacks error handling. It leaks memory and leaves
the hotplug notifier registered.

Add the proper error checks, cleanup the memory and register the hotplug
notifier only on success.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |   29 ++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -686,6 +686,14 @@ static int rapl_check_hw_unit(void)
 	return 0;
 }
 
+static void __init cleanup_rapl_pmus(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu)
+		kfree(per_cpu(rapl_pmu, cpu));
+}
+
 static const struct x86_cpu_id rapl_cpu_match[] = {
 	[0] = { .vendor = X86_VENDOR_INTEL, .family = 6 },
 	[1] = {},
@@ -702,7 +710,7 @@ static int __init rapl_pmu_init(void)
 	 * check for Intel processor family 6
 	 */
 	if (!x86_match_cpu(rapl_cpu_match))
-		return 0;
+		return -ENODEV;
 
 	/* check supported CPU */
 	switch (boot_cpu_data.x86_model) {
@@ -734,8 +742,9 @@ static int __init rapl_pmu_init(void)
 		break;
 	default:
 		/* unsupported */
-		return 0;
+		return -ENODEV;
 	}
+
 	ret = rapl_check_hw_unit();
 	if (ret)
 		return ret;
@@ -743,6 +752,7 @@ static int __init rapl_pmu_init(void)
 	/* run cpu model quirks */
 	for (quirk = rapl_quirks; quirk; quirk = quirk->next)
 		quirk->func();
+
 	cpu_notifier_register_begin();
 
 	for_each_online_cpu(cpu) {
@@ -752,15 +762,14 @@ static int __init rapl_pmu_init(void)
 		rapl_cpu_init(cpu);
 	}
 
-	__perf_cpu_notifier(rapl_cpu_notifier);
-
 	ret = perf_pmu_register(&rapl_pmu_class, "power", -1);
 	if (WARN_ON(ret)) {
 		pr_info("RAPL PMU detected, registration failed (%d), RAPL PMU disabled\n", ret);
-		cpu_notifier_register_done();
-		return -1;
+		goto out;
 	}
 
+	__perf_cpu_notifier(rapl_cpu_notifier);
+
 	pmu = __this_cpu_read(rapl_pmu);
 
 	pr_info("RAPL PMU detected,"
@@ -775,9 +784,13 @@ static int __init rapl_pmu_init(void)
 				rapl_domain_names[i], rapl_hw_unit[i]);
 		}
 	}
-out:
-	cpu_notifier_register_done();
 
+	cpu_notifier_register_done();
 	return 0;
+
+out:
+	cleanup_rapl_pmus();
+	cpu_notifier_register_done();
+	return ret;
 }
 device_initcall(rapl_pmu_init);

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 18/28] x86/perf/intel_rapl: Make Knights Landings support functional
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (16 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 17/28] x86/perf/cqm: Get rid of the silly for_each_cpu lookups Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 20:38   ` Borislav Petkov
  2016-02-22 11:06 ` [patch V2 19/28] x86/perf/intel/rapl: Add proper error handling Thomas Gleixner
                   ` (10 subsequent siblings)
  28 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan,
	Dasaratharaman Chandramouli

[-- Attachment #1: x86-perf-intel_rapl--Make-Knights-Landings-support-functional.patch --]
[-- Type: text/plain, Size: 912 bytes --]

The Knights Landings support added the events and the detection case, but then
returns 0 without actually initializing the driver.

Fixes: 3a2a7797326a4 "perf/x86/intel/rapl: Add support for Knights Landing (KNL)"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -731,7 +731,7 @@ static int __init rapl_pmu_init(void)
 		rapl_add_quirk(rapl_hsw_server_quirk);
 		rapl_cntr_mask = RAPL_IDX_KNL;
 		rapl_pmu_events_group.attrs = rapl_events_knl_attr;
-
+		break;
 	default:
 		/* unsupported */
 		return 0;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 20/28] x86/perf/intel/rapl: Sanitize the quirk handling
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (18 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 19/28] x86/perf/intel/rapl: Add proper error handling Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 21/28] x86/perf/intel/rapl: Calculate timing once Thomas Gleixner
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel-rapl--Sanitize-the-quirk-handling.patch --]
[-- Type: text/plain, Size: 3850 bytes --]

There is no point in having a quirk machinery for a single possible
function. Get rid of it and move the quirk to a place where it makes actually
sense.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |   49 ++++++++++------------------
 1 file changed, 19 insertions(+), 30 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -133,7 +133,6 @@ static int rapl_cntr_mask;
 static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu);
 static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu_to_free);
 
-static struct x86_pmu_quirk *rapl_quirks;
 static inline u64 rapl_read_counter(struct perf_event *event)
 {
 	u64 raw;
@@ -141,15 +140,6 @@ static inline u64 rapl_read_counter(stru
 	return raw;
 }
 
-#define rapl_add_quirk(func_)						\
-do {									\
-	static struct x86_pmu_quirk __quirk __initdata = {		\
-		.func = func_,						\
-	};								\
-	__quirk.next = rapl_quirks;					\
-	rapl_quirks = &__quirk;						\
-} while (0)
-
 static inline u64 rapl_scale(u64 v, int cfg)
 {
 	if (cfg > NR_RAPL_DOMAINS) {
@@ -564,17 +554,6 @@ static void rapl_cpu_init(int cpu)
 	cpumask_set_cpu(cpu, &rapl_cpu_mask);
 }
 
-static __init void rapl_hsw_server_quirk(void)
-{
-	/*
-	 * DRAM domain on HSW server has fixed energy unit which can be
-	 * different than the unit from power unit MSR.
-	 * "Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families, V2
-	 * of 2. Datasheet, September 2014, Reference Number: 330784-001 "
-	 */
-	rapl_hw_unit[RAPL_IDX_RAM_NRG_STAT] = 16;
-}
-
 static int rapl_cpu_prepare(int cpu)
 {
 	struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu);
@@ -672,7 +651,18 @@ static int rapl_cpu_notifier(struct noti
 	return NOTIFY_OK;
 }
 
-static int rapl_check_hw_unit(void)
+static __init void rapl_hsw_server_quirk(void)
+{
+	/*
+	 * DRAM domain on HSW server has fixed energy unit which can be
+	 * different than the unit from power unit MSR.
+	 * "Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families, V2
+	 * of 2. Datasheet, September 2014, Reference Number: 330784-001 "
+	 */
+	rapl_hw_unit[RAPL_IDX_RAM_NRG_STAT] = 16;
+}
+
+static int rapl_check_hw_unit(void (*quirk)(void))
 {
 	u64 msr_rapl_power_unit_bits;
 	int i;
@@ -683,6 +673,9 @@ static int rapl_check_hw_unit(void)
 	for (i = 0; i < NR_RAPL_DOMAINS; i++)
 		rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
 
+	/* Apply cpu model quirk */
+	if (quirk)
+		quirk();
 	return 0;
 }
 
@@ -701,9 +694,9 @@ static const struct x86_cpu_id rapl_cpu_
 
 static int __init rapl_pmu_init(void)
 {
+	void (*quirk)(void) = NULL;
 	struct rapl_pmu *pmu;
 	int cpu, ret;
-	struct x86_pmu_quirk *quirk;
 	int i;
 
 	/*
@@ -720,7 +713,7 @@ static int __init rapl_pmu_init(void)
 		rapl_pmu_events_group.attrs = rapl_events_cln_attr;
 		break;
 	case 63: /* Haswell-Server */
-		rapl_add_quirk(rapl_hsw_server_quirk);
+		quirk = rapl_hsw_server_quirk;
 		rapl_cntr_mask = RAPL_IDX_SRV;
 		rapl_pmu_events_group.attrs = rapl_events_srv_attr;
 		break;
@@ -736,7 +729,7 @@ static int __init rapl_pmu_init(void)
 		rapl_pmu_events_group.attrs = rapl_events_srv_attr;
 		break;
 	case 87: /* Knights Landing */
-		rapl_add_quirk(rapl_hsw_server_quirk);
+		quirk = rapl_hsw_server_quirk;
 		rapl_cntr_mask = RAPL_IDX_KNL;
 		rapl_pmu_events_group.attrs = rapl_events_knl_attr;
 		break;
@@ -745,14 +738,10 @@ static int __init rapl_pmu_init(void)
 		return -ENODEV;
 	}
 
-	ret = rapl_check_hw_unit();
+	ret = rapl_check_hw_unit(quirk);
 	if (ret)
 		return ret;
 
-	/* run cpu model quirks */
-	for (quirk = rapl_quirks; quirk; quirk = quirk->next)
-		quirk->func();
-
 	cpu_notifier_register_begin();
 
 	for_each_online_cpu(cpu) {

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 21/28] x86/perf/intel/rapl: Calculate timing once
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (19 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 20/28] x86/perf/intel/rapl: Sanitize the quirk handling Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 23/28] x86/perf/intel/rapl: Refactor code some more Thomas Gleixner
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel-rapl--Calculate-timing-once.patch --]
[-- Type: text/plain, Size: 2976 bytes --]

No point in doing the same calculation over and over. Do it once in
rapl_check_hw_unit().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |   39 ++++++++++++----------------
 1 file changed, 18 insertions(+), 21 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -129,6 +129,7 @@ static int rapl_hw_unit[NR_RAPL_DOMAINS]
 static struct pmu rapl_pmu_class;
 static cpumask_t rapl_cpu_mask;
 static int rapl_cntr_mask;
+static u64 rapl_timer_ms;
 
 static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu);
 static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu_to_free);
@@ -558,7 +559,6 @@ static int rapl_cpu_prepare(int cpu)
 {
 	struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu);
 	int phys_id = topology_physical_package_id(cpu);
-	u64 ms;
 
 	if (pmu)
 		return 0;
@@ -575,19 +575,7 @@ static int rapl_cpu_prepare(int cpu)
 
 	pmu->pmu = &rapl_pmu_class;
 
-	/*
-	 * use reference of 200W for scaling the timeout
-	 * to avoid missing counter overflows.
-	 * 200W = 200 Joules/sec
-	 * divide interval by 2 to avoid lockstep (2 * 100)
-	 * if hw unit is 32, then we use 2 ms 1/200/2
-	 */
-	if (rapl_hw_unit[0] < 32)
-		ms = (1000 / (2 * 100)) * (1ULL << (32 - rapl_hw_unit[0] - 1));
-	else
-		ms = 2;
-
-	pmu->timer_interval = ms_to_ktime(ms);
+	pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
 
 	rapl_hrtimer_init(pmu);
 
@@ -676,6 +664,19 @@ static int rapl_check_hw_unit(void (*qui
 	/* Apply cpu model quirk */
 	if (quirk)
 		quirk();
+
+	/*
+	 * Calculate the timer rate:
+	 * Use reference of 200W for scaling the timeout to avoid counter
+	 * overflows. 200W = 200 Joules/sec
+	 * Divide interval by 2 to avoid lockstep (2 * 100)
+	 * if hw unit is 32, then we use 2 ms 1/200/2
+	 */
+	rapl_timer_ms = 2;
+	if (rapl_hw_unit[0] < 32) {
+		rapl_timer_ms = (1000 / (2 * 100));
+		rapl_timer_ms *= (1ULL << (32 - rapl_hw_unit[0] - 1));
+	}
 	return 0;
 }
 
@@ -695,9 +696,7 @@ static const struct x86_cpu_id rapl_cpu_
 static int __init rapl_pmu_init(void)
 {
 	void (*quirk)(void) = NULL;
-	struct rapl_pmu *pmu;
-	int cpu, ret;
-	int i;
+	int cpu, ret, i;
 
 	/*
 	 * check for Intel processor family 6
@@ -758,15 +757,14 @@ static int __init rapl_pmu_init(void)
 	}
 
 	__perf_cpu_notifier(rapl_cpu_notifier);
-
-	pmu = __this_cpu_read(rapl_pmu);
+	cpu_notifier_register_done();
 
 	pr_info("RAPL PMU detected,"
 		" API unit is 2^-32 Joules,"
 		" %d fixed counters"
 		" %llu ms ovfl timer\n",
 		hweight32(rapl_cntr_mask),
-		ktime_to_ms(pmu->timer_interval));
+		rapl_timer_ms);
 	for (i = 0; i < NR_RAPL_DOMAINS; i++) {
 		if (rapl_cntr_mask & (1 << i)) {
 			pr_info("hw unit of domain %s 2^-%d Joules\n",
@@ -774,7 +772,6 @@ static int __init rapl_pmu_init(void)
 		}
 	}
 
-	cpu_notifier_register_done();
 	return 0;
 
 out:

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 22/28] x86/perf/intel/rapl: Cleanup the printk output
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (21 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 23/28] x86/perf/intel/rapl: Refactor code some more Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 18:44   ` Andi Kleen
  2016-02-22 11:07 ` [patch V2 24/28] x86/perf/intel/rapl: Make pmu lock raw Thomas Gleixner
                   ` (5 subsequent siblings)
  28 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel-rapl--Cleanup-the-printk-output.patch --]
[-- Type: text/plain, Size: 2722 bytes --]

The output is inconsistent. Use a proper pr_fmt prefix and split out the
advertisement into a seperate function.

Remove the WARN_ON in the failure case. It's pointless as we already know
where it failed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |   42 +++++++++++++++-------------
 1 file changed, 23 insertions(+), 19 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -44,6 +44,9 @@
  * the duration of the measurement. Tools may use a function such as
  * ldexp(raw_count, -32);
  */
+
+#define pr_fmt(fmt) "RAPL PMU: " fmt
+
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/perf_event.h>
@@ -144,7 +147,7 @@ static inline u64 rapl_read_counter(stru
 static inline u64 rapl_scale(u64 v, int cfg)
 {
 	if (cfg > NR_RAPL_DOMAINS) {
-		pr_warn("invalid domain %d, failed to scale data\n", cfg);
+		pr_warn("Invalid domain %d, failed to scale data\n", cfg);
 		return v;
 	}
 	/*
@@ -680,6 +683,21 @@ static int rapl_check_hw_unit(void (*qui
 	return 0;
 }
 
+static void __init rapl_advertise(void)
+{
+	int i;
+
+	pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
+		hweight32(rapl_cntr_mask), rapl_timer_ms);
+
+	for (i = 0; i < NR_RAPL_DOMAINS; i++) {
+		if (rapl_cntr_mask & (1 << i)) {
+			pr_info("hw unit of domain %s 2^-%d Joules\n",
+				rapl_domain_names[i], rapl_hw_unit[i]);
+		}
+	}
+}
+
 static void __init cleanup_rapl_pmus(void)
 {
 	int cpu;
@@ -696,7 +714,7 @@ static const struct x86_cpu_id rapl_cpu_
 static int __init rapl_pmu_init(void)
 {
 	void (*quirk)(void) = NULL;
-	int cpu, ret, i;
+	int cpu, ret;
 
 	/*
 	 * check for Intel processor family 6
@@ -751,30 +769,16 @@ static int __init rapl_pmu_init(void)
 	}
 
 	ret = perf_pmu_register(&rapl_pmu_class, "power", -1);
-	if (WARN_ON(ret)) {
-		pr_info("RAPL PMU detected, registration failed (%d), RAPL PMU disabled\n", ret);
+	if (ret)
 		goto out;
-	}
 
 	__perf_cpu_notifier(rapl_cpu_notifier);
 	cpu_notifier_register_done();
-
-	pr_info("RAPL PMU detected,"
-		" API unit is 2^-32 Joules,"
-		" %d fixed counters"
-		" %llu ms ovfl timer\n",
-		hweight32(rapl_cntr_mask),
-		rapl_timer_ms);
-	for (i = 0; i < NR_RAPL_DOMAINS; i++) {
-		if (rapl_cntr_mask & (1 << i)) {
-			pr_info("hw unit of domain %s 2^-%d Joules\n",
-				rapl_domain_names[i], rapl_hw_unit[i]);
-		}
-	}
-
+	rapl_advertise();
 	return 0;
 
 out:
+	pr_warn("Initialization failed (%d), disabled\n", ret);
 	cleanup_rapl_pmus();
 	cpu_notifier_register_done();
 	return ret;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 23/28] x86/perf/intel/rapl: Refactor code some more
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (20 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 21/28] x86/perf/intel/rapl: Calculate timing once Thomas Gleixner
@ 2016-02-22 11:06 ` Thomas Gleixner
  2016-02-22 11:06 ` [patch V2 22/28] x86/perf/intel/rapl: Cleanup the printk output Thomas Gleixner
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:06 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel-rapl--Refactor-code-some-more.patch --]
[-- Type: text/plain, Size: 4346 bytes --]

Split out code from init into seperate functions. Tidy up the code and get rid
of pointless comments. I wish there would be comments for code which is not
obvious....

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |   61 ++++++++++++++--------------
 1 file changed, 31 insertions(+), 30 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -110,7 +110,7 @@ static ssize_t __rapl_##_var##_show(stru
 static struct kobj_attribute format_attr_##_var =		\
 	__ATTR(_name, 0444, __rapl_##_var##_show, NULL)
 
-#define RAPL_CNTR_WIDTH 32 /* 32-bit rapl counters */
+#define RAPL_CNTR_WIDTH 32
 
 #define RAPL_EVENT_ATTR_STR(_name, v, str)					\
 static struct perf_pmu_events_attr event_attr_##v = {				\
@@ -120,15 +120,16 @@ static struct perf_pmu_events_attr event
 };
 
 struct rapl_pmu {
-	spinlock_t	 lock;
-	int		 n_active; /* number of active events */
-	struct list_head active_list;
-	struct pmu	 *pmu; /* pointer to rapl_pmu_class */
-	ktime_t		 timer_interval; /* in ktime_t unit */
-	struct hrtimer   hrtimer;
+	spinlock_t		lock;
+	int			n_active;
+	struct list_head	active_list;
+	struct pmu		*pmu;
+	ktime_t			timer_interval;
+	struct hrtimer		hrtimer;
 };
 
-static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;  /* 1/2^hw_unit Joule */
+ /* 1/2^hw_unit Joule */
+static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;
 static struct pmu rapl_pmu_class;
 static cpumask_t rapl_cpu_mask;
 static int rapl_cntr_mask;
@@ -200,11 +201,6 @@ static void rapl_start_hrtimer(struct ra
 		     HRTIMER_MODE_REL_PINNED);
 }
 
-static void rapl_stop_hrtimer(struct rapl_pmu *pmu)
-{
-	hrtimer_cancel(&pmu->hrtimer);
-}
-
 static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer)
 {
 	struct rapl_pmu *pmu = __this_cpu_read(rapl_pmu);
@@ -216,9 +212,8 @@ static enum hrtimer_restart rapl_hrtimer
 
 	spin_lock_irqsave(&pmu->lock, flags);
 
-	list_for_each_entry(event, &pmu->active_list, active_entry) {
+	list_for_each_entry(event, &pmu->active_list, active_entry)
 		rapl_event_update(event);
-	}
 
 	spin_unlock_irqrestore(&pmu->lock, flags);
 
@@ -275,7 +270,7 @@ static void rapl_pmu_event_stop(struct p
 		WARN_ON_ONCE(pmu->n_active <= 0);
 		pmu->n_active--;
 		if (pmu->n_active == 0)
-			rapl_stop_hrtimer(pmu);
+			hrtimer_cancel(&pmu->hrtimer);
 
 		list_del(&event->active_entry);
 
@@ -542,7 +537,7 @@ static void rapl_cpu_exit(int cpu)
 		perf_pmu_migrate_context(pmu->pmu, cpu, target);
 
 	/* cancel overflow polling timer for CPU */
-	rapl_stop_hrtimer(pmu);
+	hrtimer_cancel(&pmu->hrtimer);
 }
 
 static void rapl_cpu_init(int cpu)
@@ -698,6 +693,20 @@ static void __init rapl_advertise(void)
 	}
 }
 
+static int __init rapl_prepare_cpus(void)
+{
+	unsigned int cpu;
+	int ret;
+
+	for_each_online_cpu(cpu) {
+		ret = rapl_cpu_prepare(cpu);
+		if (ret)
+			return ret;
+		rapl_cpu_init(cpu);
+	}
+	return 0;
+}
+
 static void __init cleanup_rapl_pmus(void)
 {
 	int cpu;
@@ -706,7 +715,7 @@ static void __init cleanup_rapl_pmus(voi
 		kfree(per_cpu(rapl_pmu, cpu));
 }
 
-static const struct x86_cpu_id rapl_cpu_match[] = {
+static const struct x86_cpu_id rapl_cpu_match[] __initconst = {
 	[0] = { .vendor = X86_VENDOR_INTEL, .family = 6 },
 	[1] = {},
 };
@@ -714,15 +723,11 @@ static const struct x86_cpu_id rapl_cpu_
 static int __init rapl_pmu_init(void)
 {
 	void (*quirk)(void) = NULL;
-	int cpu, ret;
+	int ret;
 
-	/*
-	 * check for Intel processor family 6
-	 */
 	if (!x86_match_cpu(rapl_cpu_match))
 		return -ENODEV;
 
-	/* check supported CPU */
 	switch (boot_cpu_data.x86_model) {
 	case 42: /* Sandy Bridge */
 	case 58: /* Ivy Bridge */
@@ -751,7 +756,6 @@ static int __init rapl_pmu_init(void)
 		rapl_pmu_events_group.attrs = rapl_events_knl_attr;
 		break;
 	default:
-		/* unsupported */
 		return -ENODEV;
 	}
 
@@ -761,12 +765,9 @@ static int __init rapl_pmu_init(void)
 
 	cpu_notifier_register_begin();
 
-	for_each_online_cpu(cpu) {
-		ret = rapl_cpu_prepare(cpu);
-		if (ret)
-			goto out;
-		rapl_cpu_init(cpu);
-	}
+	ret = rapl_prepare_cpus();
+	if (ret)
+		goto out;
 
 	ret = perf_pmu_register(&rapl_pmu_class, "power", -1);
 	if (ret)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 24/28] x86/perf/intel/rapl: Make pmu lock raw
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (22 preceding siblings ...)
  2016-02-22 11:06 ` [patch V2 22/28] x86/perf/intel/rapl: Cleanup the printk output Thomas Gleixner
@ 2016-02-22 11:07 ` Thomas Gleixner
  2016-02-22 11:07 ` [patch V2 25/28] x86/perf/intel/rapl: Utilize event->pmu_private Thomas Gleixner
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:07 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel-rapl--Make-pmu-lock-raw.patch --]
[-- Type: text/plain, Size: 2862 bytes --]

This lock is taken in hard interrupt context even on Preempt-RT. Make it raw
so RT does not have to patch it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -120,7 +120,7 @@ static struct perf_pmu_events_attr event
 };
 
 struct rapl_pmu {
-	spinlock_t		lock;
+	raw_spinlock_t		lock;
 	int			n_active;
 	struct list_head	active_list;
 	struct pmu		*pmu;
@@ -210,12 +210,12 @@ static enum hrtimer_restart rapl_hrtimer
 	if (!pmu->n_active)
 		return HRTIMER_NORESTART;
 
-	spin_lock_irqsave(&pmu->lock, flags);
+	raw_spin_lock_irqsave(&pmu->lock, flags);
 
 	list_for_each_entry(event, &pmu->active_list, active_entry)
 		rapl_event_update(event);
 
-	spin_unlock_irqrestore(&pmu->lock, flags);
+	raw_spin_unlock_irqrestore(&pmu->lock, flags);
 
 	hrtimer_forward_now(hrtimer, pmu->timer_interval);
 
@@ -252,9 +252,9 @@ static void rapl_pmu_event_start(struct
 	struct rapl_pmu *pmu = __this_cpu_read(rapl_pmu);
 	unsigned long flags;
 
-	spin_lock_irqsave(&pmu->lock, flags);
+	raw_spin_lock_irqsave(&pmu->lock, flags);
 	__rapl_pmu_event_start(pmu, event);
-	spin_unlock_irqrestore(&pmu->lock, flags);
+	raw_spin_unlock_irqrestore(&pmu->lock, flags);
 }
 
 static void rapl_pmu_event_stop(struct perf_event *event, int mode)
@@ -263,7 +263,7 @@ static void rapl_pmu_event_stop(struct p
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long flags;
 
-	spin_lock_irqsave(&pmu->lock, flags);
+	raw_spin_lock_irqsave(&pmu->lock, flags);
 
 	/* mark event as deactivated and stopped */
 	if (!(hwc->state & PERF_HES_STOPPED)) {
@@ -288,7 +288,7 @@ static void rapl_pmu_event_stop(struct p
 		hwc->state |= PERF_HES_UPTODATE;
 	}
 
-	spin_unlock_irqrestore(&pmu->lock, flags);
+	raw_spin_unlock_irqrestore(&pmu->lock, flags);
 }
 
 static int rapl_pmu_event_add(struct perf_event *event, int mode)
@@ -297,14 +297,14 @@ static int rapl_pmu_event_add(struct per
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long flags;
 
-	spin_lock_irqsave(&pmu->lock, flags);
+	raw_spin_lock_irqsave(&pmu->lock, flags);
 
 	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
 
 	if (mode & PERF_EF_START)
 		__rapl_pmu_event_start(pmu, event);
 
-	spin_unlock_irqrestore(&pmu->lock, flags);
+	raw_spin_unlock_irqrestore(&pmu->lock, flags);
 
 	return 0;
 }
@@ -567,7 +567,7 @@ static int rapl_cpu_prepare(int cpu)
 	pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
 	if (!pmu)
 		return -1;
-	spin_lock_init(&pmu->lock);
+	raw_spin_lock_init(&pmu->lock);
 
 	INIT_LIST_HEAD(&pmu->active_list);
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 25/28] x86/perf/intel/rapl: Utilize event->pmu_private
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (23 preceding siblings ...)
  2016-02-22 11:07 ` [patch V2 24/28] x86/perf/intel/rapl: Make pmu lock raw Thomas Gleixner
@ 2016-02-22 11:07 ` Thomas Gleixner
  2016-02-22 11:07 ` [patch V2 26/28] x86/perf/intel/rapl: Convert it to a per package facility Thomas Gleixner
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:07 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel-rapl--Utilize-event->pmu_private.patch --]
[-- Type: text/plain, Size: 2899 bytes --]

Store the pmu in event->pmu_private and use it instead of the per cpu
data. Preparatory step to get rid of the per cpu allocations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -122,6 +122,7 @@ static struct perf_pmu_events_attr event
 struct rapl_pmu {
 	raw_spinlock_t		lock;
 	int			n_active;
+	int			cpu;
 	struct list_head	active_list;
 	struct pmu		*pmu;
 	ktime_t			timer_interval;
@@ -203,7 +204,7 @@ static void rapl_start_hrtimer(struct ra
 
 static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer)
 {
-	struct rapl_pmu *pmu = __this_cpu_read(rapl_pmu);
+	struct rapl_pmu *pmu = container_of(hrtimer, struct rapl_pmu, hrtimer);
 	struct perf_event *event;
 	unsigned long flags;
 
@@ -249,7 +250,7 @@ static void __rapl_pmu_event_start(struc
 
 static void rapl_pmu_event_start(struct perf_event *event, int mode)
 {
-	struct rapl_pmu *pmu = __this_cpu_read(rapl_pmu);
+	struct rapl_pmu *pmu = event->pmu_private;
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&pmu->lock, flags);
@@ -259,7 +260,7 @@ static void rapl_pmu_event_start(struct
 
 static void rapl_pmu_event_stop(struct perf_event *event, int mode)
 {
-	struct rapl_pmu *pmu = __this_cpu_read(rapl_pmu);
+	struct rapl_pmu *pmu = event->pmu_private;
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long flags;
 
@@ -293,7 +294,7 @@ static void rapl_pmu_event_stop(struct p
 
 static int rapl_pmu_event_add(struct perf_event *event, int mode)
 {
-	struct rapl_pmu *pmu = __this_cpu_read(rapl_pmu);
+	struct rapl_pmu *pmu = event->pmu_private;
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long flags;
 
@@ -316,6 +317,7 @@ static void rapl_pmu_event_del(struct pe
 
 static int rapl_pmu_event_init(struct perf_event *event)
 {
+	struct rapl_pmu *pmu = __this_cpu_read(rapl_pmu);
 	u64 cfg = event->attr.config & RAPL_EVENT_MASK;
 	int bit, msr, ret = 0;
 
@@ -327,6 +329,9 @@ static int rapl_pmu_event_init(struct pe
 	if (event->attr.config & ~RAPL_EVENT_MASK)
 		return -EINVAL;
 
+	if (event->cpu < 0)
+		return -EINVAL;
+
 	/*
 	 * check event is known (determines counter)
 	 */
@@ -365,6 +370,8 @@ static int rapl_pmu_event_init(struct pe
 		return -EINVAL;
 
 	/* must be done before validate_group */
+	event->cpu = pmu->cpu;
+	event->pmu_private = pmu;
 	event->hw.event_base = msr;
 	event->hw.config = cfg;
 	event->hw.idx = bit;
@@ -572,6 +579,7 @@ static int rapl_cpu_prepare(int cpu)
 	INIT_LIST_HEAD(&pmu->active_list);
 
 	pmu->pmu = &rapl_pmu_class;
+	pmu->cpu = cpu;
 
 	pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 26/28] x86/perf/intel/rapl: Convert it to a per package facility
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (24 preceding siblings ...)
  2016-02-22 11:07 ` [patch V2 25/28] x86/perf/intel/rapl: Utilize event->pmu_private Thomas Gleixner
@ 2016-02-22 11:07 ` Thomas Gleixner
  2016-02-22 12:08   ` Peter Zijlstra
  2016-02-22 11:07 ` [patch V2 27/28] perf: Export perf_event_sysfs_show Thomas Gleixner
                   ` (2 subsequent siblings)
  28 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:07 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel-rapl--Convert-it-to-a-per-package-facility.patch --]
[-- Type: text/plain, Size: 8306 bytes --]

RAPL is a per package facility and we already have a mechanism for a dedicated
per package reader. So there is no point to have multiple CPUs doing the
same. The current implementation actually starts two timers on two cpus if one
does:

	perf stat -C1,2 -e -e power/energy-pkg ....

which makes the whole concept of 1 reader per package moot.

What's worse is that the above returns the double of the actual energy
consumption, but that's a different problem to address and cannot be solved by
removing the pointless per cpuness of that mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |  196 ++++++++++++----------------
 1 file changed, 87 insertions(+), 109 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
===================================================================
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -129,15 +129,23 @@ struct rapl_pmu {
 	struct hrtimer		hrtimer;
 };
 
+struct rapl_pmus {
+	struct pmu		pmu;
+	unsigned int		maxpkg;
+	struct rapl_pmu		*pmus[];
+};
+
  /* 1/2^hw_unit Joule */
 static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;
-static struct pmu rapl_pmu_class;
+static struct rapl_pmus *rapl_pmus;
 static cpumask_t rapl_cpu_mask;
-static int rapl_cntr_mask;
+static unsigned int rapl_cntr_mask;
 static u64 rapl_timer_ms;
 
-static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu);
-static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu_to_free);
+static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
+{
+	return rapl_pmus->pmus[topology_logical_package_id(cpu)];
+}
 
 static inline u64 rapl_read_counter(struct perf_event *event)
 {
@@ -317,12 +325,12 @@ static void rapl_pmu_event_del(struct pe
 
 static int rapl_pmu_event_init(struct perf_event *event)
 {
-	struct rapl_pmu *pmu = __this_cpu_read(rapl_pmu);
 	u64 cfg = event->attr.config & RAPL_EVENT_MASK;
 	int bit, msr, ret = 0;
+	struct rapl_pmu *pmu;
 
 	/* only look at RAPL events */
-	if (event->attr.type != rapl_pmu_class.type)
+	if (event->attr.type != rapl_pmus->pmu.type)
 		return -ENOENT;
 
 	/* check only supported bits are set */
@@ -370,6 +378,7 @@ static int rapl_pmu_event_init(struct pe
 		return -EINVAL;
 
 	/* must be done before validate_group */
+	pmu = cpu_to_rapl_pmu(event->cpu);
 	event->cpu = pmu->cpu;
 	event->pmu_private = pmu;
 	event->hw.event_base = msr;
@@ -502,116 +511,62 @@ const struct attribute_group *rapl_attr_
 	NULL,
 };
 
-static struct pmu rapl_pmu_class = {
-	.attr_groups	= rapl_attr_groups,
-	.task_ctx_nr	= perf_invalid_context, /* system-wide only */
-	.event_init	= rapl_pmu_event_init,
-	.add		= rapl_pmu_event_add, /* must have */
-	.del		= rapl_pmu_event_del, /* must have */
-	.start		= rapl_pmu_event_start,
-	.stop		= rapl_pmu_event_stop,
-	.read		= rapl_pmu_event_read,
-};
-
 static void rapl_cpu_exit(int cpu)
 {
-	struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu);
-	int i, phys_id = topology_physical_package_id(cpu);
-	int target = -1;
-
-	/* find a new cpu on same package */
-	for_each_online_cpu(i) {
-		if (i == cpu)
-			continue;
-		if (phys_id == topology_physical_package_id(i)) {
-			target = i;
-			break;
-		}
-	}
-	/*
-	 * clear cpu from cpumask
-	 * if was set in cpumask and still some cpu on package,
-	 * then move to new cpu
-	 */
-	if (cpumask_test_and_clear_cpu(cpu, &rapl_cpu_mask) && target >= 0)
-		cpumask_set_cpu(target, &rapl_cpu_mask);
+	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
+	int target;
 
-	WARN_ON(cpumask_empty(&rapl_cpu_mask));
-	/*
-	 * migrate events and context to new cpu
-	 */
-	if (target >= 0)
-		perf_pmu_migrate_context(pmu->pmu, cpu, target);
+	/* Check if exiting cpu is used for collecting rapl events */
+	if (!cpumask_test_and_clear_cpu(cpu, &rapl_cpu_mask))
+		return;
+
+	pmu->cpu = -1;
+	/* Find a new cpu to collect rapl events */
+	target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
 
-	/* cancel overflow polling timer for CPU */
-	hrtimer_cancel(&pmu->hrtimer);
+	/* Migrate rapl events to the new target */
+	if (target < nr_cpu_ids) {
+		cpumask_set_cpu(target, &rapl_cpu_mask);
+		pmu->cpu = target;
+		perf_pmu_migrate_context(pmu->pmu, cpu, target);
+	}
 }
 
 static void rapl_cpu_init(int cpu)
 {
-	int i, phys_id = topology_physical_package_id(cpu);
+	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
+	int target;
+
+	/*
+	 * Check if there is an online cpu in the package which collects rapl
+	 * events already.
+	 */
+	target = cpumask_any_and(&rapl_cpu_mask, topology_core_cpumask(cpu));
+	if (target < nr_cpu_ids)
+		return;
 
-	/* check if phys_is is already covered */
-	for_each_cpu(i, &rapl_cpu_mask) {
-		if (phys_id == topology_physical_package_id(i))
-			return;
-	}
-	/* was not found, so add it */
 	cpumask_set_cpu(cpu, &rapl_cpu_mask);
+	pmu->cpu = cpu;
 }
 
 static int rapl_cpu_prepare(int cpu)
 {
-	struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu);
-	int phys_id = topology_physical_package_id(cpu);
+	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
 
 	if (pmu)
 		return 0;
 
-	if (phys_id < 0)
-		return -1;
-
 	pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
 	if (!pmu)
-		return -1;
-	raw_spin_lock_init(&pmu->lock);
+		return -ENOMEM;
 
+	raw_spin_lock_init(&pmu->lock);
 	INIT_LIST_HEAD(&pmu->active_list);
-
-	pmu->pmu = &rapl_pmu_class;
-	pmu->cpu = cpu;
-
+	pmu->pmu = &rapl_pmus->pmu;
 	pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
-
+	pmu->cpu = -1;
 	rapl_hrtimer_init(pmu);
-
-	/* set RAPL pmu for this cpu for now */
-	per_cpu(rapl_pmu, cpu) = pmu;
-	per_cpu(rapl_pmu_to_free, cpu) = NULL;
-
-	return 0;
-}
-
-static void rapl_cpu_kfree(int cpu)
-{
-	struct rapl_pmu *pmu = per_cpu(rapl_pmu_to_free, cpu);
-
-	kfree(pmu);
-
-	per_cpu(rapl_pmu_to_free, cpu) = NULL;
-}
-
-static int rapl_cpu_dying(int cpu)
-{
-	struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu);
-
-	if (!pmu)
-		return 0;
-
-	per_cpu(rapl_pmu, cpu) = NULL;
-
-	per_cpu(rapl_pmu_to_free, cpu) = pmu;
-
+	rapl_pmus->pmus[topology_logical_package_id(cpu)] = pmu;
 	return 0;
 }
 
@@ -624,24 +579,16 @@ static int rapl_cpu_notifier(struct noti
 	case CPU_UP_PREPARE:
 		rapl_cpu_prepare(cpu);
 		break;
-	case CPU_STARTING:
-		rapl_cpu_init(cpu);
-		break;
-	case CPU_UP_CANCELED:
-	case CPU_DYING:
-		rapl_cpu_dying(cpu);
-		break;
+
+	case CPU_DOWN_FAILED:
 	case CPU_ONLINE:
-	case CPU_DEAD:
-		rapl_cpu_kfree(cpu);
+		rapl_cpu_init(cpu);
 		break;
+
 	case CPU_DOWN_PREPARE:
 		rapl_cpu_exit(cpu);
 		break;
-	default:
-		break;
 	}
-
 	return NOTIFY_OK;
 }
 
@@ -703,10 +650,14 @@ static void __init rapl_advertise(void)
 
 static int __init rapl_prepare_cpus(void)
 {
-	unsigned int cpu;
+	unsigned int cpu, pkg;
 	int ret;
 
 	for_each_online_cpu(cpu) {
+		pkg = topology_logical_package_id(cpu);
+		if (rapl_pmus->pmus[pkg])
+			continue;
+
 		ret = rapl_cpu_prepare(cpu);
 		if (ret)
 			return ret;
@@ -717,10 +668,33 @@ static int __init rapl_prepare_cpus(void
 
 static void __init cleanup_rapl_pmus(void)
 {
-	int cpu;
+	int i;
 
-	for_each_online_cpu(cpu)
-		kfree(per_cpu(rapl_pmu, cpu));
+	for (i = 0; i < rapl_pmus->maxpkg; i++)
+		kfree(rapl_pmus->pmus + i);
+	kfree(rapl_pmus);
+}
+
+static int __init init_rapl_pmus(void)
+{
+	int maxpkg = topology_max_packages();
+	size_t size;
+
+	size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *);
+	rapl_pmus = kzalloc(size, GFP_KERNEL);
+	if (!rapl_pmus)
+		return -ENOMEM;
+
+	rapl_pmus->maxpkg		= maxpkg;
+	rapl_pmus->pmu.attr_groups	= rapl_attr_groups;
+	rapl_pmus->pmu.task_ctx_nr	= perf_invalid_context;
+	rapl_pmus->pmu.event_init	= rapl_pmu_event_init;
+	rapl_pmus->pmu.add		= rapl_pmu_event_add;
+	rapl_pmus->pmu.del		= rapl_pmu_event_del;
+	rapl_pmus->pmu.start		= rapl_pmu_event_start;
+	rapl_pmus->pmu.stop		= rapl_pmu_event_stop;
+	rapl_pmus->pmu.read		= rapl_pmu_event_read;
+	return 0;
 }
 
 static const struct x86_cpu_id rapl_cpu_match[] __initconst = {
@@ -771,13 +745,17 @@ static int __init rapl_pmu_init(void)
 	if (ret)
 		return ret;
 
+	ret = init_rapl_pmus();
+	if (ret)
+		return ret;
+
 	cpu_notifier_register_begin();
 
 	ret = rapl_prepare_cpus();
 	if (ret)
 		goto out;
 
-	ret = perf_pmu_register(&rapl_pmu_class, "power", -1);
+	ret = perf_pmu_register(&rapl_pmus->pmu, "power", -1);
 	if (ret)
 		goto out;
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 28/28] x86/perf/intel/rapl: Make it modular
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (26 preceding siblings ...)
  2016-02-22 11:07 ` [patch V2 27/28] perf: Export perf_event_sysfs_show Thomas Gleixner
@ 2016-02-22 11:07 ` Thomas Gleixner
  2016-02-22 18:40   ` Andi Kleen
  2016-02-22 18:55 ` [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Andi Kleen
  28 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:07 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: x86-perf-intel-rapl--Make-it-modular.patch --]
[-- Type: text/plain, Size: 2919 bytes --]

Add the necessary exit functions so it can be built as a module.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig.perf                       |   10 ++++++++++
 arch/x86/kernel/cpu/Makefile                |    4 +++-
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |   22 +++++++++++++++++++---
 3 files changed, 32 insertions(+), 4 deletions(-)

--- a/arch/x86/Kconfig.perf
+++ b/arch/x86/Kconfig.perf
@@ -10,4 +10,14 @@ config PERF_EVENTS_INTEL_UNCORE
 
 	  If unsure say y.
 
+config PERF_EVENTS_INTEL_RAPL
+	tristate "Intel rapl performance events"
+	depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
+	default y
+	---help---
+	  Include support for Intel rapl performance events for power
+	  monitoring on modern processors.
+
+	  If unsure say y.
+
 endmenu
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -39,10 +39,12 @@ obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event
 endif
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_knc.o perf_event_p4.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o
-obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_rapl.o perf_event_intel_cqm.o
+obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_cqm.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_pt.o perf_event_intel_bts.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_cstate.o
 
+obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL)	+= perf_event_intel_rapl.o
+
 obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)	+= perf_event_intel_uncores.o
 perf_event_intel_uncores-objs		:= perf_event_intel_uncore.o \
 					   perf_event_intel_uncore_snb.o \
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -53,6 +53,8 @@
 #include <asm/cpu_device_id.h>
 #include "perf_event.h"
 
+MODULE_LICENSE("GPL");
+
 /*
  * RAPL energy status counters
  */
@@ -592,6 +594,10 @@ static int rapl_cpu_notifier(struct noti
 	return NOTIFY_OK;
 }
 
+static struct notifier_block rapl_cpu_nb = {
+	.notifier_call	= rapl_cpu_notifier,
+};
+
 static __init void rapl_hsw_server_quirk(void)
 {
 	/*
@@ -666,7 +672,7 @@ static int __init rapl_prepare_cpus(void
 	return 0;
 }
 
-static void __init cleanup_rapl_pmus(void)
+static void cleanup_rapl_pmus(void)
 {
 	int i;
 
@@ -759,7 +765,7 @@ static int __init rapl_pmu_init(void)
 	if (ret)
 		goto out;
 
-	__perf_cpu_notifier(rapl_cpu_notifier);
+	__register_cpu_notifier(&rapl_cpu_nb);
 	cpu_notifier_register_done();
 	rapl_advertise();
 	return 0;
@@ -770,4 +776,14 @@ static int __init rapl_pmu_init(void)
 	cpu_notifier_register_done();
 	return ret;
 }
-device_initcall(rapl_pmu_init);
+module_init(rapl_pmu_init);
+
+static void __exit intel_rapl_exit(void)
+{
+	cpu_notifier_register_done();
+	__unregister_cpu_notifier(&rapl_cpu_nb);
+	perf_pmu_unregister(&rapl_pmus->pmu);
+	cleanup_rapl_pmus();
+	cpu_notifier_register_done();
+}
+module_exit(intel_rapl_exit);

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 27/28] perf: Export perf_event_sysfs_show
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (25 preceding siblings ...)
  2016-02-22 11:07 ` [patch V2 26/28] x86/perf/intel/rapl: Convert it to a per package facility Thomas Gleixner
@ 2016-02-22 11:07 ` Thomas Gleixner
  2016-02-22 12:09   ` Peter Zijlstra
  2016-02-22 11:07 ` [patch V2 28/28] x86/perf/intel/rapl: Make it modular Thomas Gleixner
  2016-02-22 18:55 ` [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Andi Kleen
  28 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:07 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

[-- Attachment #1: perf--Export-perf_event_sysfs_show.patch --]
[-- Type: text/plain, Size: 390 bytes --]

Required to use it in modular perf drivers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/events/core.c |    1 +
 1 file changed, 1 insertion(+)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9335,6 +9335,7 @@ ssize_t perf_event_sysfs_show(struct dev
 
 	return 0;
 }
+EXPORT_SYMBOL(perf_event_sysfs_show);
 
 static int __init perf_event_sysfs_init(void)
 {

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 10/28] x86/perf/intel_uncore: Store box in event->pmu_private
  2016-02-22 11:06 ` [patch V2 10/28] x86/perf/intel_uncore: Store box in event->pmu_private Thomas Gleixner
@ 2016-02-22 11:45   ` Peter Zijlstra
  2016-02-22 11:58     ` Thomas Gleixner
  2016-02-22 11:52   ` Peter Zijlstra
  1 sibling, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2016-02-22 11:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Borislav Petkov, Stephane Eranian, Harish Chegondi,
	Kan Liang, Andi Kleen, Jacob Pan

On Mon, Feb 22, 2016 at 11:06:50AM -0000, Thomas Gleixner wrote:
> Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
> ===================================================================
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
> @@ -313,6 +313,7 @@ static int snb_uncore_imc_event_init(str
>  		return -EINVAL;
>  
>  	event->cpu = box->cpu;
> +	event->pmu_private = pmu;

This seems rather pointless, did you want that to be '= box' instead?

>  
>  	event->hw.idx = -1;
>  	event->hw.last_tag = ~0ULL;
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 10/28] x86/perf/intel_uncore: Store box in event->pmu_private
  2016-02-22 11:06 ` [patch V2 10/28] x86/perf/intel_uncore: Store box in event->pmu_private Thomas Gleixner
  2016-02-22 11:45   ` Peter Zijlstra
@ 2016-02-22 11:52   ` Peter Zijlstra
  2016-02-22 12:00     ` Thomas Gleixner
  1 sibling, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2016-02-22 11:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Borislav Petkov, Stephane Eranian, Harish Chegondi,
	Kan Liang, Andi Kleen, Jacob Pan

On Mon, Feb 22, 2016 at 11:06:50AM -0000, Thomas Gleixner wrote:
> +	event->pmu_private = box;

> +static inline struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
> +{
> +	return event->pmu_private;
> +}

Do you really need this? That is, what is wrong with:

static inline struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
{
	return uncore_pmu_to_box(event->pmu, event->cpu);
}

Which, after patch 12, should be fairly trivial, right?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 13/28] x86/perf/intel_uncore: Clear all hardware state on exit
  2016-02-22 11:06 ` [patch V2 13/28] x86/perf/intel_uncore: Clear all hardware state on exit Thomas Gleixner
@ 2016-02-22 11:58   ` Peter Zijlstra
  2016-02-22 11:59     ` Thomas Gleixner
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2016-02-22 11:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Borislav Petkov, Stephane Eranian, Harish Chegondi,
	Kan Liang, Andi Kleen, Jacob Pan

On Mon, Feb 22, 2016 at 11:06:52AM -0000, Thomas Gleixner wrote:
> +static void __init uncore_exit_boxes(void *dummy)
> +{
> +	struct intel_uncore_type **types = uncore_msr_uncores;
> +
> +	while (*types)
> +		__uncore_exit_boxes(*types++, smp_processor_id());

	for (types = uncore_msr_uncores; *types; types++)
		__uncore_exit_boxes(*types, smp_processor_id());

> +}

You flipped to for() loops in the other places, figures this one should
match?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 10/28] x86/perf/intel_uncore: Store box in event->pmu_private
  2016-02-22 11:45   ` Peter Zijlstra
@ 2016-02-22 11:58     ` Thomas Gleixner
  0 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Borislav Petkov, Stephane Eranian, Harish Chegondi,
	Kan Liang, Andi Kleen, Jacob Pan

On Mon, 22 Feb 2016, Peter Zijlstra wrote:

> On Mon, Feb 22, 2016 at 11:06:50AM -0000, Thomas Gleixner wrote:
> > Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
> > ===================================================================
> > --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
> > +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
> > @@ -313,6 +313,7 @@ static int snb_uncore_imc_event_init(str
> >  		return -EINVAL;
> >  
> >  	event->cpu = box->cpu;
> > +	event->pmu_private = pmu;
> 
> This seems rather pointless, did you want that to be '= box' instead?

Duh, yes.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 13/28] x86/perf/intel_uncore: Clear all hardware state on exit
  2016-02-22 11:58   ` Peter Zijlstra
@ 2016-02-22 11:59     ` Thomas Gleixner
  0 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 11:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Borislav Petkov, Stephane Eranian, Harish Chegondi,
	Kan Liang, Andi Kleen, Jacob Pan

On Mon, 22 Feb 2016, Peter Zijlstra wrote:

> On Mon, Feb 22, 2016 at 11:06:52AM -0000, Thomas Gleixner wrote:
> > +static void __init uncore_exit_boxes(void *dummy)
> > +{
> > +	struct intel_uncore_type **types = uncore_msr_uncores;
> > +
> > +	while (*types)
> > +		__uncore_exit_boxes(*types++, smp_processor_id());
> 
> 	for (types = uncore_msr_uncores; *types; types++)
> 		__uncore_exit_boxes(*types, smp_processor_id());
> 
> > +}
> 
> You flipped to for() loops in the other places, figures this one should
> match?

Yes. Forgot about that one.
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 10/28] x86/perf/intel_uncore: Store box in event->pmu_private
  2016-02-22 11:52   ` Peter Zijlstra
@ 2016-02-22 12:00     ` Thomas Gleixner
  0 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 12:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Borislav Petkov, Stephane Eranian, Harish Chegondi,
	Kan Liang, Andi Kleen, Jacob Pan

On Mon, 22 Feb 2016, Peter Zijlstra wrote:
> On Mon, Feb 22, 2016 at 11:06:50AM -0000, Thomas Gleixner wrote:
> > +	event->pmu_private = box;
> 
> > +static inline struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
> > +{
> > +	return event->pmu_private;
> > +}
> 
> Do you really need this? That is, what is wrong with:
> 
> static inline struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
> {
> 	return uncore_pmu_to_box(event->pmu, event->cpu);
> }
> 
> Which, after patch 12, should be fairly trivial, right?
 
Yes.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 26/28] x86/perf/intel/rapl: Convert it to a per package facility
  2016-02-22 11:07 ` [patch V2 26/28] x86/perf/intel/rapl: Convert it to a per package facility Thomas Gleixner
@ 2016-02-22 12:08   ` Peter Zijlstra
  2016-02-22 15:52     ` Thomas Gleixner
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2016-02-22 12:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Borislav Petkov, Stephane Eranian, Harish Chegondi,
	Kan Liang, Andi Kleen, Jacob Pan

On Mon, Feb 22, 2016 at 11:07:02AM -0000, Thomas Gleixner wrote:
> +static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
> +{
> +	return rapl_pmus->pmus[topology_logical_package_id(cpu)];
> +}

> @@ -370,6 +378,7 @@ static int rapl_pmu_event_init(struct pe
>  		return -EINVAL;
>  
>  	/* must be done before validate_group */
> +	pmu = cpu_to_rapl_pmu(event->cpu);
>  	event->cpu = pmu->cpu;
>  	event->pmu_private = pmu;

This again looks like pmu_private is 'trivially' replacable with
something like:

static inline struct rapl_pmu *event_to_rapl_pmu(struct perf_event *event)
{
	return cpu_to_rapl_pmu(event->cpu);
}

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 27/28] perf: Export perf_event_sysfs_show
  2016-02-22 11:07 ` [patch V2 27/28] perf: Export perf_event_sysfs_show Thomas Gleixner
@ 2016-02-22 12:09   ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2016-02-22 12:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Borislav Petkov, Stephane Eranian, Harish Chegondi,
	Kan Liang, Andi Kleen, Jacob Pan

On Mon, Feb 22, 2016 at 11:07:03AM -0000, Thomas Gleixner wrote:
> Required to use it in modular perf drivers.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/events/core.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9335,6 +9335,7 @@ ssize_t perf_event_sysfs_show(struct dev
>  
>  	return 0;
>  }
> +EXPORT_SYMBOL(perf_event_sysfs_show);

perf exports are _GPL.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 26/28] x86/perf/intel/rapl: Convert it to a per package facility
  2016-02-22 12:08   ` Peter Zijlstra
@ 2016-02-22 15:52     ` Thomas Gleixner
  2016-02-22 16:04       ` Peter Zijlstra
  0 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 15:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Borislav Petkov, Stephane Eranian, Harish Chegondi,
	Kan Liang, Andi Kleen, Jacob Pan

On Mon, 22 Feb 2016, Peter Zijlstra wrote:

> On Mon, Feb 22, 2016 at 11:07:02AM -0000, Thomas Gleixner wrote:
> > +static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
> > +{
> > +	return rapl_pmus->pmus[topology_logical_package_id(cpu)];
> > +}
> 
> > @@ -370,6 +378,7 @@ static int rapl_pmu_event_init(struct pe
> >  		return -EINVAL;
> >  
> >  	/* must be done before validate_group */
> > +	pmu = cpu_to_rapl_pmu(event->cpu);
> >  	event->cpu = pmu->cpu;
> >  	event->pmu_private = pmu;
> 
> This again looks like pmu_private is 'trivially' replacable with
> something like:
> 
> static inline struct rapl_pmu *event_to_rapl_pmu(struct perf_event *event)
> {
> 	return cpu_to_rapl_pmu(event->cpu);
> }

Yes, it is. But that's 3 loads versus 1 and we have that in the perf fastpath,
so I prefer to keep the pmu_private add on.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 26/28] x86/perf/intel/rapl: Convert it to a per package facility
  2016-02-22 15:52     ` Thomas Gleixner
@ 2016-02-22 16:04       ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2016-02-22 16:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Borislav Petkov, Stephane Eranian, Harish Chegondi,
	Kan Liang, Andi Kleen, Jacob Pan

On Mon, Feb 22, 2016 at 04:52:28PM +0100, Thomas Gleixner wrote:
> On Mon, 22 Feb 2016, Peter Zijlstra wrote:
> 
> > On Mon, Feb 22, 2016 at 11:07:02AM -0000, Thomas Gleixner wrote:
> > > +static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
> > > +{
> > > +	return rapl_pmus->pmus[topology_logical_package_id(cpu)];
> > > +}
> > 
> > > @@ -370,6 +378,7 @@ static int rapl_pmu_event_init(struct pe
> > >  		return -EINVAL;
> > >  
> > >  	/* must be done before validate_group */
> > > +	pmu = cpu_to_rapl_pmu(event->cpu);
> > >  	event->cpu = pmu->cpu;
> > >  	event->pmu_private = pmu;
> > 
> > This again looks like pmu_private is 'trivially' replacable with
> > something like:
> > 
> > static inline struct rapl_pmu *event_to_rapl_pmu(struct perf_event *event)
> > {
> > 	return cpu_to_rapl_pmu(event->cpu);
> > }
> 
> Yes, it is. But that's 3 loads versus 1 and we have that in the perf fastpath,
> so I prefer to keep the pmu_private add on.

OK, fair enough.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 28/28] x86/perf/intel/rapl: Make it modular
  2016-02-22 11:07 ` [patch V2 28/28] x86/perf/intel/rapl: Make it modular Thomas Gleixner
@ 2016-02-22 18:40   ` Andi Kleen
  2016-02-22 19:11     ` Thomas Gleixner
  0 siblings, 1 reply; 47+ messages in thread
From: Andi Kleen @ 2016-02-22 18:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

Thomas Gleixner <tglx@linutronix.de> writes:

> Add the necessary exit functions so it can be built as a module.

If you make it a module you also need to add MODULE_DEVICE_TABLE
for the PCI IDs and also add x86_cpu_id tables/annotations
for the model numbers.

Otherwise the module would always need to be loaded manually
which would break existing setups.

Of course it's a bit of a waste because in many cases the uncore
driver will not be needed. One alternative would be to only
add some module aliases and then add code in the perf core
to probe the alias when that PMU is accessed. That would
likely work for most of perf, except for perf list (which does
not know what pmu to probe)

Other than that it's very useful, it was long overdue to make
these drivers modular.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 22/28] x86/perf/intel/rapl: Cleanup the printk output
  2016-02-22 11:06 ` [patch V2 22/28] x86/perf/intel/rapl: Cleanup the printk output Thomas Gleixner
@ 2016-02-22 18:44   ` Andi Kleen
  2016-02-22 18:58     ` Thomas Gleixner
  0 siblings, 1 reply; 47+ messages in thread
From: Andi Kleen @ 2016-02-22 18:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

Thomas Gleixner <tglx@linutronix.de> writes:
>  
> +static void __init rapl_advertise(void)
> +{
> +	int i;
> +
> +	pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
> +		hweight32(rapl_cntr_mask), rapl_timer_ms);

Does the final resulting line contain RAPL for grepping?

Ideally would be similar as before in case someone has a parser for
this.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 11/28] x86/topology: Create logical package id
  2016-02-22 11:06 ` [patch V2 11/28] x86/topology: Create logical package id Thomas Gleixner
@ 2016-02-22 18:54   ` Andi Kleen
  2016-02-22 19:05     ` Thomas Gleixner
  0 siblings, 1 reply; 47+ messages in thread
From: Andi Kleen @ 2016-02-22 18:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

Thomas Gleixner <tglx@linutronix.de> writes:


> +
> +	if (c->cpuid_level >= 0x00000001) {
> +		u32 eax, ebx, ecx, edx;
> +
> +		cpuid(0x00000001, &eax, &ebx, &ecx, &edx);

Use cpuid_edx()

> +		/*
> +		 * If HTT (EDX[28]) is set EBX[16:23] contain the number of
> +		 * apicids which are reserved per package. Store the resulting
> +		 * shift value for the package management code.
> +		 */
> +		if (edx & (1U << 28))
> +			c->x86_coreid_bits = get_count_order((ebx >> 16) & 0xff);
> +	}
> +++ b/arch/x86/kernel/cpu/proc.c
> @@ -12,6 +12,7 @@ static void show_cpuinfo_core(struct seq
>  {
>  #ifdef CONFIG_SMP
>  	seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
> +	seq_printf(m, "logical id\t: %d\n", c->logical_proc_id);


I'm not sure it makes sense to export this. What good would it be for
the user?

If it was it would need to be documented somewhere. But I would
just drop it and keep it kernel internal.

FWIW every time something is added to this file it usually breaks
some (dumb) programs.

> +	/*
> +	 * Today neither Intel nor AMD support heterogenous systems. That
> +	 * might change in the future....
> +	 */
> +	ncpus = boot_cpu_data.x86_max_cores * smp_num_siblings;
> +	__max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);

FWIW Hypervisors can do nearly everything today.

I assume your code handles it.

Let's hope that the Hypervisors always set up the correct CPUID now
for their sibling configuration. If they don't with this change
some CPUs would be suddenly lost.

Would it be worth to have a kernel option where the maximum can be overriden
in case this happens?


-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management
  2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
                   ` (27 preceding siblings ...)
  2016-02-22 11:07 ` [patch V2 28/28] x86/perf/intel/rapl: Make it modular Thomas Gleixner
@ 2016-02-22 18:55 ` Andi Kleen
  28 siblings, 0 replies; 47+ messages in thread
From: Andi Kleen @ 2016-02-22 18:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

Thomas Gleixner <tglx@linutronix.de> writes:

> This series addresses the following issues:

FWIW I did a quick read through the series, and except where commented
it looks good to me.

Reviewed-by: Andi Kleen <ak@linux.intel.com>

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 22/28] x86/perf/intel/rapl: Cleanup the printk output
  2016-02-22 18:44   ` Andi Kleen
@ 2016-02-22 18:58     ` Thomas Gleixner
  0 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 18:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

Andi,

On Mon, 22 Feb 2016, Andi Kleen wrote:

> Thomas Gleixner <tglx@linutronix.de> writes:
> >  
> > +static void __init rapl_advertise(void)
> > +{
> > +	int i;
> > +
> > +	pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
> > +		hweight32(rapl_cntr_mask), rapl_timer_ms);
> 
> Does the final resulting line contain RAPL for grepping?

Yes. That's what the pr_fmt prepends.

[35282.080376] calling  rapl_pmu_init+0x0/0xff5 [perf_event_intel_rapl] @ 10523
[35282.080679] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer
[35282.088634] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
[35282.094394] RAPL PMU: hw unit of domain package 2^-14 Joules
[35282.111347] RAPL PMU: hw unit of domain dram 2^-16 Joules
[35282.127514] initcall rapl_pmu_init+0x0/0xff5 [perf_event_intel_rapl] returned 0 after 46023 usecs

> Ideally would be similar as before in case someone has a parser for
> this.

It is.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 11/28] x86/topology: Create logical package id
  2016-02-22 18:54   ` Andi Kleen
@ 2016-02-22 19:05     ` Thomas Gleixner
  0 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 19:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

Andi,

On Mon, 22 Feb 2016, Andi Kleen wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
> > +	if (c->cpuid_level >= 0x00000001) {
> > +		u32 eax, ebx, ecx, edx;
> > +
> > +		cpuid(0x00000001, &eax, &ebx, &ecx, &edx);
> 
> Use cpuid_edx()

That does not give me EBX ...
 
> > +		/*
> > +		 * If HTT (EDX[28]) is set EBX[16:23] contain the number of
> > +		 * apicids which are reserved per package. Store the resulting
> > +		 * shift value for the package management code.
> > +		 */
> > +		if (edx & (1U << 28))
> > +			c->x86_coreid_bits = get_count_order((ebx >> 16) & 0xff);
> > +	}
> > +++ b/arch/x86/kernel/cpu/proc.c
> > @@ -12,6 +12,7 @@ static void show_cpuinfo_core(struct seq
> >  {
> >  #ifdef CONFIG_SMP
> >  	seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
> > +	seq_printf(m, "logical id\t: %d\n", c->logical_proc_id);
> 
> 
> I'm not sure it makes sense to export this. What good would it be for
> the user?
>
> If it was it would need to be documented somewhere. But I would
> just drop it and keep it kernel internal.

You are right. We print already when we change the package number, so it can
be retrieved from dmesg.
 
> FWIW every time something is added to this file it usually breaks
> some (dumb) programs.

Ok, did not think about that.
 
> > +	/*
> > +	 * Today neither Intel nor AMD support heterogenous systems. That
> > +	 * might change in the future....
> > +	 */
> > +	ncpus = boot_cpu_data.x86_max_cores * smp_num_siblings;
> > +	__max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
> 
> FWIW Hypervisors can do nearly everything today.
> 
> I assume your code handles it.

It should. At least as long as nr_cpu_ids is sufficiently large.
 
> Let's hope that the Hypervisors always set up the correct CPUID now
> for their sibling configuration. If they don't with this change
> some CPUs would be suddenly lost.

The ones I looked at are doing is sane. Famous last words :)
 
> Would it be worth to have a kernel option where the maximum can be overriden
> in case this happens?

Let's wait for it to happen. It's done in no time, but if not needed it's just
ballast.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 28/28] x86/perf/intel/rapl: Make it modular
  2016-02-22 18:40   ` Andi Kleen
@ 2016-02-22 19:11     ` Thomas Gleixner
  0 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2016-02-22 19:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, Peter Zijlstra, x86, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Jacob Pan

Andi,

On Mon, 22 Feb 2016, Andi Kleen wrote:

> Thomas Gleixner <tglx@linutronix.de> writes:
> 
> > Add the necessary exit functions so it can be built as a module.
> 
> If you make it a module you also need to add MODULE_DEVICE_TABLE
> for the PCI IDs and also add x86_cpu_id tables/annotations
> for the model numbers.
> 
> Otherwise the module would always need to be loaded manually
> which would break existing setups.

I know. That's why I made the default Yes. We can drop that patch or add some
explanation to the help text that 'M' is for manual module loading until
someone adds the necessary magic for auto probing.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch V2 18/28] x86/perf/intel_rapl: Make Knights Landings support functional
  2016-02-22 11:06 ` [patch V2 18/28] x86/perf/intel_rapl: Make Knights Landings support functional Thomas Gleixner
@ 2016-02-22 20:38   ` Borislav Petkov
  0 siblings, 0 replies; 47+ messages in thread
From: Borislav Petkov @ 2016-02-22 20:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, x86, Stephane Eranian, Harish Chegondi,
	Kan Liang, Andi Kleen, Jacob Pan, Dasaratharaman Chandramouli

On Mon, Feb 22, 2016 at 11:06:56AM -0000, Thomas Gleixner wrote:
> The Knights Landings support added the events and the detection case, but then
> returns 0 without actually initializing the driver.
> 
> Fixes: 3a2a7797326a4 "perf/x86/intel/rapl: Add support for Knights Landing (KNL)"
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
> ---
>  arch/x86/kernel/cpu/perf_event_intel_rapl.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
> ===================================================================
> --- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
> @@ -731,7 +731,7 @@ static int __init rapl_pmu_init(void)
>  		rapl_add_quirk(rapl_hsw_server_quirk);
>  		rapl_cntr_mask = RAPL_IDX_KNL;
>  		rapl_pmu_events_group.attrs = rapl_events_knl_attr;
> -
> +		break;
>  	default:
>  		/* unsupported */
>  		return 0;

Cc: <stable@vger.kernel.org> # 4.3

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2016-02-22 20:39 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-22 11:06 [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Thomas Gleixner
2016-02-22 11:06 ` [patch V2 01/28] x86/perf/intel/uncore: Remove pointless mask check Thomas Gleixner
2016-02-22 11:06 ` [patch V2 03/28] x86/perf/intel/uncore: Fix error handling Thomas Gleixner
2016-02-22 11:06 ` [patch V2 02/28] x86/perf/intel/uncore: Simplify error rollback Thomas Gleixner
2016-02-22 11:06 ` [patch V2 04/28] x86/perf/intel/uncore: Add sanity checks for pci dev package id Thomas Gleixner
2016-02-22 11:06 ` [patch V2 05/28] x86/perf/intel_uncore: Cleanup hardware on exit Thomas Gleixner
2016-02-22 11:06 ` [patch V2 06/28] x86/perf/intel/uncore: Drop pointless hotplug priority Thomas Gleixner
2016-02-22 11:06 ` [patch V2 07/28] x86/perf/intel_uncore: Make code readable Thomas Gleixner
2016-02-22 11:06 ` [patch V2 08/28] x86/perf/uncore: Make uncore_pcibus_to_physid static Thomas Gleixner
2016-02-22 11:06 ` [patch V2 09/28] perf: Allow storage of pmu private data in event Thomas Gleixner
2016-02-22 11:06 ` [patch V2 11/28] x86/topology: Create logical package id Thomas Gleixner
2016-02-22 18:54   ` Andi Kleen
2016-02-22 19:05     ` Thomas Gleixner
2016-02-22 11:06 ` [patch V2 10/28] x86/perf/intel_uncore: Store box in event->pmu_private Thomas Gleixner
2016-02-22 11:45   ` Peter Zijlstra
2016-02-22 11:58     ` Thomas Gleixner
2016-02-22 11:52   ` Peter Zijlstra
2016-02-22 12:00     ` Thomas Gleixner
2016-02-22 11:06 ` [patch V2 12/28] x86/perf/uncore: Track packages not per cpu data Thomas Gleixner
2016-02-22 11:06 ` [patch V2 13/28] x86/perf/intel_uncore: Clear all hardware state on exit Thomas Gleixner
2016-02-22 11:58   ` Peter Zijlstra
2016-02-22 11:59     ` Thomas Gleixner
2016-02-22 11:06 ` [patch V2 14/28] x86/perf/intel_uncore: Make PCI and MSR uncore independent Thomas Gleixner
2016-02-22 11:06 ` [patch V2 15/28] cpumask: Export cpumask_any_but Thomas Gleixner
2016-02-22 11:06 ` [patch V2 16/28] x86/perf/intel_uncore: Make it modular Thomas Gleixner
2016-02-22 11:06 ` [patch V2 17/28] x86/perf/cqm: Get rid of the silly for_each_cpu lookups Thomas Gleixner
2016-02-22 11:06 ` [patch V2 18/28] x86/perf/intel_rapl: Make Knights Landings support functional Thomas Gleixner
2016-02-22 20:38   ` Borislav Petkov
2016-02-22 11:06 ` [patch V2 19/28] x86/perf/intel/rapl: Add proper error handling Thomas Gleixner
2016-02-22 11:06 ` [patch V2 20/28] x86/perf/intel/rapl: Sanitize the quirk handling Thomas Gleixner
2016-02-22 11:06 ` [patch V2 21/28] x86/perf/intel/rapl: Calculate timing once Thomas Gleixner
2016-02-22 11:06 ` [patch V2 23/28] x86/perf/intel/rapl: Refactor code some more Thomas Gleixner
2016-02-22 11:06 ` [patch V2 22/28] x86/perf/intel/rapl: Cleanup the printk output Thomas Gleixner
2016-02-22 18:44   ` Andi Kleen
2016-02-22 18:58     ` Thomas Gleixner
2016-02-22 11:07 ` [patch V2 24/28] x86/perf/intel/rapl: Make pmu lock raw Thomas Gleixner
2016-02-22 11:07 ` [patch V2 25/28] x86/perf/intel/rapl: Utilize event->pmu_private Thomas Gleixner
2016-02-22 11:07 ` [patch V2 26/28] x86/perf/intel/rapl: Convert it to a per package facility Thomas Gleixner
2016-02-22 12:08   ` Peter Zijlstra
2016-02-22 15:52     ` Thomas Gleixner
2016-02-22 16:04       ` Peter Zijlstra
2016-02-22 11:07 ` [patch V2 27/28] perf: Export perf_event_sysfs_show Thomas Gleixner
2016-02-22 12:09   ` Peter Zijlstra
2016-02-22 11:07 ` [patch V2 28/28] x86/perf/intel/rapl: Make it modular Thomas Gleixner
2016-02-22 18:40   ` Andi Kleen
2016-02-22 19:11     ` Thomas Gleixner
2016-02-22 18:55 ` [patch V2 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).