All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements
@ 2016-02-17 13:47 Thomas Gleixner
  2016-02-17 13:47 ` [patch 01/11] x86/perf/intel_uncore: Remove pointless mask check Thomas Gleixner
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 13:47 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen

While working on the hotplug rewrite I stumbled over the uncore drivers. The
intel_uncore driver particular is a complete trainwreck:

 - Lacks any form of proper error handling. Most errors are simply ignored.

 - Leaks memory and hardware state in case of failures

 - Tries to mimick a per cpu machinery for a facility which is strictly per
   package. That is implemented with convoluted alloc/free dancing during cpu
   hotplug with magic loops over the online cpus

The series cleans up the mess

 - Implement proper error handling

 - Switch to a per package storage model

 - Make MSR and PCI independent as far as it goes

 - Allow it to build as a module

Thanks,

	tglx
---
 arch/x86/Kconfig                                    |   14 
 arch/x86/include/asm/topology.h                     |    3 
 arch/x86/kernel/cpu/Makefile                        |    3 
 arch/x86/kernel/cpu/perf_event_intel_uncore.c       |  587 +++++++++-----------
 arch/x86/kernel/cpu/perf_event_intel_uncore.h       |   24 
 arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c |    6 
 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c   |   13 
 arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c |   57 +
 lib/cpumask.c                                       |    1 
 9 files changed, 400 insertions(+), 308 deletions(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 01/11] x86/perf/intel_uncore: Remove pointless mask check
  2016-02-17 13:47 [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements Thomas Gleixner
@ 2016-02-17 13:47 ` Thomas Gleixner
  2016-02-17 13:47 ` [patch 02/11] x86/perf/intel_uncore: Simplify error rollback Thomas Gleixner
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 13:47 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen

[-- Attachment #1: x86-perf-intel_uncore--Remove-pointless-mask-check.patch --]
[-- Type: text/plain, Size: 619 bytes --]

uncore_cpumask_init() is only ever called from intel_uncore_init() where the
mask is guaranteed to be empty.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    6 ------
 1 file changed, 6 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1342,12 +1342,6 @@ static void __init uncore_cpumask_init(v
 {
 	int cpu;
 
-	/*
-	 * ony invoke once from msr or pci init code
-	 */
-	if (!cpumask_empty(&uncore_cpu_mask))
-		return;
-
 	cpu_notifier_register_begin();
 
 	for_each_online_cpu(cpu) {

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 02/11] x86/perf/intel_uncore: Simplify error rollback
  2016-02-17 13:47 [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements Thomas Gleixner
  2016-02-17 13:47 ` [patch 01/11] x86/perf/intel_uncore: Remove pointless mask check Thomas Gleixner
@ 2016-02-17 13:47 ` Thomas Gleixner
  2016-02-17 13:47 ` [patch 04/11] x86/perf/intel_uncore: Cleanup hardware on exit Thomas Gleixner
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 13:47 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen

[-- Attachment #1: x86-perf-intel_uncore--Simplify-error-rollback.patch --]
[-- Type: text/plain, Size: 3058 bytes --]

No point in doing partial rollbacks. Robustify uncore_exit_type() so it does
not dereference type->pmus unconditionally and remove all the partial rollback
hackery.

Preparatory patch for proper error handling.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   45 +++++++++++++-------------
 1 file changed, 24 insertions(+), 21 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -767,10 +767,12 @@ static void __init uncore_type_exit(stru
 {
 	int i;
 
-	for (i = 0; i < type->num_boxes; i++)
-		free_percpu(type->pmus[i].box);
-	kfree(type->pmus);
-	type->pmus = NULL;
+	if (type->pmus) {
+		for (i = 0; i < type->num_boxes; i++)
+			free_percpu(type->pmus[i].box);
+		kfree(type->pmus);
+		type->pmus = NULL;
+	}
 	kfree(type->events_group);
 	type->events_group = NULL;
 }
@@ -778,6 +780,7 @@ static void __init uncore_type_exit(stru
 static void __init uncore_types_exit(struct intel_uncore_type **types)
 {
 	int i;
+
 	for (i = 0; types[i]; i++)
 		uncore_type_exit(types[i]);
 }
@@ -806,7 +809,7 @@ static int __init uncore_type_init(struc
 		INIT_LIST_HEAD(&pmus[i].box_list);
 		pmus[i].box = alloc_percpu(struct intel_uncore_box *);
 		if (!pmus[i].box)
-			goto fail;
+			return -ENOMEM;
 	}
 
 	if (type->event_descs) {
@@ -817,7 +820,7 @@ static int __init uncore_type_init(struc
 		attr_group = kzalloc(sizeof(struct attribute *) * (i + 1) +
 					sizeof(*attr_group), GFP_KERNEL);
 		if (!attr_group)
-			goto fail;
+			return -ENOMEM;
 
 		attrs = (struct attribute **)(attr_group + 1);
 		attr_group->name = "events";
@@ -831,9 +834,6 @@ static int __init uncore_type_init(struc
 
 	type->pmu_group = &uncore_pmu_attr_group;
 	return 0;
-fail:
-	uncore_type_exit(type);
-	return -ENOMEM;
 }
 
 static int __init uncore_types_init(struct intel_uncore_type **types)
@@ -843,13 +843,9 @@ static int __init uncore_types_init(stru
 	for (i = 0; types[i]; i++) {
 		ret = uncore_type_init(types[i]);
 		if (ret)
-			goto fail;
+			return ret;
 	}
 	return 0;
-fail:
-	while (--i >= 0)
-		uncore_type_exit(types[i]);
-	return ret;
 }
 
 /*
@@ -1007,17 +1003,21 @@ static int __init uncore_pci_init(void)
 
 	ret = uncore_types_init(uncore_pci_uncores);
 	if (ret)
-		return ret;
+		goto err;
 
 	uncore_pci_driver->probe = uncore_pci_probe;
 	uncore_pci_driver->remove = uncore_pci_remove;
 
 	ret = pci_register_driver(uncore_pci_driver);
-	if (ret == 0)
-		pcidrv_registered = true;
-	else
-		uncore_types_exit(uncore_pci_uncores);
+	if (ret)
+		goto err;
+
+	pcidrv_registered = true;
+	return 0;
 
+err:
+	uncore_types_exit(uncore_pci_uncores);
+	uncore_pci_uncores = empty_uncore;
 	return ret;
 }
 
@@ -1316,9 +1316,12 @@ static int __init uncore_cpu_init(void)
 
 	ret = uncore_types_init(uncore_msr_uncores);
 	if (ret)
-		return ret;
-
+		goto err;
 	return 0;
+err:
+	uncore_types_exit(uncore_msr_uncores);
+	uncore_msr_uncores = empty_uncore;
+	return ret;
 }
 
 static int __init uncore_pmus_register(void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 04/11] x86/perf/intel_uncore: Cleanup hardware on exit
  2016-02-17 13:47 [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements Thomas Gleixner
  2016-02-17 13:47 ` [patch 01/11] x86/perf/intel_uncore: Remove pointless mask check Thomas Gleixner
  2016-02-17 13:47 ` [patch 02/11] x86/perf/intel_uncore: Simplify error rollback Thomas Gleixner
@ 2016-02-17 13:47 ` Thomas Gleixner
  2016-02-17 15:49   ` Liang, Kan
  2016-02-17 13:47 ` [patch 03/11] x86/perf/intel_uncore: Fix error handling Thomas Gleixner
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 13:47 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen

[-- Attachment #1: x86-perf-intel_uncore--Cleanup-hardware-on-exit.patch --]
[-- Type: text/plain, Size: 10907 bytes --]

When tearing down the boxes nothing undoes the hardware state which was setup
by box->init_box(). Add a box->exit_box() callback and implement it for the
uncores which have an init_box() callback.

This misses the cleanup in the error exit pathes, but I cannot be bothered to
implement it before cleaning up the rest of the driver, which makes that task
way simpler.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c       |    6 +-
 arch/x86/kernel/cpu/perf_event_intel_uncore.h       |    9 +++
 arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c |    6 ++
 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c   |   13 ++++
 arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c |   57 +++++++++++++++++++-
 5 files changed, 88 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -927,6 +927,7 @@ static int uncore_pci_probe(struct pci_d
 		raw_spin_lock(&uncore_box_lock);
 		list_del(&box->list);
 		raw_spin_unlock(&uncore_box_lock);
+		uncore_box_exit(box);
 		kfree(box);
 	}
 	return ret;
@@ -972,6 +973,7 @@ static void uncore_pci_remove(struct pci
 	}
 
 	WARN_ON_ONCE(atomic_read(&box->refcnt) != 1);
+	uncore_box_exit(box);
 	kfree(box);
 
 	if (last_box)
@@ -1079,8 +1081,10 @@ static void uncore_cpu_dying(int cpu)
 			pmu = &type->pmus[j];
 			box = *per_cpu_ptr(pmu->box, cpu);
 			*per_cpu_ptr(pmu->box, cpu) = NULL;
-			if (box && atomic_dec_and_test(&box->refcnt))
+			if (box && atomic_dec_and_test(&box->refcnt)) {
 				list_add(&box->list, &boxes_to_free);
+				uncore_box_exit(box);
+			}
 		}
 	}
 }
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -61,6 +61,7 @@ struct intel_uncore_type {
 
 struct intel_uncore_ops {
 	void (*init_box)(struct intel_uncore_box *);
+	void (*exit_box)(struct intel_uncore_box *);
 	void (*disable_box)(struct intel_uncore_box *);
 	void (*enable_box)(struct intel_uncore_box *);
 	void (*disable_event)(struct intel_uncore_box *, struct perf_event *);
@@ -306,6 +307,14 @@ static inline void uncore_box_init(struc
 	}
 }
 
+static inline void uncore_box_exit(struct intel_uncore_box *box)
+{
+	if (test_and_clear_bit(UNCORE_BOX_FLAG_INITIATED, &box->flags)) {
+		if (box->pmu->type->ops->exit_box)
+			box->pmu->type->ops->exit_box(box);
+	}
+}
+
 static inline bool uncore_box_is_fake(struct intel_uncore_box *box)
 {
 	return (box->phys_id < 0);
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c
@@ -201,6 +201,11 @@ static void nhmex_uncore_msr_init_box(st
 	wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL, NHMEX_U_PMON_GLOBAL_EN_ALL);
 }
 
+static void nhmex_uncore_msr_exit_box(struct intel_uncore_box *box)
+{
+	wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL, 0);
+}
+
 static void nhmex_uncore_msr_disable_box(struct intel_uncore_box *box)
 {
 	unsigned msr = uncore_msr_box_ctl(box);
@@ -250,6 +255,7 @@ static void nhmex_uncore_msr_enable_even
 
 #define NHMEX_UNCORE_OPS_COMMON_INIT()				\
 	.init_box	= nhmex_uncore_msr_init_box,		\
+	.exit_box	= nhmex_uncore_msr_exit_box,		\
 	.disable_box	= nhmex_uncore_msr_disable_box,		\
 	.enable_box	= nhmex_uncore_msr_enable_box,		\
 	.disable_event	= nhmex_uncore_msr_disable_event,	\
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
@@ -95,6 +95,12 @@ static void snb_uncore_msr_init_box(stru
 	}
 }
 
+static void snb_uncore_msr_exit_box(struct intel_uncore_box *box)
+{
+	if (box->pmu->pmu_idx == 0)
+		wrmsrl(SNB_UNC_PERF_GLOBAL_CTL, 0);
+}
+
 static struct uncore_event_desc snb_uncore_events[] = {
 	INTEL_UNCORE_EVENT_DESC(clockticks, "event=0xff,umask=0x00"),
 	{ /* end: all zeroes */ },
@@ -116,6 +122,7 @@ static struct attribute_group snb_uncore
 
 static struct intel_uncore_ops snb_uncore_msr_ops = {
 	.init_box	= snb_uncore_msr_init_box,
+	.exit_box	= snb_uncore_msr_exit_box,
 	.disable_event	= snb_uncore_msr_disable_event,
 	.enable_event	= snb_uncore_msr_enable_event,
 	.read_counter	= uncore_msr_read_counter,
@@ -231,6 +238,11 @@ static void snb_uncore_imc_init_box(stru
 	box->hrtimer_duration = UNCORE_SNB_IMC_HRTIMER_INTERVAL;
 }
 
+static void snb_uncore_imc_exit_box(struct intel_uncore_box *box)
+{
+	iounmap(box->io_addr);
+}
+
 static void snb_uncore_imc_enable_box(struct intel_uncore_box *box)
 {}
 
@@ -458,6 +470,7 @@ static struct pmu snb_uncore_imc_pmu = {
 
 static struct intel_uncore_ops snb_uncore_imc_ops = {
 	.init_box	= snb_uncore_imc_init_box,
+	.exit_box	= snb_uncore_imc_exit_box,
 	.enable_box	= snb_uncore_imc_enable_box,
 	.disable_box	= snb_uncore_imc_disable_box,
 	.disable_event	= snb_uncore_imc_disable_event,
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c
@@ -387,6 +387,14 @@ static void snbep_uncore_pci_init_box(st
 	pci_write_config_dword(pdev, box_ctl, SNBEP_PMON_BOX_CTL_INT);
 }
 
+static void snbep_uncore_pci_exit_box(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	int box_ctl = uncore_pci_box_ctl(box);
+
+	pci_write_config_dword(pdev, box_ctl, 0);
+}
+
 static void snbep_uncore_msr_disable_box(struct intel_uncore_box *box)
 {
 	u64 config;
@@ -440,6 +448,14 @@ static void snbep_uncore_msr_init_box(st
 		wrmsrl(msr, SNBEP_PMON_BOX_CTL_INT);
 }
 
+static void snbep_uncore_msr_exit_box(struct intel_uncore_box *box)
+{
+	unsigned msr = uncore_msr_box_ctl(box);
+
+	if (msr)
+		wrmsrl(msr, 0);
+}
+
 static struct attribute *snbep_uncore_formats_attr[] = {
 	&format_attr_event.attr,
 	&format_attr_umask.attr,
@@ -567,7 +583,8 @@ static struct attribute_group snbep_unco
 
 #define SNBEP_UNCORE_MSR_OPS_COMMON_INIT()			\
 	__SNBEP_UNCORE_MSR_OPS_COMMON_INIT(),			\
-	.init_box	= snbep_uncore_msr_init_box		\
+	.init_box	= snbep_uncore_msr_init_box,		\
+	.exit_box	= snbep_uncore_msr_exit_box		\
 
 static struct intel_uncore_ops snbep_uncore_msr_ops = {
 	SNBEP_UNCORE_MSR_OPS_COMMON_INIT(),
@@ -575,6 +592,7 @@ static struct intel_uncore_ops snbep_unc
 
 #define SNBEP_UNCORE_PCI_OPS_COMMON_INIT()			\
 	.init_box	= snbep_uncore_pci_init_box,		\
+	.exit_box	= snbep_uncore_pci_exit_box,		\
 	.disable_box	= snbep_uncore_pci_disable_box,		\
 	.enable_box	= snbep_uncore_pci_enable_box,		\
 	.disable_event	= snbep_uncore_pci_disable_event,	\
@@ -1236,10 +1254,19 @@ int snbep_uncore_pci_init(void)
 static void ivbep_uncore_msr_init_box(struct intel_uncore_box *box)
 {
 	unsigned msr = uncore_msr_box_ctl(box);
+
 	if (msr)
 		wrmsrl(msr, IVBEP_PMON_BOX_CTL_INT);
 }
 
+static void ivbep_uncore_msr_exit_box(struct intel_uncore_box *box)
+{
+	unsigned msr = uncore_msr_box_ctl(box);
+
+	if (msr)
+		wrmsrl(msr, 0);
+}
+
 static void ivbep_uncore_pci_init_box(struct intel_uncore_box *box)
 {
 	struct pci_dev *pdev = box->pci_dev;
@@ -1247,8 +1274,16 @@ static void ivbep_uncore_pci_init_box(st
 	pci_write_config_dword(pdev, SNBEP_PCI_PMON_BOX_CTL, IVBEP_PMON_BOX_CTL_INT);
 }
 
+static void ivbep_uncore_pci_exit_box(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+
+	pci_write_config_dword(pdev, SNBEP_PCI_PMON_BOX_CTL, 0);
+}
+
 #define IVBEP_UNCORE_MSR_OPS_COMMON_INIT()			\
 	.init_box	= ivbep_uncore_msr_init_box,		\
+	.exit_box	= ivbep_uncore_msr_exit_box,		\
 	.disable_box	= snbep_uncore_msr_disable_box,		\
 	.enable_box	= snbep_uncore_msr_enable_box,		\
 	.disable_event	= snbep_uncore_msr_disable_event,	\
@@ -1261,6 +1296,7 @@ static struct intel_uncore_ops ivbep_unc
 
 static struct intel_uncore_ops ivbep_uncore_pci_ops = {
 	.init_box	= ivbep_uncore_pci_init_box,
+	.exit_box	= ivbep_uncore_pci_exit_box,
 	.disable_box	= snbep_uncore_pci_disable_box,
 	.enable_box	= snbep_uncore_pci_enable_box,
 	.disable_event	= snbep_uncore_pci_disable_event,
@@ -1497,6 +1533,7 @@ static void ivbep_cbox_enable_event(stru
 
 static struct intel_uncore_ops ivbep_uncore_cbox_ops = {
 	.init_box		= ivbep_uncore_msr_init_box,
+	.exit_box		= ivbep_uncore_msr_exit_box,
 	.disable_box		= snbep_uncore_msr_disable_box,
 	.enable_box		= snbep_uncore_msr_enable_box,
 	.disable_event		= snbep_uncore_msr_disable_event,
@@ -1613,6 +1650,7 @@ static u64 ivbep_uncore_irp_read_counter
 
 static struct intel_uncore_ops ivbep_uncore_irp_ops = {
 	.init_box	= ivbep_uncore_pci_init_box,
+	.exit_box	= ivbep_uncore_pci_exit_box,
 	.disable_box	= snbep_uncore_pci_disable_box,
 	.enable_box	= snbep_uncore_pci_enable_box,
 	.disable_event	= ivbep_uncore_irp_disable_event,
@@ -1633,6 +1671,7 @@ static struct intel_uncore_type ivbep_un
 
 static struct intel_uncore_ops ivbep_uncore_qpi_ops = {
 	.init_box	= ivbep_uncore_pci_init_box,
+	.exit_box	= ivbep_uncore_pci_exit_box,
 	.disable_box	= snbep_uncore_pci_disable_box,
 	.enable_box	= snbep_uncore_pci_enable_box,
 	.disable_event	= snbep_uncore_pci_disable_event,
@@ -1914,6 +1953,7 @@ static void hswep_cbox_enable_event(stru
 
 static struct intel_uncore_ops knl_uncore_cha_ops = {
 	.init_box		= snbep_uncore_msr_init_box,
+	.exit_box		= snbep_uncore_msr_exit_box,
 	.disable_box		= snbep_uncore_msr_disable_box,
 	.enable_box		= snbep_uncore_msr_enable_box,
 	.disable_event		= snbep_uncore_msr_disable_event,
@@ -2008,6 +2048,7 @@ static void knl_uncore_imc_enable_event(
 
 static struct intel_uncore_ops knl_uncore_imc_ops = {
 	.init_box	= snbep_uncore_pci_init_box,
+	.exit_box	= snbep_uncore_pci_exit_box,
 	.disable_box	= snbep_uncore_pci_disable_box,
 	.enable_box	= knl_uncore_imc_enable_box,
 	.read_counter	= snbep_uncore_pci_read_counter,
@@ -2397,6 +2438,7 @@ static void hswep_cbox_enable_event(stru
 
 static struct intel_uncore_ops hswep_uncore_cbox_ops = {
 	.init_box		= snbep_uncore_msr_init_box,
+	.exit_box		= snbep_uncore_msr_exit_box,
 	.disable_box		= snbep_uncore_msr_disable_box,
 	.enable_box		= snbep_uncore_msr_enable_box,
 	.disable_event		= snbep_uncore_msr_disable_event,
@@ -2442,9 +2484,19 @@ static void hswep_uncore_sbox_msr_init_b
 	}
 }
 
+static void hswep_uncore_sbox_msr_exit_box(struct intel_uncore_box *box)
+{
+	unsigned msr = uncore_msr_box_ctl(box);
+
+	/* CHECKME: Does this need the bit dance like init() ? */
+	if (msr)
+		wrmsrl(msr, 0);
+}
+
 static struct intel_uncore_ops hswep_uncore_sbox_msr_ops = {
 	__SNBEP_UNCORE_MSR_OPS_COMMON_INIT(),
-	.init_box		= hswep_uncore_sbox_msr_init_box
+	.init_box		= hswep_uncore_sbox_msr_init_box,
+	.exit_box		= hswep_uncore_sbox_msr_exit_box
 };
 
 static struct attribute *hswep_uncore_sbox_formats_attr[] = {
@@ -2584,6 +2636,7 @@ static u64 hswep_uncore_irp_read_counter
 
 static struct intel_uncore_ops hswep_uncore_irp_ops = {
 	.init_box	= snbep_uncore_pci_init_box,
+	.exit_box	= snbep_uncore_pci_exit_box,
 	.disable_box	= snbep_uncore_pci_disable_box,
 	.enable_box	= snbep_uncore_pci_enable_box,
 	.disable_event	= ivbep_uncore_irp_disable_event,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 03/11] x86/perf/intel_uncore: Fix error handling
  2016-02-17 13:47 [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements Thomas Gleixner
                   ` (2 preceding siblings ...)
  2016-02-17 13:47 ` [patch 04/11] x86/perf/intel_uncore: Cleanup hardware on exit Thomas Gleixner
@ 2016-02-17 13:47 ` Thomas Gleixner
  2016-02-17 13:47 ` [patch 05/11] x86/perf/intel_uncore: Make code readable Thomas Gleixner
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 13:47 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen

[-- Attachment #1: x86-perf-intel_uncore--Fix-error-handling.patch --]
[-- Type: text/plain, Size: 5899 bytes --]

This driver lacks any form of proper error handling. If initialization fails
or hotplug prepare fails, it lets the facility with half initialized stuff
around.

Fix the state and memory leaks in a first step. As a second step we need to
undo the hardware state which is set via uncore_box_init() on some of the
uncore implementations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  110 +++++++++++++++++---------
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |   15 +--
 2 files changed, 82 insertions(+), 43 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -760,16 +760,28 @@ static int uncore_pmu_register(struct in
 	}
 
 	ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
+	if (!ret)
+		pmu->registered = true;
 	return ret;
 }
 
+static void uncore_pmu_unregister(struct intel_uncore_pmu *pmu)
+{
+	if (!pmu->registered)
+		return;
+	perf_pmu_unregister(&pmu->pmu);
+	pmu->registered = false;
+}
+
 static void __init uncore_type_exit(struct intel_uncore_type *type)
 {
 	int i;
 
 	if (type->pmus) {
-		for (i = 0; i < type->num_boxes; i++)
+		for (i = 0; i < type->num_boxes; i++) {
+			uncore_pmu_unregister(&type->pmus[i]);
 			free_percpu(type->pmus[i].box);
+		}
 		kfree(type->pmus);
 		type->pmus = NULL;
 	}
@@ -856,8 +868,8 @@ static int uncore_pci_probe(struct pci_d
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
 	struct intel_uncore_type *type;
-	int phys_id;
 	bool first_box = false;
+	int phys_id, ret;
 
 	phys_id = uncore_pcibus_to_physid(pdev->bus);
 	if (phys_id < 0)
@@ -906,9 +918,18 @@ static int uncore_pci_probe(struct pci_d
 	list_add_tail(&box->list, &pmu->box_list);
 	raw_spin_unlock(&uncore_box_lock);
 
-	if (first_box)
-		uncore_pmu_register(pmu);
-	return 0;
+	if (!first_box)
+		return 0;
+
+	ret = uncore_pmu_register(pmu);
+	if (ret) {
+		pci_set_drvdata(pdev, NULL);
+		raw_spin_lock(&uncore_box_lock);
+		list_del(&box->list);
+		raw_spin_unlock(&uncore_box_lock);
+		kfree(box);
+	}
+	return ret;
 }
 
 static void uncore_pci_remove(struct pci_dev *pdev)
@@ -954,7 +975,7 @@ static void uncore_pci_remove(struct pci
 	kfree(box);
 
 	if (last_box)
-		perf_pmu_unregister(&pmu->pmu);
+		uncore_pmu_unregister(pmu);
 }
 
 static int __init uncore_pci_init(void)
@@ -1223,8 +1244,7 @@ static int uncore_cpu_notifier(struct no
 	/* allocate/free data structure for uncore box */
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_UP_PREPARE:
-		uncore_cpu_prepare(cpu, -1);
-		break;
+		return notifier_from_errno(uncore_cpu_prepare(cpu, -1));
 	case CPU_STARTING:
 		uncore_cpu_starting(cpu);
 		break;
@@ -1265,9 +1285,29 @@ static struct notifier_block uncore_cpu_
 	.priority	= CPU_PRI_PERF + 1,
 };
 
-static void __init uncore_cpu_setup(void *dummy)
+static int __init type_pmu_register(struct intel_uncore_type *type)
 {
-	uncore_cpu_starting(smp_processor_id());
+	int i, ret;
+
+	for (i = 0; i < type->num_boxes; i++) {
+		ret = uncore_pmu_register(&type->pmus[i]);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int __init uncore_msr_pmus_register(void)
+{
+	struct intel_uncore_type **types = uncore_msr_uncores;
+	int ret;
+
+	while (*types) {
+		ret = type_pmu_register(*types++);
+		if (ret)
+			return ret;
+	}
+	return 0;
 }
 
 static int __init uncore_cpu_init(void)
@@ -1317,6 +1357,10 @@ static int __init uncore_cpu_init(void)
 	ret = uncore_types_init(uncore_msr_uncores);
 	if (ret)
 		goto err;
+
+	ret = uncore_msr_pmus_register();
+	if (ret)
+		goto err;
 	return 0;
 err:
 	uncore_types_exit(uncore_msr_uncores);
@@ -1324,26 +1368,14 @@ static int __init uncore_cpu_init(void)
 	return ret;
 }
 
-static int __init uncore_pmus_register(void)
+static void __init uncore_cpu_setup(void *dummy)
 {
-	struct intel_uncore_pmu *pmu;
-	struct intel_uncore_type *type;
-	int i, j;
-
-	for (i = 0; uncore_msr_uncores[i]; i++) {
-		type = uncore_msr_uncores[i];
-		for (j = 0; j < type->num_boxes; j++) {
-			pmu = &type->pmus[j];
-			uncore_pmu_register(pmu);
-		}
-	}
-
-	return 0;
+	uncore_cpu_starting(smp_processor_id());
 }
 
-static void __init uncore_cpumask_init(void)
+static int __init uncore_cpumask_init(void)
 {
-	int cpu;
+	int cpu, ret = 0;
 
 	cpu_notifier_register_begin();
 
@@ -1359,17 +1391,20 @@ static void __init uncore_cpumask_init(v
 		if (phys_id < 0)
 			continue;
 
-		uncore_cpu_prepare(cpu, phys_id);
+		ret = uncore_cpu_prepare(cpu, phys_id);
+		if (ret)
+			goto out;
 		uncore_event_init_cpu(cpu);
 	}
 	on_each_cpu(uncore_cpu_setup, NULL, 1);
 
 	__register_cpu_notifier(&uncore_cpu_nb);
 
+out:
 	cpu_notifier_register_done();
+	return ret;
 }
 
-
 static int __init intel_uncore_init(void)
 {
 	int ret;
@@ -1382,17 +1417,20 @@ static int __init intel_uncore_init(void
 
 	ret = uncore_pci_init();
 	if (ret)
-		goto fail;
+		return ret;
 	ret = uncore_cpu_init();
-	if (ret) {
-		uncore_pci_exit();
-		goto fail;
-	}
-	uncore_cpumask_init();
+	if (ret)
+		goto errpci;
+	ret = uncore_cpumask_init();
+	if (ret)
+		goto errcpu;
 
-	uncore_pmus_register();
 	return 0;
-fail:
+
+errcpu:
+	uncore_types_exit(uncore_msr_uncores);
+errpci:
+	uncore_pci_exit();
 	return ret;
 }
 device_initcall(intel_uncore_init);
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -73,13 +73,14 @@ struct intel_uncore_ops {
 };
 
 struct intel_uncore_pmu {
-	struct pmu pmu;
-	char name[UNCORE_PMU_NAME_LEN];
-	int pmu_idx;
-	int func_id;
-	struct intel_uncore_type *type;
-	struct intel_uncore_box ** __percpu box;
-	struct list_head box_list;
+	struct pmu			pmu;
+	char				name[UNCORE_PMU_NAME_LEN];
+	int				pmu_idx;
+	int				func_id;
+	bool				registered;
+	struct intel_uncore_type	*type;
+	struct intel_uncore_box		** __percpu box;
+	struct list_head		box_list;
 };
 
 struct intel_uncore_extra_reg {

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 05/11] x86/perf/intel_uncore: Make code readable
  2016-02-17 13:47 [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements Thomas Gleixner
                   ` (3 preceding siblings ...)
  2016-02-17 13:47 ` [patch 03/11] x86/perf/intel_uncore: Fix error handling Thomas Gleixner
@ 2016-02-17 13:47 ` Thomas Gleixner
  2016-02-17 13:47 ` [patch 07/11] x86/perf/uncore: Track packages not per cpu data Thomas Gleixner
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 13:47 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen

[-- Attachment #1: x86-perf-intel_uncore--Make-code-readable.patch --]
[-- Type: text/plain, Size: 2905 bytes --]

Cleanup the code a bit before reworking it completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   71 +++++++++++++-------------
 1 file changed, 36 insertions(+), 35 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -791,10 +791,8 @@ static void __init uncore_type_exit(stru
 
 static void __init uncore_types_exit(struct intel_uncore_type **types)
 {
-	int i;
-
-	for (i = 0; types[i]; i++)
-		uncore_type_exit(types[i]);
+	while (*types)
+		uncore_type_exit(*types++);
 }
 
 static int __init uncore_type_init(struct intel_uncore_type *type)
@@ -898,9 +896,11 @@ static int uncore_pci_probe(struct pci_d
 	 * some device types. Hence PCI device idx would be 0 for all devices.
 	 * So increment pmu pointer to point to an unused array element.
 	 */
-	if (boot_cpu_data.x86_model == 87)
+	if (boot_cpu_data.x86_model == 87) {
 		while (pmu->func_id >= 0)
 			pmu++;
+	}
+
 	if (pmu->func_id < 0)
 		pmu->func_id = pdev->devfn;
 	else
@@ -1158,44 +1158,45 @@ static int uncore_cpu_prepare(int cpu, i
 	return 0;
 }
 
-static void
-uncore_change_context(struct intel_uncore_type **uncores, int old_cpu, int new_cpu)
+static void uncore_change_type_ctx(struct intel_uncore_type *type, int old_cpu,
+				   int new_cpu)
 {
-	struct intel_uncore_type *type;
-	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_pmu *pmu = type->pmus;
 	struct intel_uncore_box *box;
-	int i, j;
+	int i;
 
-	for (i = 0; uncores[i]; i++) {
-		type = uncores[i];
-		for (j = 0; j < type->num_boxes; j++) {
-			pmu = &type->pmus[j];
-			if (old_cpu < 0)
-				box = uncore_pmu_to_box(pmu, new_cpu);
-			else
-				box = uncore_pmu_to_box(pmu, old_cpu);
-			if (!box)
-				continue;
+	for (i = 0; i < type->num_boxes; i++, pmu++) {
+		if (old_cpu < 0)
+			box = uncore_pmu_to_box(pmu, new_cpu);
+		else
+			box = uncore_pmu_to_box(pmu, old_cpu);
+		if (!box)
+			continue;
+
+		if (old_cpu < 0) {
+			WARN_ON_ONCE(box->cpu != -1);
+			box->cpu = new_cpu;
+			continue;
+		}
 
-			if (old_cpu < 0) {
-				WARN_ON_ONCE(box->cpu != -1);
-				box->cpu = new_cpu;
-				continue;
-			}
+		WARN_ON_ONCE(box->cpu != old_cpu);
+		box->cpu = -1;
+		if (new_cpu < 0)
+			continue;
 
-			WARN_ON_ONCE(box->cpu != old_cpu);
-			if (new_cpu >= 0) {
-				uncore_pmu_cancel_hrtimer(box);
-				perf_pmu_migrate_context(&pmu->pmu,
-						old_cpu, new_cpu);
-				box->cpu = new_cpu;
-			} else {
-				box->cpu = -1;
-			}
-		}
+		uncore_pmu_cancel_hrtimer(box);
+		perf_pmu_migrate_context(&pmu->pmu, old_cpu, new_cpu);
+		box->cpu = new_cpu;
 	}
 }
 
+static void uncore_change_context(struct intel_uncore_type **uncores,
+				  int old_cpu, int new_cpu)
+{
+	while (*uncores)
+		uncore_change_type_ctx(*uncores++, old_cpu, new_cpu);
+}
+
 static void uncore_event_exit_cpu(int cpu)
 {
 	int i, phys_id, target;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-17 13:47 [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements Thomas Gleixner
                   ` (4 preceding siblings ...)
  2016-02-17 13:47 ` [patch 05/11] x86/perf/intel_uncore: Make code readable Thomas Gleixner
@ 2016-02-17 13:47 ` Thomas Gleixner
  2016-02-17 21:19   ` Stephane Eranian
  2016-02-17 13:47 ` [patch 06/11] x86/topology: Provide helper to retrieve number of cpu packages Thomas Gleixner
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 13:47 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen

[-- Attachment #1: x86-perf-uncore--Track-packages-not-per-cpu-data.patch --]
[-- Type: text/plain, Size: 16556 bytes --]

uncore is a per package facility, but the code tries to mimick a per cpu
facility with completely convoluted constructs.

Simplify the whole machinery by tracking per package information. While at it,
avoid the kfree/alloc dance when a cpu goes offline and online again. There is
no point in freeing the box after it was allocated. We just keep proper
refcounting and the first cpu which comes online in a package does the
initialization/activation of the box.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  355 ++++++++++----------------
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |    4 
 2 files changed, 139 insertions(+), 220 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -11,7 +11,6 @@ DEFINE_RAW_SPINLOCK(pci2phy_map_lock);
 struct list_head pci2phy_map_head = LIST_HEAD_INIT(pci2phy_map_head);
 struct pci_dev *uncore_extra_pci_dev[UNCORE_SOCKET_MAX][UNCORE_EXTRA_PCI_DEV_MAX];
 
-static DEFINE_RAW_SPINLOCK(uncore_box_lock);
 /* mask of cpus that collect uncore events */
 static cpumask_t uncore_cpu_mask;
 
@@ -89,27 +88,7 @@ struct intel_uncore_pmu *uncore_event_to
 
 struct intel_uncore_box *uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
 {
-	struct intel_uncore_box *box;
-
-	box = *per_cpu_ptr(pmu->box, cpu);
-	if (box)
-		return box;
-
-	raw_spin_lock(&uncore_box_lock);
-	/* Recheck in lock to handle races. */
-	if (*per_cpu_ptr(pmu->box, cpu))
-		goto out;
-	list_for_each_entry(box, &pmu->box_list, list) {
-		if (box->phys_id == topology_physical_package_id(cpu)) {
-			atomic_inc(&box->refcnt);
-			*per_cpu_ptr(pmu->box, cpu) = box;
-			break;
-		}
-	}
-out:
-	raw_spin_unlock(&uncore_box_lock);
-
-	return *per_cpu_ptr(pmu->box, cpu);
+	return pmu->boxes[topology_physical_package_id(cpu)];
 }
 
 struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
@@ -317,7 +296,6 @@ static struct intel_uncore_box *uncore_a
 		raw_spin_lock_init(&box->shared_regs[i].lock);
 
 	uncore_pmu_init_hrtimer(box);
-	atomic_set(&box->refcnt, 1);
 	box->cpu = -1;
 	box->phys_id = -1;
 
@@ -773,14 +751,24 @@ static void uncore_pmu_unregister(struct
 	pmu->registered = false;
 }
 
+static void uncore_free_boxes(struct intel_uncore_pmu *pmu)
+{
+	int pkg, maxpkg = topology_max_packages();
+
+	for (pkg = 0; pkg < maxpkg; pkg++)
+		kfree(pmu->boxes[pkg]);
+	kfree(pmu->boxes);
+}
+
 static void __init uncore_type_exit(struct intel_uncore_type *type)
 {
+	struct intel_uncore_pmu *pmu = type->pmus;
 	int i;
 
-	if (type->pmus) {
-		for (i = 0; i < type->num_boxes; i++) {
-			uncore_pmu_unregister(&type->pmus[i]);
-			free_percpu(type->pmus[i].box);
+	if (pmu) {
+		for (i = 0; i < type->num_boxes; i++, pmu++) {
+			uncore_pmu_unregister(pmu);
+			uncore_free_boxes(pmu);
 		}
 		kfree(type->pmus);
 		type->pmus = NULL;
@@ -795,33 +783,34 @@ static void __init uncore_types_exit(str
 		uncore_type_exit(*types++);
 }
 
-static int __init uncore_type_init(struct intel_uncore_type *type)
+static int __init uncore_type_init(struct intel_uncore_type *type, bool setid)
 {
 	struct intel_uncore_pmu *pmus;
 	struct attribute_group *attr_group;
 	struct attribute **attrs;
+	size_t size;
 	int i, j;
 
 	pmus = kzalloc(sizeof(*pmus) * type->num_boxes, GFP_KERNEL);
 	if (!pmus)
 		return -ENOMEM;
 
-	type->pmus = pmus;
-
-	type->unconstrainted = (struct event_constraint)
-		__EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
-				0, type->num_counters, 0, 0);
+	size = topology_max_packages() * sizeof(struct intel_uncore_box *);
 
 	for (i = 0; i < type->num_boxes; i++) {
-		pmus[i].func_id = -1;
-		pmus[i].pmu_idx = i;
-		pmus[i].type = type;
-		INIT_LIST_HEAD(&pmus[i].box_list);
-		pmus[i].box = alloc_percpu(struct intel_uncore_box *);
-		if (!pmus[i].box)
+		pmus[i].func_id	= setid ? i : -1;
+		pmus[i].pmu_idx	= i;
+		pmus[i].type	= type;
+		pmus[i].boxes	= kzalloc(size, GFP_KERNEL);
+		if (!pmus[i].boxes)
 			return -ENOMEM;
 	}
 
+	type->pmus = pmus;
+	type->unconstrainted = (struct event_constraint)
+		__EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
+				0, type->num_counters, 0, 0);
+
 	if (type->event_descs) {
 		i = 0;
 		while (type->event_descs[i].attr.attr.name)
@@ -846,12 +835,13 @@ static int __init uncore_type_init(struc
 	return 0;
 }
 
-static int __init uncore_types_init(struct intel_uncore_type **types)
+static int __init
+uncore_types_init(struct intel_uncore_type **types, bool setid)
 {
-	int i, ret;
+	int ret;
 
-	for (i = 0; types[i]; i++) {
-		ret = uncore_type_init(types[i]);
+	while (*types) {
+		ret = uncore_type_init(*types++, setid);
 		if (ret)
 			return ret;
 	}
@@ -863,10 +853,9 @@ static int __init uncore_types_init(stru
  */
 static int uncore_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
+	struct intel_uncore_type *type;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
-	struct intel_uncore_type *type;
-	bool first_box = false;
 	int phys_id, ret;
 
 	phys_id = uncore_pcibus_to_physid(pdev->bus);
@@ -906,27 +895,24 @@ static int uncore_pci_probe(struct pci_d
 	else
 		WARN_ON_ONCE(pmu->func_id != pdev->devfn);
 
+	WARN_ON_ONCE(pmu->boxes[phys_id] != NULL);
+
+	atomic_inc(&box->refcnt);
 	box->phys_id = phys_id;
 	box->pci_dev = pdev;
 	box->pmu = pmu;
 	uncore_box_init(box);
 	pci_set_drvdata(pdev, box);
 
-	raw_spin_lock(&uncore_box_lock);
-	if (list_empty(&pmu->box_list))
-		first_box = true;
-	list_add_tail(&box->list, &pmu->box_list);
-	raw_spin_unlock(&uncore_box_lock);
-
-	if (!first_box)
+	pmu->boxes[phys_id] = box;
+	if (atomic_inc_return(&pmu->activeboxes) > 1)
 		return 0;
 
+	/* First active box registers the pmu */
 	ret = uncore_pmu_register(pmu);
 	if (ret) {
 		pci_set_drvdata(pdev, NULL);
-		raw_spin_lock(&uncore_box_lock);
-		list_del(&box->list);
-		raw_spin_unlock(&uncore_box_lock);
+		pmu->boxes[phys_id] = NULL;
 		uncore_box_exit(box);
 		kfree(box);
 	}
@@ -937,8 +923,7 @@ static void uncore_pci_remove(struct pci
 {
 	struct intel_uncore_box *box = pci_get_drvdata(pdev);
 	struct intel_uncore_pmu *pmu;
-	int i, cpu, phys_id;
-	bool last_box = false;
+	int i, phys_id;
 
 	phys_id = uncore_pcibus_to_physid(pdev->bus);
 	box = pci_get_drvdata(pdev);
@@ -958,26 +943,13 @@ static void uncore_pci_remove(struct pci
 		return;
 
 	pci_set_drvdata(pdev, NULL);
+	pmu->boxes[phys_id] = NULL;
 
-	raw_spin_lock(&uncore_box_lock);
-	list_del(&box->list);
-	if (list_empty(&pmu->box_list))
-		last_box = true;
-	raw_spin_unlock(&uncore_box_lock);
-
-	for_each_possible_cpu(cpu) {
-		if (*per_cpu_ptr(pmu->box, cpu) == box) {
-			*per_cpu_ptr(pmu->box, cpu) = NULL;
-			atomic_dec(&box->refcnt);
-		}
-	}
+	if (atomic_dec_return(&pmu->activeboxes) == 0)
+		uncore_pmu_unregister(pmu);
 
-	WARN_ON_ONCE(atomic_read(&box->refcnt) != 1);
 	uncore_box_exit(box);
 	kfree(box);
-
-	if (last_box)
-		uncore_pmu_unregister(pmu);
 }
 
 static int __init uncore_pci_init(void)
@@ -1024,7 +996,7 @@ static int __init uncore_pci_init(void)
 	if (ret)
 		return ret;
 
-	ret = uncore_types_init(uncore_pci_uncores);
+	ret = uncore_types_init(uncore_pci_uncores, false);
 	if (ret)
 		goto err;
 
@@ -1053,106 +1025,76 @@ static void __init uncore_pci_exit(void)
 	}
 }
 
-/* CPU hot plug/unplug are serialized by cpu_add_remove_lock mutex */
-static LIST_HEAD(boxes_to_free);
-
-static void uncore_kfree_boxes(void)
-{
-	struct intel_uncore_box *box;
-
-	while (!list_empty(&boxes_to_free)) {
-		box = list_entry(boxes_to_free.next,
-				 struct intel_uncore_box, list);
-		list_del(&box->list);
-		kfree(box);
-	}
-}
-
 static void uncore_cpu_dying(int cpu)
 {
-	struct intel_uncore_type *type;
+	struct intel_uncore_type *type, **types = uncore_msr_uncores;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
-	int i, j;
+	int i, pkg;
 
-	for (i = 0; uncore_msr_uncores[i]; i++) {
-		type = uncore_msr_uncores[i];
-		for (j = 0; j < type->num_boxes; j++) {
-			pmu = &type->pmus[j];
-			box = *per_cpu_ptr(pmu->box, cpu);
-			*per_cpu_ptr(pmu->box, cpu) = NULL;
-			if (box && atomic_dec_and_test(&box->refcnt)) {
-				list_add(&box->list, &boxes_to_free);
+	pkg = topology_physical_package_id(cpu);
+	while (*types) {
+		type = *types++;
+		pmu = type->pmus;
+		for (i = 0; i < type->num_boxes; i++, pmu++) {
+			box = pmu->boxes[pkg];
+			if (box && atomic_dec_return(&box->refcnt) == 0)
 				uncore_box_exit(box);
-			}
 		}
 	}
 }
 
-static int uncore_cpu_starting(int cpu)
+static void uncore_cpu_starting(int cpu, bool init)
 {
-	struct intel_uncore_type *type;
+	struct intel_uncore_type *type, **types = uncore_msr_uncores;
 	struct intel_uncore_pmu *pmu;
-	struct intel_uncore_box *box, *exist;
-	int i, j, k, phys_id;
+	struct intel_uncore_box *box;
+	int i, pkg, ncpus = 1;
 
-	phys_id = topology_physical_package_id(cpu);
+	if (init) {
+		/*
+		 * On init we get the number of online cpus in the package
+		 * and set refcount for all of them.
+		 */
+		ncpus = cpumask_weight(topology_core_cpumask(cpu));
+	}
 
-	for (i = 0; uncore_msr_uncores[i]; i++) {
-		type = uncore_msr_uncores[i];
-		for (j = 0; j < type->num_boxes; j++) {
-			pmu = &type->pmus[j];
-			box = *per_cpu_ptr(pmu->box, cpu);
-			/* called by uncore_cpu_init? */
-			if (box && box->phys_id >= 0) {
-				uncore_box_init(box);
+	pkg = topology_physical_package_id(cpu);
+	while (*types) {
+		type = *types++;
+		pmu = type->pmus;
+		for (i = 0; i < type->num_boxes; i++, pmu++) {
+			box = pmu->boxes[pkg];
+			if (!box)
 				continue;
-			}
-
-			for_each_online_cpu(k) {
-				exist = *per_cpu_ptr(pmu->box, k);
-				if (exist && exist->phys_id == phys_id) {
-					atomic_inc(&exist->refcnt);
-					*per_cpu_ptr(pmu->box, cpu) = exist;
-					if (box) {
-						list_add(&box->list,
-							 &boxes_to_free);
-						box = NULL;
-					}
-					break;
-				}
-			}
-
-			if (box) {
-				box->phys_id = phys_id;
+			/* The first cpu on a package activates the box */
+			if (atomic_add_return(ncpus, &box->refcnt) == ncpus)
 				uncore_box_init(box);
-			}
 		}
 	}
-	return 0;
 }
 
-static int uncore_cpu_prepare(int cpu, int phys_id)
+static int uncore_cpu_prepare(int cpu)
 {
-	struct intel_uncore_type *type;
+	struct intel_uncore_type *type, **types = uncore_msr_uncores;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
-	int i, j;
-
-	for (i = 0; uncore_msr_uncores[i]; i++) {
-		type = uncore_msr_uncores[i];
-		for (j = 0; j < type->num_boxes; j++) {
-			pmu = &type->pmus[j];
-			if (pmu->func_id < 0)
-				pmu->func_id = j;
+	int i, pkg;
 
+	pkg = topology_physical_package_id(cpu);
+	while (*types) {
+		type = *types++;
+		pmu = type->pmus;
+		for (i = 0; i < type->num_boxes; i++, pmu++) {
+			if (pmu->boxes[pkg])
+				continue;
+			/* First cpu of a package allocates the box */
 			box = uncore_alloc_box(type, cpu_to_node(cpu));
 			if (!box)
 				return -ENOMEM;
-
 			box->pmu = pmu;
-			box->phys_id = phys_id;
-			*per_cpu_ptr(pmu->box, cpu) = box;
+			box->phys_id = pkg;
+			pmu->boxes[pkg] = box;
 		}
 	}
 	return 0;
@@ -1163,13 +1105,11 @@ static void uncore_change_type_ctx(struc
 {
 	struct intel_uncore_pmu *pmu = type->pmus;
 	struct intel_uncore_box *box;
-	int i;
+	int i, pkg;
 
+	pkg = topology_physical_package_id(old_cpu < 0 ? new_cpu : old_cpu);
 	for (i = 0; i < type->num_boxes; i++, pmu++) {
-		if (old_cpu < 0)
-			box = uncore_pmu_to_box(pmu, new_cpu);
-		else
-			box = uncore_pmu_to_box(pmu, old_cpu);
+		box = pmu->boxes[pkg];
 		if (!box)
 			continue;
 
@@ -1199,27 +1139,20 @@ static void uncore_change_context(struct
 
 static void uncore_event_exit_cpu(int cpu)
 {
-	int i, phys_id, target;
+	int target;
 
-	/* if exiting cpu is used for collecting uncore events */
+	/* Check if exiting cpu is used for collecting uncore events */
 	if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask))
 		return;
 
-	/* find a new cpu to collect uncore events */
-	phys_id = topology_physical_package_id(cpu);
-	target = -1;
-	for_each_online_cpu(i) {
-		if (i == cpu)
-			continue;
-		if (phys_id == topology_physical_package_id(i)) {
-			target = i;
-			break;
-		}
-	}
+	/* Find a new cpu to collect uncore events */
+	target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
 
-	/* migrate uncore events to the new cpu */
-	if (target >= 0)
+	/* Migrate uncore events to the new target */
+	if (target < nr_cpu_ids)
 		cpumask_set_cpu(target, &uncore_cpu_mask);
+	else
+		target = -1;
 
 	uncore_change_context(uncore_msr_uncores, cpu, target);
 	uncore_change_context(uncore_pci_uncores, cpu, target);
@@ -1227,13 +1160,15 @@ static void uncore_event_exit_cpu(int cp
 
 static void uncore_event_init_cpu(int cpu)
 {
-	int i, phys_id;
+	int target;
 
-	phys_id = topology_physical_package_id(cpu);
-	for_each_cpu(i, &uncore_cpu_mask) {
-		if (phys_id == topology_physical_package_id(i))
-			return;
-	}
+	/*
+	 * Check if there is an online cpu in the package
+	 * which collects uncore events already.
+	 */
+	target = cpumask_any_and(topology_core_cpumask(cpu), &uncore_cpu_mask);
+	if (target < nr_cpu_ids)
+		return;
 
 	cpumask_set_cpu(cpu, &uncore_cpu_mask);
 
@@ -1246,38 +1181,25 @@ static int uncore_cpu_notifier(struct no
 {
 	unsigned int cpu = (long)hcpu;
 
-	/* allocate/free data structure for uncore box */
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_UP_PREPARE:
-		return notifier_from_errno(uncore_cpu_prepare(cpu, -1));
+		return notifier_from_errno(uncore_cpu_prepare(cpu));
+
 	case CPU_STARTING:
-		uncore_cpu_starting(cpu);
+		uncore_cpu_starting(cpu, false);
+	case CPU_DOWN_FAILED:
+		uncore_event_init_cpu(cpu);
 		break;
+
 	case CPU_UP_CANCELED:
 	case CPU_DYING:
 		uncore_cpu_dying(cpu);
 		break;
-	case CPU_ONLINE:
-	case CPU_DEAD:
-		uncore_kfree_boxes();
-		break;
-	default:
-		break;
-	}
 
-	/* select the cpu that collects uncore events */
-	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_DOWN_FAILED:
-	case CPU_STARTING:
-		uncore_event_init_cpu(cpu);
-		break;
 	case CPU_DOWN_PREPARE:
 		uncore_event_exit_cpu(cpu);
 		break;
-	default:
-		break;
 	}
-
 	return NOTIFY_OK;
 }
 
@@ -1359,7 +1281,7 @@ static int __init uncore_cpu_init(void)
 		return 0;
 	}
 
-	ret = uncore_types_init(uncore_msr_uncores);
+	ret = uncore_types_init(uncore_msr_uncores, true);
 	if (ret)
 		goto err;
 
@@ -1375,39 +1297,34 @@ static int __init uncore_cpu_init(void)
 
 static void __init uncore_cpu_setup(void *dummy)
 {
-	uncore_cpu_starting(smp_processor_id());
+	uncore_cpu_starting(smp_processor_id(), true);
 }
 
+/* Lazy to avoid allocation of a few bytes for the normal case */
+static __initdata DECLARE_BITMAP(packages, NR_CPUS);
+
 static int __init uncore_cpumask_init(void)
 {
-	int cpu, ret = 0;
-
-	cpu_notifier_register_begin();
+	unsigned int cpu;
 
 	for_each_online_cpu(cpu) {
-		int i, phys_id = topology_physical_package_id(cpu);
+		unsigned int pkg = topology_physical_package_id(cpu);
+		int ret;
 
-		for_each_cpu(i, &uncore_cpu_mask) {
-			if (phys_id == topology_physical_package_id(i)) {
-				phys_id = -1;
-				break;
-			}
-		}
-		if (phys_id < 0)
+		if (test_and_set_bit(pkg, packages))
 			continue;
-
-		ret = uncore_cpu_prepare(cpu, phys_id);
+		/*
+		 * The first online cpu of each package takes the refcounts
+		 * for all other online cpus in that package.
+		 */
+		ret = uncore_cpu_prepare(cpu);
 		if (ret)
-			goto out;
+			return ret;
 		uncore_event_init_cpu(cpu);
+		smp_call_function_single(cpu, uncore_cpu_setup, NULL, 1);
 	}
-	on_each_cpu(uncore_cpu_setup, NULL, 1);
-
 	__register_cpu_notifier(&uncore_cpu_nb);
-
-out:
-	cpu_notifier_register_done();
-	return ret;
+	return 0;
 }
 
 static int __init intel_uncore_init(void)
@@ -1425,17 +1342,19 @@ static int __init intel_uncore_init(void
 		return ret;
 	ret = uncore_cpu_init();
 	if (ret)
-		goto errpci;
+		goto err;
+
+	cpu_notifier_register_begin();
 	ret = uncore_cpumask_init();
 	if (ret)
-		goto errcpu;
-
+		goto err;
+	cpu_notifier_register_done();
 	return 0;
 
-errcpu:
+err:
 	uncore_types_exit(uncore_msr_uncores);
-errpci:
 	uncore_pci_exit();
+	cpu_notifier_register_done();
 	return ret;
 }
 device_initcall(intel_uncore_init);
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -79,9 +79,9 @@ struct intel_uncore_pmu {
 	int				pmu_idx;
 	int				func_id;
 	bool				registered;
+	atomic_t			activeboxes;
 	struct intel_uncore_type	*type;
-	struct intel_uncore_box		** __percpu box;
-	struct list_head		box_list;
+	struct intel_uncore_box		**boxes;
 };
 
 struct intel_uncore_extra_reg {

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 06/11] x86/topology: Provide helper to retrieve number of cpu packages
  2016-02-17 13:47 [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements Thomas Gleixner
                   ` (5 preceding siblings ...)
  2016-02-17 13:47 ` [patch 07/11] x86/perf/uncore: Track packages not per cpu data Thomas Gleixner
@ 2016-02-17 13:47 ` Thomas Gleixner
  2016-02-17 13:47 ` [patch 08/11] x86/perf/intel_uncore: Clear all hardware state on exit Thomas Gleixner
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 13:47 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen

[-- Attachment #1: x86-topology--Provide-helper-to-get-number-of-cpu-packages.patch --]
[-- Type: text/plain, Size: 727 bytes --]

For services which are per package, e.g. intel_uncore, we need to know the
number of possible packages in the system. Provide a helper. Unfortunately a
macro to avoid inclue hell.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/topology.h |    3 +++
 1 file changed, 3 insertions(+)

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -127,6 +127,9 @@ extern const struct cpumask *cpu_coregro
 #define topology_sibling_cpumask(cpu)		(per_cpu(cpu_sibling_map, cpu))
 #endif
 
+#define topology_max_packages()			\
+	DIV_ROUND_UP(num_possible_cpus(), boot_cpu_data.x86_max_cores * smp_num_siblings)
+
 static inline void arch_fix_phys_package_id(int num, u32 slot)
 {
 }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 08/11] x86/perf/intel_uncore: Clear all hardware state on exit
  2016-02-17 13:47 [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements Thomas Gleixner
                   ` (6 preceding siblings ...)
  2016-02-17 13:47 ` [patch 06/11] x86/topology: Provide helper to retrieve number of cpu packages Thomas Gleixner
@ 2016-02-17 13:47 ` Thomas Gleixner
  2016-02-17 13:47 ` [patch 10/11] cpumask: Export cpumask_any_but Thomas Gleixner
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 13:47 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen

[-- Attachment #1: x86-perf-intel_uncore--Clear-all-hardware-state-on-exit.patch --]
[-- Type: text/plain, Size: 1508 bytes --]

The only missing bit is to completely clear the hardware state on failure
exit. This is now a pretty simple exercise.

Undo the box->init_box() setup on all packages which have been initialized so
far.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -751,6 +751,30 @@ static void uncore_pmu_unregister(struct
 	pmu->registered = false;
 }
 
+static void __init __uncore_exit_boxes(struct intel_uncore_type *type, int cpu)
+{
+	struct intel_uncore_pmu *pmu = type->pmus;
+	struct intel_uncore_box *box;
+	int i, pkg;
+
+	if (pmu) {
+		pkg = topology_physical_package_id(cpu);
+		for (i = 0; i < type->num_boxes; i++, pmu++) {
+			box = pmu->boxes[pkg];
+			if (box)
+				uncore_box_exit(box);
+		}
+	}
+}
+
+static void __init uncore_exit_boxes(void *dummy)
+{
+	struct intel_uncore_type **types = uncore_msr_uncores;
+
+	while (*types)
+		__uncore_exit_boxes(*types++, smp_processor_id());
+}
+
 static void uncore_free_boxes(struct intel_uncore_pmu *pmu)
 {
 	int pkg, maxpkg = topology_max_packages();
@@ -1352,6 +1376,8 @@ static int __init intel_uncore_init(void
 	return 0;
 
 err:
+	/* Undo box->init_box() */
+	on_each_cpu_mask(&uncore_cpu_mask, uncore_exit_boxes, NULL, 1);
 	uncore_types_exit(uncore_msr_uncores);
 	uncore_pci_exit();
 	cpu_notifier_register_done();

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 09/11] x86/perf/intel_uncore: Make PCI and MSR uncore independent
  2016-02-17 13:47 [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements Thomas Gleixner
                   ` (8 preceding siblings ...)
  2016-02-17 13:47 ` [patch 10/11] cpumask: Export cpumask_any_but Thomas Gleixner
@ 2016-02-17 13:47 ` Thomas Gleixner
  2016-02-17 13:47 ` [patch 11/11] x86/perf/intel_uncore: Make it modular Thomas Gleixner
  10 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 13:47 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen, Andi Kleen

[-- Attachment #1: x86-perf-intel_uncore--Make-PCI-and-MSR-uncore-independent.patch --]
[-- Type: text/plain, Size: 2479 bytes --]

Andi wanted to do this before, but the patch fell down the cracks. Implement
it with the proper error handling.

Requested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   34 +++++++++++++-------------
 1 file changed, 18 insertions(+), 16 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1014,7 +1014,7 @@ static int __init uncore_pci_init(void)
 		ret = skl_uncore_pci_init();
 		break;
 	default:
-		return 0;
+		return -ENODEV;
 	}
 
 	if (ret)
@@ -1302,7 +1302,7 @@ static int __init uncore_cpu_init(void)
 		knl_uncore_cpu_init();
 		break;
 	default:
-		return 0;
+		return -ENODEV;
 	}
 
 	ret = uncore_types_init(uncore_msr_uncores, true);
@@ -1327,7 +1327,7 @@ static void __init uncore_cpu_setup(void
 /* Lazy to avoid allocation of a few bytes for the normal case */
 static __initdata DECLARE_BITMAP(packages, NR_CPUS);
 
-static int __init uncore_cpumask_init(void)
+static int __init uncore_cpumask_init(bool msr)
 {
 	unsigned int cpu;
 
@@ -1338,12 +1338,15 @@ static int __init uncore_cpumask_init(vo
 		if (test_and_set_bit(pkg, packages))
 			continue;
 		/*
-		 * The first online cpu of each package takes the refcounts
-		 * for all other online cpus in that package.
+		 * The first online cpu of each package allocates and takes
+		 * the refcounts for all other online cpus in that package.
+		 * If msrs are not enabled no allocation is required.
 		 */
-		ret = uncore_cpu_prepare(cpu);
-		if (ret)
-			return ret;
+		if (msr) {
+			ret = uncore_cpu_prepare(cpu);
+			if (ret)
+				return ret;
+		}
 		uncore_event_init_cpu(cpu);
 		smp_call_function_single(cpu, uncore_cpu_setup, NULL, 1);
 	}
@@ -1353,7 +1356,7 @@ static int __init uncore_cpumask_init(vo
 
 static int __init intel_uncore_init(void)
 {
-	int ret;
+	int pret, cret, ret;
 
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
 		return -ENODEV;
@@ -1361,15 +1364,14 @@ static int __init intel_uncore_init(void
 	if (cpu_has_hypervisor)
 		return -ENODEV;
 
-	ret = uncore_pci_init();
-	if (ret)
-		return ret;
-	ret = uncore_cpu_init();
-	if (ret)
-		goto err;
+	pret = uncore_pci_init();
+	cret = uncore_cpu_init();
+
+	if (cret && pret)
+		return -ENODEV;
 
 	cpu_notifier_register_begin();
-	ret = uncore_cpumask_init();
+	ret = uncore_cpumask_init(!cret);
 	if (ret)
 		goto err;
 	cpu_notifier_register_done();

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 10/11] cpumask: Export cpumask_any_but
  2016-02-17 13:47 [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements Thomas Gleixner
                   ` (7 preceding siblings ...)
  2016-02-17 13:47 ` [patch 08/11] x86/perf/intel_uncore: Clear all hardware state on exit Thomas Gleixner
@ 2016-02-17 13:47 ` Thomas Gleixner
  2016-02-17 13:47 ` [patch 09/11] x86/perf/intel_uncore: Make PCI and MSR uncore independent Thomas Gleixner
  2016-02-17 13:47 ` [patch 11/11] x86/perf/intel_uncore: Make it modular Thomas Gleixner
  10 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 13:47 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen

[-- Attachment #1: cpumask--Export-cpumask_any_but.patch --]
[-- Type: text/plain, Size: 469 bytes --]

Almost every cpumask function is exported, just not the one I need to make the
intel uncore driver modular.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 lib/cpumask.c |    1 +
 1 file changed, 1 insertion(+)

--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -41,6 +41,7 @@ int cpumask_any_but(const struct cpumask
 			break;
 	return i;
 }
+EXPORT_SYMBOL(cpumask_any_but);
 
 /* These are not inline because of header tangles. */
 #ifdef CONFIG_CPUMASK_OFFSTACK

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch 11/11] x86/perf/intel_uncore: Make it modular
  2016-02-17 13:47 [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements Thomas Gleixner
                   ` (9 preceding siblings ...)
  2016-02-17 13:47 ` [patch 09/11] x86/perf/intel_uncore: Make PCI and MSR uncore independent Thomas Gleixner
@ 2016-02-17 13:47 ` Thomas Gleixner
  10 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 13:47 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Harish Chegondi, Kan Liang, Andi Kleen

[-- Attachment #1: x86-perf-intel_uncore--Make-it-modular.patch --]
[-- Type: text/plain, Size: 3903 bytes --]

Now that we have a proper cleanup all over the place, it's simple to make this
a modular driver.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig                              |   14 ++++++++++----
 arch/x86/kernel/cpu/Makefile                  |    3 ++-
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   24 ++++++++++++++++++------
 3 files changed, 30 insertions(+), 11 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -160,10 +160,6 @@ config INSTRUCTION_DECODER
 	def_bool y
 	depends on KPROBES || PERF_EVENTS || UPROBES
 
-config PERF_EVENTS_INTEL_UNCORE
-	def_bool y
-	depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
-
 config OUTPUT_FORMAT
 	string
 	default "elf32-i386" if X86_32
@@ -1039,6 +1035,16 @@ config X86_THERMAL_VECTOR
 	def_bool y
 	depends on X86_MCE_INTEL
 
+config PERF_EVENTS_INTEL_UNCORE
+	tristate "Intel uncore performance events"
+	depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
+	default y
+	---help---
+	  Include support for Intel uncore performance events. These are
+	  available on NehalemEX and more modern processors.
+
+	  If unsure say y.
+
 config X86_LEGACY_VM86
 	bool "Legacy VM86 support"
 	default n
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -43,7 +43,8 @@ obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_eve
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_pt.o perf_event_intel_bts.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_cstate.o
 
-obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)	+= perf_event_intel_uncore.o \
+obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)	+= perf_event_intel_uncores.o
+perf_event_intel_uncores-objs		:= perf_event_intel_uncore.o \
 					   perf_event_intel_uncore_snb.o \
 					   perf_event_intel_uncore_snbep.o \
 					   perf_event_intel_uncore_nhmex.o
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -20,6 +20,8 @@ static struct event_constraint uncore_co
 struct event_constraint uncore_constraint_empty =
 	EVENT_CONSTRAINT(0, 0, 0);
 
+MODULE_LICENSE("GPL");
+
 int uncore_pcibus_to_physid(struct pci_bus *bus)
 {
 	struct pci2phy_map *map;
@@ -751,7 +753,7 @@ static void uncore_pmu_unregister(struct
 	pmu->registered = false;
 }
 
-static void __init __uncore_exit_boxes(struct intel_uncore_type *type, int cpu)
+static void __uncore_exit_boxes(struct intel_uncore_type *type, int cpu)
 {
 	struct intel_uncore_pmu *pmu = type->pmus;
 	struct intel_uncore_box *box;
@@ -767,7 +769,7 @@ static void __init __uncore_exit_boxes(s
 	}
 }
 
-static void __init uncore_exit_boxes(void *dummy)
+static void uncore_exit_boxes(void *dummy)
 {
 	struct intel_uncore_type **types = uncore_msr_uncores;
 
@@ -784,7 +786,7 @@ static void uncore_free_boxes(struct int
 	kfree(pmu->boxes);
 }
 
-static void __init uncore_type_exit(struct intel_uncore_type *type)
+static void uncore_type_exit(struct intel_uncore_type *type)
 {
 	struct intel_uncore_pmu *pmu = type->pmus;
 	int i;
@@ -801,7 +803,7 @@ static void __init uncore_type_exit(stru
 	type->events_group = NULL;
 }
 
-static void __init uncore_types_exit(struct intel_uncore_type **types)
+static void uncore_types_exit(struct intel_uncore_type **types)
 {
 	while (*types)
 		uncore_type_exit(*types++);
@@ -1040,7 +1042,7 @@ static int __init uncore_pci_init(void)
 	return ret;
 }
 
-static void __init uncore_pci_exit(void)
+static void uncore_pci_exit(void)
 {
 	if (pcidrv_registered) {
 		pcidrv_registered = false;
@@ -1385,4 +1387,14 @@ static int __init intel_uncore_init(void
 	cpu_notifier_register_done();
 	return ret;
 }
-device_initcall(intel_uncore_init);
+module_init(intel_uncore_init);
+
+static void __exit intel_uncore_exit(void)
+{
+	cpu_notifier_register_done();
+	__unregister_cpu_notifier(&uncore_cpu_nb);
+	uncore_types_exit(uncore_msr_uncores);
+	uncore_pci_exit();
+	cpu_notifier_register_done();
+}
+module_exit(intel_uncore_exit);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch 04/11] x86/perf/intel_uncore: Cleanup hardware on exit
  2016-02-17 13:47 ` [patch 04/11] x86/perf/intel_uncore: Cleanup hardware on exit Thomas Gleixner
@ 2016-02-17 15:49   ` Liang, Kan
  2016-02-17 18:16     ` Thomas Gleixner
  0 siblings, 1 reply; 30+ messages in thread
From: Liang, Kan @ 2016-02-17 15:49 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Stephane Eranian,
	Chegondi, Harish, Kleen, Andi



> 
> When tearing down the boxes nothing undoes the hardware state which
> was setup by box->init_box(). Add a box->exit_box() callback and
> implement it for the uncores which have an init_box() callback.

I don't think we need exit_box.
Because in disable_box we already freezes the box.

Also, writing 0 cannot clear hardware state. It will unfreeze the box.
The counter will start to count.

> 
> This misses the cleanup in the error exit pathes, but I cannot be bothered
> to implement it before cleaning up the rest of the driver, which makes that
> task way simpler.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/kernel/cpu/perf_event_intel_uncore.c       |    6 +-
>  arch/x86/kernel/cpu/perf_event_intel_uncore.h       |    9 +++
>  arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c |    6 ++
>  arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c   |   13 ++++
>  arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c |   57
> +++++++++++++++++++-
>  5 files changed, 88 insertions(+), 3 deletions(-)
> 
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> @@ -927,6 +927,7 @@ static int uncore_pci_probe(struct pci_d
>  		raw_spin_lock(&uncore_box_lock);
>  		list_del(&box->list);
>  		raw_spin_unlock(&uncore_box_lock);
> +		uncore_box_exit(box);
>  		kfree(box);
>  	}
>  	return ret;
> @@ -972,6 +973,7 @@ static void uncore_pci_remove(struct pci
>  	}
> 
>  	WARN_ON_ONCE(atomic_read(&box->refcnt) != 1);
> +	uncore_box_exit(box);
>  	kfree(box);
> 
>  	if (last_box)
> @@ -1079,8 +1081,10 @@ static void uncore_cpu_dying(int cpu)
>  			pmu = &type->pmus[j];
>  			box = *per_cpu_ptr(pmu->box, cpu);
>  			*per_cpu_ptr(pmu->box, cpu) = NULL;
> -			if (box && atomic_dec_and_test(&box->refcnt))
> +			if (box && atomic_dec_and_test(&box->refcnt)) {
>  				list_add(&box->list, &boxes_to_free);
> +				uncore_box_exit(box);
> +			}
>  		}
>  	}
>  }
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> @@ -61,6 +61,7 @@ struct intel_uncore_type {
> 
>  struct intel_uncore_ops {
>  	void (*init_box)(struct intel_uncore_box *);
> +	void (*exit_box)(struct intel_uncore_box *);
>  	void (*disable_box)(struct intel_uncore_box *);
>  	void (*enable_box)(struct intel_uncore_box *);
>  	void (*disable_event)(struct intel_uncore_box *, struct
> perf_event *); @@ -306,6 +307,14 @@ static inline void
> uncore_box_init(struc
>  	}
>  }
> 
> +static inline void uncore_box_exit(struct intel_uncore_box *box) {
> +	if (test_and_clear_bit(UNCORE_BOX_FLAG_INITIATED, &box-
> >flags)) {
> +		if (box->pmu->type->ops->exit_box)
> +			box->pmu->type->ops->exit_box(box);
> +	}
> +}
> +
>  static inline bool uncore_box_is_fake(struct intel_uncore_box *box)  {
>  	return (box->phys_id < 0);
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c
> @@ -201,6 +201,11 @@ static void nhmex_uncore_msr_init_box(st
>  	wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL,
> NHMEX_U_PMON_GLOBAL_EN_ALL);  }
> 
> +static void nhmex_uncore_msr_exit_box(struct intel_uncore_box *box)
> {
> +	wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL, 0); }
> +
>  static void nhmex_uncore_msr_disable_box(struct intel_uncore_box
> *box)  {
>  	unsigned msr = uncore_msr_box_ctl(box); @@ -250,6 +255,7 @@
> static void nhmex_uncore_msr_enable_even
> 
>  #define NHMEX_UNCORE_OPS_COMMON_INIT()
> 	\
>  	.init_box	= nhmex_uncore_msr_init_box,		\
> +	.exit_box	= nhmex_uncore_msr_exit_box,		\
>  	.disable_box	= nhmex_uncore_msr_disable_box,		\
>  	.enable_box	= nhmex_uncore_msr_enable_box,		\
>  	.disable_event	= nhmex_uncore_msr_disable_event,	\
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
> @@ -95,6 +95,12 @@ static void snb_uncore_msr_init_box(stru
>  	}
>  }
> 
> +static void snb_uncore_msr_exit_box(struct intel_uncore_box *box) {
> +	if (box->pmu->pmu_idx == 0)
> +		wrmsrl(SNB_UNC_PERF_GLOBAL_CTL, 0);
> +}
> +
>  static struct uncore_event_desc snb_uncore_events[] = {
>  	INTEL_UNCORE_EVENT_DESC(clockticks,
> "event=0xff,umask=0x00"),
>  	{ /* end: all zeroes */ },
> @@ -116,6 +122,7 @@ static struct attribute_group snb_uncore
> 
>  static struct intel_uncore_ops snb_uncore_msr_ops = {
>  	.init_box	= snb_uncore_msr_init_box,
> +	.exit_box	= snb_uncore_msr_exit_box,
>  	.disable_event	= snb_uncore_msr_disable_event,
>  	.enable_event	= snb_uncore_msr_enable_event,
>  	.read_counter	= uncore_msr_read_counter,
> @@ -231,6 +238,11 @@ static void snb_uncore_imc_init_box(stru
>  	box->hrtimer_duration =
> UNCORE_SNB_IMC_HRTIMER_INTERVAL;  }
> 
> +static void snb_uncore_imc_exit_box(struct intel_uncore_box *box) {
> +	iounmap(box->io_addr);
> +}
> +
>  static void snb_uncore_imc_enable_box(struct intel_uncore_box *box)  {}
> 
> @@ -458,6 +470,7 @@ static struct pmu snb_uncore_imc_pmu = {
> 
>  static struct intel_uncore_ops snb_uncore_imc_ops = {
>  	.init_box	= snb_uncore_imc_init_box,
> +	.exit_box	= snb_uncore_imc_exit_box,
>  	.enable_box	= snb_uncore_imc_enable_box,
>  	.disable_box	= snb_uncore_imc_disable_box,
>  	.disable_event	= snb_uncore_imc_disable_event,
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c
> @@ -387,6 +387,14 @@ static void snbep_uncore_pci_init_box(st
>  	pci_write_config_dword(pdev, box_ctl,
> SNBEP_PMON_BOX_CTL_INT);  }
> 
> +static void snbep_uncore_pci_exit_box(struct intel_uncore_box *box) {
> +	struct pci_dev *pdev = box->pci_dev;
> +	int box_ctl = uncore_pci_box_ctl(box);
> +
> +	pci_write_config_dword(pdev, box_ctl, 0); }
> +
>  static void snbep_uncore_msr_disable_box(struct intel_uncore_box *box)
> {
>  	u64 config;
> @@ -440,6 +448,14 @@ static void snbep_uncore_msr_init_box(st
>  		wrmsrl(msr, SNBEP_PMON_BOX_CTL_INT);
>  }
> 
> +static void snbep_uncore_msr_exit_box(struct intel_uncore_box *box) {
> +	unsigned msr = uncore_msr_box_ctl(box);
> +
> +	if (msr)
> +		wrmsrl(msr, 0);
> +}
> +
>  static struct attribute *snbep_uncore_formats_attr[] = {
>  	&format_attr_event.attr,
>  	&format_attr_umask.attr,
> @@ -567,7 +583,8 @@ static struct attribute_group snbep_unco
> 
>  #define SNBEP_UNCORE_MSR_OPS_COMMON_INIT()
> 	\
>  	__SNBEP_UNCORE_MSR_OPS_COMMON_INIT(),
> 	\
> -	.init_box	= snbep_uncore_msr_init_box		\
> +	.init_box	= snbep_uncore_msr_init_box,		\
> +	.exit_box	= snbep_uncore_msr_exit_box		\
> 
>  static struct intel_uncore_ops snbep_uncore_msr_ops = {
>  	SNBEP_UNCORE_MSR_OPS_COMMON_INIT(),
> @@ -575,6 +592,7 @@ static struct intel_uncore_ops snbep_unc
> 
>  #define SNBEP_UNCORE_PCI_OPS_COMMON_INIT()			\
>  	.init_box	= snbep_uncore_pci_init_box,		\
> +	.exit_box	= snbep_uncore_pci_exit_box,		\
>  	.disable_box	= snbep_uncore_pci_disable_box,		\
>  	.enable_box	= snbep_uncore_pci_enable_box,		\
>  	.disable_event	= snbep_uncore_pci_disable_event,	\
> @@ -1236,10 +1254,19 @@ int snbep_uncore_pci_init(void)  static void
> ivbep_uncore_msr_init_box(struct intel_uncore_box *box)  {
>  	unsigned msr = uncore_msr_box_ctl(box);
> +
>  	if (msr)
>  		wrmsrl(msr, IVBEP_PMON_BOX_CTL_INT);
>  }
> 
> +static void ivbep_uncore_msr_exit_box(struct intel_uncore_box *box) {
> +	unsigned msr = uncore_msr_box_ctl(box);
> +
> +	if (msr)
> +		wrmsrl(msr, 0);
> +}
> +
>  static void ivbep_uncore_pci_init_box(struct intel_uncore_box *box)  {
>  	struct pci_dev *pdev = box->pci_dev;
> @@ -1247,8 +1274,16 @@ static void ivbep_uncore_pci_init_box(st
>  	pci_write_config_dword(pdev, SNBEP_PCI_PMON_BOX_CTL,
> IVBEP_PMON_BOX_CTL_INT);  }
> 
> +static void ivbep_uncore_pci_exit_box(struct intel_uncore_box *box) {
> +	struct pci_dev *pdev = box->pci_dev;
> +
> +	pci_write_config_dword(pdev, SNBEP_PCI_PMON_BOX_CTL, 0); }
> +
>  #define IVBEP_UNCORE_MSR_OPS_COMMON_INIT()
> 	\
>  	.init_box	= ivbep_uncore_msr_init_box,		\
> +	.exit_box	= ivbep_uncore_msr_exit_box,		\
>  	.disable_box	= snbep_uncore_msr_disable_box,		\
>  	.enable_box	= snbep_uncore_msr_enable_box,		\
>  	.disable_event	= snbep_uncore_msr_disable_event,	\
> @@ -1261,6 +1296,7 @@ static struct intel_uncore_ops ivbep_unc
> 
>  static struct intel_uncore_ops ivbep_uncore_pci_ops = {
>  	.init_box	= ivbep_uncore_pci_init_box,
> +	.exit_box	= ivbep_uncore_pci_exit_box,
>  	.disable_box	= snbep_uncore_pci_disable_box,
>  	.enable_box	= snbep_uncore_pci_enable_box,
>  	.disable_event	= snbep_uncore_pci_disable_event,
> @@ -1497,6 +1533,7 @@ static void ivbep_cbox_enable_event(stru
> 
>  static struct intel_uncore_ops ivbep_uncore_cbox_ops = {
>  	.init_box		= ivbep_uncore_msr_init_box,
> +	.exit_box		= ivbep_uncore_msr_exit_box,
>  	.disable_box		= snbep_uncore_msr_disable_box,
>  	.enable_box		= snbep_uncore_msr_enable_box,
>  	.disable_event		= snbep_uncore_msr_disable_event,
> @@ -1613,6 +1650,7 @@ static u64 ivbep_uncore_irp_read_counter
> 
>  static struct intel_uncore_ops ivbep_uncore_irp_ops = {
>  	.init_box	= ivbep_uncore_pci_init_box,
> +	.exit_box	= ivbep_uncore_pci_exit_box,
>  	.disable_box	= snbep_uncore_pci_disable_box,
>  	.enable_box	= snbep_uncore_pci_enable_box,
>  	.disable_event	= ivbep_uncore_irp_disable_event,
> @@ -1633,6 +1671,7 @@ static struct intel_uncore_type ivbep_un
> 
>  static struct intel_uncore_ops ivbep_uncore_qpi_ops = {
>  	.init_box	= ivbep_uncore_pci_init_box,
> +	.exit_box	= ivbep_uncore_pci_exit_box,
>  	.disable_box	= snbep_uncore_pci_disable_box,
>  	.enable_box	= snbep_uncore_pci_enable_box,
>  	.disable_event	= snbep_uncore_pci_disable_event,
> @@ -1914,6 +1953,7 @@ static void hswep_cbox_enable_event(stru
> 
>  static struct intel_uncore_ops knl_uncore_cha_ops = {
>  	.init_box		= snbep_uncore_msr_init_box,
> +	.exit_box		= snbep_uncore_msr_exit_box,
>  	.disable_box		= snbep_uncore_msr_disable_box,
>  	.enable_box		= snbep_uncore_msr_enable_box,
>  	.disable_event		= snbep_uncore_msr_disable_event,
> @@ -2008,6 +2048,7 @@ static void knl_uncore_imc_enable_event(
> 
>  static struct intel_uncore_ops knl_uncore_imc_ops = {
>  	.init_box	= snbep_uncore_pci_init_box,
> +	.exit_box	= snbep_uncore_pci_exit_box,
>  	.disable_box	= snbep_uncore_pci_disable_box,
>  	.enable_box	= knl_uncore_imc_enable_box,
>  	.read_counter	= snbep_uncore_pci_read_counter,
> @@ -2397,6 +2438,7 @@ static void hswep_cbox_enable_event(stru
> 
>  static struct intel_uncore_ops hswep_uncore_cbox_ops = {
>  	.init_box		= snbep_uncore_msr_init_box,
> +	.exit_box		= snbep_uncore_msr_exit_box,
>  	.disable_box		= snbep_uncore_msr_disable_box,
>  	.enable_box		= snbep_uncore_msr_enable_box,
>  	.disable_event		= snbep_uncore_msr_disable_event,
> @@ -2442,9 +2484,19 @@ static void hswep_uncore_sbox_msr_init_b
>  	}
>  }
> 
> +static void hswep_uncore_sbox_msr_exit_box(struct intel_uncore_box
> +*box) {
> +	unsigned msr = uncore_msr_box_ctl(box);
> +
> +	/* CHECKME: Does this need the bit dance like init() ? */
> +	if (msr)
> +		wrmsrl(msr, 0);
> +}
> +
>  static struct intel_uncore_ops hswep_uncore_sbox_msr_ops = {
>  	__SNBEP_UNCORE_MSR_OPS_COMMON_INIT(),
> -	.init_box		= hswep_uncore_sbox_msr_init_box
> +	.init_box		= hswep_uncore_sbox_msr_init_box,
> +	.exit_box		= hswep_uncore_sbox_msr_exit_box
>  };
> 
>  static struct attribute *hswep_uncore_sbox_formats_attr[] = { @@ -
> 2584,6 +2636,7 @@ static u64 hswep_uncore_irp_read_counter
> 
>  static struct intel_uncore_ops hswep_uncore_irp_ops = {
>  	.init_box	= snbep_uncore_pci_init_box,
> +	.exit_box	= snbep_uncore_pci_exit_box,
>  	.disable_box	= snbep_uncore_pci_disable_box,
>  	.enable_box	= snbep_uncore_pci_enable_box,
>  	.disable_event	= ivbep_uncore_irp_disable_event,
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch 04/11] x86/perf/intel_uncore: Cleanup hardware on exit
  2016-02-17 15:49   ` Liang, Kan
@ 2016-02-17 18:16     ` Thomas Gleixner
  2016-02-17 21:57       ` Liang, Kan
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 18:16 UTC (permalink / raw)
  To: Liang, Kan
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	Stephane Eranian, Chegondi, Harish, Kleen, Andi

On Wed, 17 Feb 2016, Liang, Kan wrote:
> > When tearing down the boxes nothing undoes the hardware state which
> > was setup by box->init_box(). Add a box->exit_box() callback and
> > implement it for the uncores which have an init_box() callback.
> 
> I don't think we need exit_box.
> Because in disable_box we already freezes the box.

init_box() != enable_box() 
exit_box() != disable_box()

And we certainly want to clear stuff which enables things at a different
msr/config word location than what you do with the enable/disable callbacks.

I'm not a fan of leaving hardware in some random state.

> Also, writing 0 cannot clear hardware state. It will unfreeze the box.
> The counter will start to count.

Nonsense.
 
> > @@ -201,6 +201,11 @@ static void nhmex_uncore_msr_init_box(st
> >  	wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL,
> > NHMEX_U_PMON_GLOBAL_EN_ALL);  }

> > +static void nhmex_uncore_msr_exit_box(struct intel_uncore_box *box)
> > {
> > +	wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL, 0); }

The reset value for this register is 0. So how is that wrong?

> > +static void snb_uncore_msr_exit_box(struct intel_uncore_box *box) {
> > +	if (box->pmu->pmu_idx == 0)
> > +		wrmsrl(SNB_UNC_PERF_GLOBAL_CTL, 0);
> > +}

Ditto.

> > +static void snb_uncore_imc_exit_box(struct intel_uncore_box *box) {
> > +	iounmap(box->io_addr);

That's definitely required, because it would leak a mapping.

I know Intel folks do not care about error handling and a few reference leaks,
but I care very much.

If there is a single instance of exit_box() in that patch which flips the
wrong bits, then please point it out with the proper reference in the manual
and not with such half baken statements as above.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-17 13:47 ` [patch 07/11] x86/perf/uncore: Track packages not per cpu data Thomas Gleixner
@ 2016-02-17 21:19   ` Stephane Eranian
  2016-02-17 21:24     ` Andi Kleen
  2016-02-17 21:25     ` Thomas Gleixner
  0 siblings, 2 replies; 30+ messages in thread
From: Stephane Eranian @ 2016-02-17 21:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	Harish Chegondi, Kan Liang, Andi Kleen

On Wed, Feb 17, 2016 at 5:47 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> uncore is a per package facility, but the code tries to mimick a per cpu
> facility with completely convoluted constructs.
>
> Simplify the whole machinery by tracking per package information. While at it,
> avoid the kfree/alloc dance when a cpu goes offline and online again. There is
> no point in freeing the box after it was allocated. We just keep proper
> refcounting and the first cpu which comes online in a package does the
> initialization/activation of the box.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/kernel/cpu/perf_event_intel_uncore.c |  355 ++++++++++----------------
>  arch/x86/kernel/cpu/perf_event_intel_uncore.h |    4
>  2 files changed, 139 insertions(+), 220 deletions(-)
>
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> @@ -11,7 +11,6 @@ DEFINE_RAW_SPINLOCK(pci2phy_map_lock);
>  struct list_head pci2phy_map_head = LIST_HEAD_INIT(pci2phy_map_head);
>  struct pci_dev *uncore_extra_pci_dev[UNCORE_SOCKET_MAX][UNCORE_EXTRA_PCI_DEV_MAX];
>
> -static DEFINE_RAW_SPINLOCK(uncore_box_lock);
>  /* mask of cpus that collect uncore events */
>  static cpumask_t uncore_cpu_mask;
>
> @@ -89,27 +88,7 @@ struct intel_uncore_pmu *uncore_event_to
>
>  struct intel_uncore_box *uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
>  {
> -       struct intel_uncore_box *box;
> -
> -       box = *per_cpu_ptr(pmu->box, cpu);
> -       if (box)
> -               return box;
> -
> -       raw_spin_lock(&uncore_box_lock);
> -       /* Recheck in lock to handle races. */
> -       if (*per_cpu_ptr(pmu->box, cpu))
> -               goto out;
> -       list_for_each_entry(box, &pmu->box_list, list) {
> -               if (box->phys_id == topology_physical_package_id(cpu)) {
> -                       atomic_inc(&box->refcnt);
> -                       *per_cpu_ptr(pmu->box, cpu) = box;
> -                       break;
> -               }
> -       }
> -out:
> -       raw_spin_unlock(&uncore_box_lock);
> -
> -       return *per_cpu_ptr(pmu->box, cpu);
> +       return pmu->boxes[topology_physical_package_id(cpu)];
>  }
>
>  struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
> @@ -317,7 +296,6 @@ static struct intel_uncore_box *uncore_a
>                 raw_spin_lock_init(&box->shared_regs[i].lock);
>
>         uncore_pmu_init_hrtimer(box);
> -       atomic_set(&box->refcnt, 1);
>         box->cpu = -1;
>         box->phys_id = -1;
>
> @@ -773,14 +751,24 @@ static void uncore_pmu_unregister(struct
>         pmu->registered = false;
>  }
>
> +static void uncore_free_boxes(struct intel_uncore_pmu *pmu)
> +{
> +       int pkg, maxpkg = topology_max_packages();
> +
> +       for (pkg = 0; pkg < maxpkg; pkg++)
> +               kfree(pmu->boxes[pkg]);
> +       kfree(pmu->boxes);
> +}
> +
>  static void __init uncore_type_exit(struct intel_uncore_type *type)
>  {
> +       struct intel_uncore_pmu *pmu = type->pmus;
>         int i;
>
> -       if (type->pmus) {
> -               for (i = 0; i < type->num_boxes; i++) {
> -                       uncore_pmu_unregister(&type->pmus[i]);
> -                       free_percpu(type->pmus[i].box);
> +       if (pmu) {
> +               for (i = 0; i < type->num_boxes; i++, pmu++) {
> +                       uncore_pmu_unregister(pmu);
> +                       uncore_free_boxes(pmu);
>                 }
>                 kfree(type->pmus);
>                 type->pmus = NULL;
> @@ -795,33 +783,34 @@ static void __init uncore_types_exit(str
>                 uncore_type_exit(*types++);
>  }
>
> -static int __init uncore_type_init(struct intel_uncore_type *type)
> +static int __init uncore_type_init(struct intel_uncore_type *type, bool setid)
>  {
>         struct intel_uncore_pmu *pmus;
>         struct attribute_group *attr_group;
>         struct attribute **attrs;
> +       size_t size;
>         int i, j;
>
>         pmus = kzalloc(sizeof(*pmus) * type->num_boxes, GFP_KERNEL);
>         if (!pmus)
>                 return -ENOMEM;
>
> -       type->pmus = pmus;
> -
> -       type->unconstrainted = (struct event_constraint)
> -               __EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
> -                               0, type->num_counters, 0, 0);
> +       size = topology_max_packages() * sizeof(struct intel_uncore_box *);
>
Let's assume  you have a system with 48 cpus with HT on, then
you have  2 processor sockets, which is correct. But then you further
assume that  topology_physical_package_id() will return 0 or 1 in this case.
In other words, that physical id are always assigned from 0 to N in a contiguous
manner.

Do we know for sure that this is always the case for all Intel systems
and that no
weird BIOS is assigning random numbers for phys_proc_id?


        for (i = 0; i < type->num_boxes; i++) {
-               pmus[i].func_id = -1;
-               pmus[i].pmu_idx = i;
-               pmus[i].type = type;
-               INIT_LIST_HEAD(&pmus[i].box_list);
-               pmus[i].box = alloc_percpu(struct intel_uncore_box *);
-               if (!pmus[i].box)
+               pmus[i].func_id = setid ? i : -1;
+               pmus[i].pmu_idx = i;
+               pmus[i].type    = type;
+               pmus[i].boxes   = kzalloc(size, GFP_KERNEL);

>         for (i = 0; i < type->num_boxes; i++) {
> -               pmus[i].func_id = -1;
> -               pmus[i].pmu_idx = i;
> -               pmus[i].type = type;
> -               INIT_LIST_HEAD(&pmus[i].box_list);
> -               pmus[i].box = alloc_percpu(struct intel_uncore_box *);
> -               if (!pmus[i].box)
> +               pmus[i].func_id = setid ? i : -1;
> +               pmus[i].pmu_idx = i;
> +               pmus[i].type    = type;
> +               pmus[i].boxes   = kzalloc(size, GFP_KERNEL);
> +               if (!pmus[i].boxes)
>                         return -ENOMEM;
>         }
>
> +       type->pmus = pmus;
> +       type->unconstrainted = (struct event_constraint)
> +               __EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
> +                               0, type->num_counters, 0, 0);
> +
>         if (type->event_descs) {
>                 i = 0;
>                 while (type->event_descs[i].attr.attr.name)
> @@ -846,12 +835,13 @@ static int __init uncore_type_init(struc
>         return 0;
>  }
>
> -static int __init uncore_types_init(struct intel_uncore_type **types)
> +static int __init
> +uncore_types_init(struct intel_uncore_type **types, bool setid)
>  {
> -       int i, ret;
> +       int ret;
>
> -       for (i = 0; types[i]; i++) {
> -               ret = uncore_type_init(types[i]);
> +       while (*types) {
> +               ret = uncore_type_init(*types++, setid);
>                 if (ret)
>                         return ret;
>         }
> @@ -863,10 +853,9 @@ static int __init uncore_types_init(stru
>   */
>  static int uncore_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
> +       struct intel_uncore_type *type;
>         struct intel_uncore_pmu *pmu;
>         struct intel_uncore_box *box;
> -       struct intel_uncore_type *type;
> -       bool first_box = false;
>         int phys_id, ret;
>
>         phys_id = uncore_pcibus_to_physid(pdev->bus);
> @@ -906,27 +895,24 @@ static int uncore_pci_probe(struct pci_d
>         else
>                 WARN_ON_ONCE(pmu->func_id != pdev->devfn);
>
> +       WARN_ON_ONCE(pmu->boxes[phys_id] != NULL);
> +
> +       atomic_inc(&box->refcnt);
>         box->phys_id = phys_id;
>         box->pci_dev = pdev;
>         box->pmu = pmu;
>         uncore_box_init(box);
>         pci_set_drvdata(pdev, box);
>
> -       raw_spin_lock(&uncore_box_lock);
> -       if (list_empty(&pmu->box_list))
> -               first_box = true;
> -       list_add_tail(&box->list, &pmu->box_list);
> -       raw_spin_unlock(&uncore_box_lock);
> -
> -       if (!first_box)
> +       pmu->boxes[phys_id] = box;
> +       if (atomic_inc_return(&pmu->activeboxes) > 1)
>                 return 0;
>
> +       /* First active box registers the pmu */
>         ret = uncore_pmu_register(pmu);
>         if (ret) {
>                 pci_set_drvdata(pdev, NULL);
> -               raw_spin_lock(&uncore_box_lock);
> -               list_del(&box->list);
> -               raw_spin_unlock(&uncore_box_lock);
> +               pmu->boxes[phys_id] = NULL;
>                 uncore_box_exit(box);
>                 kfree(box);
>         }
> @@ -937,8 +923,7 @@ static void uncore_pci_remove(struct pci
>  {
>         struct intel_uncore_box *box = pci_get_drvdata(pdev);
>         struct intel_uncore_pmu *pmu;
> -       int i, cpu, phys_id;
> -       bool last_box = false;
> +       int i, phys_id;
>
>         phys_id = uncore_pcibus_to_physid(pdev->bus);
>         box = pci_get_drvdata(pdev);
> @@ -958,26 +943,13 @@ static void uncore_pci_remove(struct pci
>                 return;
>
>         pci_set_drvdata(pdev, NULL);
> +       pmu->boxes[phys_id] = NULL;
>
> -       raw_spin_lock(&uncore_box_lock);
> -       list_del(&box->list);
> -       if (list_empty(&pmu->box_list))
> -               last_box = true;
> -       raw_spin_unlock(&uncore_box_lock);
> -
> -       for_each_possible_cpu(cpu) {
> -               if (*per_cpu_ptr(pmu->box, cpu) == box) {
> -                       *per_cpu_ptr(pmu->box, cpu) = NULL;
> -                       atomic_dec(&box->refcnt);
> -               }
> -       }
> +       if (atomic_dec_return(&pmu->activeboxes) == 0)
> +               uncore_pmu_unregister(pmu);
>
> -       WARN_ON_ONCE(atomic_read(&box->refcnt) != 1);
>         uncore_box_exit(box);
>         kfree(box);
> -
> -       if (last_box)
> -               uncore_pmu_unregister(pmu);
>  }
>
>  static int __init uncore_pci_init(void)
> @@ -1024,7 +996,7 @@ static int __init uncore_pci_init(void)
>         if (ret)
>                 return ret;
>
> -       ret = uncore_types_init(uncore_pci_uncores);
> +       ret = uncore_types_init(uncore_pci_uncores, false);
>         if (ret)
>                 goto err;
>
> @@ -1053,106 +1025,76 @@ static void __init uncore_pci_exit(void)
>         }
>  }
>
> -/* CPU hot plug/unplug are serialized by cpu_add_remove_lock mutex */
> -static LIST_HEAD(boxes_to_free);
> -
> -static void uncore_kfree_boxes(void)
> -{
> -       struct intel_uncore_box *box;
> -
> -       while (!list_empty(&boxes_to_free)) {
> -               box = list_entry(boxes_to_free.next,
> -                                struct intel_uncore_box, list);
> -               list_del(&box->list);
> -               kfree(box);
> -       }
> -}
> -
>  static void uncore_cpu_dying(int cpu)
>  {
> -       struct intel_uncore_type *type;
> +       struct intel_uncore_type *type, **types = uncore_msr_uncores;
>         struct intel_uncore_pmu *pmu;
>         struct intel_uncore_box *box;
> -       int i, j;
> +       int i, pkg;
>
> -       for (i = 0; uncore_msr_uncores[i]; i++) {
> -               type = uncore_msr_uncores[i];
> -               for (j = 0; j < type->num_boxes; j++) {
> -                       pmu = &type->pmus[j];
> -                       box = *per_cpu_ptr(pmu->box, cpu);
> -                       *per_cpu_ptr(pmu->box, cpu) = NULL;
> -                       if (box && atomic_dec_and_test(&box->refcnt)) {
> -                               list_add(&box->list, &boxes_to_free);
> +       pkg = topology_physical_package_id(cpu);
> +       while (*types) {
> +               type = *types++;
> +               pmu = type->pmus;
> +               for (i = 0; i < type->num_boxes; i++, pmu++) {
> +                       box = pmu->boxes[pkg];
> +                       if (box && atomic_dec_return(&box->refcnt) == 0)
>                                 uncore_box_exit(box);
> -                       }
>                 }
>         }
>  }
>
> -static int uncore_cpu_starting(int cpu)
> +static void uncore_cpu_starting(int cpu, bool init)
>  {
> -       struct intel_uncore_type *type;
> +       struct intel_uncore_type *type, **types = uncore_msr_uncores;
>         struct intel_uncore_pmu *pmu;
> -       struct intel_uncore_box *box, *exist;
> -       int i, j, k, phys_id;
> +       struct intel_uncore_box *box;
> +       int i, pkg, ncpus = 1;
>
> -       phys_id = topology_physical_package_id(cpu);
> +       if (init) {
> +               /*
> +                * On init we get the number of online cpus in the package
> +                * and set refcount for all of them.
> +                */
> +               ncpus = cpumask_weight(topology_core_cpumask(cpu));
> +       }
>
> -       for (i = 0; uncore_msr_uncores[i]; i++) {
> -               type = uncore_msr_uncores[i];
> -               for (j = 0; j < type->num_boxes; j++) {
> -                       pmu = &type->pmus[j];
> -                       box = *per_cpu_ptr(pmu->box, cpu);
> -                       /* called by uncore_cpu_init? */
> -                       if (box && box->phys_id >= 0) {
> -                               uncore_box_init(box);
> +       pkg = topology_physical_package_id(cpu);
> +       while (*types) {
> +               type = *types++;
> +               pmu = type->pmus;
> +               for (i = 0; i < type->num_boxes; i++, pmu++) {
> +                       box = pmu->boxes[pkg];
> +                       if (!box)
>                                 continue;
> -                       }
> -
> -                       for_each_online_cpu(k) {
> -                               exist = *per_cpu_ptr(pmu->box, k);
> -                               if (exist && exist->phys_id == phys_id) {
> -                                       atomic_inc(&exist->refcnt);
> -                                       *per_cpu_ptr(pmu->box, cpu) = exist;
> -                                       if (box) {
> -                                               list_add(&box->list,
> -                                                        &boxes_to_free);
> -                                               box = NULL;
> -                                       }
> -                                       break;
> -                               }
> -                       }
> -
> -                       if (box) {
> -                               box->phys_id = phys_id;
> +                       /* The first cpu on a package activates the box */
> +                       if (atomic_add_return(ncpus, &box->refcnt) == ncpus)
>                                 uncore_box_init(box);
> -                       }
>                 }
>         }
> -       return 0;
>  }
>
> -static int uncore_cpu_prepare(int cpu, int phys_id)
> +static int uncore_cpu_prepare(int cpu)
>  {
> -       struct intel_uncore_type *type;
> +       struct intel_uncore_type *type, **types = uncore_msr_uncores;
>         struct intel_uncore_pmu *pmu;
>         struct intel_uncore_box *box;
> -       int i, j;
> -
> -       for (i = 0; uncore_msr_uncores[i]; i++) {
> -               type = uncore_msr_uncores[i];
> -               for (j = 0; j < type->num_boxes; j++) {
> -                       pmu = &type->pmus[j];
> -                       if (pmu->func_id < 0)
> -                               pmu->func_id = j;
> +       int i, pkg;
>
> +       pkg = topology_physical_package_id(cpu);
> +       while (*types) {
> +               type = *types++;
> +               pmu = type->pmus;
> +               for (i = 0; i < type->num_boxes; i++, pmu++) {
> +                       if (pmu->boxes[pkg])
> +                               continue;
> +                       /* First cpu of a package allocates the box */
>                         box = uncore_alloc_box(type, cpu_to_node(cpu));
>                         if (!box)
>                                 return -ENOMEM;
> -
>                         box->pmu = pmu;
> -                       box->phys_id = phys_id;
> -                       *per_cpu_ptr(pmu->box, cpu) = box;
> +                       box->phys_id = pkg;
> +                       pmu->boxes[pkg] = box;
>                 }
>         }
>         return 0;
> @@ -1163,13 +1105,11 @@ static void uncore_change_type_ctx(struc
>  {
>         struct intel_uncore_pmu *pmu = type->pmus;
>         struct intel_uncore_box *box;
> -       int i;
> +       int i, pkg;
>
> +       pkg = topology_physical_package_id(old_cpu < 0 ? new_cpu : old_cpu);
>         for (i = 0; i < type->num_boxes; i++, pmu++) {
> -               if (old_cpu < 0)
> -                       box = uncore_pmu_to_box(pmu, new_cpu);
> -               else
> -                       box = uncore_pmu_to_box(pmu, old_cpu);
> +               box = pmu->boxes[pkg];
>                 if (!box)
>                         continue;
>
> @@ -1199,27 +1139,20 @@ static void uncore_change_context(struct
>
>  static void uncore_event_exit_cpu(int cpu)
>  {
> -       int i, phys_id, target;
> +       int target;
>
> -       /* if exiting cpu is used for collecting uncore events */
> +       /* Check if exiting cpu is used for collecting uncore events */
>         if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask))
>                 return;
>
> -       /* find a new cpu to collect uncore events */
> -       phys_id = topology_physical_package_id(cpu);
> -       target = -1;
> -       for_each_online_cpu(i) {
> -               if (i == cpu)
> -                       continue;
> -               if (phys_id == topology_physical_package_id(i)) {
> -                       target = i;
> -                       break;
> -               }
> -       }
> +       /* Find a new cpu to collect uncore events */
> +       target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
>
> -       /* migrate uncore events to the new cpu */
> -       if (target >= 0)
> +       /* Migrate uncore events to the new target */
> +       if (target < nr_cpu_ids)
>                 cpumask_set_cpu(target, &uncore_cpu_mask);
> +       else
> +               target = -1;
>
>         uncore_change_context(uncore_msr_uncores, cpu, target);
>         uncore_change_context(uncore_pci_uncores, cpu, target);
> @@ -1227,13 +1160,15 @@ static void uncore_event_exit_cpu(int cp
>
>  static void uncore_event_init_cpu(int cpu)
>  {
> -       int i, phys_id;
> +       int target;
>
> -       phys_id = topology_physical_package_id(cpu);
> -       for_each_cpu(i, &uncore_cpu_mask) {
> -               if (phys_id == topology_physical_package_id(i))
> -                       return;
> -       }
> +       /*
> +        * Check if there is an online cpu in the package
> +        * which collects uncore events already.
> +        */
> +       target = cpumask_any_and(topology_core_cpumask(cpu), &uncore_cpu_mask);
> +       if (target < nr_cpu_ids)
> +               return;
>
>         cpumask_set_cpu(cpu, &uncore_cpu_mask);
>
> @@ -1246,38 +1181,25 @@ static int uncore_cpu_notifier(struct no
>  {
>         unsigned int cpu = (long)hcpu;
>
> -       /* allocate/free data structure for uncore box */
>         switch (action & ~CPU_TASKS_FROZEN) {
>         case CPU_UP_PREPARE:
> -               return notifier_from_errno(uncore_cpu_prepare(cpu, -1));
> +               return notifier_from_errno(uncore_cpu_prepare(cpu));
> +
>         case CPU_STARTING:
> -               uncore_cpu_starting(cpu);
> +               uncore_cpu_starting(cpu, false);
> +       case CPU_DOWN_FAILED:
> +               uncore_event_init_cpu(cpu);
>                 break;
> +
>         case CPU_UP_CANCELED:
>         case CPU_DYING:
>                 uncore_cpu_dying(cpu);
>                 break;
> -       case CPU_ONLINE:
> -       case CPU_DEAD:
> -               uncore_kfree_boxes();
> -               break;
> -       default:
> -               break;
> -       }
>
> -       /* select the cpu that collects uncore events */
> -       switch (action & ~CPU_TASKS_FROZEN) {
> -       case CPU_DOWN_FAILED:
> -       case CPU_STARTING:
> -               uncore_event_init_cpu(cpu);
> -               break;
>         case CPU_DOWN_PREPARE:
>                 uncore_event_exit_cpu(cpu);
>                 break;
> -       default:
> -               break;
>         }
> -
>         return NOTIFY_OK;
>  }
>
> @@ -1359,7 +1281,7 @@ static int __init uncore_cpu_init(void)
>                 return 0;
>         }
>
> -       ret = uncore_types_init(uncore_msr_uncores);
> +       ret = uncore_types_init(uncore_msr_uncores, true);
>         if (ret)
>                 goto err;
>
> @@ -1375,39 +1297,34 @@ static int __init uncore_cpu_init(void)
>
>  static void __init uncore_cpu_setup(void *dummy)
>  {
> -       uncore_cpu_starting(smp_processor_id());
> +       uncore_cpu_starting(smp_processor_id(), true);
>  }
>
> +/* Lazy to avoid allocation of a few bytes for the normal case */
> +static __initdata DECLARE_BITMAP(packages, NR_CPUS);
> +
>  static int __init uncore_cpumask_init(void)
>  {
> -       int cpu, ret = 0;
> -
> -       cpu_notifier_register_begin();
> +       unsigned int cpu;
>
>         for_each_online_cpu(cpu) {
> -               int i, phys_id = topology_physical_package_id(cpu);
> +               unsigned int pkg = topology_physical_package_id(cpu);
> +               int ret;
>
> -               for_each_cpu(i, &uncore_cpu_mask) {
> -                       if (phys_id == topology_physical_package_id(i)) {
> -                               phys_id = -1;
> -                               break;
> -                       }
> -               }
> -               if (phys_id < 0)
> +               if (test_and_set_bit(pkg, packages))
>                         continue;
> -
> -               ret = uncore_cpu_prepare(cpu, phys_id);
> +               /*
> +                * The first online cpu of each package takes the refcounts
> +                * for all other online cpus in that package.
> +                */
> +               ret = uncore_cpu_prepare(cpu);
>                 if (ret)
> -                       goto out;
> +                       return ret;
>                 uncore_event_init_cpu(cpu);
> +               smp_call_function_single(cpu, uncore_cpu_setup, NULL, 1);
>         }
> -       on_each_cpu(uncore_cpu_setup, NULL, 1);
> -
>         __register_cpu_notifier(&uncore_cpu_nb);
> -
> -out:
> -       cpu_notifier_register_done();
> -       return ret;
> +       return 0;
>  }
>
>  static int __init intel_uncore_init(void)
> @@ -1425,17 +1342,19 @@ static int __init intel_uncore_init(void
>                 return ret;
>         ret = uncore_cpu_init();
>         if (ret)
> -               goto errpci;
> +               goto err;
> +
> +       cpu_notifier_register_begin();
>         ret = uncore_cpumask_init();
>         if (ret)
> -               goto errcpu;
> -
> +               goto err;
> +       cpu_notifier_register_done();
>         return 0;
>
> -errcpu:
> +err:
>         uncore_types_exit(uncore_msr_uncores);
> -errpci:
>         uncore_pci_exit();
> +       cpu_notifier_register_done();
>         return ret;
>  }
>  device_initcall(intel_uncore_init);
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> @@ -79,9 +79,9 @@ struct intel_uncore_pmu {
>         int                             pmu_idx;
>         int                             func_id;
>         bool                            registered;
> +       atomic_t                        activeboxes;
>         struct intel_uncore_type        *type;
> -       struct intel_uncore_box         ** __percpu box;
> -       struct list_head                box_list;
> +       struct intel_uncore_box         **boxes;
>  };
>
>  struct intel_uncore_extra_reg {
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-17 21:19   ` Stephane Eranian
@ 2016-02-17 21:24     ` Andi Kleen
  2016-02-17 21:56       ` Thomas Gleixner
  2016-02-17 21:25     ` Thomas Gleixner
  1 sibling, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2016-02-17 21:24 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Borislav Petkov, Harish Chegondi, Kan Liang, Andi Kleen

Stephane Eranian <eranian@google.com> writes:

>>
> Let's assume  you have a system with 48 cpus with HT on, then
> you have  2 processor sockets, which is correct. But then you further
> assume that  topology_physical_package_id() will return 0 or 1 in this case.
> In other words, that physical id are always assigned from 0 to N in a contiguous
> manner.
>
> Do we know for sure that this is always the case for all Intel systems
> and that no
> weird BIOS is assigning random numbers for phys_proc_id?

I don't believe that is a safe assumption.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-17 21:19   ` Stephane Eranian
  2016-02-17 21:24     ` Andi Kleen
@ 2016-02-17 21:25     ` Thomas Gleixner
  1 sibling, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 21:25 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	Harish Chegondi, Kan Liang, Andi Kleen

Stephane,

On Wed, 17 Feb 2016, Stephane Eranian wrote:

Please trim your replies.

> On Wed, Feb 17, 2016 at 5:47 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > +       size = topology_max_packages() * sizeof(struct intel_uncore_box *);
> >
> Let's assume  you have a system with 48 cpus with HT on, then
> you have  2 processor sockets, which is correct. But then you further
> assume that  topology_physical_package_id() will return 0 or 1 in this case.
> In other words, that physical id are always assigned from 0 to N in a contiguous
> manner.
> 
> Do we know for sure that this is always the case for all Intel systems
> and that no
> weird BIOS is assigning random numbers for phys_proc_id?

We have quite some stuff which depends on phys_proc_id < max_package_id.

But sure, we should add a sanity check somewhere if we don't have one
already. I'll go digging around in the cpu setup code.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-17 21:24     ` Andi Kleen
@ 2016-02-17 21:56       ` Thomas Gleixner
  2016-02-17 22:16         ` Andi Kleen
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 21:56 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Stephane Eranian, LKML, Peter Zijlstra, Ingo Molnar,
	Borislav Petkov, Harish Chegondi, Kan Liang, Andi Kleen

On Wed, 17 Feb 2016, Andi Kleen wrote:

> Stephane Eranian <eranian@google.com> writes:
> 
> >>
> > Let's assume  you have a system with 48 cpus with HT on, then
> > you have  2 processor sockets, which is correct. But then you further
> > assume that  topology_physical_package_id() will return 0 or 1 in this case.
> > In other words, that physical id are always assigned from 0 to N in a contiguous
> > manner.
> >
> > Do we know for sure that this is always the case for all Intel systems
> > and that no
> > weird BIOS is assigning random numbers for phys_proc_id?
> 
> I don't believe that is a safe assumption.

Do you have any data to back that up or is that just "believe" ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch 04/11] x86/perf/intel_uncore: Cleanup hardware on exit
  2016-02-17 18:16     ` Thomas Gleixner
@ 2016-02-17 21:57       ` Liang, Kan
  2016-02-17 22:00         ` Thomas Gleixner
  0 siblings, 1 reply; 30+ messages in thread
From: Liang, Kan @ 2016-02-17 21:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	Stephane Eranian, Chegondi, Harish, Kleen, Andi


> 
> > > @@ -201,6 +201,11 @@ static void nhmex_uncore_msr_init_box(st
> > >  	wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL,
> > > NHMEX_U_PMON_GLOBAL_EN_ALL);  }
> 
> > > +static void nhmex_uncore_msr_exit_box(struct intel_uncore_box
> *box)
> > > {
> > > +	wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL, 0); }
> 
> The reset value for this register is 0. So how is that wrong?
> 
> > > +static void snb_uncore_msr_exit_box(struct intel_uncore_box *box)
> {
> > > +	if (box->pmu->pmu_idx == 0)
> > > +		wrmsrl(SNB_UNC_PERF_GLOBAL_CTL, 0); }
> 
> Ditto.
> 
> > > +static void snb_uncore_imc_exit_box(struct intel_uncore_box *box) {
> > > +	iounmap(box->io_addr);
> 
> That's definitely required, because it would leak a mapping.
> 
> I know Intel folks do not care about error handling and a few reference
> leaks, but I care very much.
> 
> If there is a single instance of exit_box() in that patch which flips the wrong
> bits, then please point it out with the proper reference in the manual and
> not with such half baken statements as above.
> 

Sorry, I didn't make it clear.
For the older server platforms like nhmex and client platforms like snb, I agree
with you on nhmex_uncore_msr_exit_box and snb_uncore_imc_exit_box.

However, for newer server platforms (start from IVB server), we cannot write 0
to rsv bit of BOX control registers. The behavior is undefined.
The following codes may have issues.
It looks we also write 0 to rsv bit in box_init. We may need to fix it.

You can find all the server uncore documents here.
https://software.intel.com/en-us/blogs/2014/07/11/documentation-for-uncore-performance-monitoring-units

> +static void snbep_uncore_pci_exit_box(struct intel_uncore_box *box) {
> +	struct pci_dev *pdev = box->pci_dev;
> +	int box_ctl = uncore_pci_box_ctl(box);
> +
> +	pci_write_config_dword(pdev, box_ctl, 0); }
> +

> +static void snbep_uncore_msr_exit_box(struct intel_uncore_box *box) {
> +	unsigned msr = uncore_msr_box_ctl(box);
> +
> +	if (msr)
> +		wrmsrl(msr, 0);
> +}

> +static void ivbep_uncore_msr_exit_box(struct intel_uncore_box *box) {
> +	unsigned msr = uncore_msr_box_ctl(box);
> +
> +	if (msr)
> +		wrmsrl(msr, 0);
> +}

> +static void ivbep_uncore_pci_exit_box(struct intel_uncore_box *box) {
> +	struct pci_dev *pdev = box->pci_dev;
> +
> +	pci_write_config_dword(pdev, SNBEP_PCI_PMON_BOX_CTL, 0); }
> +

> +static void hswep_uncore_sbox_msr_exit_box(struct intel_uncore_box
> +*box) {
> +	unsigned msr = uncore_msr_box_ctl(box);
> +
> +	/* CHECKME: Does this need the bit dance like init() ? */
> +	if (msr)
> +		wrmsrl(msr, 0);
> +}
> +

Thanks,

Kan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch 04/11] x86/perf/intel_uncore: Cleanup hardware on exit
  2016-02-17 21:57       ` Liang, Kan
@ 2016-02-17 22:00         ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 22:00 UTC (permalink / raw)
  To: Liang, Kan
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	Stephane Eranian, Chegondi, Harish, Kleen, Andi

On Wed, 17 Feb 2016, Liang, Kan wrote:
> > If there is a single instance of exit_box() in that patch which flips the wrong
> > bits, then please point it out with the proper reference in the manual and
> > not with such half baken statements as above.
> > 
> Sorry, I didn't make it clear.
> For the older server platforms like nhmex and client platforms like snb, I agree
> with you on nhmex_uncore_msr_exit_box and snb_uncore_imc_exit_box.
> 
> However, for newer server platforms (start from IVB server), we cannot write 0
> to rsv bit of BOX control registers. The behavior is undefined.
> The following codes may have issues.
> It looks we also write 0 to rsv bit in box_init. We may need to fix it.

Each platform can have its own init/exit callbacks or just omit them when not
needed. But we certainly want them for platforms where it is either needed
(i.e. undo ioremap) or makes sense from the hardware POV.

Thanks,

	tglx

 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-17 21:56       ` Thomas Gleixner
@ 2016-02-17 22:16         ` Andi Kleen
  2016-02-17 22:31           ` Thomas Gleixner
                             ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Andi Kleen @ 2016-02-17 22:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Stephane Eranian, LKML, Peter Zijlstra, Ingo Molnar,
	Borislav Petkov, Harish Chegondi, Kan Liang, Andi Kleen

> Do you have any data to back that up or is that just "believe" ?

I've seen systems with discontiguous apic ids before.

It is obvious if you consider setups with node hotplug.

BTW reading this thread you don't seem interested in any code review feedback,
attacking everyone, not bothering to look up data sheets,
not understanding the hardware you're changing the driver for, etc.
Why do you even bother to post the patches?

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-17 22:16         ` Andi Kleen
@ 2016-02-17 22:31           ` Thomas Gleixner
  2016-02-18  7:50           ` Ingo Molnar
  2016-02-18  8:13           ` Peter Zijlstra
  2 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-17 22:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Stephane Eranian, LKML, Peter Zijlstra, Ingo Molnar,
	Borislav Petkov, Harish Chegondi, Kan Liang, Andi Kleen

On Wed, 17 Feb 2016, Andi Kleen wrote:

> > Do you have any data to back that up or is that just "believe" ?
> 
> I've seen systems with discontiguous apic ids before.
> 
> It is obvious if you consider setups with node hotplug.

Right, but if you have 1024 maximal cores and 
 
> BTW reading this thread you don't seem interested in any code review
> feedback, attacking everyone, not bothering to look up data sheets,

I did look up data sheets, but I assumed that IVB is the same. But that's what
reviews are for.

> not understanding the hardware you're changing the driver for, etc.

Oh well.

> Why do you even bother to post the patches?

Just to annoy you.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-17 22:16         ` Andi Kleen
  2016-02-17 22:31           ` Thomas Gleixner
@ 2016-02-18  7:50           ` Ingo Molnar
  2016-02-18  8:13           ` Peter Zijlstra
  2 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2016-02-18  7:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Thomas Gleixner, Stephane Eranian, LKML, Peter Zijlstra,
	Borislav Petkov, Harish Chegondi, Kan Liang, Andi Kleen


* Andi Kleen <andi@firstfloor.org> wrote:

> > Do you have any data to back that up or is that just "believe" ?
> 
> I've seen systems with discontiguous apic ids before.
> 
> It is obvious if you consider setups with node hotplug.
> 
> BTW reading this thread you don't seem interested in any code review feedback, 
> attacking everyone, [...]

I've not seen Thomas attack anyone in this thread - I've seen him getting 
rightfully grumpy at sub-par code quality of the Intel uncore driver...

> [...] not bothering to look up data sheets, not understanding the hardware 
> you're changing the driver for, etc. Why do you even bother to post the patches?

Stop these personal attacks and stop insulting people why try to clean up the mess 
you made of a driver!

Thomas found real bugs in the driver and presented a much cleaner model for 
representing uncore PMUs, which results in nice simplifications:

   2 files changed, 139 insertions(+), 220 deletions(-)

if these changes survive review and testing then they are very much 'worth it', 
and that fact should be apparent to you as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-17 22:16         ` Andi Kleen
  2016-02-17 22:31           ` Thomas Gleixner
  2016-02-18  7:50           ` Ingo Molnar
@ 2016-02-18  8:13           ` Peter Zijlstra
  2016-02-18  9:35             ` Stephane Eranian
  2016-02-18 10:22             ` Thomas Gleixner
  2 siblings, 2 replies; 30+ messages in thread
From: Peter Zijlstra @ 2016-02-18  8:13 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Thomas Gleixner, Stephane Eranian, LKML, Ingo Molnar,
	Borislav Petkov, Harish Chegondi, Kan Liang, Andi Kleen

On Wed, Feb 17, 2016 at 11:16:40PM +0100, Andi Kleen wrote:
> > Do you have any data to back that up or is that just "believe" ?
> 
> I've seen systems with discontiguous apic ids before.
> 
> It is obvious if you consider setups with node hotplug.

Those systems will also have a num_possible_cpus() that's inflated to
account for the possible hotplug of new sockets, right?

So while the phys_pkg_id of the present sockets might be 2 and 3
(leaving 0 and 1 available for hotplug), the num_possible_cpus() should
be big enough to actually allow for that hotplug to happen.

Because if num_possible_cpus() is too small, we could not accommodate
the hotplug operation.

And if num_possible_cpus() is of the right size, then the computed
max_packages() should be of the right size too.

Now clearly, BIOS can completely wreck things and indeed report too
small an apic_id range or whatever, and in this case we're up a creek
without a paddle.

But I think you can check for that at boot and report errors/warns
whatever, because if you trigger this, your system is not really
'correct' anyway.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-18  8:13           ` Peter Zijlstra
@ 2016-02-18  9:35             ` Stephane Eranian
  2016-02-18  9:51               ` Peter Zijlstra
  2016-02-18 10:25               ` Thomas Gleixner
  2016-02-18 10:22             ` Thomas Gleixner
  1 sibling, 2 replies; 30+ messages in thread
From: Stephane Eranian @ 2016-02-18  9:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Thomas Gleixner, LKML, Ingo Molnar, Borislav Petkov,
	Harish Chegondi, Kan Liang, Andi Kleen

On Thu, Feb 18, 2016 at 12:13 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Feb 17, 2016 at 11:16:40PM +0100, Andi Kleen wrote:
> > > Do you have any data to back that up or is that just "believe" ?
> >
> > I've seen systems with discontiguous apic ids before.
> >
> > It is obvious if you consider setups with node hotplug.
>
> Those systems will also have a num_possible_cpus() that's inflated to
> account for the possible hotplug of new sockets, right?
>
> So while the phys_pkg_id of the present sockets might be 2 and 3
> (leaving 0 and 1 available for hotplug), the num_possible_cpus() should
> be big enough to actually allow for that hotplug to happen.
>
> Because if num_possible_cpus() is too small, we could not accommodate
> the hotplug operation.
>
> And if num_possible_cpus() is of the right size, then the computed
> max_packages() should be of the right size too.
>
> Now clearly, BIOS can completely wreck things and indeed report too
> small an apic_id range or whatever, and in this case we're up a creek
> without a paddle.
>
> But I think you can check for that at boot and report errors/warns
> whatever, because if you trigger this, your system is not really
> 'correct' anyway.
>
The example I was worried about is as follows, take an Ivytown system
(IVT) with 12 physical cores per package on a 2 socket system.

Thomas's  topology_max_packages()  macro would return 2, for 2 packages
with CPU0..CPU47.

But let's assume that the BIOS does some weird mappings and that the
id for socket0 is indeed 0
but for socket1 it is 255. Then doing:

    pkg = topology_physical_package_id();
    pmu->boxes[pkg];

Would crash because the boxes[] has space for 2 elements and not 256.

If we know this cannot happen, then the code is fine. If we are not
sure, then I believe a check should be added
and if a mismatch is found, then the uncore PMU init code should
return an error. That's all I was trying to point
out. I think Thomas' code is indeed a good simplification.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-18  9:35             ` Stephane Eranian
@ 2016-02-18  9:51               ` Peter Zijlstra
  2016-02-18 10:25               ` Thomas Gleixner
  1 sibling, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2016-02-18  9:51 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Andi Kleen, Thomas Gleixner, LKML, Ingo Molnar, Borislav Petkov,
	Harish Chegondi, Kan Liang, Andi Kleen

On Thu, Feb 18, 2016 at 01:35:56AM -0800, Stephane Eranian wrote:
> But let's assume that the BIOS does some weird mappings and that the
> id for socket0 is indeed 0
> but for socket1 it is 255. Then doing:

Right, that would be fail. But Andi was talking about physical hotplug,
and that should work if the BIOS isn't weird like that.

Now obviously we cannot trust the BIOS to not be weird and tglx is
working on new topology bits that can indeed accommodate BIOS fail.

Would still be good to complain on boot if we find such weirdness, even
if its not fatal anymore.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-18  8:13           ` Peter Zijlstra
  2016-02-18  9:35             ` Stephane Eranian
@ 2016-02-18 10:22             ` Thomas Gleixner
  2016-02-18 10:54               ` Thomas Gleixner
  1 sibling, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-18 10:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Stephane Eranian, LKML, Ingo Molnar, Borislav Petkov,
	Harish Chegondi, Kan Liang, Andi Kleen

On Thu, 18 Feb 2016, Peter Zijlstra wrote:
> On Wed, Feb 17, 2016 at 11:16:40PM +0100, Andi Kleen wrote:
> > > Do you have any data to back that up or is that just "believe" ?
> > 
> > I've seen systems with discontiguous apic ids before.
> > 
> > It is obvious if you consider setups with node hotplug.
> 
> Those systems will also have a num_possible_cpus() that's inflated to
> account for the possible hotplug of new sockets, right?
> 
> So while the phys_pkg_id of the present sockets might be 2 and 3
> (leaving 0 and 1 available for hotplug), the num_possible_cpus() should
> be big enough to actually allow for that hotplug to happen.
> 
> Because if num_possible_cpus() is too small, we could not accommodate
> the hotplug operation.
> 
> And if num_possible_cpus() is of the right size, then the computed
> max_packages() should be of the right size too.
> 
> Now clearly, BIOS can completely wreck things and indeed report too
> small an apic_id range or whatever, and in this case we're up a creek
> without a paddle.
> 
> But I think you can check for that at boot and report errors/warns
> whatever, because if you trigger this, your system is not really
> 'correct' anyway.

Correct. Furthermore, we have a limitation of apic ids anyway. So there is a
limitation of creative apic id assignements already.

Now if the system implementer can assign apic ids randomly in that space, the
assumption of 

	   maxpackages = num_possible_cpus / cpus_per_package

is not longer true. But that's not a blocker for the approach of tracking per
package data. We simply can create logical package ids and use those.

So there is another possible issue. The above is also not true if we get
heterogenous systems, i.e. cpus_per_package is non constant. But if we create
logical package ids when building the system topoplogy the we can account for
that and adjust maxpackages accordingly.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-18  9:35             ` Stephane Eranian
  2016-02-18  9:51               ` Peter Zijlstra
@ 2016-02-18 10:25               ` Thomas Gleixner
  1 sibling, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-18 10:25 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, Andi Kleen, LKML, Ingo Molnar, Borislav Petkov,
	Harish Chegondi, Kan Liang, Andi Kleen

Stephane,

On Thu, 18 Feb 2016, Stephane Eranian wrote:
> On Thu, Feb 18, 2016 at 12:13 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > Now clearly, BIOS can completely wreck things and indeed report too
> > small an apic_id range or whatever, and in this case we're up a creek
> > without a paddle.
> >
> > But I think you can check for that at boot and report errors/warns
> > whatever, because if you trigger this, your system is not really
> > 'correct' anyway.
> >
> The example I was worried about is as follows, take an Ivytown system
> (IVT) with 12 physical cores per package on a 2 socket system.
> 
> Thomas's  topology_max_packages()  macro would return 2, for 2 packages
> with CPU0..CPU47.
> 
> But let's assume that the BIOS does some weird mappings and that the
> id for socket0 is indeed 0
> but for socket1 it is 255. Then doing:
> 
>     pkg = topology_physical_package_id();
>     pmu->boxes[pkg];
> 
> Would crash because the boxes[] has space for 2 elements and not 256.

Right. It did not occur to me that this might be possible, but as I said in my
reply to Peter, we can simply create logical package ids and use them. That
makes us safe against BIOS tinkering.
 
> If we know this cannot happen, then the code is fine. If we are not
> sure, then I believe a check should be added
> and if a mismatch is found, then the uncore PMU init code should
> return an error. That's all I was trying to point
> out. I think Thomas' code is indeed a good simplification.

I'm glad you like it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-18 10:22             ` Thomas Gleixner
@ 2016-02-18 10:54               ` Thomas Gleixner
  2016-02-19  8:39                 ` Thomas Gleixner
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-18 10:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Stephane Eranian, LKML, Ingo Molnar, Borislav Petkov,
	Harish Chegondi, Kan Liang, Andi Kleen

On Thu, 18 Feb 2016, Thomas Gleixner wrote:
> On Thu, 18 Feb 2016, Peter Zijlstra wrote:
> > On Wed, Feb 17, 2016 at 11:16:40PM +0100, Andi Kleen wrote:
> > > > Do you have any data to back that up or is that just "believe" ?
> > > 
> > > I've seen systems with discontiguous apic ids before.
> > > 
> > > It is obvious if you consider setups with node hotplug.
> > 
> > Those systems will also have a num_possible_cpus() that's inflated to
> > account for the possible hotplug of new sockets, right?
> > 
> > So while the phys_pkg_id of the present sockets might be 2 and 3
> > (leaving 0 and 1 available for hotplug), the num_possible_cpus() should
> > be big enough to actually allow for that hotplug to happen.
> > 
> > Because if num_possible_cpus() is too small, we could not accommodate
> > the hotplug operation.
> > 
> > And if num_possible_cpus() is of the right size, then the computed
> > max_packages() should be of the right size too.
> > 
> > Now clearly, BIOS can completely wreck things and indeed report too
> > small an apic_id range or whatever, and in this case we're up a creek
> > without a paddle.
> > 
> > But I think you can check for that at boot and report errors/warns
> > whatever, because if you trigger this, your system is not really
> > 'correct' anyway.
> 
> Correct. Furthermore, we have a limitation of apic ids anyway. So there is a
> limitation of creative apic id assignements already.
> 
> Now if the system implementer can assign apic ids randomly in that space, the
> assumption of 
> 
> 	   maxpackages = num_possible_cpus / cpus_per_package
> 
> is not longer true. But that's not a blocker for the approach of tracking per
> package data. We simply can create logical package ids and use those.
> 
> So there is another possible issue. The above is also not true if we get
> heterogenous systems, i.e. cpus_per_package is non constant. But if we create
> logical package ids when building the system topoplogy the we can account for
> that and adjust maxpackages accordingly.

Thinking more about it. Even on heterogeneous systems the implementer needs to
be sane with the apic ids. The APIC id has 4 levels

   [Cluster ID][Package ID][Core ID][SMT ID]

So for any given system, the number of [Core ID + SMT ID] bits must be the
same for every package independent of the actual number of cores in the
package. If you violate this, then your APIC ID space is not longer
consistent. So we need to check for this anyway.

The SDM agrees with that:

Intel supports multi-threading systems where all physical processors report
identical values in CPUID leaf 0BH, CPUID.1:EBX[23:16]), CPUID.4 10
:EAX[31:26], and CPUID.4 11 :EAX[25:14].

So there is a limitation to BIOS creativity and all we need is to maintain a
logical package id.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch 07/11] x86/perf/uncore: Track packages not per cpu data
  2016-02-18 10:54               ` Thomas Gleixner
@ 2016-02-19  8:39                 ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-02-19  8:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Stephane Eranian, LKML, Ingo Molnar, Borislav Petkov,
	Harish Chegondi, Kan Liang, Andi Kleen

On Thu, 18 Feb 2016, Thomas Gleixner wrote:
> So there is a limitation to BIOS creativity and all we need is to maintain a
> logical package id.

Just for the record. This driver would fall apart today if the socket ids
would be > 7. The uncore extra PCI devs assume a fixed mapping between the
physid in their config space and the physid of the socket. If the pci config
space id would be > 8 then the storage space is exceeded.

#define UNCORE_SOCKET_MAX               8
struct pci_dev *uncore_extra_pci_dev[UNCORE_SOCKET_MAX][UNCORE_EXTRA_PCI_DEV_MAX];

Of course there is no range check for this either. uncore_pci_probe() sticks
it blindly into the array....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-02-19  8:41 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-17 13:47 [patch 00/11] x86/perf/intel_uncore: Cleanup and enhancements Thomas Gleixner
2016-02-17 13:47 ` [patch 01/11] x86/perf/intel_uncore: Remove pointless mask check Thomas Gleixner
2016-02-17 13:47 ` [patch 02/11] x86/perf/intel_uncore: Simplify error rollback Thomas Gleixner
2016-02-17 13:47 ` [patch 04/11] x86/perf/intel_uncore: Cleanup hardware on exit Thomas Gleixner
2016-02-17 15:49   ` Liang, Kan
2016-02-17 18:16     ` Thomas Gleixner
2016-02-17 21:57       ` Liang, Kan
2016-02-17 22:00         ` Thomas Gleixner
2016-02-17 13:47 ` [patch 03/11] x86/perf/intel_uncore: Fix error handling Thomas Gleixner
2016-02-17 13:47 ` [patch 05/11] x86/perf/intel_uncore: Make code readable Thomas Gleixner
2016-02-17 13:47 ` [patch 07/11] x86/perf/uncore: Track packages not per cpu data Thomas Gleixner
2016-02-17 21:19   ` Stephane Eranian
2016-02-17 21:24     ` Andi Kleen
2016-02-17 21:56       ` Thomas Gleixner
2016-02-17 22:16         ` Andi Kleen
2016-02-17 22:31           ` Thomas Gleixner
2016-02-18  7:50           ` Ingo Molnar
2016-02-18  8:13           ` Peter Zijlstra
2016-02-18  9:35             ` Stephane Eranian
2016-02-18  9:51               ` Peter Zijlstra
2016-02-18 10:25               ` Thomas Gleixner
2016-02-18 10:22             ` Thomas Gleixner
2016-02-18 10:54               ` Thomas Gleixner
2016-02-19  8:39                 ` Thomas Gleixner
2016-02-17 21:25     ` Thomas Gleixner
2016-02-17 13:47 ` [patch 06/11] x86/topology: Provide helper to retrieve number of cpu packages Thomas Gleixner
2016-02-17 13:47 ` [patch 08/11] x86/perf/intel_uncore: Clear all hardware state on exit Thomas Gleixner
2016-02-17 13:47 ` [patch 10/11] cpumask: Export cpumask_any_but Thomas Gleixner
2016-02-17 13:47 ` [patch 09/11] x86/perf/intel_uncore: Make PCI and MSR uncore independent Thomas Gleixner
2016-02-17 13:47 ` [patch 11/11] x86/perf/intel_uncore: Make it modular Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.