nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Add perf interface to expose nvdimm
@ 2021-06-08 11:56 Kajol Jain
  2021-06-08 11:56 ` [PATCH 1/4] drivers/nvdimm: Add nvdimm pmu structure Kajol Jain
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Kajol Jain @ 2021-06-08 11:56 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz
  Cc: maddy, santosh, aneesh.kumar, vaibhav, dan.j.williams, ira.weiny,
	atrajeev, tglx, kjain, rnsastry

Patchset adds performance stats reporting support for nvdimm.
Added interface includes support for pmu register/unregister
functions. A structure is added called nvdimm_pmu to be used for
adding arch/platform specific data such as supported events, cpumask
pmu event functions like event_init/add/read/del.
User could use the standard perf tool to access perf
events exposed via pmu.

Added implementation to expose IBM pseries platform nmem*
device performance stats using this interface.

Result from power9 pseries lpar with 2 nvdimm device:
command:# perf list nmem
  nmem0/cchrhcnt/                                    [Kernel PMU event]
  nmem0/cchwhcnt/                                    [Kernel PMU event]
  nmem0/critrscu/                                    [Kernel PMU event]
  nmem0/ctlresct/                                    [Kernel PMU event]
  nmem0/ctlrestm/                                    [Kernel PMU event]
  nmem0/fastwcnt/                                    [Kernel PMU event]
  nmem0/hostlcnt/                                    [Kernel PMU event]
  nmem0/hostldur/                                    [Kernel PMU event]
  nmem0/hostscnt/                                    [Kernel PMU event]
  nmem0/hostsdur/                                    [Kernel PMU event]
  nmem0/medrcnt/                                     [Kernel PMU event]
  nmem0/medrdur/                                     [Kernel PMU event]
  nmem0/medwcnt/                                     [Kernel PMU event]
  nmem0/medwdur/                                     [Kernel PMU event]
  nmem0/memlife/                                     [Kernel PMU event]
  nmem0/noopstat/                                    [Kernel PMU event]
  nmem0/ponsecs/                                     [Kernel PMU event]
  nmem1/cchrhcnt/                                    [Kernel PMU event]
  nmem1/cchwhcnt/                                    [Kernel PMU event]
  nmem1/critrscu/                                    [Kernel PMU event]
  ...
  nmem1/noopstat/                                    [Kernel PMU event]
  nmem1/ponsecs/                                     [Kernel PMU event]

Patch1:
        Introduces the nvdimm_pmu structure
Patch2:
	Adds common interface to add arch/platform specific data
	includes supported events, pmu event functions. It also
	adds code for cpu hotplug support.
Patch3:
        Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
        nmem* pmu. It fills in the nvdimm_pmu structure with event attrs
        cpumask andevent functions and then registers the pmu by adding
        callbacks to register_nvdimm_pmu.
Patch4:
        Sysfs documentation patch

Changelog
---
RFC v3 -> PATCH
- Link to the RFC v3 patchset : https://lkml.org/lkml/2021/5/29/28

- Remove RFC tag.

- Add nvdimm_pmu_cpu_online function.

- A new variable 'arch_cpumask' is added to the struct nvdimm_pmu
  which can be used to provide cpumask by the arch specific code.
  It will used incase cpu hotplug is not handled by arch code.
  Now common interface first check for any active cpu in arch_cpumask
  to designate cpu to collect counter data and incase we dont have any
  active cpu in that mask, it will look into cpumask of the device
  numa node.

-Add code in papr_scm to fill arch_cpumask variable with required
 cpumask.

- Some optimizations/fixes from previous RFC code

v2 -> v3
- Link to the RFC v2 patchset : https://lkml.org/lkml/2021/5/25/591

- Moved hotplug code changes from papr_scm code to generic interface
  with required functionality as suggested by Peter Zijlstra

- Changed function parameter of unregister_nvdimm_pmu function from
  struct pmu to struct nvdimm_pmu.

- Now cpumask will get updated based on numa node of corresponding nvdimm
  device as suggested by Peter Zijlstra.

v1 -> v2
- Link to the RFC v1 patchset : https://lkml.org/lkml/2021/5/12/2747

- Removed intermediate functions nvdimm_pmu_read/nvdimm_pmu_add/
  nvdimm_pmu_del/nvdimm_pmu_event_init and directly assigned
  platfrom specific routines. Also add check for any NULL functions.
  Suggested by: Peter Zijlstra

- Add macros for event attribute array index which can be used to
  assign dynamically allocated attr_groups.

- New function 'nvdimm_pmu_mem_free' is added to free dynamic
  memory allocated for attr_groups in papr_scm.c

- PMU register call moved from papr_scm_nvdimm_init() to papr_scm_probe()

- Move addition of cpu/node/cpuhp_state attributes in struct nvdimm_pmu
  to patch 4 where cpu hotplug code added

- Removed device attribute from the attribute list of
  add/del/read/event_init functions in nvdimm_pmu structure
  as we need to assign them directly to pmu structure.

---

Kajol Jain (4):
  drivers/nvdimm: Add nvdimm pmu structure
  drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  powerpc/papr_scm: Add perf interface support
  powerpc/papr_scm: Document papr_scm sysfs event format entries

 Documentation/ABI/testing/sysfs-bus-papr-pmem |  31 ++
 arch/powerpc/include/asm/device.h             |   5 +
 arch/powerpc/platforms/pseries/papr_scm.c     | 365 ++++++++++++++++++
 drivers/nvdimm/Makefile                       |   1 +
 drivers/nvdimm/nd_perf.c                      | 230 +++++++++++
 include/linux/nd.h                            |  46 +++
 6 files changed, 678 insertions(+)
 create mode 100644 drivers/nvdimm/nd_perf.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] drivers/nvdimm: Add nvdimm pmu structure
  2021-06-08 11:56 [PATCH 0/4] Add perf interface to expose nvdimm Kajol Jain
@ 2021-06-08 11:56 ` Kajol Jain
  2021-06-08 11:56 ` [PATCH 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats Kajol Jain
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Kajol Jain @ 2021-06-08 11:56 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz
  Cc: maddy, santosh, aneesh.kumar, vaibhav, dan.j.williams, ira.weiny,
	atrajeev, tglx, kjain, rnsastry

A structure is added, called nvdimm_pmu, for performance
stats reporting support of nvdimm devices. It can be used to add
nvdimm pmu data such as supported events and pmu event functions
like event_init/add/read/del with cpu hotplug support.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 include/linux/nd.h | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/include/linux/nd.h b/include/linux/nd.h
index ee9ad76afbba..712499cf7335 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -8,6 +8,8 @@
 #include <linux/ndctl.h>
 #include <linux/device.h>
 #include <linux/badblocks.h>
+#include <linux/platform_device.h>
+#include <linux/perf_event.h>
 
 enum nvdimm_event {
 	NVDIMM_REVALIDATE_POISON,
@@ -23,6 +25,47 @@ enum nvdimm_claim_class {
 	NVDIMM_CCLASS_UNKNOWN,
 };
 
+/* Event attribute array index */
+#define NVDIMM_PMU_FORMAT_ATTR		0
+#define NVDIMM_PMU_EVENT_ATTR		1
+#define NVDIMM_PMU_CPUMASK_ATTR		2
+#define NVDIMM_PMU_NULL_ATTR		3
+
+/**
+ * struct nvdimm_pmu - data structure for nvdimm perf driver
+ *
+ * @name: name of the nvdimm pmu device.
+ * @pmu: pmu data structure for nvdimm performance stats.
+ * @dev: nvdimm device pointer.
+ * @functions(event_init/add/del/read): platform specific pmu functions.
+ * @attr_groups: data structure for events, formats and cpumask
+ * @cpu: designated cpu for counter access.
+ * @node: node for cpu hotplug notifier link.
+ * @cpuhp_state: state for cpu hotplug notification.
+ * @arch_cpumask: cpumask to get designated cpu for counter access.
+ */
+struct nvdimm_pmu {
+	const char *name;
+	struct pmu pmu;
+	struct device *dev;
+	int (*event_init)(struct perf_event *event);
+	int  (*add)(struct perf_event *event, int flags);
+	void (*del)(struct perf_event *event, int flags);
+	void (*read)(struct perf_event *event);
+	/*
+	 * Attribute groups for the nvdimm pmu. Index 0 used for
+	 * format attribute, index 1 used for event attribute,
+	 * index 2 used for cpusmask attribute and index 3 kept as NULL.
+	 */
+	const struct attribute_group *attr_groups[4];
+	int cpu;
+	struct hlist_node node;
+	enum cpuhp_state cpuhp_state;
+
+	/* cpumask provided by arch/platform specific code */
+	struct cpumask arch_cpumask;
+};
+
 struct nd_device_driver {
 	struct device_driver drv;
 	unsigned long type;
-- 
2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  2021-06-08 11:56 [PATCH 0/4] Add perf interface to expose nvdimm Kajol Jain
  2021-06-08 11:56 ` [PATCH 1/4] drivers/nvdimm: Add nvdimm pmu structure Kajol Jain
@ 2021-06-08 11:56 ` Kajol Jain
  2021-06-08 17:36   ` Peter Zijlstra
  2021-06-08 11:56 ` [PATCH 3/4] powerpc/papr_scm: Add perf interface support Kajol Jain
  2021-06-08 11:57 ` [PATCH 4/4] powerpc/papr_scm: Document papr_scm sysfs event format entries Kajol Jain
  3 siblings, 1 reply; 7+ messages in thread
From: Kajol Jain @ 2021-06-08 11:56 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz
  Cc: maddy, santosh, aneesh.kumar, vaibhav, dan.j.williams, ira.weiny,
	atrajeev, tglx, kjain, rnsastry

A common interface is added to get performance stats reporting
support for nvdimm devices. Added interface includes support for
pmu register/unregister functions, cpu hotplug and pmu event
functions like event_init/add/read/del.
User could use the standard perf tool to access perf
events exposed via pmu.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_perf.c | 230 +++++++++++++++++++++++++++++++++++++++
 include/linux/nd.h       |   3 +
 3 files changed, 234 insertions(+)
 create mode 100644 drivers/nvdimm/nd_perf.c

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 29203f3d3069..25dba6095612 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -18,6 +18,7 @@ nd_e820-y := e820.o
 libnvdimm-y := core.o
 libnvdimm-y += bus.o
 libnvdimm-y += dimm_devs.o
+libnvdimm-y += nd_perf.o
 libnvdimm-y += dimm.o
 libnvdimm-y += region_devs.o
 libnvdimm-y += region.o
diff --git a/drivers/nvdimm/nd_perf.c b/drivers/nvdimm/nd_perf.c
new file mode 100644
index 000000000000..5cc1f1c65972
--- /dev/null
+++ b/drivers/nvdimm/nd_perf.c
@@ -0,0 +1,230 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * nd_perf.c: NVDIMM Device Performance Monitoring Unit support
+ *
+ * Perf interface to expose nvdimm performance stats.
+ *
+ * Copyright (C) 2021 IBM Corporation
+ */
+
+#define pr_fmt(fmt) "nvdimm_pmu: " fmt
+
+#include <linux/nd.h>
+
+static ssize_t nvdimm_pmu_cpumask_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct nvdimm_pmu *nd_pmu;
+
+	nd_pmu = container_of(pmu, struct nvdimm_pmu, pmu);
+
+	return cpumap_print_to_pagebuf(true, buf, cpumask_of(nd_pmu->cpu));
+}
+
+static int nvdimm_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
+{
+	struct nvdimm_pmu *nd_pmu;
+	u32 target;
+	int nodeid;
+	const struct cpumask *cpumask;
+
+	nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+	/* Clear it, incase given cpu is set in nd_pmu->arch_cpumask */
+	cpumask_test_and_clear_cpu(cpu, &nd_pmu->arch_cpumask);
+
+	/*
+	 * If given cpu is not same as current designated cpu for
+	 * counter access, just return.
+	 */
+	if (cpu != nd_pmu->cpu)
+		return 0;
+
+	/* Check for any active cpu in nd_pmu->arch_cpumask */
+	target = cpumask_any(&nd_pmu->arch_cpumask);
+	nd_pmu->cpu = target;
+
+	/*
+	 * Incase we don't have any active cpu in nd_pmu->arch_cpumask,
+	 * check in given cpu's numa node list.
+	 */
+	if (target >= nr_cpu_ids) {
+		nodeid = cpu_to_node(cpu);
+		cpumask = cpumask_of_node(nodeid);
+		target = cpumask_any_but(cpumask, cpu);
+		nd_pmu->cpu = target;
+
+		if (target >= nr_cpu_ids)
+			return -1;
+	}
+
+	return 0;
+}
+
+static int nvdimm_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+	struct nvdimm_pmu *nd_pmu;
+
+	nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+	if (nd_pmu->cpu >= nr_cpu_ids)
+		nd_pmu->cpu = cpu;
+
+	return 0;
+}
+
+static int create_cpumask_attr_group(struct nvdimm_pmu *nd_pmu)
+{
+	struct perf_pmu_events_attr *attr;
+	struct attribute **attrs;
+	struct attribute_group *nvdimm_pmu_cpumask_group;
+
+	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+	if (!attr)
+		return -ENOMEM;
+
+	attrs = kzalloc(2 * sizeof(struct attribute *), GFP_KERNEL);
+	if (!attrs) {
+		kfree(attr);
+		return -ENOMEM;
+	}
+
+	/* Allocate memory for cpumask attribute group */
+	nvdimm_pmu_cpumask_group = kzalloc(sizeof(*nvdimm_pmu_cpumask_group), GFP_KERNEL);
+	if (!nvdimm_pmu_cpumask_group) {
+		kfree(attr);
+		kfree(attrs);
+		return -ENOMEM;
+	}
+
+	sysfs_attr_init(&attr->attr.attr);
+	attr->attr.attr.name = "cpumask";
+	attr->attr.attr.mode = 0444;
+	attr->attr.show = nvdimm_pmu_cpumask_show;
+	attrs[0] = &attr->attr.attr;
+	attrs[1] = NULL;
+
+	nvdimm_pmu_cpumask_group->attrs = attrs;
+	nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR] = nvdimm_pmu_cpumask_group;
+	return 0;
+}
+
+static int nvdimm_pmu_cpu_hotplug_init(struct nvdimm_pmu *nd_pmu)
+{
+	int nodeid, rc;
+	const struct cpumask *cpumask;
+
+	/*
+	 * Incase cpu hotplug is not handled by arch specific code
+	 * they can still provide required cpumask which can be used
+	 * to get designatd cpu for counter access.
+	 * Check for any active cpu in nd_pmu->arch_cpumask.
+	 */
+	if (!cpumask_empty(&nd_pmu->arch_cpumask)) {
+		nd_pmu->cpu = cpumask_any(&nd_pmu->arch_cpumask);
+	} else {
+		/* pick active cpu from the cpumask of device numa node. */
+		nodeid = dev_to_node(nd_pmu->dev);
+		cpumask = cpumask_of_node(nodeid);
+		nd_pmu->cpu = cpumask_any(cpumask);
+	}
+
+	rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "perf/nvdimm:online",
+				     nvdimm_pmu_cpu_online, nvdimm_pmu_cpu_offline);
+
+	if (rc < 0)
+		return rc;
+
+	nd_pmu->cpuhp_state = rc;
+
+	/* Register the pmu instance for cpu hotplug */
+	rc = cpuhp_state_add_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+	if (rc) {
+		cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+		return rc;
+	}
+
+	/* Create cpumask attribute group */
+	rc = create_cpumask_attr_group(nd_pmu);
+	if (rc) {
+		cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+		cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+		return rc;
+	}
+
+	return 0;
+}
+
+void nvdimm_pmu_free_hotplug_memory(struct nvdimm_pmu *nd_pmu)
+{
+	cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+	cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+
+	if (nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR])
+		kfree(nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR]->attrs);
+	kfree(nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR]);
+}
+
+int register_nvdimm_pmu(struct nvdimm_pmu *nd_pmu, struct platform_device *pdev)
+{
+	int rc;
+
+	if (!nd_pmu || !pdev)
+		return -EINVAL;
+
+	/* event functions like add/del/read/event_init should not be NULL */
+	if (WARN_ON_ONCE(!(nd_pmu->event_init && nd_pmu->add && nd_pmu->del && nd_pmu->read)))
+		return -EINVAL;
+
+	nd_pmu->pmu.task_ctx_nr = perf_invalid_context;
+	nd_pmu->pmu.name = nd_pmu->name;
+	nd_pmu->pmu.event_init = nd_pmu->event_init;
+	nd_pmu->pmu.add = nd_pmu->add;
+	nd_pmu->pmu.del = nd_pmu->del;
+	nd_pmu->pmu.read = nd_pmu->read;
+
+	nd_pmu->pmu.attr_groups = nd_pmu->attr_groups;
+	nd_pmu->pmu.capabilities = PERF_PMU_CAP_NO_INTERRUPT |
+				PERF_PMU_CAP_NO_EXCLUDE;
+
+	/*
+	 * Add platform_device->dev pointer to nvdimm_pmu to access
+	 * device data in events functions.
+	 */
+	nd_pmu->dev = &pdev->dev;
+
+	/*
+	 * Incase cpumask attribute is set it means cpu
+	 * hotplug is handled by the arch specific code and
+	 * we can skip calling hotplug_init.
+	 */
+	if (!nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR]) {
+		/* init cpuhotplug */
+		rc = nvdimm_pmu_cpu_hotplug_init(nd_pmu);
+		if (rc) {
+			pr_info("cpu hotplug feature failed for device: %s\n", nd_pmu->name);
+			return rc;
+		}
+	}
+
+	rc = perf_pmu_register(&nd_pmu->pmu, nd_pmu->name, -1);
+	if (rc) {
+		nvdimm_pmu_free_hotplug_memory(nd_pmu);
+		return rc;
+	}
+
+	pr_info("%s NVDIMM performance monitor support registered\n",
+		nd_pmu->name);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(register_nvdimm_pmu);
+
+void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu)
+{
+	/* handle freeing of memory nd_pmu in arch specific code */
+	perf_pmu_unregister(&nd_pmu->pmu);
+	nvdimm_pmu_free_hotplug_memory(nd_pmu);
+}
+EXPORT_SYMBOL_GPL(unregister_nvdimm_pmu);
diff --git a/include/linux/nd.h b/include/linux/nd.h
index 712499cf7335..7d8b4f7d277d 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -66,6 +66,9 @@ struct nvdimm_pmu {
 	struct cpumask arch_cpumask;
 };
 
+int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev);
+void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu);
+
 struct nd_device_driver {
 	struct device_driver drv;
 	unsigned long type;
-- 
2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 3/4] powerpc/papr_scm: Add perf interface support
  2021-06-08 11:56 [PATCH 0/4] Add perf interface to expose nvdimm Kajol Jain
  2021-06-08 11:56 ` [PATCH 1/4] drivers/nvdimm: Add nvdimm pmu structure Kajol Jain
  2021-06-08 11:56 ` [PATCH 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats Kajol Jain
@ 2021-06-08 11:56 ` Kajol Jain
  2021-06-08 11:57 ` [PATCH 4/4] powerpc/papr_scm: Document papr_scm sysfs event format entries Kajol Jain
  3 siblings, 0 replies; 7+ messages in thread
From: Kajol Jain @ 2021-06-08 11:56 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz
  Cc: maddy, santosh, aneesh.kumar, vaibhav, dan.j.williams, ira.weiny,
	atrajeev, tglx, kjain, rnsastry

Performance monitoring support for papr-scm nvdimm devices
via perf interface is added which includes addition of pmu
functions like add/del/read/event_init for nvdimm_pmu struture.

A new parameter 'priv' in added to the pdev_archdata structure to save
nvdimm_pmu device pointer, to handle the unregistering of pmu device.

papr_scm_pmu_register function populates the nvdimm_pmu structure
with events, cpumask, attribute groups along with event handling
functions. Finally the populated nvdimm_pmu structure is passed to
register the pmu device.
Event handling functions internally uses hcall to get events and
counter data.

Result in power9 machine with 2 nvdimm device:

Ex: List all event by perf list

command:# perf list nmem

  nmem0/cchrhcnt/                                    [Kernel PMU event]
  nmem0/cchwhcnt/                                    [Kernel PMU event]
  nmem0/critrscu/                                    [Kernel PMU event]
  nmem0/ctlresct/                                    [Kernel PMU event]
  nmem0/ctlrestm/                                    [Kernel PMU event]
  nmem0/fastwcnt/                                    [Kernel PMU event]
  nmem0/hostlcnt/                                    [Kernel PMU event]
  nmem0/hostldur/                                    [Kernel PMU event]
  nmem0/hostscnt/                                    [Kernel PMU event]
  nmem0/hostsdur/                                    [Kernel PMU event]
  nmem0/medrcnt/                                     [Kernel PMU event]
  nmem0/medrdur/                                     [Kernel PMU event]
  nmem0/medwcnt/                                     [Kernel PMU event]
  nmem0/medwdur/                                     [Kernel PMU event]
  nmem0/memlife/                                     [Kernel PMU event]
  nmem0/noopstat/                                    [Kernel PMU event]
  nmem0/ponsecs/                                     [Kernel PMU event]
  nmem1/cchrhcnt/                                    [Kernel PMU event]
  nmem1/cchwhcnt/                                    [Kernel PMU event]
  nmem1/critrscu/                                    [Kernel PMU event]
  ...
  nmem1/noopstat/                                    [Kernel PMU event]
  nmem1/ponsecs/                                     [Kernel PMU event]

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/include/asm/device.h         |   5 +
 arch/powerpc/platforms/pseries/papr_scm.c | 365 ++++++++++++++++++++++
 2 files changed, 370 insertions(+)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 219559d65864..47ed639f3b8f 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -48,6 +48,11 @@ struct dev_archdata {
 
 struct pdev_archdata {
 	u64 dma_mask;
+	/*
+	 * Pointer to nvdimm_pmu structure, to handle the unregistering
+	 * of pmu device
+	 */
+	void *priv;
 };
 
 #endif /* _ASM_POWERPC_DEVICE_H */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index ef26fe40efb0..92632b4a4a60 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -18,6 +18,8 @@
 #include <asm/plpar_wrappers.h>
 #include <asm/papr_pdsm.h>
 #include <asm/mce.h>
+#include <linux/perf_event.h>
+#include <linux/ctype.h>
 
 #define BIND_ANY_ADDR (~0ul)
 
@@ -67,6 +69,8 @@
 #define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
 #define PAPR_SCM_PERF_STATS_VERSION 0x1
 
+#define to_nvdimm_pmu(_pmu)	container_of(_pmu, struct nvdimm_pmu, pmu)
+
 /* Struct holding a single performance metric */
 struct papr_scm_perf_stat {
 	u8 stat_id[8];
@@ -116,6 +120,12 @@ struct papr_scm_priv {
 
 	/* length of the stat buffer as expected by phyp */
 	size_t stat_buffer_len;
+
+	 /* array to have event_code and stat_id mappings */
+	char **nvdimm_events_map;
+
+	/* count of supported events */
+	u32 total_events;
 };
 
 static int papr_scm_pmem_flush(struct nd_region *nd_region,
@@ -329,6 +339,354 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p,
 	return 0;
 }
 
+PMU_FORMAT_ATTR(event, "config:0-4");
+
+static struct attribute *nvdimm_pmu_format_attr[] = {
+	&format_attr_event.attr,
+	NULL,
+};
+
+static struct attribute_group nvdimm_pmu_format_group = {
+	.name = "format",
+	.attrs = nvdimm_pmu_format_attr,
+};
+
+static int papr_scm_pmu_get_value(struct perf_event *event, struct device *dev, u64 *count)
+{
+	struct papr_scm_perf_stat *stat;
+	struct papr_scm_perf_stats *stats;
+	struct papr_scm_priv *p = (struct papr_scm_priv *)dev->driver_data;
+	int rc, size;
+
+	/* Allocate request buffer enough to hold single performance stat */
+	size = sizeof(struct papr_scm_perf_stats) +
+		sizeof(struct papr_scm_perf_stat);
+
+	if (!p || !p->nvdimm_events_map)
+		return -EINVAL;
+
+	stats = kzalloc(size, GFP_KERNEL);
+	if (!stats)
+		return -ENOMEM;
+
+	stat = &stats->scm_statistic[0];
+	memcpy(&stat->stat_id,
+	       p->nvdimm_events_map[event->attr.config - 1],
+		sizeof(stat->stat_id));
+	stat->stat_val = 0;
+
+	rc = drc_pmem_query_stats(p, stats, 1);
+	if (rc < 0) {
+		kfree(stats);
+		return rc;
+	}
+
+	*count = be64_to_cpu(stat->stat_val);
+	kfree(stats);
+	return 0;
+}
+
+static int papr_scm_pmu_event_init(struct perf_event *event)
+{
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+	struct papr_scm_priv *p;
+
+	if (!nd_pmu)
+		return -EINVAL;
+
+	/* test the event attr type for PMU enumeration */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* it does not support event sampling mode */
+	if (is_sampling_event(event))
+		return -EOPNOTSUPP;
+
+	/* no branch sampling */
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	p = (struct papr_scm_priv *)nd_pmu->dev->driver_data;
+	if (!p)
+		return -EINVAL;
+
+	/* Invalid eventcode */
+	if (event->attr.config == 0 || event->attr.config > p->total_events)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int papr_scm_pmu_add(struct perf_event *event, int flags)
+{
+	u64 count;
+	int rc;
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+	if (!nd_pmu)
+		return -EINVAL;
+
+	if (flags & PERF_EF_START) {
+		rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &count);
+		if (rc)
+			return rc;
+
+		local64_set(&event->hw.prev_count, count);
+	}
+
+	return 0;
+}
+
+static void papr_scm_pmu_read(struct perf_event *event)
+{
+	u64 prev, now;
+	int rc;
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+	if (!nd_pmu)
+		return;
+
+	rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &now);
+	if (rc)
+		return;
+
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
+}
+
+static void papr_scm_pmu_del(struct perf_event *event, int flags)
+{
+	papr_scm_pmu_read(event);
+}
+
+static ssize_t device_show_string(struct device *dev, struct device_attribute *attr,
+				  char *buf)
+{
+	struct perf_pmu_events_attr *d;
+
+	d = container_of(attr, struct perf_pmu_events_attr, attr);
+
+	return sysfs_emit(buf, "%s\n", (char *)d->event_str);
+}
+
+static char *strtolower(char *updated_name)
+{
+	int i = 0;
+
+	while (updated_name[i]) {
+		if (isupper(updated_name[i]))
+			updated_name[i] = tolower(updated_name[i]);
+		i++;
+	}
+	updated_name[i] = '\0';
+	return strim(updated_name);
+}
+
+/* device_str_attr_create : Populate event "name" and string "str" in attribute */
+static struct attribute *device_str_attr_create_(char *name, char *str)
+{
+	struct perf_pmu_events_attr *attr;
+
+	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+	if (!attr)
+		return NULL;
+
+	sysfs_attr_init(&attr->attr.attr);
+	attr->event_str = str;
+	attr->attr.attr.name = strtolower(name);
+	attr->attr.attr.mode = 0444;
+	attr->attr.show = device_show_string;
+
+	return &attr->attr.attr;
+}
+
+static int papr_scm_pmu_check_events(struct papr_scm_priv *p, struct nvdimm_pmu *nd_pmu)
+{
+	struct papr_scm_perf_stat *stat;
+	struct papr_scm_perf_stats *stats, *single_stats;
+	int index, size, rc, count;
+	u32 available_events;
+	struct attribute **events;
+	char *eventcode, *eventname, *statid;
+	struct attribute_group *nvdimm_pmu_events_group;
+
+	if (!p->stat_buffer_len)
+		return -ENOENT;
+
+	available_events = (p->stat_buffer_len  - sizeof(struct papr_scm_perf_stats))
+			/ sizeof(struct papr_scm_perf_stat);
+
+	/* Allocate memory for events attribute group */
+	nvdimm_pmu_events_group = kzalloc(sizeof(*nvdimm_pmu_events_group), GFP_KERNEL);
+	if (!nvdimm_pmu_events_group)
+		return -ENOMEM;
+
+	/* Allocate the buffer for phyp where stats are written */
+	stats = kzalloc(p->stat_buffer_len, GFP_KERNEL);
+	if (!stats) {
+		rc = -ENOMEM;
+		goto out_nvdimm_pmu_events_group;
+	}
+
+	/* Allocate memory to nvdimm_event_map */
+	p->nvdimm_events_map = kcalloc(available_events, sizeof(char *), GFP_KERNEL);
+	if (!p->nvdimm_events_map) {
+		rc = -ENOMEM;
+		goto out_stats;
+	}
+
+	/* Called to get list of events supported */
+	rc = drc_pmem_query_stats(p, stats, 0);
+	if (rc)
+		goto out_nvdimm_events_map;
+
+	/* Allocate buffer to hold single performance stat */
+	size = sizeof(struct papr_scm_perf_stats) + sizeof(struct papr_scm_perf_stat);
+
+	single_stats = kzalloc(size, GFP_KERNEL);
+	if (!single_stats) {
+		rc = -ENOMEM;
+		goto out_nvdimm_events_map;
+	}
+
+	events = kzalloc(available_events * sizeof(struct attribute *), GFP_KERNEL);
+	if (!events) {
+		rc = -ENOMEM;
+		goto out_single_stats;
+	}
+
+	for (index = 0, stat = stats->scm_statistic, count = 0;
+		     index < available_events; index++, ++stat) {
+
+		single_stats->scm_statistic[0] = *stat;
+		rc = drc_pmem_query_stats(p, single_stats, 1);
+
+		if (rc < 0) {
+			pr_info("Event not supported %s for device %s\n",
+				stat->stat_id, nvdimm_name(p->nvdimm));
+		} else {
+			eventcode = kasprintf(GFP_KERNEL, "event=0x%x", count + 1);
+			eventname = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+			statid = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+
+			if (!eventname || !statid || !eventcode)
+				goto out;
+
+			strcpy(eventname, stat->stat_id);
+			events[count] = device_str_attr_create_(eventname,
+								eventcode);
+			if (!events[count])
+				goto out;
+
+			strcpy(statid, stat->stat_id);
+			p->nvdimm_events_map[count] = statid;
+			count++;
+			continue;
+out:
+			kfree(eventcode);
+			kfree(eventname);
+			kfree(statid);
+		}
+	}
+
+	if (!count)
+		goto out_events;
+
+	events[count] = NULL;
+	p->nvdimm_events_map[count] = NULL;
+	p->total_events = count;
+
+	nvdimm_pmu_events_group->name = "events";
+	nvdimm_pmu_events_group->attrs = events;
+
+	/* Fill attribute groups for the nvdimm pmu device */
+	nd_pmu->attr_groups[NVDIMM_PMU_FORMAT_ATTR] = &nvdimm_pmu_format_group;
+	nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR] = nvdimm_pmu_events_group;
+	nd_pmu->attr_groups[NVDIMM_PMU_NULL_ATTR] = NULL;
+
+	kfree(single_stats);
+	kfree(stats);
+	return 0;
+
+out_events:
+	kfree(events);
+out_single_stats:
+	kfree(single_stats);
+out_nvdimm_events_map:
+	kfree(p->nvdimm_events_map);
+out_stats:
+	kfree(stats);
+out_nvdimm_pmu_events_group:
+	kfree(nvdimm_pmu_events_group);
+	return rc;
+}
+
+/* Function to free the attr_groups which are dynamically allocated */
+static void papr_scm_pmu_mem_free(struct nvdimm_pmu *nd_pmu)
+{
+	if (nd_pmu) {
+		if (nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR])
+			kfree(nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR]->attrs);
+		kfree(nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR]);
+	}
+}
+
+static void papr_scm_pmu_register(struct papr_scm_priv *p)
+{
+	struct nvdimm_pmu *nd_pmu;
+	int rc, nodeid;
+
+	nd_pmu = kzalloc(sizeof(*nd_pmu), GFP_KERNEL);
+	if (!nd_pmu) {
+		rc = -ENOMEM;
+		goto pmu_err_print;
+	}
+
+	rc = papr_scm_pmu_check_events(p, nd_pmu);
+	if (rc)
+		goto pmu_check_events_err;
+
+	nd_pmu->name = nvdimm_name(p->nvdimm);
+	nd_pmu->event_init = papr_scm_pmu_event_init;
+	nd_pmu->read = papr_scm_pmu_read;
+	nd_pmu->add = papr_scm_pmu_add;
+	nd_pmu->del = papr_scm_pmu_del;
+
+	/*updating the cpumask variable */
+	nodeid = dev_to_node(&p->pdev->dev);
+	nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
+
+	/* cpumask should not be NULL */
+	WARN_ON_ONCE(cpumask_empty(&nd_pmu->arch_cpumask));
+
+	rc = register_nvdimm_pmu(nd_pmu, p->pdev);
+	if (rc)
+		goto pmu_register_err;
+
+	/*
+	 * Set archdata.priv value to nvdimm_pmu structure, to handle the
+	 * unregistering of pmu device.
+	 */
+	p->pdev->archdata.priv = nd_pmu;
+	return;
+
+pmu_register_err:
+	papr_scm_pmu_mem_free(nd_pmu);
+	kfree(p->nvdimm_events_map);
+pmu_check_events_err:
+	kfree(nd_pmu);
+pmu_err_print:
+	dev_info(&p->pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
+}
+
+static void papr_scm_pmu_uninit(struct nvdimm_pmu *nd_pmu)
+{
+	unregister_nvdimm_pmu(nd_pmu);
+	papr_scm_pmu_mem_free(nd_pmu);
+	kfree(nd_pmu);
+}
+
 /*
  * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
  * health information.
@@ -1177,6 +1535,7 @@ static int papr_scm_probe(struct platform_device *pdev)
 		goto err2;
 
 	platform_set_drvdata(pdev, p);
+	papr_scm_pmu_register(p);
 
 	return 0;
 
@@ -1195,6 +1554,12 @@ static int papr_scm_remove(struct platform_device *pdev)
 
 	nvdimm_bus_unregister(p->bus);
 	drc_pmem_unbind(p);
+
+	if (pdev->archdata.priv)
+		papr_scm_pmu_uninit(pdev->archdata.priv);
+
+	pdev->archdata.priv = NULL;
+	kfree(p->nvdimm_events_map);
 	kfree(p->bus_desc.provider_name);
 	kfree(p);
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 4/4] powerpc/papr_scm: Document papr_scm sysfs event format entries
  2021-06-08 11:56 [PATCH 0/4] Add perf interface to expose nvdimm Kajol Jain
                   ` (2 preceding siblings ...)
  2021-06-08 11:56 ` [PATCH 3/4] powerpc/papr_scm: Add perf interface support Kajol Jain
@ 2021-06-08 11:57 ` Kajol Jain
  3 siblings, 0 replies; 7+ messages in thread
From: Kajol Jain @ 2021-06-08 11:57 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz
  Cc: maddy, santosh, aneesh.kumar, vaibhav, dan.j.williams, ira.weiny,
	atrajeev, tglx, kjain, rnsastry

Details is added for the event, cpumask and format attributes
in the ABI documentation.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 Documentation/ABI/testing/sysfs-bus-papr-pmem | 31 +++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
index 92e2db0e2d3d..be91de341454 100644
--- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -59,3 +59,34 @@ Description:
 		* "CchRHCnt" : Cache Read Hit Count
 		* "CchWHCnt" : Cache Write Hit Count
 		* "FastWCnt" : Fast Write Count
+
+What:		/sys/devices/nmemX/format
+Date:		June 2021
+Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:	(RO) Attribute group to describe the magic bits
+                that go into perf_event_attr.config for a particular pmu.
+                (See ABI/testing/sysfs-bus-event_source-devices-format).
+
+                Each attribute under this group defines a bit range of the
+                perf_event_attr.config. Supported attribute is listed
+                below::
+
+		    event  = "config:0-4"  - event ID
+
+		For example::
+		    noopstat = "event=0x1"
+
+What:		/sys/devices/nmemX/events
+Date:		June 2021
+Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:    (RO) Attribute group to describe performance monitoring
+                events specific to papr-scm. Each attribute in this group describes
+                a single performance monitoring event supported by this nvdimm pmu.
+                The name of the file is the name of the event.
+                (See ABI/testing/sysfs-bus-event_source-devices-events).
+
+What:		/sys/devices/nmemX/cpumask
+Date:		June 2021
+Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:	(RO) This sysfs file exposes the cpumask which is designated to make
+                HCALLs to retrieve nvdimm pmu event counter data.
-- 
2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  2021-06-08 11:56 ` [PATCH 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats Kajol Jain
@ 2021-06-08 17:36   ` Peter Zijlstra
  2021-06-09 12:12     ` kajoljain
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2021-06-08 17:36 UTC (permalink / raw)
  To: Kajol Jain
  Cc: mpe, linuxppc-dev, nvdimm, linux-kernel, maddy, santosh,
	aneesh.kumar, vaibhav, dan.j.williams, ira.weiny, atrajeev, tglx,
	rnsastry

On Tue, Jun 08, 2021 at 05:26:58PM +0530, Kajol Jain wrote:
> +static int nvdimm_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct nvdimm_pmu *nd_pmu;
> +	u32 target;
> +	int nodeid;
> +	const struct cpumask *cpumask;
> +
> +	nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
> +
> +	/* Clear it, incase given cpu is set in nd_pmu->arch_cpumask */
> +	cpumask_test_and_clear_cpu(cpu, &nd_pmu->arch_cpumask);
> +
> +	/*
> +	 * If given cpu is not same as current designated cpu for
> +	 * counter access, just return.
> +	 */
> +	if (cpu != nd_pmu->cpu)
> +		return 0;
> +
> +	/* Check for any active cpu in nd_pmu->arch_cpumask */
> +	target = cpumask_any(&nd_pmu->arch_cpumask);
> +	nd_pmu->cpu = target;
> +
> +	/*
> +	 * Incase we don't have any active cpu in nd_pmu->arch_cpumask,
> +	 * check in given cpu's numa node list.
> +	 */
> +	if (target >= nr_cpu_ids) {
> +		nodeid = cpu_to_node(cpu);
> +		cpumask = cpumask_of_node(nodeid);
> +		target = cpumask_any_but(cpumask, cpu);
> +		nd_pmu->cpu = target;
> +
> +		if (target >= nr_cpu_ids)
> +			return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int nvdimm_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct nvdimm_pmu *nd_pmu;
> +
> +	nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
> +
> +	if (nd_pmu->cpu >= nr_cpu_ids)
> +		nd_pmu->cpu = cpu;
> +
> +	return 0;
> +}

> +static int nvdimm_pmu_cpu_hotplug_init(struct nvdimm_pmu *nd_pmu)
> +{
> +	int nodeid, rc;
> +	const struct cpumask *cpumask;
> +
> +	/*
> +	 * Incase cpu hotplug is not handled by arch specific code
> +	 * they can still provide required cpumask which can be used
> +	 * to get designatd cpu for counter access.
> +	 * Check for any active cpu in nd_pmu->arch_cpumask.
> +	 */
> +	if (!cpumask_empty(&nd_pmu->arch_cpumask)) {
> +		nd_pmu->cpu = cpumask_any(&nd_pmu->arch_cpumask);
> +	} else {
> +		/* pick active cpu from the cpumask of device numa node. */
> +		nodeid = dev_to_node(nd_pmu->dev);
> +		cpumask = cpumask_of_node(nodeid);
> +		nd_pmu->cpu = cpumask_any(cpumask);
> +	}
> +
> +	rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "perf/nvdimm:online",
> +				     nvdimm_pmu_cpu_online, nvdimm_pmu_cpu_offline);
> +

Did you actually test this hotplug stuff?

That is, create a counter, unplug the CPU the counter was on, and
continue counting? "perf stat -I" is a good option for this, concurrent
with a hotplug.

Because I don't think it's actually correct. The thing is perf core is
strictly per-cpu, and it will place the event on a specific CPU context.
If you then unplug that CPU, nothing will touch the events on that CPU
anymore.

What drivers that span CPUs need to do is call
perf_pmu_migrate_context() whenever the CPU they were assigned to goes
away. Please have a look at arch/x86/events/rapl.c or
arch/x86/events/amd/power.c for relatively simple drivers that have this
property.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  2021-06-08 17:36   ` Peter Zijlstra
@ 2021-06-09 12:12     ` kajoljain
  0 siblings, 0 replies; 7+ messages in thread
From: kajoljain @ 2021-06-09 12:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mpe, linuxppc-dev, nvdimm, linux-kernel, maddy, santosh,
	aneesh.kumar, vaibhav, dan.j.williams, ira.weiny, atrajeev, tglx,
	rnsastry



On 6/8/21 11:06 PM, Peter Zijlstra wrote:
> On Tue, Jun 08, 2021 at 05:26:58PM +0530, Kajol Jain wrote:
>> +static int nvdimm_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
>> +{
>> +	struct nvdimm_pmu *nd_pmu;
>> +	u32 target;
>> +	int nodeid;
>> +	const struct cpumask *cpumask;
>> +
>> +	nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
>> +
>> +	/* Clear it, incase given cpu is set in nd_pmu->arch_cpumask */
>> +	cpumask_test_and_clear_cpu(cpu, &nd_pmu->arch_cpumask);
>> +
>> +	/*
>> +	 * If given cpu is not same as current designated cpu for
>> +	 * counter access, just return.
>> +	 */
>> +	if (cpu != nd_pmu->cpu)
>> +		return 0;
>> +
>> +	/* Check for any active cpu in nd_pmu->arch_cpumask */
>> +	target = cpumask_any(&nd_pmu->arch_cpumask);
>> +	nd_pmu->cpu = target;
>> +
>> +	/*
>> +	 * Incase we don't have any active cpu in nd_pmu->arch_cpumask,
>> +	 * check in given cpu's numa node list.
>> +	 */
>> +	if (target >= nr_cpu_ids) {
>> +		nodeid = cpu_to_node(cpu);
>> +		cpumask = cpumask_of_node(nodeid);
>> +		target = cpumask_any_but(cpumask, cpu);
>> +		nd_pmu->cpu = target;
>> +
>> +		if (target >= nr_cpu_ids)
>> +			return -1;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int nvdimm_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
>> +{
>> +	struct nvdimm_pmu *nd_pmu;
>> +
>> +	nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
>> +
>> +	if (nd_pmu->cpu >= nr_cpu_ids)
>> +		nd_pmu->cpu = cpu;
>> +
>> +	return 0;
>> +}
> 
>> +static int nvdimm_pmu_cpu_hotplug_init(struct nvdimm_pmu *nd_pmu)
>> +{
>> +	int nodeid, rc;
>> +	const struct cpumask *cpumask;
>> +
>> +	/*
>> +	 * Incase cpu hotplug is not handled by arch specific code
>> +	 * they can still provide required cpumask which can be used
>> +	 * to get designatd cpu for counter access.
>> +	 * Check for any active cpu in nd_pmu->arch_cpumask.
>> +	 */
>> +	if (!cpumask_empty(&nd_pmu->arch_cpumask)) {
>> +		nd_pmu->cpu = cpumask_any(&nd_pmu->arch_cpumask);
>> +	} else {
>> +		/* pick active cpu from the cpumask of device numa node. */
>> +		nodeid = dev_to_node(nd_pmu->dev);
>> +		cpumask = cpumask_of_node(nodeid);
>> +		nd_pmu->cpu = cpumask_any(cpumask);
>> +	}
>> +
>> +	rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "perf/nvdimm:online",
>> +				     nvdimm_pmu_cpu_online, nvdimm_pmu_cpu_offline);
>> +
> 
> Did you actually test this hotplug stuff?
> 
> That is, create a counter, unplug the CPU the counter was on, and
> continue counting? "perf stat -I" is a good option for this, concurrent
> with a hotplug.
>
> Because I don't think it's actually correct. The thing is perf core is
> strictly per-cpu, and it will place the event on a specific CPU context.
> If you then unplug that CPU, nothing will touch the events on that CPU
> anymore.
> 
> What drivers that span CPUs need to do is call
> perf_pmu_migrate_context() whenever the CPU they were assigned to goes
> away. Please have a look at arch/x86/events/rapl.c or
> arch/x86/events/amd/power.c for relatively simple drivers that have this
> property.
> 


Hi Peter,
    Primarily I tested off-lining multiple cpus and checking if cpumask file is updating as expected,
followed with perf stat commands.
But I missed the scenario where we are off-lining CPU while running perf stat. My bad, thanks
for pointing it out.
I will fix this issue and send new version of the patchset.

Thanks,
Kajol Jain
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-09 12:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-08 11:56 [PATCH 0/4] Add perf interface to expose nvdimm Kajol Jain
2021-06-08 11:56 ` [PATCH 1/4] drivers/nvdimm: Add nvdimm pmu structure Kajol Jain
2021-06-08 11:56 ` [PATCH 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats Kajol Jain
2021-06-08 17:36   ` Peter Zijlstra
2021-06-09 12:12     ` kajoljain
2021-06-08 11:56 ` [PATCH 3/4] powerpc/papr_scm: Add perf interface support Kajol Jain
2021-06-08 11:57 ` [PATCH 4/4] powerpc/papr_scm: Document papr_scm sysfs event format entries Kajol Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).