linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/4] Add perf interface to expose nvdimm performance stats
@ 2021-05-12 16:38 Kajol Jain
  2021-05-12 16:38 ` [RFC 1/4] drivers/nvdimm: " Kajol Jain
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Kajol Jain @ 2021-05-12 16:38 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-nvdimm, linux-kernel
  Cc: maddy, santosh, aneesh.kumar, vaibhav, dan.j.williams, ira.weiny,
	atrajeev, kjain, peterz, tglx

Patchset adds performance stats reporting support for nvdimm.
Added interface includes support for a pmu register function and
callbacks to be used by arch/platform specific drivers.
User could use standard perf tool to access perf events exposed via pmu.

Patchset adds a structure called nvdimm_pmu which can
be used to add platform specific data like supported event list and
callbacks to pmu functions like event_init/add/delete/read.

Patchset includes an implements the to expose IBM pseries platform nmem*
device performance stats using this interface.

Result from power9 pseries lpar with 2 nvdimm device:
command:# perf list nmem
  nmem0/cchrhcnt/                                    [Kernel PMU event]
  nmem0/cchwhcnt/                                    [Kernel PMU event]
  nmem0/critrscu/                                    [Kernel PMU event]
  nmem0/ctlresct/                                    [Kernel PMU event]
  nmem0/ctlrestm/                                    [Kernel PMU event]
  nmem0/fastwcnt/                                    [Kernel PMU event]
  nmem0/hostlcnt/                                    [Kernel PMU event]
  nmem0/hostldur/                                    [Kernel PMU event]
  nmem0/hostscnt/                                    [Kernel PMU event]
  nmem0/hostsdur/                                    [Kernel PMU event]
  nmem0/medrcnt/                                     [Kernel PMU event]
  nmem0/medrdur/                                     [Kernel PMU event]
  nmem0/medwcnt/                                     [Kernel PMU event]
  nmem0/medwdur/                                     [Kernel PMU event]
  nmem0/memlife/                                     [Kernel PMU event]
  nmem0/noopstat/                                    [Kernel PMU event]
  nmem0/ponsecs/                                     [Kernel PMU event]
  nmem1/cchrhcnt/                                    [Kernel PMU event]
  nmem1/cchwhcnt/                                    [Kernel PMU event]
  nmem1/critrscu/                                    [Kernel PMU event]
  ...
  nmem1/noopstat/                                    [Kernel PMU event]
  nmem1/ponsecs/                                     [Kernel PMU event]

Patch1:
        Introduces the nvdimm_pmu structure, common function for pmu
        register along with callback routine check.
Pacth2
        Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
        nmem* pmu. It fills in the nvdimm_pmu structure with event attrs
        and event callback functions and then registers the pmu by adding
        callback to register_nvdimm_pmu.
Patch3:
        Sysfs documentation patch
Patch4:
        Adds cpuhotplug support.

Kajol Jain (4):
  drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  powerpc/papr_scm: Add perf interface support
  powerpc/papr_scm: Document papr_scm sysfs event format entries
  powerpc/papr_scm: Add cpu hotplug support for nvdimm pmu device

 Documentation/ABI/testing/sysfs-bus-papr-pmem |  31 ++
 arch/powerpc/include/asm/device.h             |   5 +
 arch/powerpc/platforms/pseries/papr_scm.c     | 346 +++++++++++++++++-
 drivers/nvdimm/Makefile                       |   1 +
 drivers/nvdimm/nd_perf.c                      | 111 ++++++
 include/linux/nd.h                            |  31 ++
 6 files changed, 524 insertions(+), 1 deletion(-)
 create mode 100644 drivers/nvdimm/nd_perf.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  2021-05-12 16:38 [RFC 0/4] Add perf interface to expose nvdimm performance stats Kajol Jain
@ 2021-05-12 16:38 ` Kajol Jain
  2021-05-12 17:27   ` Peter Zijlstra
  2021-05-12 16:38 ` [RFC 2/4] powerpc/papr_scm: Add perf interface support Kajol Jain
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Kajol Jain @ 2021-05-12 16:38 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-nvdimm, linux-kernel
  Cc: maddy, santosh, aneesh.kumar, vaibhav, dan.j.williams, ira.weiny,
	atrajeev, kjain, peterz, tglx

Patch adds performance stats reporting support for nvdimm.
Added interface includes support for a pmu register function and
callbacks to be used by the arch/platform specific drivers.
User could use the standard perf tool to access perf events exposed
via pmu.

A structure is added called nvdimm_pmu which can be used to add
platform specific data like supported events and callbacks to pmu
functions like event_init/add/delete/read. It also adds
unregister_nvdimm_pmu function to handle unregistering of a pmu device.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_perf.c | 111 +++++++++++++++++++++++++++++++++++++++
 include/linux/nd.h       |  31 +++++++++++
 3 files changed, 143 insertions(+)
 create mode 100644 drivers/nvdimm/nd_perf.c

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 29203f3d3069..25dba6095612 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -18,6 +18,7 @@ nd_e820-y := e820.o
 libnvdimm-y := core.o
 libnvdimm-y += bus.o
 libnvdimm-y += dimm_devs.o
+libnvdimm-y += nd_perf.o
 libnvdimm-y += dimm.o
 libnvdimm-y += region_devs.o
 libnvdimm-y += region.o
diff --git a/drivers/nvdimm/nd_perf.c b/drivers/nvdimm/nd_perf.c
new file mode 100644
index 000000000000..d28bec2b61a2
--- /dev/null
+++ b/drivers/nvdimm/nd_perf.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * nd_perf.c: NVDIMM Device Performance Monitoring Unit support
+ *
+ * Perf interface to expose nvdimm performance stats.
+ *
+ * Copyright (C) 2021 IBM Corporation
+ */
+
+#define pr_fmt(fmt) "nvdimm_pmu: " fmt
+
+#include <linux/nd.h>
+
+#define to_nvdimm_pmu(_pmu)	container_of(_pmu, struct nvdimm_pmu, pmu)
+
+static int nvdimm_pmu_event_init(struct perf_event *event)
+{
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+	/* test the event attr type for PMU enumeration */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* it does not support event sampling mode */
+	if (is_sampling_event(event))
+		return -EINVAL;
+
+	/* no branch sampling */
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	/* jump to arch/platform specific callbacks if any */
+	if (nd_pmu && nd_pmu->event_init)
+		return nd_pmu->event_init(event, nd_pmu->dev);
+
+	return 0;
+}
+
+static void nvdimm_pmu_read(struct perf_event *event)
+{
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+	/* jump to arch/platform specific callbacks if any */
+	if (nd_pmu && nd_pmu->read)
+		nd_pmu->read(event, nd_pmu->dev);
+}
+
+static void nvdimm_pmu_del(struct perf_event *event, int flags)
+{
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+	/* jump to arch/platform specific callbacks if any */
+	if (nd_pmu && nd_pmu->del)
+		nd_pmu->del(event, flags, nd_pmu->dev);
+}
+
+static int nvdimm_pmu_add(struct perf_event *event, int flags)
+{
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+	if (flags & PERF_EF_START)
+		/* jump to arch/platform specific callbacks if any */
+		if (nd_pmu && nd_pmu->add)
+			return nd_pmu->add(event, flags, nd_pmu->dev);
+	return 0;
+}
+
+int register_nvdimm_pmu(struct nvdimm_pmu *nd_pmu, struct platform_device *pdev)
+{
+	int rc;
+
+	if (!nd_pmu || !pdev)
+		return -EINVAL;
+
+	nd_pmu->pmu.task_ctx_nr = perf_invalid_context;
+	nd_pmu->pmu.event_init = nvdimm_pmu_event_init;
+	nd_pmu->pmu.add = nvdimm_pmu_add;
+	nd_pmu->pmu.del = nvdimm_pmu_del;
+	nd_pmu->pmu.read = nvdimm_pmu_read;
+	nd_pmu->pmu.name = nd_pmu->name;
+	nd_pmu->pmu.attr_groups = nd_pmu->attr_groups;
+	nd_pmu->pmu.capabilities = PERF_PMU_CAP_NO_INTERRUPT |
+				PERF_PMU_CAP_NO_EXCLUDE;
+
+	/*
+	 * Adding platform_device->dev pointer to nvdimm_pmu, so that we can
+	 * access that device data in PMU callbacks and also pass it to
+	 * arch/platform specific code.
+	 */
+	nd_pmu->dev = &pdev->dev;
+
+	rc = perf_pmu_register(&nd_pmu->pmu, nd_pmu->name, -1);
+	if (rc)
+		return rc;
+
+	pr_info("%s NVDIMM performance monitor support registered\n",
+		nd_pmu->name);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(register_nvdimm_pmu);
+
+void unregister_nvdimm_pmu(struct pmu *nd_pmu)
+{
+	/*
+	 * nd_pmu will get free in arch/platform specific code once
+	 * corresponding pmu get unregistered.
+	 */
+	perf_pmu_unregister(nd_pmu);
+}
+EXPORT_SYMBOL_GPL(unregister_nvdimm_pmu);
diff --git a/include/linux/nd.h b/include/linux/nd.h
index ee9ad76afbba..fa6e60b2b368 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -8,6 +8,8 @@
 #include <linux/ndctl.h>
 #include <linux/device.h>
 #include <linux/badblocks.h>
+#include <linux/platform_device.h>
+#include <linux/perf_event.h>
 
 enum nvdimm_event {
 	NVDIMM_REVALIDATE_POISON,
@@ -23,6 +25,35 @@ enum nvdimm_claim_class {
 	NVDIMM_CCLASS_UNKNOWN,
 };
 
+/**
+ * struct nvdimm_pmu - data structure for nvdimm perf driver
+ *
+ * @name: name of the nvdimm pmu device.
+ * @pmu: pmu data structure for nvdimm performance stats.
+ * @cpu: designated cpu for counter access.
+ * @dev: nvdimm device pointer.
+ * @functions(event_init/add/del/read): platform specific callbacks.
+ * @attr_groups: data structure for events/formats/cpumask.
+ * @node: node for cpu hotplug notifier link.
+ * @cpuhp_state: state for cpu hotplug notification.
+ */
+struct nvdimm_pmu {
+	const char *name;
+	struct pmu pmu;
+	int cpu;
+	struct device *dev;
+	int (*event_init)(struct perf_event *event,  struct device *dev);
+	int  (*add)(struct perf_event *event, int flags, struct device *dev);
+	void (*del)(struct perf_event *event, int flags, struct device *dev);
+	void (*read)(struct perf_event *event,  struct device *dev);
+	const struct attribute_group **attr_groups;
+	struct hlist_node node;
+	enum cpuhp_state cpuhp_state;
+};
+
+int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev);
+void unregister_nvdimm_pmu(struct pmu *pmu);
+
 struct nd_device_driver {
 	struct device_driver drv;
 	unsigned long type;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 2/4] powerpc/papr_scm: Add perf interface support
  2021-05-12 16:38 [RFC 0/4] Add perf interface to expose nvdimm performance stats Kajol Jain
  2021-05-12 16:38 ` [RFC 1/4] drivers/nvdimm: " Kajol Jain
@ 2021-05-12 16:38 ` Kajol Jain
  2021-05-12 16:38 ` [RFC 3/4] powerpc/papr_scm: Document papr_scm sysfs event format entries Kajol Jain
  2021-05-12 16:38 ` [RFC 4/4] powerpc/papr_scm: Add cpu hotplug support for nvdimm pmu device Kajol Jain
  3 siblings, 0 replies; 9+ messages in thread
From: Kajol Jain @ 2021-05-12 16:38 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-nvdimm, linux-kernel
  Cc: maddy, santosh, aneesh.kumar, vaibhav, dan.j.williams, ira.weiny,
	atrajeev, kjain, peterz, tglx

This patch adds support for performance monitoring of papr
nvdimm devices via perf interface. It adds callbacks functions
like add/del/read/event_init for nvdimm_pmu structure.

Patch adds a new parameter 'priv' in pdev_archdata structure to save
nvdimm_pmu device pointer, to handle the unregistering of pmu device.

papr_scm_pmu_register function populates the nvdimm_pmu structure
with events, attribute groups along with event handling functions.
Event handling functions internally uses hcall to get events and
counter data. Finally the populated nvdimm_pmu structure is passed
to register the pmu device.

Result in power9 machine with 2 nvdimm device:

Ex: List all event by perf list

command:# perf list nmem

  nmem0/cchrhcnt/                                    [Kernel PMU event]
  nmem0/cchwhcnt/                                    [Kernel PMU event]
  nmem0/critrscu/                                    [Kernel PMU event]
  nmem0/ctlresct/                                    [Kernel PMU event]
  nmem0/ctlrestm/                                    [Kernel PMU event]
  nmem0/fastwcnt/                                    [Kernel PMU event]
  nmem0/hostlcnt/                                    [Kernel PMU event]
  nmem0/hostldur/                                    [Kernel PMU event]
  nmem0/hostscnt/                                    [Kernel PMU event]
  nmem0/hostsdur/                                    [Kernel PMU event]
  nmem0/medrcnt/                                     [Kernel PMU event]
  nmem0/medrdur/                                     [Kernel PMU event]
  nmem0/medwcnt/                                     [Kernel PMU event]
  nmem0/medwdur/                                     [Kernel PMU event]
  nmem0/memlife/                                     [Kernel PMU event]
  nmem0/noopstat/                                    [Kernel PMU event]
  nmem0/ponsecs/                                     [Kernel PMU event]
  nmem1/cchrhcnt/                                    [Kernel PMU event]
  nmem1/cchwhcnt/                                    [Kernel PMU event]
  nmem1/critrscu/                                    [Kernel PMU event]
  ...
  nmem1/noopstat/                                    [Kernel PMU event]
  nmem1/ponsecs/                                     [Kernel PMU event]

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/include/asm/device.h         |   5 +
 arch/powerpc/platforms/pseries/papr_scm.c | 284 +++++++++++++++++++++-
 2 files changed, 288 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 219559d65864..47ed639f3b8f 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -48,6 +48,11 @@ struct dev_archdata {
 
 struct pdev_archdata {
 	u64 dma_mask;
+	/*
+	 * Pointer to nvdimm_pmu structure, to handle the unregistering
+	 * of pmu device
+	 */
+	void *priv;
 };
 
 #endif /* _ASM_POWERPC_DEVICE_H */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index ef26fe40efb0..997d379094d0 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -18,6 +18,8 @@
 #include <asm/plpar_wrappers.h>
 #include <asm/papr_pdsm.h>
 #include <asm/mce.h>
+#include <linux/perf_event.h>
+#include <linux/ctype.h>
 
 #define BIND_ANY_ADDR (~0ul)
 
@@ -116,6 +118,9 @@ struct papr_scm_priv {
 
 	/* length of the stat buffer as expected by phyp */
 	size_t stat_buffer_len;
+
+	 /* array to have event_code and stat_id mappings */
+	char **nvdimm_events_map;
 };
 
 static int papr_scm_pmem_flush(struct nd_region *nd_region,
@@ -329,6 +334,271 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p,
 	return 0;
 }
 
+static struct attribute_group nvdimm_pmu_events_group = {
+	.name = "events",
+	/* .attrs is set in papr_scm_pmu_check_events function */
+};
+
+PMU_FORMAT_ATTR(event, "config:0-37");
+
+static struct attribute *nvdimm_pmu_format_attr[] = {
+	&format_attr_event.attr,
+	NULL,
+};
+
+static struct attribute_group nvdimm_pmu_format_group = {
+	.name = "format",
+	.attrs = nvdimm_pmu_format_attr,
+};
+
+static const struct attribute_group *nvdimm_pmu_attr_groups[] = {
+	&nvdimm_pmu_format_group,
+	&nvdimm_pmu_events_group,
+	NULL,
+};
+
+static void papr_scm_pmu_get_value(struct perf_event *event, struct device *dev, u64 *count)
+{
+	struct papr_scm_perf_stat *stat;
+	struct papr_scm_perf_stats *stats;
+	struct papr_scm_priv *p = (struct papr_scm_priv *)dev->driver_data;
+	int rc, size;
+	u64 statval;
+
+	/* Allocate buffer to hold single performance stat */
+	size = sizeof(struct papr_scm_perf_stats) +
+		sizeof(struct papr_scm_perf_stat);
+
+	if (!p->nvdimm_events_map)
+		return;
+
+	stats = kzalloc(size, GFP_KERNEL);
+	if (!stats)
+		return;
+
+	stat = &stats->scm_statistic[0];
+	memcpy(&stat->stat_id,
+	       p->nvdimm_events_map[event->attr.config - 1],
+		sizeof(stat->stat_id));
+	stat->stat_val = 0;
+
+	rc = drc_pmem_query_stats(p, stats, 1);
+	if (rc < 0) {
+		kfree(stats);
+		return;
+	}
+
+	statval = be64_to_cpu(stat->stat_val);
+	*count = statval;
+	kfree(stats);
+}
+
+static int papr_scm_pmu_add(struct perf_event *event, int flags,  struct device *dev)
+{
+	u64 count = 0;
+
+	papr_scm_pmu_get_value(event, dev, &count);
+	local64_set(&event->hw.prev_count, count);
+	return 0;
+}
+
+static void papr_scm_pmu_read(struct perf_event *event, struct device *dev)
+{
+	u64 prev, now = 0;
+
+	papr_scm_pmu_get_value(event, dev, &now);
+	prev = local64_xchg(&event->hw.prev_count, now);
+
+	if (now - prev >= 0)
+		local64_add(now - prev, &event->count);
+}
+
+static void papr_scm_pmu_del(struct perf_event *event, int flags,  struct device *dev)
+{
+	papr_scm_pmu_read(event, dev);
+}
+
+static void nvdimm_pmu_uinit(struct nvdimm_pmu *nd_pmu)
+{
+	unregister_nvdimm_pmu(&nd_pmu->pmu);
+	kfree(nd_pmu);
+}
+
+static int papr_scm_pmu_register(struct papr_scm_priv *p)
+{
+	struct nvdimm_pmu *papr_scm_pmu;
+	int rc;
+
+	papr_scm_pmu = devm_kzalloc(&p->pdev->dev, sizeof(*papr_scm_pmu), GFP_KERNEL);
+	if (!papr_scm_pmu)
+		return -ENOMEM;
+
+	papr_scm_pmu->name = nvdimm_name(p->nvdimm);
+	papr_scm_pmu->read = papr_scm_pmu_read;
+	papr_scm_pmu->add = papr_scm_pmu_add;
+	papr_scm_pmu->del = papr_scm_pmu_del;
+	papr_scm_pmu->attr_groups = nvdimm_pmu_attr_groups;
+
+	rc = register_nvdimm_pmu(papr_scm_pmu, p->pdev);
+	if (rc)
+		goto pmu_register_err;
+
+	/*
+	 * Set archdata.priv value to nvdimm_pmu structure, to handle the
+	 * unregistering of pmu device.
+	 */
+	p->pdev->archdata.priv = papr_scm_pmu;
+	return 0;
+
+pmu_register_err:
+	kfree(papr_scm_pmu);
+	return rc;
+}
+
+static ssize_t device_show_string(struct device *dev, struct device_attribute *attr,
+				  char *buf)
+{
+	struct perf_pmu_events_attr *d;
+
+	d = container_of(attr, struct perf_pmu_events_attr, attr);
+
+	return sysfs_emit(buf, "%s\n", (char *)d->event_str);
+}
+
+static char *strtolower(char *updated_name)
+{
+	int i = 0;
+
+	while (updated_name[i]) {
+		if (isupper(updated_name[i]))
+			updated_name[i] = tolower(updated_name[i]);
+		i++;
+	}
+	updated_name[i] = '\0';
+	return strim(updated_name);
+}
+
+/* device_str_attr_create : Populate event "name" and string "str" in attribute */
+static struct attribute *device_str_attr_create_(char *name, char *str)
+{
+	struct perf_pmu_events_attr *attr;
+
+	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+	if (!attr)
+		return NULL;
+
+	sysfs_attr_init(&attr->attr.attr);
+	attr->event_str = str;
+	attr->attr.attr.name = strtolower(name);
+	attr->attr.attr.mode = 0444;
+	attr->attr.show = device_show_string;
+
+	return &attr->attr.attr;
+}
+
+static int papr_scm_pmu_check_events(struct papr_scm_priv *p)
+{
+	struct papr_scm_perf_stat *stat;
+	struct papr_scm_perf_stats *stats, *single_stats;
+	int index, size, rc, attrs;
+	u32 total_events;
+	struct attribute **events;
+	char *eventcode, *eventname, *statid;
+
+	if (!p->stat_buffer_len)
+		return -ENOENT;
+
+	total_events = (p->stat_buffer_len  - sizeof(struct papr_scm_perf_stats))
+			/ sizeof(struct papr_scm_perf_stat);
+
+	/* Allocate the buffer for phyp where stats are written */
+	stats = kzalloc(p->stat_buffer_len, GFP_KERNEL);
+	if (!stats)
+		return -ENOMEM;
+
+	/* Allocate memory to nvdimm_event_map */
+	p->nvdimm_events_map = kcalloc(total_events, sizeof(char *), GFP_KERNEL);
+	if (!p->nvdimm_events_map) {
+		rc = -ENOMEM;
+		goto out_stats;
+	}
+
+	/* Called to get list of events supported */
+	rc = drc_pmem_query_stats(p, stats, 0);
+	if (rc)
+		goto out_nvdimm_events_map;
+
+	/* Allocate buffer to hold single performance stat */
+	size = sizeof(struct papr_scm_perf_stats) + sizeof(struct papr_scm_perf_stat);
+
+	single_stats = kzalloc(size, GFP_KERNEL);
+	if (!single_stats) {
+		rc = -ENOMEM;
+		goto out_nvdimm_events_map;
+	}
+
+	events = kzalloc(total_events * sizeof(struct attribute *), GFP_KERNEL);
+	if (!events) {
+		rc = -ENOMEM;
+		goto out_single_stats;
+	}
+
+	for (index = 0, stat = stats->scm_statistic, attrs = 0;
+		     index < total_events; index++, ++stat) {
+
+		single_stats->scm_statistic[0] = *stat;
+		rc = drc_pmem_query_stats(p, single_stats, 1);
+
+		if (rc < 0) {
+			pr_info("Event not supported %s for device %s\n",
+				stat->stat_id, nvdimm_name(p->nvdimm));
+		} else {
+			eventcode = kasprintf(GFP_KERNEL, "event=0x%x", attrs + 1);
+			eventname = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+			statid = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+
+			if (!eventname || !statid || !eventcode)
+				goto out;
+
+			strcpy(eventname, stat->stat_id);
+			events[attrs] = device_str_attr_create_(eventname,
+								eventcode);
+			if (!events[attrs])
+				goto out;
+
+			strcpy(statid, stat->stat_id);
+			p->nvdimm_events_map[attrs] = statid;
+			attrs++;
+			continue;
+out:
+			kfree(eventcode);
+			kfree(eventname);
+			kfree(statid);
+		}
+	}
+	events[attrs] = NULL;
+	p->nvdimm_events_map[attrs] = NULL;
+
+	if (!attrs)
+		goto out_events;
+
+	nvdimm_pmu_events_group.attrs = events;
+	kfree(single_stats);
+	kfree(stats);
+	return 0;
+
+out_events:
+	kfree(events);
+out_single_stats:
+	kfree(single_stats);
+out_nvdimm_events_map:
+	kfree(p->nvdimm_events_map);
+out_stats:
+	kfree(stats);
+	return rc;
+}
+
 /*
  * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
  * health information.
@@ -923,7 +1193,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 	struct nd_mapping_desc mapping;
 	struct nd_region_desc ndr_desc;
 	unsigned long dimm_flags;
-	int target_nid, online_nid;
+	int target_nid, online_nid, rc;
 	ssize_t stat_size;
 
 	p->bus_desc.ndctl = papr_scm_ndctl;
@@ -1015,6 +1285,15 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 		p->stat_buffer_len = stat_size;
 		dev_dbg(&p->pdev->dev, "Max perf-stat size %lu-bytes\n",
 			p->stat_buffer_len);
+
+		rc = papr_scm_pmu_check_events(p);
+		if (rc) {
+			dev_info(&p->pdev->dev, "nvdimm pmu check events failed, rc=%d\n", rc);
+		} else {
+			rc = papr_scm_pmu_register(p);
+			if (rc)
+				dev_info(&p->pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
+		}
 	} else {
 		dev_info(&p->pdev->dev, "Dimm performance stats unavailable\n");
 	}
@@ -1195,7 +1474,10 @@ static int papr_scm_remove(struct platform_device *pdev)
 
 	nvdimm_bus_unregister(p->bus);
 	drc_pmem_unbind(p);
+	nvdimm_pmu_uinit(pdev->archdata.priv);
+	pdev->archdata.priv = NULL;
 	kfree(p->bus_desc.provider_name);
+	kfree(p->nvdimm_events_map);
 	kfree(p);
 
 	return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 3/4] powerpc/papr_scm: Document papr_scm sysfs event format entries
  2021-05-12 16:38 [RFC 0/4] Add perf interface to expose nvdimm performance stats Kajol Jain
  2021-05-12 16:38 ` [RFC 1/4] drivers/nvdimm: " Kajol Jain
  2021-05-12 16:38 ` [RFC 2/4] powerpc/papr_scm: Add perf interface support Kajol Jain
@ 2021-05-12 16:38 ` Kajol Jain
  2021-05-12 16:38 ` [RFC 4/4] powerpc/papr_scm: Add cpu hotplug support for nvdimm pmu device Kajol Jain
  3 siblings, 0 replies; 9+ messages in thread
From: Kajol Jain @ 2021-05-12 16:38 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-nvdimm, linux-kernel
  Cc: maddy, santosh, aneesh.kumar, vaibhav, dan.j.williams, ira.weiny,
	atrajeev, kjain, peterz, tglx

This patch add event format and events details in ABI
documentation

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 Documentation/ABI/testing/sysfs-bus-papr-pmem | 25 +++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
index 8316c33862a0..216f70deca7e 100644
--- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -59,3 +59,28 @@ Description:
 		* "CchRHCnt" : Cache Read Hit Count
 		* "CchWHCnt" : Cache Write Hit Count
 		* "FastWCnt" : Fast Write Count
+
+What:		/sys/devices/nmemX/format
+Date:		May 2021
+Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:	(RO) Attribute group to describe the magic bits
+                that go into perf_event_attr.config for a particular pmu.
+                (See ABI/testing/sysfs-bus-event_source-devices-format).
+
+                Each attribute under this group defines a bit range of the
+                perf_event_attr.config. Supported attributes is listed
+                below::
+
+		    event  = "config:0-37"  - event ID
+
+		For example::
+		    noopstat = "event=0x1"
+
+What:		/sys/devices/nmemX/events
+Date:		May 2021
+Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:    (RO) Attribute group to describe performance monitoring
+                events specific to papr_pmem. Each attribute in this group describes
+                a single performance monitoring event supported by this nvdimm pmu.
+                The name of the file is the name of the event.
+                (See ABI/testing/sysfs-bus-event_source-devices-events).
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 4/4] powerpc/papr_scm: Add cpu hotplug support for nvdimm pmu device
  2021-05-12 16:38 [RFC 0/4] Add perf interface to expose nvdimm performance stats Kajol Jain
                   ` (2 preceding siblings ...)
  2021-05-12 16:38 ` [RFC 3/4] powerpc/papr_scm: Document papr_scm sysfs event format entries Kajol Jain
@ 2021-05-12 16:38 ` Kajol Jain
  3 siblings, 0 replies; 9+ messages in thread
From: Kajol Jain @ 2021-05-12 16:38 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-nvdimm, linux-kernel
  Cc: maddy, santosh, aneesh.kumar, vaibhav, dan.j.williams, ira.weiny,
	atrajeev, kjain, peterz, tglx

Patch here adds cpu hotplug functions to nvdimm pmu.
It adds cpumask to designate a cpu to make HCALL to
collect the counter data for the nvdimm device and
update ABI documentation accordingly.

Result in power9 lpar system:
command:# cat /sys/devices/nmem0/cpumask
0

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 Documentation/ABI/testing/sysfs-bus-papr-pmem |  6 ++
 arch/powerpc/platforms/pseries/papr_scm.c     | 62 +++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
index 216f70deca7e..a40fbec683a8 100644
--- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -76,6 +76,12 @@ Description:	(RO) Attribute group to describe the magic bits
 		For example::
 		    noopstat = "event=0x1"
 
+What:		/sys/devices/nmemX/cpumask
+Date:		May 2021
+Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:	(RO) This sysfs file exposes the cpumask which is designated to make
+                HCALLs to retrieve nvdimm pmu event counter data.
+
 What:		/sys/devices/nmemX/events
 Date:		May 2021
 Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 997d379094d0..6d94c2f260aa 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -334,6 +334,28 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p,
 	return 0;
 }
 
+static ssize_t cpumask_show(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct nvdimm_pmu *nd_pmu;
+
+	nd_pmu = container_of(pmu, struct nvdimm_pmu, pmu);
+
+	return cpumap_print_to_pagebuf(true, buf, cpumask_of(nd_pmu->cpu));
+}
+
+static DEVICE_ATTR_RO(cpumask);
+
+static struct attribute *nvdimm_cpumask_attrs[] = {
+	&dev_attr_cpumask.attr,
+	NULL,
+};
+
+static const struct attribute_group nvdimm_pmu_cpumask_group = {
+	.attrs = nvdimm_cpumask_attrs,
+};
+
 static struct attribute_group nvdimm_pmu_events_group = {
 	.name = "events",
 	/* .attrs is set in papr_scm_pmu_check_events function */
@@ -354,6 +376,7 @@ static struct attribute_group nvdimm_pmu_format_group = {
 static const struct attribute_group *nvdimm_pmu_attr_groups[] = {
 	&nvdimm_pmu_format_group,
 	&nvdimm_pmu_events_group,
+	&nvdimm_pmu_cpumask_group,
 	NULL,
 };
 
@@ -418,10 +441,30 @@ static void papr_scm_pmu_del(struct perf_event *event, int flags,  struct device
 	papr_scm_pmu_read(event, dev);
 }
 
+static int nvdimm_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
+{
+	struct nvdimm_pmu *pmu;
+	int target;
+
+	pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+	if (cpu != pmu->cpu)
+		return 0;
+
+	target = cpumask_last(cpu_active_mask);
+	if (target < 0 || target >= nr_cpu_ids)
+		return -1;
+
+	pmu->cpu = target;
+	return 0;
+}
+
 static void nvdimm_pmu_uinit(struct nvdimm_pmu *nd_pmu)
 {
 	unregister_nvdimm_pmu(&nd_pmu->pmu);
 	kfree(nd_pmu);
+	cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+	cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
 }
 
 static int papr_scm_pmu_register(struct papr_scm_priv *p)
@@ -438,6 +481,22 @@ static int papr_scm_pmu_register(struct papr_scm_priv *p)
 	papr_scm_pmu->add = papr_scm_pmu_add;
 	papr_scm_pmu->del = papr_scm_pmu_del;
 	papr_scm_pmu->attr_groups = nvdimm_pmu_attr_groups;
+	papr_scm_pmu->cpu = raw_smp_processor_id();
+
+	rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+				     "perf/nvdimm:online",
+			      NULL, nvdimm_pmu_offline_cpu);
+	if (rc < 0) {
+		kfree(papr_scm_pmu);
+		return rc;
+	}
+
+	papr_scm_pmu->cpuhp_state = rc;
+
+	/* Register the pmu instance for cpu hotplug */
+	rc = cpuhp_state_add_instance_nocalls(papr_scm_pmu->cpuhp_state, &papr_scm_pmu->node);
+	if (rc)
+		goto cpuhp_instance_err;
 
 	rc = register_nvdimm_pmu(papr_scm_pmu, p->pdev);
 	if (rc)
@@ -451,6 +510,9 @@ static int papr_scm_pmu_register(struct papr_scm_priv *p)
 	return 0;
 
 pmu_register_err:
+	cpuhp_state_remove_instance_nocalls(papr_scm_pmu->cpuhp_state, &papr_scm_pmu->node);
+cpuhp_instance_err:
+	cpuhp_remove_multi_state(papr_scm_pmu->cpuhp_state);
 	kfree(papr_scm_pmu);
 	return rc;
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 1/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  2021-05-12 16:38 ` [RFC 1/4] drivers/nvdimm: " Kajol Jain
@ 2021-05-12 17:27   ` Peter Zijlstra
  2021-05-13 12:26     ` kajoljain
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2021-05-12 17:27 UTC (permalink / raw)
  To: Kajol Jain
  Cc: mpe, linuxppc-dev, linux-nvdimm, linux-kernel, maddy, santosh,
	aneesh.kumar, vaibhav, dan.j.williams, ira.weiny, atrajeev, tglx

On Wed, May 12, 2021 at 10:08:21PM +0530, Kajol Jain wrote:
> +static void nvdimm_pmu_read(struct perf_event *event)
> +{
> +	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
> +
> +	/* jump to arch/platform specific callbacks if any */
> +	if (nd_pmu && nd_pmu->read)
> +		nd_pmu->read(event, nd_pmu->dev);
> +}
> +
> +static void nvdimm_pmu_del(struct perf_event *event, int flags)
> +{
> +	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
> +
> +	/* jump to arch/platform specific callbacks if any */
> +	if (nd_pmu && nd_pmu->del)
> +		nd_pmu->del(event, flags, nd_pmu->dev);
> +}
> +
> +static int nvdimm_pmu_add(struct perf_event *event, int flags)
> +{
> +	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
> +
> +	if (flags & PERF_EF_START)
> +		/* jump to arch/platform specific callbacks if any */
> +		if (nd_pmu && nd_pmu->add)
> +			return nd_pmu->add(event, flags, nd_pmu->dev);
> +	return 0;
> +}

What's the value add here? Why can't you directly set driver pointers? I
also don't really believe ->{add,del,read} can be optional and still
have a sane driver.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  2021-05-12 17:27   ` Peter Zijlstra
@ 2021-05-13 12:26     ` kajoljain
  2021-05-14 11:47       ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: kajoljain @ 2021-05-13 12:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mpe, linuxppc-dev, linux-nvdimm, linux-kernel, maddy, santosh,
	aneesh.kumar, vaibhav, dan.j.williams, ira.weiny, atrajeev, tglx



On 5/12/21 10:57 PM, Peter Zijlstra wrote:
> On Wed, May 12, 2021 at 10:08:21PM +0530, Kajol Jain wrote:
>> +static void nvdimm_pmu_read(struct perf_event *event)
>> +{
>> +	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
>> +
>> +	/* jump to arch/platform specific callbacks if any */
>> +	if (nd_pmu && nd_pmu->read)
>> +		nd_pmu->read(event, nd_pmu->dev);
>> +}
>> +
>> +static void nvdimm_pmu_del(struct perf_event *event, int flags)
>> +{
>> +	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
>> +
>> +	/* jump to arch/platform specific callbacks if any */
>> +	if (nd_pmu && nd_pmu->del)
>> +		nd_pmu->del(event, flags, nd_pmu->dev);
>> +}
>> +
>> +static int nvdimm_pmu_add(struct perf_event *event, int flags)
>> +{
>> +	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
>> +
>> +	if (flags & PERF_EF_START)
>> +		/* jump to arch/platform specific callbacks if any */
>> +		if (nd_pmu && nd_pmu->add)
>> +			return nd_pmu->add(event, flags, nd_pmu->dev);
>> +	return 0;
>> +}
> 
> What's the value add here? Why can't you directly set driver pointers? I
> also don't really believe ->{add,del,read} can be optional and still
> have a sane driver.
> 

Hi Peter,

  The intend for adding these callbacks  is to give flexibility to the
arch/platform specific driver code to use its own routine for getting 
counter data or specific checks/operations. Arch/platform driver code
would have different method to get the counter data like IBM pseries
nmem* device which uses a hypervisor call(hcall).

But yes the current read/add/del functions are not adding value. We
could  add an arch/platform specific function which could handle the
capturing of the counter data and do the rest of the operation here,
is this approach better?

Thanks,
Kajol Jain




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  2021-05-13 12:26     ` kajoljain
@ 2021-05-14 11:47       ` Peter Zijlstra
  2021-05-17  6:43         ` kajoljain
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2021-05-14 11:47 UTC (permalink / raw)
  To: kajoljain
  Cc: mpe, linuxppc-dev, linux-nvdimm, linux-kernel, maddy, santosh,
	aneesh.kumar, vaibhav, dan.j.williams, ira.weiny, atrajeev, tglx

On Thu, May 13, 2021 at 05:56:14PM +0530, kajoljain wrote:

> But yes the current read/add/del functions are not adding value. We
> could  add an arch/platform specific function which could handle the
> capturing of the counter data and do the rest of the operation here,
> is this approach better?

Right; have your register_nvdimm_pmu() set pmu->{add,del,read} to
nd_pmu->{add,del,read} directly, don't bother with these intermediates.
Also you can WARN_ON_ONCE() if any of them are NULL and fail
registration at that point.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  2021-05-14 11:47       ` Peter Zijlstra
@ 2021-05-17  6:43         ` kajoljain
  0 siblings, 0 replies; 9+ messages in thread
From: kajoljain @ 2021-05-17  6:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mpe, linuxppc-dev, linux-nvdimm, linux-kernel, maddy, santosh,
	aneesh.kumar, vaibhav, dan.j.williams, ira.weiny, atrajeev, tglx



On 5/14/21 5:17 PM, Peter Zijlstra wrote:
> On Thu, May 13, 2021 at 05:56:14PM +0530, kajoljain wrote:
> 
>> But yes the current read/add/del functions are not adding value. We
>> could  add an arch/platform specific function which could handle the
>> capturing of the counter data and do the rest of the operation here,
>> is this approach better?
> 
> Right; have your register_nvdimm_pmu() set pmu->{add,del,read} to
> nd_pmu->{add,del,read} directly, don't bother with these intermediates.
> Also you can WARN_ON_ONCE() if any of them are NULL and fail
> registration at that point.
> 

Hi Peter,
    I will make all required changes and send next version of this patchset soon.

Thanks,
Kajol Jain

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-05-17  6:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-12 16:38 [RFC 0/4] Add perf interface to expose nvdimm performance stats Kajol Jain
2021-05-12 16:38 ` [RFC 1/4] drivers/nvdimm: " Kajol Jain
2021-05-12 17:27   ` Peter Zijlstra
2021-05-13 12:26     ` kajoljain
2021-05-14 11:47       ` Peter Zijlstra
2021-05-17  6:43         ` kajoljain
2021-05-12 16:38 ` [RFC 2/4] powerpc/papr_scm: Add perf interface support Kajol Jain
2021-05-12 16:38 ` [RFC 3/4] powerpc/papr_scm: Document papr_scm sysfs event format entries Kajol Jain
2021-05-12 16:38 ` [RFC 4/4] powerpc/papr_scm: Add cpu hotplug support for nvdimm pmu device Kajol Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).