linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] powerpc/papr_scm: add support for reporting NVDIMM 'life_used_percentage' metric
@ 2020-06-22  4:24 Vaibhav Jain
  2020-06-22  4:24 ` [PATCH 1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP Vaibhav Jain
  2020-06-22  4:24 ` [PATCH 2/2] powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric Vaibhav Jain
  0 siblings, 2 replies; 10+ messages in thread
From: Vaibhav Jain @ 2020-06-22  4:24 UTC (permalink / raw)
  To: linuxppc-dev, linux-nvdimm
  Cc: Vaibhav Jain, Aneesh Kumar K . V, Michael Ellerman

This small patchset implements kernel side support for reporting
'life_used_percentage' metric in NDCTL with dimm health output for
papr-scm NVDIMMs. With corresponding NDCTL side changes [1] output for
should be like:

$ sudo ndctl list -DH
[
  {
    "dev":"nmem0",
    "health":{
      "health_state":"ok",
      "life_used_percentage":0,
      "shutdown_state":"clean"
    }
  }
]

PHYP supports H_SCM_PERFORMANCE_STATS hcall through which an LPAR can
fetch various performance stats including 'fuel_gauge' percentage for
an NVDIMM. 'fuel_gauge' metric indicates the usable life remaining of
an NVDIMM expressed as percentage and  'life_used_percentage' can be
calculated as 'life_used_percentage = 100 - fuel_gauge'.

Structure of the patchset
=========================
First patch implements necessary scaffolding needed to issue the
H_SCM_PERFORMANCE_STATS hcall and fetch performance stats
catalogue. The patch also implements support for 'perf_stats' sysfs
attribute to report the full catalogue of supported performance stats
by PHYP.

Second and final patch implements support for sending this value to
libndctl by extending the PAPR_PDSM_HEALTH pdsm payload to add a new
field named 'dimm_fuel_gauge' to it.

References
==========
[1]
https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v13_run_guage

Vaibhav Jain (2):
  powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
  powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric

 Documentation/ABI/testing/sysfs-bus-papr-pmem |  27 +++
 arch/powerpc/include/uapi/asm/papr_pdsm.h     |   9 +
 arch/powerpc/platforms/pseries/papr_scm.c     | 186 ++++++++++++++++++
 3 files changed, 222 insertions(+)

-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
  2020-06-22  4:24 [PATCH 0/2] powerpc/papr_scm: add support for reporting NVDIMM 'life_used_percentage' metric Vaibhav Jain
@ 2020-06-22  4:24 ` Vaibhav Jain
  2020-06-23  5:42   ` Aneesh Kumar K.V
  2020-06-23 19:02   ` Ira Weiny
  2020-06-22  4:24 ` [PATCH 2/2] powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric Vaibhav Jain
  1 sibling, 2 replies; 10+ messages in thread
From: Vaibhav Jain @ 2020-06-22  4:24 UTC (permalink / raw)
  To: linuxppc-dev, linux-nvdimm
  Cc: Vaibhav Jain, Aneesh Kumar K . V, Michael Ellerman

Update papr_scm.c to query dimm performance statistics from PHYP via
H_SCM_PERFORMANCE_STATS hcall and export them to user-space as PAPR
specific NVDIMM attribute 'perf_stats' in sysfs. The patch also
provide a sysfs ABI documentation for the stats being reported and
their meanings.

During NVDIMM probe time in papr_scm_nvdimm_init() a special variant
of H_SCM_PERFORMANCE_STATS hcall is issued to check if collection of
performance statistics is supported or not. If successful then a PHYP
returns a maximum possible buffer length needed to read all
performance stats. This returned value is stored in a per-nvdimm
attribute 'len_stat_buffer'.

The layout of request buffer for reading NVDIMM performance stats from
PHYP is defined in 'struct papr_scm_perf_stats' and 'struct
papr_scm_perf_stat'. These structs are used in newly introduced
drc_pmem_query_stats() that issues the H_SCM_PERFORMANCE_STATS hcall.

The sysfs access function perf_stats_show() uses value
'len_stat_buffer' to allocate a buffer large enough to hold all
possible NVDIMM performance stats and passes it to
drc_pmem_query_stats() to populate. Finally statistics reported in the
buffer are formatted into the sysfs access function output buffer.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
 Documentation/ABI/testing/sysfs-bus-papr-pmem |  27 ++++
 arch/powerpc/platforms/pseries/papr_scm.c     | 139 ++++++++++++++++++
 2 files changed, 166 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
index 5b10d036a8d4..c1a67275c43f 100644
--- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -25,3 +25,30 @@ Description:
 				  NVDIMM have been scrubbed.
 		* "locked"	: Indicating that NVDIMM contents cant
 				  be modified until next power cycle.
+
+What:		/sys/bus/nd/devices/nmemX/papr/perf_stats
+Date:		May, 2020
+KernelVersion:	v5.9
+Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:
+		(RO) Report various performance stats related to papr-scm NVDIMM
+		device.  Each stat is reported on a new line with each line
+		composed of a stat-identifier followed by it value. Below are
+		currently known dimm performance stats which are reported:
+
+		* "CtlResCt" : Controller Reset Count
+		* "CtlResTm" : Controller Reset Elapsed Time
+		* "PonSecs " : Power-on Seconds
+		* "MemLife " : Life Remaining
+		* "CritRscU" : Critical Resource Utilization
+		* "HostLCnt" : Host Load Count
+		* "HostSCnt" : Host Store Count
+		* "HostSDur" : Host Store Duration
+		* "HostLDur" : Host Load Duration
+		* "MedRCnt " : Media Read Count
+		* "MedWCnt " : Media Write Count
+		* "MedRDur " : Media Read Duration
+		* "MedWDur " : Media Write Duration
+		* "CchRHCnt" : Cache Read Hit Count
+		* "CchWHCnt" : Cache Write Hit Count
+		* "FastWCnt" : Fast Write Count
\ No newline at end of file
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 9c569078a09f..cb3f9acc325b 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -62,6 +62,24 @@
 				    PAPR_PMEM_HEALTH_FATAL |	\
 				    PAPR_PMEM_HEALTH_UNHEALTHY)
 
+#define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
+#define PAPR_SCM_PERF_STATS_VERSION 0x1
+
+/* Struct holding a single performance metric */
+struct papr_scm_perf_stat {
+	u8 statistic_id[8];
+	u64 statistic_value;
+};
+
+/* Struct exchanged between kernel and PHYP for fetching drc perf stats */
+struct papr_scm_perf_stats {
+	u8 eye_catcher[8];
+	u32 stats_version;		/* Should be 0x01 */
+	u32 num_statistics;		/* Number of stats following */
+	/* zero or more performance matrics */
+	struct papr_scm_perf_stat scm_statistic[];
+} __packed;
+
 /* private struct associated with each region */
 struct papr_scm_priv {
 	struct platform_device *pdev;
@@ -89,6 +107,9 @@ struct papr_scm_priv {
 
 	/* Health information for the dimm */
 	u64 health_bitmap;
+
+	/* length of the stat buffer as expected by phyp */
+	size_t len_stat_buffer;
 };
 
 static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -194,6 +215,75 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
 	return drc_pmem_bind(p);
 }
 
+/*
+ * Query the Dimm performance stats from PHYP and copy them (if returned) to
+ * provided struct papr_scm_perf_stats instance 'stats' of 'size' in bytes.
+ * The value of R4 is copied to 'out' if the pointer is provided.
+ */
+static int drc_pmem_query_stats(struct papr_scm_priv *p,
+				struct papr_scm_perf_stats *buff_stats,
+				size_t size, unsigned int num_stats,
+				uint64_t *out)
+{
+	unsigned long ret[PLPAR_HCALL_BUFSIZE];
+	struct papr_scm_perf_stat *stats;
+	s64 rc, i;
+
+	/* Setup the out buffer */
+	if (buff_stats) {
+		memcpy(buff_stats->eye_catcher,
+		       PAPR_SCM_PERF_STATS_EYECATCHER, 8);
+		buff_stats->stats_version =
+			cpu_to_be32(PAPR_SCM_PERF_STATS_VERSION);
+		buff_stats->num_statistics =
+			cpu_to_be32(num_stats);
+	} else {
+		/* In case of no out buffer ignore the size */
+		size = 0;
+	}
+
+	/*
+	 * Do the HCALL asking PHYP for info and if R4 was requested
+	 * return its value in 'out' variable.
+	 */
+	rc = plpar_hcall(H_SCM_PERFORMANCE_STATS, ret, p->drc_index,
+			 virt_to_phys(buff_stats), size);
+	if (out)
+		*out =  ret[0];
+
+	if (rc == H_PARTIAL) {
+		dev_err(&p->pdev->dev,
+			"Unknown performance stats, Err:0x%016lX\n", ret[0]);
+		return -ENOENT;
+	} else if (rc != H_SUCCESS) {
+		dev_err(&p->pdev->dev,
+			"Failed to query performance stats, Err:%lld\n", rc);
+		return -ENXIO;
+	}
+
+	/* Successfully fetched the requested stats from phyp */
+	if (size != 0) {
+		buff_stats->num_statistics =
+			be32_to_cpu(buff_stats->num_statistics);
+
+		/* Transform the stats buffer values from BE to cpu native */
+		for (i = 0, stats = buff_stats->scm_statistic;
+		     i < buff_stats->num_statistics; ++i) {
+			stats[i].statistic_value =
+				be64_to_cpu(stats[i].statistic_value);
+		}
+		dev_dbg(&p->pdev->dev,
+			"Performance stats returned %d stats\n",
+			buff_stats->num_statistics);
+	} else {
+		/* Handle case where stat buffer size was requested */
+		dev_dbg(&p->pdev->dev,
+			"Performance stats size %ld\n", ret[0]);
+	}
+
+	return 0;
+}
+
 /*
  * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
  * health information.
@@ -631,6 +721,45 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
 	return 0;
 }
 
+static ssize_t perf_stats_show(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	int index, rc;
+	struct seq_buf s;
+	struct papr_scm_perf_stat *stat;
+	struct papr_scm_perf_stats *stats;
+	struct nvdimm *dimm = to_nvdimm(dev);
+	struct papr_scm_priv *p = nvdimm_provider_data(dimm);
+
+	if (!p->len_stat_buffer)
+		return -ENOENT;
+
+	/* Allocate the buffer for phyp where stats are written */
+	stats = kzalloc(p->len_stat_buffer, GFP_KERNEL);
+	if (!stats)
+		return -ENOMEM;
+
+	/* Ask phyp to return all dimm perf stats */
+	rc = drc_pmem_query_stats(p, stats, p->len_stat_buffer, 0, NULL);
+	if (!rc) {
+		/*
+		 * Go through the returned output buffer and print stats and
+		 * values. Since statistic_id is essentially a char string of
+		 * 8 bytes, simply use the string format specifier to print it.
+		 */
+		seq_buf_init(&s, buf, PAGE_SIZE);
+		for (index = 0, stat = stats->scm_statistic;
+		     index < stats->num_statistics; ++index, ++stat) {
+			seq_buf_printf(&s, "%.8s = 0x%016llX\n",
+				       stat->statistic_id, stat->statistic_value);
+		}
+	}
+
+	kfree(stats);
+	return rc ? rc : seq_buf_used(&s);
+}
+DEVICE_ATTR_RO(perf_stats);
+
 static ssize_t flags_show(struct device *dev,
 			  struct device_attribute *attr, char *buf)
 {
@@ -676,6 +805,7 @@ DEVICE_ATTR_RO(flags);
 /* papr_scm specific dimm attributes */
 static struct attribute *papr_nd_attributes[] = {
 	&dev_attr_flags.attr,
+	&dev_attr_perf_stats.attr,
 	NULL,
 };
 
@@ -696,6 +826,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 	struct nd_region_desc ndr_desc;
 	unsigned long dimm_flags;
 	int target_nid, online_nid;
+	u64 stat_size;
 
 	p->bus_desc.ndctl = papr_scm_ndctl;
 	p->bus_desc.module = THIS_MODULE;
@@ -759,6 +890,14 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 		dev_info(dev, "Region registered with target node %d and online node %d",
 			 target_nid, online_nid);
 
+	/* Try retriving the stat buffer and see if its supported */
+	if (!drc_pmem_query_stats(p, NULL, 0, 0, &stat_size)) {
+		p->len_stat_buffer = (size_t)stat_size;
+		dev_dbg(&p->pdev->dev, "Max perf-stat size %lu-bytes\n",
+			p->len_stat_buffer);
+	} else {
+		dev_info(&p->pdev->dev, "Limited dimm stat info available\n");
+	}
 	return 0;
 
 err:	nvdimm_bus_unregister(p->bus);
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
  2020-06-22  4:24 [PATCH 0/2] powerpc/papr_scm: add support for reporting NVDIMM 'life_used_percentage' metric Vaibhav Jain
  2020-06-22  4:24 ` [PATCH 1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP Vaibhav Jain
@ 2020-06-22  4:24 ` Vaibhav Jain
  2020-06-23 19:14   ` Ira Weiny
  1 sibling, 1 reply; 10+ messages in thread
From: Vaibhav Jain @ 2020-06-22  4:24 UTC (permalink / raw)
  To: linuxppc-dev, linux-nvdimm
  Cc: Vaibhav Jain, Aneesh Kumar K . V, Michael Ellerman

We add support for reporting 'fuel-gauge' NVDIMM metric via
PAPR_PDSM_HEALTH pdsm payload. 'fuel-gauge' metric indicates the usage
life remaining of a papr-scm compatible NVDIMM. PHYP exposes this
metric via the H_SCM_PERFORMANCE_STATS.

The metric value is returned from the pdsm by extending the return
payload 'struct nd_papr_pdsm_health' without breaking the ABI. A new
field 'dimm_fuel_gauge' to hold the metric value is introduced at the
end of the payload struct and its presence is indicated by by
extension flag PDSM_DIMM_HEALTH_RUN_GAUGE_VALID.

The patch introduces a new function papr_pdsm_fuel_gauge() that is
called from papr_pdsm_health(). If fetching NVDIMM performance stats
is supported then 'papr_pdsm_fuel_gauge()' allocated an output buffer
large enough to hold the performance stat and passes it to
drc_pmem_query_stats() that issues the HCALL to PHYP. The return value
of the stat is then populated in the 'struct
nd_papr_pdsm_health.dimm_fuel_gauge' field with extension flag
'PDSM_DIMM_HEALTH_RUN_GAUGE_VALID' set in 'struct
nd_papr_pdsm_health.extension_flags'

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
 arch/powerpc/include/uapi/asm/papr_pdsm.h |  9 +++++
 arch/powerpc/platforms/pseries/papr_scm.c | 47 +++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
index 9ccecc1d6840..50ef95e2f5b1 100644
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -72,6 +72,11 @@
 #define PAPR_PDSM_DIMM_CRITICAL      2
 #define PAPR_PDSM_DIMM_FATAL         3
 
+/* struct nd_papr_pdsm_health.extension_flags field flags */
+
+/* Indicate that the 'dimm_fuel_gauge' field is valid */
+#define PDSM_DIMM_HEALTH_RUN_GAUGE_VALID 1
+
 /*
  * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
  * Various flags indicate the health status of the dimm.
@@ -84,6 +89,7 @@
  * dimm_locked		: Contents of the dimm cant be modified until CEC reboot
  * dimm_encrypted	: Contents of dimm are encrypted.
  * dimm_health		: Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
+ * dimm_fuel_gauge	: Life remaining of DIMM as a percentage from 0-100
  */
 struct nd_papr_pdsm_health {
 	union {
@@ -96,6 +102,9 @@ struct nd_papr_pdsm_health {
 			__u8 dimm_locked;
 			__u8 dimm_encrypted;
 			__u16 dimm_health;
+
+			/* Extension flag PDSM_DIMM_HEALTH_RUN_GAUGE_VALID */
+			__u16 dimm_fuel_gauge;
 		};
 		__u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
 	};
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index cb3f9acc325b..39527cd38d9c 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -506,6 +506,45 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
 	return 0;
 }
 
+static int papr_pdsm_fuel_gauge(struct papr_scm_priv *p,
+				union nd_pdsm_payload *payload)
+{
+	int rc, size;
+	struct papr_scm_perf_stat *stat;
+	struct papr_scm_perf_stats *stats;
+
+	/* Silently fail if fetching performance metrics isn't  supported */
+	if (!p->len_stat_buffer)
+		return 0;
+
+	/* Allocate request buffer enough to hold single performance stat */
+	size = sizeof(struct papr_scm_perf_stats) +
+		sizeof(struct papr_scm_perf_stat);
+
+	stats = kzalloc(size, GFP_KERNEL);
+	if (!stats)
+		return -ENOMEM;
+
+	stat = &stats->scm_statistic[0];
+	memcpy(&stat->statistic_id, "MemLife ", sizeof(stat->statistic_id));
+	stat->statistic_value = 0;
+
+	/* Fetch the fuel gauge and populate it in payload */
+	rc = drc_pmem_query_stats(p, stats, size, 1, NULL);
+	if (!rc) {
+		dev_dbg(&p->pdev->dev,
+			"Fetched fuel-gauge %llu", stat->statistic_value);
+		payload->health.extension_flags |=
+			PDSM_DIMM_HEALTH_RUN_GAUGE_VALID;
+		payload->health.dimm_fuel_gauge = stat->statistic_value;
+
+		rc = sizeof(struct nd_papr_pdsm_health);
+	}
+
+	kfree(stats);
+	return rc;
+}
+
 /* Fetch the DIMM health info and populate it in provided package. */
 static int papr_pdsm_health(struct papr_scm_priv *p,
 			    union nd_pdsm_payload *payload)
@@ -546,6 +585,14 @@ static int papr_pdsm_health(struct papr_scm_priv *p,
 
 	/* struct populated hence can release the mutex now */
 	mutex_unlock(&p->health_mutex);
+
+	/* Populate the fuel gauge meter in the payload */
+	rc = papr_pdsm_fuel_gauge(p, payload);
+
+	/* Error fetching fuel gauge is not fatal */
+	if (rc < 0)
+		dev_dbg(&p->pdev->dev, "Err(%d) fetching fuel gauge\n", rc);
+
 	rc = sizeof(struct nd_papr_pdsm_health);
 
 out:
-- 
2.26.2
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
  2020-06-22  4:24 ` [PATCH 1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP Vaibhav Jain
@ 2020-06-23  5:42   ` Aneesh Kumar K.V
  2020-06-23  5:52     ` Aneesh Kumar K.V
  2020-06-23 19:02   ` Ira Weiny
  1 sibling, 1 reply; 10+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-23  5:42 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev, linux-nvdimm; +Cc: Vaibhav Jain, Michael Ellerman

Vaibhav Jain <vaibhav@linux.ibm.com> writes:

> Update papr_scm.c to query dimm performance statistics from PHYP via
> H_SCM_PERFORMANCE_STATS hcall and export them to user-space as PAPR
> specific NVDIMM attribute 'perf_stats' in sysfs. The patch also
> provide a sysfs ABI documentation for the stats being reported and
> their meanings.
>
> During NVDIMM probe time in papr_scm_nvdimm_init() a special variant
> of H_SCM_PERFORMANCE_STATS hcall is issued to check if collection of
> performance statistics is supported or not. If successful then a PHYP
> returns a maximum possible buffer length needed to read all
> performance stats. This returned value is stored in a per-nvdimm
> attribute 'len_stat_buffer'.
>
> The layout of request buffer for reading NVDIMM performance stats from
> PHYP is defined in 'struct papr_scm_perf_stats' and 'struct
> papr_scm_perf_stat'. These structs are used in newly introduced
> drc_pmem_query_stats() that issues the H_SCM_PERFORMANCE_STATS hcall.
>
> The sysfs access function perf_stats_show() uses value
> 'len_stat_buffer' to allocate a buffer large enough to hold all
> possible NVDIMM performance stats and passes it to
> drc_pmem_query_stats() to populate. Finally statistics reported in the
> buffer are formatted into the sysfs access function output buffer.
>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-papr-pmem |  27 ++++
>  arch/powerpc/platforms/pseries/papr_scm.c     | 139 ++++++++++++++++++
>  2 files changed, 166 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
> index 5b10d036a8d4..c1a67275c43f 100644
> --- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
> +++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
> @@ -25,3 +25,30 @@ Description:
>  				  NVDIMM have been scrubbed.
>  		* "locked"	: Indicating that NVDIMM contents cant
>  				  be modified until next power cycle.
> +
> +What:		/sys/bus/nd/devices/nmemX/papr/perf_stats
> +Date:		May, 2020
> +KernelVersion:	v5.9
> +Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
> +Description:
> +		(RO) Report various performance stats related to papr-scm NVDIMM
> +		device.  Each stat is reported on a new line with each line
> +		composed of a stat-identifier followed by it value. Below are
> +		currently known dimm performance stats which are reported:
> +
> +		* "CtlResCt" : Controller Reset Count
> +		* "CtlResTm" : Controller Reset Elapsed Time
> +		* "PonSecs " : Power-on Seconds
> +		* "MemLife " : Life Remaining
> +		* "CritRscU" : Critical Resource Utilization
> +		* "HostLCnt" : Host Load Count
> +		* "HostSCnt" : Host Store Count
> +		* "HostSDur" : Host Store Duration
> +		* "HostLDur" : Host Load Duration
> +		* "MedRCnt " : Media Read Count
> +		* "MedWCnt " : Media Write Count
> +		* "MedRDur " : Media Read Duration
> +		* "MedWDur " : Media Write Duration
> +		* "CchRHCnt" : Cache Read Hit Count
> +		* "CchWHCnt" : Cache Write Hit Count
> +		* "FastWCnt" : Fast Write Count
> \ No newline at end of file
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index 9c569078a09f..cb3f9acc325b 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -62,6 +62,24 @@
>  				    PAPR_PMEM_HEALTH_FATAL |	\
>  				    PAPR_PMEM_HEALTH_UNHEALTHY)
>  
> +#define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
> +#define PAPR_SCM_PERF_STATS_VERSION 0x1
> +
> +/* Struct holding a single performance metric */
> +struct papr_scm_perf_stat {
> +	u8 statistic_id[8];
> +	u64 statistic_value;

May be stat_id, stat_val ? 

> +};
> +
> +/* Struct exchanged between kernel and PHYP for fetching drc perf stats */
> +struct papr_scm_perf_stats {
> +	u8 eye_catcher[8];
> +	u32 stats_version;		/* Should be 0x01 */
> +	u32 num_statistics;		/* Number of stats following */
> +	/* zero or more performance matrics */
> +	struct papr_scm_perf_stat scm_statistic[];
> +} __packed;


For Phyp interaction these should be big-endian. I see you do 

	stats[i].statistic_value = be64_to_cpu(stats[i].statistic_value);

Can we avoid that?


> +
>  /* private struct associated with each region */
>  struct papr_scm_priv {
>  	struct platform_device *pdev;
> @@ -89,6 +107,9 @@ struct papr_scm_priv {
>  
>  	/* Health information for the dimm */
>  	u64 health_bitmap;
> +
> +	/* length of the stat buffer as expected by phyp */
> +	size_t len_stat_buffer;

how about stat_buffer_len?

>  };
>  
>  static int drc_pmem_bind(struct papr_scm_priv *p)
> @@ -194,6 +215,75 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
>  	return drc_pmem_bind(p);
>  }
>  
> +/*
> + * Query the Dimm performance stats from PHYP and copy them (if returned) to
> + * provided struct papr_scm_perf_stats instance 'stats' of 'size' in bytes.
> + * The value of R4 is copied to 'out' if the pointer is provided.
> + */
> +static int drc_pmem_query_stats(struct papr_scm_priv *p,
> +				struct papr_scm_perf_stats *buff_stats,
> +				size_t size, unsigned int num_stats,
> +				uint64_t *out)
> +{
> +	unsigned long ret[PLPAR_HCALL_BUFSIZE];
> +	struct papr_scm_perf_stat *stats;
> +	s64 rc, i;
> +
> +	/* Setup the out buffer */
> +	if (buff_stats) {
> +		memcpy(buff_stats->eye_catcher,
> +		       PAPR_SCM_PERF_STATS_EYECATCHER, 8);
> +		buff_stats->stats_version =
> +			cpu_to_be32(PAPR_SCM_PERF_STATS_VERSION);
> +		buff_stats->num_statistics =
> +			cpu_to_be32(num_stats);
> +	} else {
> +		/* In case of no out buffer ignore the size */
> +		size = 0;
> +	}
> +
> +	/*
> +	 * Do the HCALL asking PHYP for info and if R4 was requested
> +	 * return its value in 'out' variable.
> +	 */
> +	rc = plpar_hcall(H_SCM_PERFORMANCE_STATS, ret, p->drc_index,
> +			 virt_to_phys(buff_stats), size);
> +	if (out)
> +		*out =  ret[0];
> +
> +	if (rc == H_PARTIAL) {
> +		dev_err(&p->pdev->dev,
> +			"Unknown performance stats, Err:0x%016lX\n", ret[0]);
> +		return -ENOENT;
> +	} else if (rc != H_SUCCESS) {
> +		dev_err(&p->pdev->dev,
> +			"Failed to query performance stats, Err:%lld\n", rc);
> +		return -ENXIO;

May be just -1? ENXIO is that a suitable error return here?

> +	}
> +
> +	/* Successfully fetched the requested stats from phyp */
> +	if (size != 0) {
> +		buff_stats->num_statistics =
> +			be32_to_cpu(buff_stats->num_statistics);
> +
> +		/* Transform the stats buffer values from BE to cpu native */
> +		for (i = 0, stats = buff_stats->scm_statistic;
> +		     i < buff_stats->num_statistics; ++i) {
> +			stats[i].statistic_value =
> +				be64_to_cpu(stats[i].statistic_value);
> +		}
> +		dev_dbg(&p->pdev->dev,
> +			"Performance stats returned %d stats\n",
> +			buff_stats->num_statistics);
> +	} else {
> +		/* Handle case where stat buffer size was requested */
> +		dev_dbg(&p->pdev->dev,
> +			"Performance stats size %ld\n", ret[0]);
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
>   * health information.
> @@ -631,6 +721,45 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>  	return 0;
>  }
>  
> +static ssize_t perf_stats_show(struct device *dev,
> +			       struct device_attribute *attr, char *buf)
> +{
> +	int index, rc;
> +	struct seq_buf s;
> +	struct papr_scm_perf_stat *stat;
> +	struct papr_scm_perf_stats *stats;
> +	struct nvdimm *dimm = to_nvdimm(dev);
> +	struct papr_scm_priv *p = nvdimm_provider_data(dimm);
> +
> +	if (!p->len_stat_buffer)
> +		return -ENOENT;
> +
> +	/* Allocate the buffer for phyp where stats are written */
> +	stats = kzalloc(p->len_stat_buffer, GFP_KERNEL);
> +	if (!stats)
> +		return -ENOMEM;
> +
> +	/* Ask phyp to return all dimm perf stats */
> +	rc = drc_pmem_query_stats(p, stats, p->len_stat_buffer, 0, NULL);
> +	if (!rc) {
> +		/*
> +		 * Go through the returned output buffer and print stats and
> +		 * values. Since statistic_id is essentially a char string of
> +		 * 8 bytes, simply use the string format specifier to print it.
> +		 */
> +		seq_buf_init(&s, buf, PAGE_SIZE);
> +		for (index = 0, stat = stats->scm_statistic;
> +		     index < stats->num_statistics; ++index, ++stat) {
> +			seq_buf_printf(&s, "%.8s = 0x%016llX\n",
> +				       stat->statistic_id, stat->statistic_value);


That is raw number (statistic_id). Is that useful? Can we map them to user readable
strings? 

> +		}
> +	}
> +
> +	kfree(stats);
> +	return rc ? rc : seq_buf_used(&s);
> +}
> +DEVICE_ATTR_RO(perf_stats);
> +
>  static ssize_t flags_show(struct device *dev,
>  			  struct device_attribute *attr, char *buf)
>  {
> @@ -676,6 +805,7 @@ DEVICE_ATTR_RO(flags);
>  /* papr_scm specific dimm attributes */
>  static struct attribute *papr_nd_attributes[] = {
>  	&dev_attr_flags.attr,
> +	&dev_attr_perf_stats.attr,
>  	NULL,
>  };
>  
> @@ -696,6 +826,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>  	struct nd_region_desc ndr_desc;
>  	unsigned long dimm_flags;
>  	int target_nid, online_nid;
> +	u64 stat_size;
>  
>  	p->bus_desc.ndctl = papr_scm_ndctl;
>  	p->bus_desc.module = THIS_MODULE;
> @@ -759,6 +890,14 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>  		dev_info(dev, "Region registered with target node %d and online node %d",
>  			 target_nid, online_nid);
>  
> +	/* Try retriving the stat buffer and see if its supported */
> +	if (!drc_pmem_query_stats(p, NULL, 0, 0, &stat_size)) {
> +		p->len_stat_buffer = (size_t)stat_size;
> +		dev_dbg(&p->pdev->dev, "Max perf-stat size %lu-bytes\n",
> +			p->len_stat_buffer);
> +	} else {
> +		dev_info(&p->pdev->dev, "Limited dimm stat info available\n");
> +	}
>  	return 0;
>  
>  err:	nvdimm_bus_unregister(p->bus);
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
  2020-06-23  5:42   ` Aneesh Kumar K.V
@ 2020-06-23  5:52     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 10+ messages in thread
From: Aneesh Kumar K.V @ 2020-06-23  5:52 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev, linux-nvdimm; +Cc: Vaibhav Jain, Michael Ellerman

Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:

> Vaibhav Jain <vaibhav@linux.ibm.com> writes:
> +		 */
>> +		seq_buf_init(&s, buf, PAGE_SIZE);
>> +		for (index = 0, stat = stats->scm_statistic;
>> +		     index < stats->num_statistics; ++index, ++stat) {
>> +			seq_buf_printf(&s, "%.8s = 0x%016llX\n",
>> +				       stat->statistic_id, stat->statistic_value);
>
>
> That is raw number (statistic_id). Is that useful? Can we map them to user readable
> strings? 

Ok i missed that "%.8s" .

>
>> +		}
>> +	}
>> +
>> +	kfree(stats);
>> +	return rc ? rc : seq_buf_used(&s);
>> +}
>> +DEVICE_ATTR_RO(perf_stats);

-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
  2020-06-22  4:24 ` [PATCH 1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP Vaibhav Jain
  2020-06-23  5:42   ` Aneesh Kumar K.V
@ 2020-06-23 19:02   ` Ira Weiny
  2020-06-24 14:58     ` Vaibhav Jain
  1 sibling, 1 reply; 10+ messages in thread
From: Ira Weiny @ 2020-06-23 19:02 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: linuxppc-dev, linux-nvdimm, Aneesh Kumar K . V, Michael Ellerman

On Mon, Jun 22, 2020 at 09:54:50AM +0530, Vaibhav Jain wrote:
> Update papr_scm.c to query dimm performance statistics from PHYP via
> H_SCM_PERFORMANCE_STATS hcall and export them to user-space as PAPR
> specific NVDIMM attribute 'perf_stats' in sysfs. The patch also
> provide a sysfs ABI documentation for the stats being reported and
> their meanings.
> 
> During NVDIMM probe time in papr_scm_nvdimm_init() a special variant
> of H_SCM_PERFORMANCE_STATS hcall is issued to check if collection of
> performance statistics is supported or not. If successful then a PHYP
> returns a maximum possible buffer length needed to read all
> performance stats. This returned value is stored in a per-nvdimm
> attribute 'len_stat_buffer'.
> 
> The layout of request buffer for reading NVDIMM performance stats from
> PHYP is defined in 'struct papr_scm_perf_stats' and 'struct
> papr_scm_perf_stat'. These structs are used in newly introduced
> drc_pmem_query_stats() that issues the H_SCM_PERFORMANCE_STATS hcall.
> 
> The sysfs access function perf_stats_show() uses value
> 'len_stat_buffer' to allocate a buffer large enough to hold all
> possible NVDIMM performance stats and passes it to
> drc_pmem_query_stats() to populate. Finally statistics reported in the
> buffer are formatted into the sysfs access function output buffer.
> 
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-papr-pmem |  27 ++++
>  arch/powerpc/platforms/pseries/papr_scm.c     | 139 ++++++++++++++++++
>  2 files changed, 166 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
> index 5b10d036a8d4..c1a67275c43f 100644
> --- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
> +++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
> @@ -25,3 +25,30 @@ Description:
>  				  NVDIMM have been scrubbed.
>  		* "locked"	: Indicating that NVDIMM contents cant
>  				  be modified until next power cycle.
> +
> +What:		/sys/bus/nd/devices/nmemX/papr/perf_stats
> +Date:		May, 2020
> +KernelVersion:	v5.9
> +Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
> +Description:
> +		(RO) Report various performance stats related to papr-scm NVDIMM
> +		device.  Each stat is reported on a new line with each line
> +		composed of a stat-identifier followed by it value. Below are
> +		currently known dimm performance stats which are reported:
> +
> +		* "CtlResCt" : Controller Reset Count
> +		* "CtlResTm" : Controller Reset Elapsed Time
> +		* "PonSecs " : Power-on Seconds
> +		* "MemLife " : Life Remaining
> +		* "CritRscU" : Critical Resource Utilization
> +		* "HostLCnt" : Host Load Count
> +		* "HostSCnt" : Host Store Count
> +		* "HostSDur" : Host Store Duration
> +		* "HostLDur" : Host Load Duration
> +		* "MedRCnt " : Media Read Count
> +		* "MedWCnt " : Media Write Count
> +		* "MedRDur " : Media Read Duration
> +		* "MedWDur " : Media Write Duration
> +		* "CchRHCnt" : Cache Read Hit Count
> +		* "CchWHCnt" : Cache Write Hit Count
> +		* "FastWCnt" : Fast Write Count
> \ No newline at end of file
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index 9c569078a09f..cb3f9acc325b 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -62,6 +62,24 @@
>  				    PAPR_PMEM_HEALTH_FATAL |	\
>  				    PAPR_PMEM_HEALTH_UNHEALTHY)
>  
> +#define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
> +#define PAPR_SCM_PERF_STATS_VERSION 0x1
> +
> +/* Struct holding a single performance metric */
> +struct papr_scm_perf_stat {
> +	u8 statistic_id[8];
> +	u64 statistic_value;
> +};
> +
> +/* Struct exchanged between kernel and PHYP for fetching drc perf stats */
> +struct papr_scm_perf_stats {
> +	u8 eye_catcher[8];
> +	u32 stats_version;		/* Should be 0x01 */
                                                     ^^^^
				     PAPR_SCM_PERF_STATS_VERSION?

> +	u32 num_statistics;		/* Number of stats following */
> +	/* zero or more performance matrics */
> +	struct papr_scm_perf_stat scm_statistic[];
> +} __packed;
> +
>  /* private struct associated with each region */
>  struct papr_scm_priv {
>  	struct platform_device *pdev;
> @@ -89,6 +107,9 @@ struct papr_scm_priv {
>  
>  	/* Health information for the dimm */
>  	u64 health_bitmap;
> +
> +	/* length of the stat buffer as expected by phyp */
> +	size_t len_stat_buffer;
>  };
>  
>  static int drc_pmem_bind(struct papr_scm_priv *p)
> @@ -194,6 +215,75 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
>  	return drc_pmem_bind(p);
>  }
>  
> +/*
> + * Query the Dimm performance stats from PHYP and copy them (if returned) to
> + * provided struct papr_scm_perf_stats instance 'stats' of 'size' in bytes.
> + * The value of R4 is copied to 'out' if the pointer is provided.
> + */
> +static int drc_pmem_query_stats(struct papr_scm_priv *p,
> +				struct papr_scm_perf_stats *buff_stats,
> +				size_t size, unsigned int num_stats,
> +				uint64_t *out)
> +{
> +	unsigned long ret[PLPAR_HCALL_BUFSIZE];
> +	struct papr_scm_perf_stat *stats;
> +	s64 rc, i;
> +
> +	/* Setup the out buffer */
> +	if (buff_stats) {
> +		memcpy(buff_stats->eye_catcher,
> +		       PAPR_SCM_PERF_STATS_EYECATCHER, 8);
> +		buff_stats->stats_version =
> +			cpu_to_be32(PAPR_SCM_PERF_STATS_VERSION);
> +		buff_stats->num_statistics =
> +			cpu_to_be32(num_stats);
> +	} else {
> +		/* In case of no out buffer ignore the size */
> +		size = 0;
> +	}
> +
> +	/*
> +	 * Do the HCALL asking PHYP for info and if R4 was requested
> +	 * return its value in 'out' variable.
> +	 */
> +	rc = plpar_hcall(H_SCM_PERFORMANCE_STATS, ret, p->drc_index,
> +			 virt_to_phys(buff_stats), size);

You are calling virt_to_phys(NULL) here when called from
papr_scm_nvdimm_init()!  That can't be right.

> +	if (out)
> +		*out =  ret[0];
> +
> +	if (rc == H_PARTIAL) {
> +		dev_err(&p->pdev->dev,
> +			"Unknown performance stats, Err:0x%016lX\n", ret[0]);
> +		return -ENOENT;
> +	} else if (rc != H_SUCCESS) {
> +		dev_err(&p->pdev->dev,
> +			"Failed to query performance stats, Err:%lld\n", rc);
> +		return -ENXIO;
> +	}
> +
> +	/* Successfully fetched the requested stats from phyp */
> +	if (size != 0) {
> +		buff_stats->num_statistics =
> +			be32_to_cpu(buff_stats->num_statistics);
> +
> +		/* Transform the stats buffer values from BE to cpu native */
> +		for (i = 0, stats = buff_stats->scm_statistic;
> +		     i < buff_stats->num_statistics; ++i) {
> +			stats[i].statistic_value =
> +				be64_to_cpu(stats[i].statistic_value);
> +		}
> +		dev_dbg(&p->pdev->dev,
> +			"Performance stats returned %d stats\n",
> +			buff_stats->num_statistics);
> +	} else {
> +		/* Handle case where stat buffer size was requested */
> +		dev_dbg(&p->pdev->dev,
> +			"Performance stats size %ld\n", ret[0]);
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
>   * health information.
> @@ -631,6 +721,45 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>  	return 0;
>  }
>  
> +static ssize_t perf_stats_show(struct device *dev,
> +			       struct device_attribute *attr, char *buf)
> +{
> +	int index, rc;
> +	struct seq_buf s;
> +	struct papr_scm_perf_stat *stat;
> +	struct papr_scm_perf_stats *stats;
> +	struct nvdimm *dimm = to_nvdimm(dev);
> +	struct papr_scm_priv *p = nvdimm_provider_data(dimm);
> +
> +	if (!p->len_stat_buffer)
> +		return -ENOENT;
> +
> +	/* Allocate the buffer for phyp where stats are written */
> +	stats = kzalloc(p->len_stat_buffer, GFP_KERNEL);

I'm concerned that this buffer does not seem to have anything to do with the
'num_stats' parameter passed to drc_pmem_query_stats().  Furthermore why is
num_stats always 0 in those calls?

> +	if (!stats)
> +		return -ENOMEM;
> +
> +	/* Ask phyp to return all dimm perf stats */
> +	rc = drc_pmem_query_stats(p, stats, p->len_stat_buffer, 0, NULL);
> +	if (!rc) {
> +		/*
> +		 * Go through the returned output buffer and print stats and
> +		 * values. Since statistic_id is essentially a char string of
> +		 * 8 bytes, simply use the string format specifier to print it.
> +		 */
> +		seq_buf_init(&s, buf, PAGE_SIZE);
> +		for (index = 0, stat = stats->scm_statistic;
> +		     index < stats->num_statistics; ++index, ++stat) {
> +			seq_buf_printf(&s, "%.8s = 0x%016llX\n",
> +				       stat->statistic_id, stat->statistic_value);
> +		}
> +	}
> +
> +	kfree(stats);
> +	return rc ? rc : seq_buf_used(&s);
> +}
> +DEVICE_ATTR_RO(perf_stats);
> +
>  static ssize_t flags_show(struct device *dev,
>  			  struct device_attribute *attr, char *buf)
>  {
> @@ -676,6 +805,7 @@ DEVICE_ATTR_RO(flags);
>  /* papr_scm specific dimm attributes */
>  static struct attribute *papr_nd_attributes[] = {
>  	&dev_attr_flags.attr,
> +	&dev_attr_perf_stats.attr,
>  	NULL,
>  };
>  
> @@ -696,6 +826,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>  	struct nd_region_desc ndr_desc;
>  	unsigned long dimm_flags;
>  	int target_nid, online_nid;
> +	u64 stat_size;
>  
>  	p->bus_desc.ndctl = papr_scm_ndctl;
>  	p->bus_desc.module = THIS_MODULE;
> @@ -759,6 +890,14 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>  		dev_info(dev, "Region registered with target node %d and online node %d",
>  			 target_nid, online_nid);
>  
> +	/* Try retriving the stat buffer and see if its supported */
> +	if (!drc_pmem_query_stats(p, NULL, 0, 0, &stat_size)) {
> +		p->len_stat_buffer = (size_t)stat_size;
> +		dev_dbg(&p->pdev->dev, "Max perf-stat size %lu-bytes\n",
> +			p->len_stat_buffer);
> +	} else {
> +		dev_info(&p->pdev->dev, "Limited dimm stat info available\n");

Do we really need this print?

Ira

> +	}
>  	return 0;
>  
>  err:	nvdimm_bus_unregister(p->bus);
> -- 
> 2.26.2
> _______________________________________________
> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
  2020-06-22  4:24 ` [PATCH 2/2] powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric Vaibhav Jain
@ 2020-06-23 19:14   ` Ira Weiny
  2020-06-24 14:03     ` Vaibhav Jain
  0 siblings, 1 reply; 10+ messages in thread
From: Ira Weiny @ 2020-06-23 19:14 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: linuxppc-dev, linux-nvdimm, Aneesh Kumar K . V, Michael Ellerman

On Mon, Jun 22, 2020 at 09:54:51AM +0530, Vaibhav Jain wrote:
> We add support for reporting 'fuel-gauge' NVDIMM metric via
> PAPR_PDSM_HEALTH pdsm payload. 'fuel-gauge' metric indicates the usage
> life remaining of a papr-scm compatible NVDIMM. PHYP exposes this
> metric via the H_SCM_PERFORMANCE_STATS.
> 
> The metric value is returned from the pdsm by extending the return
> payload 'struct nd_papr_pdsm_health' without breaking the ABI. A new
> field 'dimm_fuel_gauge' to hold the metric value is introduced at the
> end of the payload struct and its presence is indicated by by
> extension flag PDSM_DIMM_HEALTH_RUN_GAUGE_VALID.
> 
> The patch introduces a new function papr_pdsm_fuel_gauge() that is
> called from papr_pdsm_health(). If fetching NVDIMM performance stats
> is supported then 'papr_pdsm_fuel_gauge()' allocated an output buffer
> large enough to hold the performance stat and passes it to
> drc_pmem_query_stats() that issues the HCALL to PHYP. The return value
> of the stat is then populated in the 'struct
> nd_papr_pdsm_health.dimm_fuel_gauge' field with extension flag
> 'PDSM_DIMM_HEALTH_RUN_GAUGE_VALID' set in 'struct
> nd_papr_pdsm_health.extension_flags'
> 
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
>  arch/powerpc/include/uapi/asm/papr_pdsm.h |  9 +++++
>  arch/powerpc/platforms/pseries/papr_scm.c | 47 +++++++++++++++++++++++
>  2 files changed, 56 insertions(+)
> 
> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> index 9ccecc1d6840..50ef95e2f5b1 100644
> --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> @@ -72,6 +72,11 @@
>  #define PAPR_PDSM_DIMM_CRITICAL      2
>  #define PAPR_PDSM_DIMM_FATAL         3
>  
> +/* struct nd_papr_pdsm_health.extension_flags field flags */
> +
> +/* Indicate that the 'dimm_fuel_gauge' field is valid */
> +#define PDSM_DIMM_HEALTH_RUN_GAUGE_VALID 1
> +
>  /*
>   * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
>   * Various flags indicate the health status of the dimm.
> @@ -84,6 +89,7 @@
>   * dimm_locked		: Contents of the dimm cant be modified until CEC reboot
>   * dimm_encrypted	: Contents of dimm are encrypted.
>   * dimm_health		: Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
> + * dimm_fuel_gauge	: Life remaining of DIMM as a percentage from 0-100
>   */
>  struct nd_papr_pdsm_health {
>  	union {
> @@ -96,6 +102,9 @@ struct nd_papr_pdsm_health {
>  			__u8 dimm_locked;
>  			__u8 dimm_encrypted;
>  			__u16 dimm_health;
> +
> +			/* Extension flag PDSM_DIMM_HEALTH_RUN_GAUGE_VALID */
> +			__u16 dimm_fuel_gauge;
>  		};
>  		__u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
>  	};
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index cb3f9acc325b..39527cd38d9c 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -506,6 +506,45 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>  	return 0;
>  }
>  
> +static int papr_pdsm_fuel_gauge(struct papr_scm_priv *p,
> +				union nd_pdsm_payload *payload)
> +{
> +	int rc, size;
> +	struct papr_scm_perf_stat *stat;
> +	struct papr_scm_perf_stats *stats;
> +
> +	/* Silently fail if fetching performance metrics isn't  supported */
> +	if (!p->len_stat_buffer)
> +		return 0;
> +
> +	/* Allocate request buffer enough to hold single performance stat */
> +	size = sizeof(struct papr_scm_perf_stats) +
> +		sizeof(struct papr_scm_perf_stat);
> +
> +	stats = kzalloc(size, GFP_KERNEL);
> +	if (!stats)
> +		return -ENOMEM;
> +
> +	stat = &stats->scm_statistic[0];
> +	memcpy(&stat->statistic_id, "MemLife ", sizeof(stat->statistic_id));
> +	stat->statistic_value = 0;
> +
> +	/* Fetch the fuel gauge and populate it in payload */
> +	rc = drc_pmem_query_stats(p, stats, size, 1, NULL);
> +	if (!rc) {

Always best to except the error case...

	if (rc) {
		... print debuging from below...
		goto free_stats;
	}

> +		dev_dbg(&p->pdev->dev,
> +			"Fetched fuel-gauge %llu", stat->statistic_value);
> +		payload->health.extension_flags |=
> +			PDSM_DIMM_HEALTH_RUN_GAUGE_VALID;
> +		payload->health.dimm_fuel_gauge = stat->statistic_value;
> +
> +		rc = sizeof(struct nd_papr_pdsm_health);
> +	}
> +

free_stats:

> +	kfree(stats);
> +	return rc;
> +}
> +
>  /* Fetch the DIMM health info and populate it in provided package. */
>  static int papr_pdsm_health(struct papr_scm_priv *p,
>  			    union nd_pdsm_payload *payload)
> @@ -546,6 +585,14 @@ static int papr_pdsm_health(struct papr_scm_priv *p,
>  
>  	/* struct populated hence can release the mutex now */
>  	mutex_unlock(&p->health_mutex);
> +
> +	/* Populate the fuel gauge meter in the payload */
> +	rc = papr_pdsm_fuel_gauge(p, payload);
> +
> +	/* Error fetching fuel gauge is not fatal */
> +	if (rc < 0)
> +		dev_dbg(&p->pdev->dev, "Err(%d) fetching fuel gauge\n", rc);

Why even return an error?  Just have *_fuel_guage() the print the debugging and
return void.

> +
>  	rc = sizeof(struct nd_papr_pdsm_health);

You just override rc here anyway...

Ira

>  
>  out:
> -- 
> 2.26.2
> _______________________________________________
> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
  2020-06-23 19:14   ` Ira Weiny
@ 2020-06-24 14:03     ` Vaibhav Jain
  0 siblings, 0 replies; 10+ messages in thread
From: Vaibhav Jain @ 2020-06-24 14:03 UTC (permalink / raw)
  To: Ira Weiny
  Cc: linuxppc-dev, linux-nvdimm, Aneesh Kumar K . V, Michael Ellerman

Thanks for reviewing this patch Ira,

My responses below:

Ira Weiny <ira.weiny@intel.com> writes:

[snip]
>> +static int papr_pdsm_fuel_gauge(struct papr_scm_priv *p,
>> +				union nd_pdsm_payload *payload)
>> +{
>> +	int rc, size;
>> +	struct papr_scm_perf_stat *stat;
>> +	struct papr_scm_perf_stats *stats;
>> +
>> +	/* Silently fail if fetching performance metrics isn't  supported */
>> +	if (!p->len_stat_buffer)
>> +		return 0;
>> +
>> +	/* Allocate request buffer enough to hold single performance stat */
>> +	size = sizeof(struct papr_scm_perf_stats) +
>> +		sizeof(struct papr_scm_perf_stat);
>> +
>> +	stats = kzalloc(size, GFP_KERNEL);
>> +	if (!stats)
>> +		return -ENOMEM;
>> +
>> +	stat = &stats->scm_statistic[0];
>> +	memcpy(&stat->statistic_id, "MemLife ", sizeof(stat->statistic_id));
>> +	stat->statistic_value = 0;
>> +
>> +	/* Fetch the fuel gauge and populate it in payload */
>> +	rc = drc_pmem_query_stats(p, stats, size, 1, NULL);
>> +	if (!rc) {
>
> Always best to except the error case...
>
> 	if (rc) {
> 		... print debuging from below...
> 		goto free_stats;
> 	}
>
Sure, I don't feel strongly about it. Will update this in v2.

>> +		dev_dbg(&p->pdev->dev,
>> +			"Fetched fuel-gauge %llu", stat->statistic_value);
>> +		payload->health.extension_flags |=
>> +			PDSM_DIMM_HEALTH_RUN_GAUGE_VALID;
>> +		payload->health.dimm_fuel_gauge = stat->statistic_value;
>> +
>> +		rc = sizeof(struct nd_papr_pdsm_health);
>> +	}
>> +
>
> free_stats:
>
>> +	kfree(stats);
>> +	return rc;
>> +}
>> +
>>  /* Fetch the DIMM health info and populate it in provided package. */
>>  static int papr_pdsm_health(struct papr_scm_priv *p,
>>  			    union nd_pdsm_payload *payload)
>> @@ -546,6 +585,14 @@ static int papr_pdsm_health(struct papr_scm_priv *p,
>>  
>>  	/* struct populated hence can release the mutex now */
>>  	mutex_unlock(&p->health_mutex);
>> +
>> +	/* Populate the fuel gauge meter in the payload */
>> +	rc = papr_pdsm_fuel_gauge(p, payload);
>> +
>> +	/* Error fetching fuel gauge is not fatal */
>> +	if (rc < 0)
>> +		dev_dbg(&p->pdev->dev, "Err(%d) fetching fuel gauge\n", rc);
>
> Why even return an error?  Just have *_fuel_guage() the print the debugging and
> return void.
>
papr_pdsm_fuel_gauge uses the same signature as other PDSM service
functions as described in pdsm_cmd_desc.service callback. Hence designed
the function signature as such.

>> +
>>  	rc = sizeof(struct nd_papr_pdsm_health);
>
> You just override rc here anyway...
>
> Ira
>
>>  
>>  out:
>> -- 
>> 2.26.2
>> _______________________________________________
>> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
>> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

-- 
Cheers
~ Vaibhav
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
  2020-06-23 19:02   ` Ira Weiny
@ 2020-06-24 14:58     ` Vaibhav Jain
  2020-06-24 17:33       ` Ira Weiny
  0 siblings, 1 reply; 10+ messages in thread
From: Vaibhav Jain @ 2020-06-24 14:58 UTC (permalink / raw)
  To: Ira Weiny; +Cc: Aneesh Kumar K . V, linuxppc-dev, linux-nvdimm

Thanks for reviewing this patch Ira,

My responses below inline.

Ira Weiny <ira.weiny@intel.com> writes:

> On Mon, Jun 22, 2020 at 09:54:50AM +0530, Vaibhav Jain wrote:
>> Update papr_scm.c to query dimm performance statistics from PHYP via
>> H_SCM_PERFORMANCE_STATS hcall and export them to user-space as PAPR
>> specific NVDIMM attribute 'perf_stats' in sysfs. The patch also
>> provide a sysfs ABI documentation for the stats being reported and
>> their meanings.
>> 
>> During NVDIMM probe time in papr_scm_nvdimm_init() a special variant
>> of H_SCM_PERFORMANCE_STATS hcall is issued to check if collection of
>> performance statistics is supported or not. If successful then a PHYP
>> returns a maximum possible buffer length needed to read all
>> performance stats. This returned value is stored in a per-nvdimm
>> attribute 'len_stat_buffer'.
>> 
>> The layout of request buffer for reading NVDIMM performance stats from
>> PHYP is defined in 'struct papr_scm_perf_stats' and 'struct
>> papr_scm_perf_stat'. These structs are used in newly introduced
>> drc_pmem_query_stats() that issues the H_SCM_PERFORMANCE_STATS hcall.
>> 
>> The sysfs access function perf_stats_show() uses value
>> 'len_stat_buffer' to allocate a buffer large enough to hold all
>> possible NVDIMM performance stats and passes it to
>> drc_pmem_query_stats() to populate. Finally statistics reported in the
>> buffer are formatted into the sysfs access function output buffer.
>> 
>> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
>> ---
>>  Documentation/ABI/testing/sysfs-bus-papr-pmem |  27 ++++
>>  arch/powerpc/platforms/pseries/papr_scm.c     | 139 ++++++++++++++++++
>>  2 files changed, 166 insertions(+)
>> 
>> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
>> index 5b10d036a8d4..c1a67275c43f 100644
>> --- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
>> +++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
>> @@ -25,3 +25,30 @@ Description:
>>  				  NVDIMM have been scrubbed.
>>  		* "locked"	: Indicating that NVDIMM contents cant
>>  				  be modified until next power cycle.
>> +
>> +What:		/sys/bus/nd/devices/nmemX/papr/perf_stats
>> +Date:		May, 2020
>> +KernelVersion:	v5.9
>> +Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
>> +Description:
>> +		(RO) Report various performance stats related to papr-scm NVDIMM
>> +		device.  Each stat is reported on a new line with each line
>> +		composed of a stat-identifier followed by it value. Below are
>> +		currently known dimm performance stats which are reported:
>> +
>> +		* "CtlResCt" : Controller Reset Count
>> +		* "CtlResTm" : Controller Reset Elapsed Time
>> +		* "PonSecs " : Power-on Seconds
>> +		* "MemLife " : Life Remaining
>> +		* "CritRscU" : Critical Resource Utilization
>> +		* "HostLCnt" : Host Load Count
>> +		* "HostSCnt" : Host Store Count
>> +		* "HostSDur" : Host Store Duration
>> +		* "HostLDur" : Host Load Duration
>> +		* "MedRCnt " : Media Read Count
>> +		* "MedWCnt " : Media Write Count
>> +		* "MedRDur " : Media Read Duration
>> +		* "MedWDur " : Media Write Duration
>> +		* "CchRHCnt" : Cache Read Hit Count
>> +		* "CchWHCnt" : Cache Write Hit Count
>> +		* "FastWCnt" : Fast Write Count
>> \ No newline at end of file
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> index 9c569078a09f..cb3f9acc325b 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -62,6 +62,24 @@
>>  				    PAPR_PMEM_HEALTH_FATAL |	\
>>  				    PAPR_PMEM_HEALTH_UNHEALTHY)
>>  
>> +#define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
>> +#define PAPR_SCM_PERF_STATS_VERSION 0x1
>> +
>> +/* Struct holding a single performance metric */
>> +struct papr_scm_perf_stat {
>> +	u8 statistic_id[8];
>> +	u64 statistic_value;
>> +};
>> +
>> +/* Struct exchanged between kernel and PHYP for fetching drc perf stats */
>> +struct papr_scm_perf_stats {
>> +	u8 eye_catcher[8];
>> +	u32 stats_version;		/* Should be 0x01 */
>                                                      ^^^^
> 				     PAPR_SCM_PERF_STATS_VERSION?
Sure. Will update in v2

>
>> +	u32 num_statistics;		/* Number of stats following */
>> +	/* zero or more performance matrics */
>> +	struct papr_scm_perf_stat scm_statistic[];
>> +} __packed;
>> +
>>  /* private struct associated with each region */
>>  struct papr_scm_priv {
>>  	struct platform_device *pdev;
>> @@ -89,6 +107,9 @@ struct papr_scm_priv {
>>  
>>  	/* Health information for the dimm */
>>  	u64 health_bitmap;
>> +
>> +	/* length of the stat buffer as expected by phyp */
>> +	size_t len_stat_buffer;
>>  };
>>  
>>  static int drc_pmem_bind(struct papr_scm_priv *p)
>> @@ -194,6 +215,75 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
>>  	return drc_pmem_bind(p);
>>  }
>>  
>> +/*
>> + * Query the Dimm performance stats from PHYP and copy them (if returned) to
>> + * provided struct papr_scm_perf_stats instance 'stats' of 'size' in bytes.
>> + * The value of R4 is copied to 'out' if the pointer is provided.
>> + */
>> +static int drc_pmem_query_stats(struct papr_scm_priv *p,
>> +				struct papr_scm_perf_stats *buff_stats,
>> +				size_t size, unsigned int num_stats,
>> +				uint64_t *out)
>> +{
>> +	unsigned long ret[PLPAR_HCALL_BUFSIZE];
>> +	struct papr_scm_perf_stat *stats;
>> +	s64 rc, i;
>> +
>> +	/* Setup the out buffer */
>> +	if (buff_stats) {
>> +		memcpy(buff_stats->eye_catcher,
>> +		       PAPR_SCM_PERF_STATS_EYECATCHER, 8);
>> +		buff_stats->stats_version =
>> +			cpu_to_be32(PAPR_SCM_PERF_STATS_VERSION);
>> +		buff_stats->num_statistics =
>> +			cpu_to_be32(num_stats);
>> +	} else {
>> +		/* In case of no out buffer ignore the size */
>> +		size = 0;
>> +	}
>> +
>> +	/*
>> +	 * Do the HCALL asking PHYP for info and if R4 was requested
>> +	 * return its value in 'out' variable.
>> +	 */
>> +	rc = plpar_hcall(H_SCM_PERFORMANCE_STATS, ret, p->drc_index,
>> +			 virt_to_phys(buff_stats), size);
>
> You are calling virt_to_phys(NULL) here when called from
> papr_scm_nvdimm_init()!  That can't be right.
Thanks for cathing this. However if the 'size' is '0' the 'buff_stats'
address is ignored by the hypervisor hence this didnt get caught in my
tests. Though CONFIG_DEBUG_VIRTUAL would have caught it early.

>
>> +	if (out)
>> +		*out =  ret[0];
>> +
>> +	if (rc == H_PARTIAL) {
>> +		dev_err(&p->pdev->dev,
>> +			"Unknown performance stats, Err:0x%016lX\n", ret[0]);
>> +		return -ENOENT;
>> +	} else if (rc != H_SUCCESS) {
>> +		dev_err(&p->pdev->dev,
>> +			"Failed to query performance stats, Err:%lld\n", rc);
>> +		return -ENXIO;
>> +	}
>> +
>> +	/* Successfully fetched the requested stats from phyp */
>> +	if (size != 0) {
>> +		buff_stats->num_statistics =
>> +			be32_to_cpu(buff_stats->num_statistics);
>> +
>> +		/* Transform the stats buffer values from BE to cpu native */
>> +		for (i = 0, stats = buff_stats->scm_statistic;
>> +		     i < buff_stats->num_statistics; ++i) {
>> +			stats[i].statistic_value =
>> +				be64_to_cpu(stats[i].statistic_value);
>> +		}
>> +		dev_dbg(&p->pdev->dev,
>> +			"Performance stats returned %d stats\n",
>> +			buff_stats->num_statistics);
>> +	} else {
>> +		/* Handle case where stat buffer size was requested */
>> +		dev_dbg(&p->pdev->dev,
>> +			"Performance stats size %ld\n", ret[0]);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  /*
>>   * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
>>   * health information.
>> @@ -631,6 +721,45 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>>  	return 0;
>>  }
>>  
>> +static ssize_t perf_stats_show(struct device *dev,
>> +			       struct device_attribute *attr, char *buf)
>> +{
>> +	int index, rc;
>> +	struct seq_buf s;
>> +	struct papr_scm_perf_stat *stat;
>> +	struct papr_scm_perf_stats *stats;
>> +	struct nvdimm *dimm = to_nvdimm(dev);
>> +	struct papr_scm_priv *p = nvdimm_provider_data(dimm);
>> +
>> +	if (!p->len_stat_buffer)
>> +		return -ENOENT;
>> +
>> +	/* Allocate the buffer for phyp where stats are written */
>> +	stats = kzalloc(p->len_stat_buffer, GFP_KERNEL);
>
> I'm concerned that this buffer does not seem to have anything to do with the
> 'num_stats' parameter passed to drc_pmem_query_stats().  Furthermore why is
> num_stats always 0 in those calls?
>
'num_stats == 0' is a special case of the hcall where PHYP returns all
the possible stats in the 'stats' buffer.

>> +	if (!stats)
>> +		return -ENOMEM;
>> +
>> +	/* Ask phyp to return all dimm perf stats */
>> +	rc = drc_pmem_query_stats(p, stats, p->len_stat_buffer, 0, NULL);
>> +	if (!rc) {
>> +		/*
>> +		 * Go through the returned output buffer and print stats and
>> +		 * values. Since statistic_id is essentially a char string of
>> +		 * 8 bytes, simply use the string format specifier to print it.
>> +		 */
>> +		seq_buf_init(&s, buf, PAGE_SIZE);
>> +		for (index = 0, stat = stats->scm_statistic;
>> +		     index < stats->num_statistics; ++index, ++stat) {
>> +			seq_buf_printf(&s, "%.8s = 0x%016llX\n",
>> +				       stat->statistic_id, stat->statistic_value);
>> +		}
>> +	}
>> +
>> +	kfree(stats);
>> +	return rc ? rc : seq_buf_used(&s);
>> +}
>> +DEVICE_ATTR_RO(perf_stats);
>> +
>>  static ssize_t flags_show(struct device *dev,
>>  			  struct device_attribute *attr, char *buf)
>>  {
>> @@ -676,6 +805,7 @@ DEVICE_ATTR_RO(flags);
>>  /* papr_scm specific dimm attributes */
>>  static struct attribute *papr_nd_attributes[] = {
>>  	&dev_attr_flags.attr,
>> +	&dev_attr_perf_stats.attr,
>>  	NULL,
>>  };
>>  
>> @@ -696,6 +826,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>>  	struct nd_region_desc ndr_desc;
>>  	unsigned long dimm_flags;
>>  	int target_nid, online_nid;
>> +	u64 stat_size;
>>  
>>  	p->bus_desc.ndctl = papr_scm_ndctl;
>>  	p->bus_desc.module = THIS_MODULE;
>> @@ -759,6 +890,14 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>>  		dev_info(dev, "Region registered with target node %d and online node %d",
>>  			 target_nid, online_nid);
>>  
>> +	/* Try retriving the stat buffer and see if its supported */
>> +	if (!drc_pmem_query_stats(p, NULL, 0, 0, &stat_size)) {
>> +		p->len_stat_buffer = (size_t)stat_size;
>> +		dev_dbg(&p->pdev->dev, "Max perf-stat size %lu-bytes\n",
>> +			p->len_stat_buffer);
>> +	} else {
>> +		dev_info(&p->pdev->dev, "Limited dimm stat info available\n");
>
> Do we really need this print?
nvdimm performance stats can be selectively turned on/off from the
hypervisor management console hence this info message is more like a
warning indicating that extended dimm stat info like 'fuel_gauge' is not
available.

>
> Ira
>
>> +	}
>>  	return 0;
>>  
>>  err:	nvdimm_bus_unregister(p->bus);
>> -- 
>> 2.26.2
>> _______________________________________________
>> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
>> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

-- 
Cheers
~ Vaibhav
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
  2020-06-24 14:58     ` Vaibhav Jain
@ 2020-06-24 17:33       ` Ira Weiny
  0 siblings, 0 replies; 10+ messages in thread
From: Ira Weiny @ 2020-06-24 17:33 UTC (permalink / raw)
  To: Vaibhav Jain; +Cc: Aneesh Kumar K . V, linuxppc-dev, linux-nvdimm

On Wed, Jun 24, 2020 at 08:28:57PM +0530, Vaibhav Jain wrote:
> Thanks for reviewing this patch Ira,
> 
> My responses below inline.
> 
> Ira Weiny <ira.weiny@intel.com> writes:
> 
> > On Mon, Jun 22, 2020 at 09:54:50AM +0530, Vaibhav Jain wrote:
> >> Update papr_scm.c to query dimm performance statistics from PHYP via
> >> H_SCM_PERFORMANCE_STATS hcall and export them to user-space as PAPR
> >> specific NVDIMM attribute 'perf_stats' in sysfs. The patch also
> >> provide a sysfs ABI documentation for the stats being reported and
> >> their meanings.
> >> 
> >> During NVDIMM probe time in papr_scm_nvdimm_init() a special variant
> >> of H_SCM_PERFORMANCE_STATS hcall is issued to check if collection of
> >> performance statistics is supported or not. If successful then a PHYP
> >> returns a maximum possible buffer length needed to read all
> >> performance stats. This returned value is stored in a per-nvdimm
> >> attribute 'len_stat_buffer'.
> >> 
> >> The layout of request buffer for reading NVDIMM performance stats from
> >> PHYP is defined in 'struct papr_scm_perf_stats' and 'struct
> >> papr_scm_perf_stat'. These structs are used in newly introduced
> >> drc_pmem_query_stats() that issues the H_SCM_PERFORMANCE_STATS hcall.
> >> 
> >> The sysfs access function perf_stats_show() uses value
> >> 'len_stat_buffer' to allocate a buffer large enough to hold all
> >> possible NVDIMM performance stats and passes it to
> >> drc_pmem_query_stats() to populate. Finally statistics reported in the
> >> buffer are formatted into the sysfs access function output buffer.
> >> 
> >> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> >> ---
> >>  Documentation/ABI/testing/sysfs-bus-papr-pmem |  27 ++++
> >>  arch/powerpc/platforms/pseries/papr_scm.c     | 139 ++++++++++++++++++
> >>  2 files changed, 166 insertions(+)
> >> 
> >> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
> >> index 5b10d036a8d4..c1a67275c43f 100644
> >> --- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
> >> +++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
> >> @@ -25,3 +25,30 @@ Description:
> >>  				  NVDIMM have been scrubbed.
> >>  		* "locked"	: Indicating that NVDIMM contents cant
> >>  				  be modified until next power cycle.
> >> +
> >> +What:		/sys/bus/nd/devices/nmemX/papr/perf_stats
> >> +Date:		May, 2020
> >> +KernelVersion:	v5.9
> >> +Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
> >> +Description:
> >> +		(RO) Report various performance stats related to papr-scm NVDIMM
> >> +		device.  Each stat is reported on a new line with each line
> >> +		composed of a stat-identifier followed by it value. Below are
> >> +		currently known dimm performance stats which are reported:
> >> +
> >> +		* "CtlResCt" : Controller Reset Count
> >> +		* "CtlResTm" : Controller Reset Elapsed Time
> >> +		* "PonSecs " : Power-on Seconds
> >> +		* "MemLife " : Life Remaining
> >> +		* "CritRscU" : Critical Resource Utilization
> >> +		* "HostLCnt" : Host Load Count
> >> +		* "HostSCnt" : Host Store Count
> >> +		* "HostSDur" : Host Store Duration
> >> +		* "HostLDur" : Host Load Duration
> >> +		* "MedRCnt " : Media Read Count
> >> +		* "MedWCnt " : Media Write Count
> >> +		* "MedRDur " : Media Read Duration
> >> +		* "MedWDur " : Media Write Duration
> >> +		* "CchRHCnt" : Cache Read Hit Count
> >> +		* "CchWHCnt" : Cache Write Hit Count
> >> +		* "FastWCnt" : Fast Write Count
> >> \ No newline at end of file
> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> >> index 9c569078a09f..cb3f9acc325b 100644
> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> >> @@ -62,6 +62,24 @@
> >>  				    PAPR_PMEM_HEALTH_FATAL |	\
> >>  				    PAPR_PMEM_HEALTH_UNHEALTHY)
> >>  
> >> +#define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
> >> +#define PAPR_SCM_PERF_STATS_VERSION 0x1
> >> +
> >> +/* Struct holding a single performance metric */
> >> +struct papr_scm_perf_stat {
> >> +	u8 statistic_id[8];
> >> +	u64 statistic_value;
> >> +};
> >> +
> >> +/* Struct exchanged between kernel and PHYP for fetching drc perf stats */
> >> +struct papr_scm_perf_stats {
> >> +	u8 eye_catcher[8];
> >> +	u32 stats_version;		/* Should be 0x01 */
> >                                                      ^^^^
> > 				     PAPR_SCM_PERF_STATS_VERSION?
> Sure. Will update in v2
> 
> >
> >> +	u32 num_statistics;		/* Number of stats following */
> >> +	/* zero or more performance matrics */
> >> +	struct papr_scm_perf_stat scm_statistic[];
> >> +} __packed;
> >> +
> >>  /* private struct associated with each region */
> >>  struct papr_scm_priv {
> >>  	struct platform_device *pdev;
> >> @@ -89,6 +107,9 @@ struct papr_scm_priv {
> >>  
> >>  	/* Health information for the dimm */
> >>  	u64 health_bitmap;
> >> +
> >> +	/* length of the stat buffer as expected by phyp */
> >> +	size_t len_stat_buffer;
> >>  };
> >>  
> >>  static int drc_pmem_bind(struct papr_scm_priv *p)
> >> @@ -194,6 +215,75 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
> >>  	return drc_pmem_bind(p);
> >>  }
> >>  
> >> +/*
> >> + * Query the Dimm performance stats from PHYP and copy them (if returned) to
> >> + * provided struct papr_scm_perf_stats instance 'stats' of 'size' in bytes.
> >> + * The value of R4 is copied to 'out' if the pointer is provided.
> >> + */
> >> +static int drc_pmem_query_stats(struct papr_scm_priv *p,
> >> +				struct papr_scm_perf_stats *buff_stats,
> >> +				size_t size, unsigned int num_stats,
> >> +				uint64_t *out)
> >> +{
> >> +	unsigned long ret[PLPAR_HCALL_BUFSIZE];
> >> +	struct papr_scm_perf_stat *stats;
> >> +	s64 rc, i;
> >> +
> >> +	/* Setup the out buffer */
> >> +	if (buff_stats) {
> >> +		memcpy(buff_stats->eye_catcher,
> >> +		       PAPR_SCM_PERF_STATS_EYECATCHER, 8);
> >> +		buff_stats->stats_version =
> >> +			cpu_to_be32(PAPR_SCM_PERF_STATS_VERSION);
> >> +		buff_stats->num_statistics =
> >> +			cpu_to_be32(num_stats);
> >> +	} else {
> >> +		/* In case of no out buffer ignore the size */
> >> +		size = 0;
> >> +	}
> >> +
> >> +	/*
> >> +	 * Do the HCALL asking PHYP for info and if R4 was requested
> >> +	 * return its value in 'out' variable.
> >> +	 */
> >> +	rc = plpar_hcall(H_SCM_PERFORMANCE_STATS, ret, p->drc_index,
> >> +			 virt_to_phys(buff_stats), size);
> >
> > You are calling virt_to_phys(NULL) here when called from
> > papr_scm_nvdimm_init()!  That can't be right.
> Thanks for cathing this. However if the 'size' is '0' the 'buff_stats'
> address is ignored by the hypervisor hence this didnt get caught in my
> tests. Though CONFIG_DEBUG_VIRTUAL would have caught it early.
> 
> >
> >> +	if (out)
> >> +		*out =  ret[0];
> >> +
> >> +	if (rc == H_PARTIAL) {
> >> +		dev_err(&p->pdev->dev,
> >> +			"Unknown performance stats, Err:0x%016lX\n", ret[0]);
> >> +		return -ENOENT;
> >> +	} else if (rc != H_SUCCESS) {
> >> +		dev_err(&p->pdev->dev,
> >> +			"Failed to query performance stats, Err:%lld\n", rc);
> >> +		return -ENXIO;
> >> +	}
> >> +
> >> +	/* Successfully fetched the requested stats from phyp */
> >> +	if (size != 0) {
> >> +		buff_stats->num_statistics =
> >> +			be32_to_cpu(buff_stats->num_statistics);
> >> +
> >> +		/* Transform the stats buffer values from BE to cpu native */
> >> +		for (i = 0, stats = buff_stats->scm_statistic;
> >> +		     i < buff_stats->num_statistics; ++i) {
> >> +			stats[i].statistic_value =
> >> +				be64_to_cpu(stats[i].statistic_value);
> >> +		}
> >> +		dev_dbg(&p->pdev->dev,
> >> +			"Performance stats returned %d stats\n",
> >> +			buff_stats->num_statistics);
> >> +	} else {
> >> +		/* Handle case where stat buffer size was requested */
> >> +		dev_dbg(&p->pdev->dev,
> >> +			"Performance stats size %ld\n", ret[0]);
> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  /*
> >>   * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
> >>   * health information.
> >> @@ -631,6 +721,45 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> >>  	return 0;
> >>  }
> >>  
> >> +static ssize_t perf_stats_show(struct device *dev,
> >> +			       struct device_attribute *attr, char *buf)
> >> +{
> >> +	int index, rc;
> >> +	struct seq_buf s;
> >> +	struct papr_scm_perf_stat *stat;
> >> +	struct papr_scm_perf_stats *stats;
> >> +	struct nvdimm *dimm = to_nvdimm(dev);
> >> +	struct papr_scm_priv *p = nvdimm_provider_data(dimm);
> >> +
> >> +	if (!p->len_stat_buffer)
> >> +		return -ENOENT;
> >> +
> >> +	/* Allocate the buffer for phyp where stats are written */
> >> +	stats = kzalloc(p->len_stat_buffer, GFP_KERNEL);
> >
> > I'm concerned that this buffer does not seem to have anything to do with the
> > 'num_stats' parameter passed to drc_pmem_query_stats().  Furthermore why is
> > num_stats always 0 in those calls?
> >
> 'num_stats == 0' is a special case of the hcall where PHYP returns all
> the possible stats in the 'stats' buffer.

So how does the above allocate ensure that the buffer length is big enough to
cover all possible stats with this special case?

Ok I think I see that len_stat_buffer is set below after a query (presumably to
the hardware).

> 
> >> +	if (!stats)
> >> +		return -ENOMEM;
> >> +
> >> +	/* Ask phyp to return all dimm perf stats */
> >> +	rc = drc_pmem_query_stats(p, stats, p->len_stat_buffer, 0, NULL);
> >> +	if (!rc) {
> >> +		/*
> >> +		 * Go through the returned output buffer and print stats and
> >> +		 * values. Since statistic_id is essentially a char string of
> >> +		 * 8 bytes, simply use the string format specifier to print it.
> >> +		 */
> >> +		seq_buf_init(&s, buf, PAGE_SIZE);
> >> +		for (index = 0, stat = stats->scm_statistic;
> >> +		     index < stats->num_statistics; ++index, ++stat) {
> >> +			seq_buf_printf(&s, "%.8s = 0x%016llX\n",
> >> +				       stat->statistic_id, stat->statistic_value);
> >> +		}
> >> +	}
> >> +
> >> +	kfree(stats);
> >> +	return rc ? rc : seq_buf_used(&s);
> >> +}
> >> +DEVICE_ATTR_RO(perf_stats);
> >> +
> >>  static ssize_t flags_show(struct device *dev,
> >>  			  struct device_attribute *attr, char *buf)
> >>  {
> >> @@ -676,6 +805,7 @@ DEVICE_ATTR_RO(flags);
> >>  /* papr_scm specific dimm attributes */
> >>  static struct attribute *papr_nd_attributes[] = {
> >>  	&dev_attr_flags.attr,
> >> +	&dev_attr_perf_stats.attr,
> >>  	NULL,
> >>  };
> >>  
> >> @@ -696,6 +826,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> >>  	struct nd_region_desc ndr_desc;
> >>  	unsigned long dimm_flags;
> >>  	int target_nid, online_nid;
> >> +	u64 stat_size;
> >>  
> >>  	p->bus_desc.ndctl = papr_scm_ndctl;
> >>  	p->bus_desc.module = THIS_MODULE;
> >> @@ -759,6 +890,14 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> >>  		dev_info(dev, "Region registered with target node %d and online node %d",
> >>  			 target_nid, online_nid);
> >>  
> >> +	/* Try retriving the stat buffer and see if its supported */
> >> +	if (!drc_pmem_query_stats(p, NULL, 0, 0, &stat_size)) {
> >> +		p->len_stat_buffer = (size_t)stat_size;
> >> +		dev_dbg(&p->pdev->dev, "Max perf-stat size %lu-bytes\n",
> >> +			p->len_stat_buffer);
> >> +	} else {
> >> +		dev_info(&p->pdev->dev, "Limited dimm stat info available\n");
> >
> > Do we really need this print?
> nvdimm performance stats can be selectively turned on/off from the
> hypervisor management console hence this info message is more like a
> warning indicating that extended dimm stat info like 'fuel_gauge' is not
> available.

Ah... But this is saying that the stat info _is_ available?  ("info available")

Should this be dev_warn(..., "... info not available\n")?

Ira

> 
> >
> > Ira
> >
> >> +	}
> >>  	return 0;
> >>  
> >>  err:	nvdimm_bus_unregister(p->bus);
> >> -- 
> >> 2.26.2
> >> _______________________________________________
> >> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> >> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
> 
> -- 
> Cheers
> ~ Vaibhav
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-06-24 17:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-22  4:24 [PATCH 0/2] powerpc/papr_scm: add support for reporting NVDIMM 'life_used_percentage' metric Vaibhav Jain
2020-06-22  4:24 ` [PATCH 1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP Vaibhav Jain
2020-06-23  5:42   ` Aneesh Kumar K.V
2020-06-23  5:52     ` Aneesh Kumar K.V
2020-06-23 19:02   ` Ira Weiny
2020-06-24 14:58     ` Vaibhav Jain
2020-06-24 17:33       ` Ira Weiny
2020-06-22  4:24 ` [PATCH 2/2] powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric Vaibhav Jain
2020-06-23 19:14   ` Ira Weiny
2020-06-24 14:03     ` Vaibhav Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).