* [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
@ 2021-05-15 7:49 ` Vaibhav Jain
0 siblings, 0 replies; 12+ messages in thread
From: Vaibhav Jain @ 2021-05-15 7:37 UTC (permalink / raw)
To: qemu-devel, kvm-ppc, qemu-ppc, david, mst, imammedo, xiaoguangrong.eric
Cc: ehabkost, aneesh.kumar, groug, shivaprasadbhat, bharata, Vaibhav Jain
Add support for H_SCM_PERFORMANCE_STATS described at [1] for
spapr nvdimms. This enables guest to fetch performance stats[2] like
expected life of an nvdimm ('MemLife ') etc and display them to the
user. Linux kernel support for fetching these performance stats and
exposing them to the user-space was done via [3].
The hcall semantics mandate that each nvdimm performance stats is
uniquely identied by a 8-byte ascii string encoded as an unsigned
integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a
8-byte unsigned integer. These performance-stats are exchanged with
the guest in via a guest allocated buffer called
'requestAndResultBuffer' or rr-buffer for short. This buffer contains
a header descibed by 'struct papr_scm_perf_stats' followed by an array
of performance-stats described by 'struct papr_scm_perf_stat'. The
hypervisor is expected to validate the rr-buffer header and then based
on the request copy the needed performance-stats to the array of
'struct papr_scm_perf_stat' following the header.
The patch proposes a new function h_scm_performance_stats() that
services the H_SCM_PERFORMANCE_STATS hcall. After verifying the
validity of the rr-buffer header via scm_perf_check_rr_buffer() it
proceeds to fill the rr-buffer with requested performance-stats. The
value of individual stats is retrived from individual accessor
function for the stat which are indexed in the array
'nvdimm_perf_stats'.
References:
[1] "Hypercall Op-codes (hcalls)"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269
[2] Sysfs attribute documentation for papr_scm
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36
[3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP"
https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog
Since RFC-v2:
* s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the
minimum size buffer needed to return all supported performance
stats. Value for this is derived from size of array 'nvdimm_perf_stats'
* Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported
rr-buffer size from a guest. Value for this is derived from hard
limit of SCM_STATS_MAX_STATS.
* Updated scm_perf_check_rr_buffer() to add a check for max size of
rr-buffer. [David]
* Updated scm_perf_check_rr_buffer() to removed a reduntant check for
min size of rr-buffer [David]
* Updated h_scm_performance_stats() to have a single allocation for
rr-buffer and copy it back to guest memory in a single shot. [David]
Since RFC-v1:
* Removed empty lines from code [ David ]
* Updated struct papr_scm_perf_stat to use uint64_t for
statistic_id.
* Added a hard limit on max number of stats requested to 255 [ David ]
* Updated scm_perf_check_rr_buffer() to check for rr-buffer header
size [ David ]
* Removed a redundant check from nvdimm_stat_getval() [ David ]
* Removed a redundant call to address_space_access_valid() in
scm_perf_check_rr_buffer() [ David ]
* Instead of allocating a minimum size local buffer, allocate a max
possible size local rr-buffer. [ David ]
* Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ]
* Updated h_scm_performance_stats() to use a canned-response method
for simplifying num_stats==0 case [ David ].
---
hw/ppc/spapr_nvdimm.c | 222 +++++++++++++++++++++++++++++++++++++++++
include/hw/ppc/spapr.h | 19 +++-
2 files changed, 240 insertions(+), 1 deletion(-)
diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
index 252204e25f..287326b9c0 100644
--- a/hw/ppc/spapr_nvdimm.c
+++ b/hw/ppc/spapr_nvdimm.c
@@ -35,6 +35,19 @@
/* SCM device is unable to persist memory contents */
#define PAPR_PMEM_UNARMED PPC_BIT(0)
+/* Minimum output buffer size needed to return all nvdimm_perf_stats */
+#define SCM_STATS_MIN_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
+ sizeof(struct papr_scm_perf_stat) * \
+ ARRAY_SIZE(nvdimm_perf_stats))
+
+/* Maximum number of stats that we can return back in a single stat request */
+#define SCM_STATS_MAX_STATS 255
+
+/* Maximum possible output buffer to fulfill a perf-stats request */
+#define SCM_STATS_MAX_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
+ sizeof(struct papr_scm_perf_stat) * \
+ SCM_STATS_MAX_STATS)
+
bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
uint64_t size, Error **errp)
{
@@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr,
return H_SUCCESS;
}
+static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val)
+{
+ *val = 0;
+ return H_SUCCESS;
+}
+
+static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t *val)
+{
+ /* Assume full life available of an NVDIMM right now */
+ *val = 100;
+ return H_SUCCESS;
+}
+
+/*
+ * Holds all supported performance stats accessors. Each performance-statistic
+ * is uniquely identified by a 8-byte ascii string for example: 'MemLife '
+ * which indicate in percentage how much usage life of an nvdimm is remaining.
+ * 'NoopStat' which is primarily used to test support for retriving performance
+ * stats and also to replace unknown stats present in the rr-buffer.
+ *
+ */
+static const struct {
+ char stat_id[8];
+ int (*stat_getval)(SpaprDrc *drc, uint64_t id, uint64_t *val);
+} nvdimm_perf_stats[] = {
+ { "NoopStat", perf_stat_noop},
+ { "MemLife ", perf_stat_memlife},
+};
+
+/*
+ * Given a nvdimm drc and stat-name return its value. In case given stat-name
+ * isnt supported then return H_PARTIAL.
+ */
+static int nvdimm_stat_getval(SpaprDrc *drc, uint64_t id, uint64_t *val)
+{
+ int index;
+ char stat_id[8];
+
+ /* since comparision is done */
+ memcpy(&stat_id[0], &id, 8);
+ *val = 0;
+
+ /* Lookup the stats-id in the nvdimm_perf_stats table */
+ for (index = 0; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
+ if (memcmp(nvdimm_perf_stats[index].stat_id, &stat_id[0], 8) == 0) {
+ return nvdimm_perf_stats[index].stat_getval(drc, id, val);
+ }
+ }
+ return H_PARTIAL;
+}
+
+/*
+ * Given a request & result buffer header verify its contents. Also
+ * buffer-size and number of stats requested are within our expected
+ * sane bounds.
+ */
+static int scm_perf_check_rr_buffer(struct papr_scm_perf_stats *header,
+ hwaddr addr, size_t size,
+ uint32_t *num_stats)
+{
+ size_t expected_buffsize;
+
+ /* Verify the header eyecather and version */
+ if (memcmp(&header->eye_catcher, SCM_STATS_EYECATCHER,
+ sizeof(header->eye_catcher))) {
+ return H_BAD_DATA;
+ }
+ if (be32_to_cpu(header->stats_version) != 0x1) {
+ return H_NOT_AVAILABLE;
+ }
+
+ /* verify that rr buffer has enough space */
+ *num_stats = be32_to_cpu(header->num_statistics);
+ if (*num_stats == 0) { /* Return all stats */
+ expected_buffsize = SCM_STATS_MIN_OUTPUT_BUFFER;
+ } else if (*num_stats > SCM_STATS_MAX_STATS) {
+ /* Too many stats requested */
+ return H_P3;
+ } else { /* Return a subset of stats */
+ expected_buffsize = sizeof(struct papr_scm_perf_stats) +
+ (*num_stats) * sizeof(struct papr_scm_perf_stat);
+ }
+
+ if (size < expected_buffsize || size > SCM_STATS_MAX_OUTPUT_BUFFER) {
+ return H_P3;
+ }
+
+ return H_SUCCESS;
+}
+
+/*
+ * For a given DRC index (R3) return one ore more performance stats of an nvdimm
+ * device in guest allocated Request-and-result buffer (rr-buffer) (R4) of
+ * given 'size' (R5). The rr-buffer consists of a header described by
+ * 'struct papr_scm_perf_stats' that embeds the 'stats_version' and
+ * 'num_statistics' fields. This is followed by an array of
+ * 'struct papr_scm_perf_stat'. Based on the request type the writes the
+ * performance into the array of 'struct papr_scm_perf_stat' embedded inside
+ * the rr-buffer provided by the guest.
+ * Special cases handled are:
+ * 'size' == 0 : Return the maximum possible size of rr-buffer
+ * 'size' != 0 && 'num_statistics == 0' : Return all possible performance stats
+ *
+ * In case there was an error fetching a specific stats (e.g stat-id unknown or
+ * any other error) then return the stat-id in R4 and also replace its stat
+ * entry in rr-buffer with 'NoopStat'
+ */
+static target_ulong h_scm_performance_stats(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong opcode,
+ target_ulong *args)
+{
+ SpaprDrc *drc = spapr_drc_by_index(args[0]);
+ const hwaddr addr = args[1];
+ size_t size = args[2];
+ struct papr_scm_perf_stats *perfstats;
+ struct papr_scm_perf_stat *stats;
+ uint64_t invalid_stat = 0, stat_val;
+ MemTxResult result;
+ uint32_t num_stats;
+ unsigned long rc;
+ int index;
+
+ /* Ensure that the drc is valid & is valid PMEM dimm and is plugged in */
+ if (!drc || !drc->dev ||
+ spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
+ return H_PARAMETER;
+ }
+
+ /* Guest requested max buffer size for output buffer */
+ if (size == 0) {
+ args[0] = SCM_STATS_MAX_OUTPUT_BUFFER;
+ return H_SUCCESS;
+ }
+
+ /* verify size is enough to hold rr-buffer header */
+ if (size < sizeof(struct papr_scm_perf_stats)) {
+ return H_BAD_DATA;
+ }
+
+ /* allocate enough buffer space locally for holding max stats */
+ perfstats = g_try_malloc0(SCM_STATS_MAX_OUTPUT_BUFFER);
+ if (!perfstats) {
+ return H_NO_MEM;
+ }
+
+ /* Read and verify rr-buffer header */
+ result = address_space_read(&address_space_memory, addr,
+ MEMTXATTRS_UNSPECIFIED, perfstats,
+ sizeof(struct papr_scm_perf_stats));
+ rc = (result == MEMTX_OK) ?
+ scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) :
+ H_PRIVILEGE;
+ if (rc != H_SUCCESS) {
+ g_free(perfstats);
+ return rc;
+ }
+
+ stats = &perfstats->scm_statistics[0];
+ /* when returning all stats generate a canned response first */
+ if (num_stats == 0) {
+ for (index = 1; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
+ memcpy(&stats[index - 1].statistic_id,
+ &nvdimm_perf_stats[index].stat_id, 8);
+ }
+ num_stats = ARRAY_SIZE(nvdimm_perf_stats) - 1;
+ } else {
+ /* copy the rr-buffer from the guest memory */
+ result = address_space_read(&address_space_memory,
+ addr + sizeof(struct papr_scm_perf_stats),
+ MEMTXATTRS_UNSPECIFIED, stats,
+ size - sizeof(struct papr_scm_perf_stats));
+ if (result != MEMTX_OK) {
+ g_free(perfstats);
+ return H_PRIVILEGE;
+ }
+ }
+
+ for (index = 0; index < num_stats; ++index) {
+ rc = nvdimm_stat_getval(drc, stats[index].statistic_id, &stat_val);
+
+ /* On error add noop stat to rr buffer & save last inval stat-id */
+ if (rc != H_SUCCESS) {
+ if (!invalid_stat) {
+ invalid_stat = stats[index].statistic_id;
+ }
+ memcpy(&stats[index].statistic_id, nvdimm_perf_stats[0].stat_id, 8);
+ }
+ /* Caller expects stat values in BE encoding */
+ stats[index].statistic_value = cpu_to_be64(stat_val);
+ }
+
+ /* Update and copy the local rr-buffer back to guest */
+ perfstats->num_statistics = cpu_to_be32(num_stats);
+ g_assert(size <= SCM_STATS_MAX_OUTPUT_BUFFER);
+ result = address_space_write(&address_space_memory, addr,
+ MEMTXATTRS_UNSPECIFIED, perfstats, size);
+
+ /* Cleanup the stats buffer */
+ g_free(perfstats);
+ if (result) {
+ return H_PRIVILEGE;
+ }
+ /* Check if there was a failure in fetching any stat */
+ args[0] = invalid_stat;
+ return invalid_stat ? H_PARTIAL : H_SUCCESS;
+}
+
static void spapr_scm_register_types(void)
{
/* qemu/scm specific hcalls */
@@ -511,6 +732,7 @@ static void spapr_scm_register_types(void)
spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all);
spapr_register_hypercall(H_SCM_HEALTH, h_scm_health);
+ spapr_register_hypercall(H_SCM_PERFORMANCE_STATS, h_scm_performance_stats);
}
type_init(spapr_scm_register_types)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index d2b5a9bdf9..6f3353b4ea 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -326,6 +326,7 @@ struct SpaprMachineState {
#define H_P8 -61
#define H_P9 -62
#define H_OVERLAP -68
+#define H_BAD_DATA -70
#define H_UNSUPPORTED_FLAG -256
#define H_MULTI_THREADS_ACTIVE -9005
@@ -539,8 +540,9 @@ struct SpaprMachineState {
#define H_SCM_UNBIND_MEM 0x3F0
#define H_SCM_UNBIND_ALL 0x3FC
#define H_SCM_HEALTH 0x400
+#define H_SCM_PERFORMANCE_STATS 0x418
-#define MAX_HCALL_OPCODE H_SCM_HEALTH
+#define MAX_HCALL_OPCODE H_SCM_PERFORMANCE_STATS
/* The hcalls above are standardized in PAPR and implemented by pHyp
* as well.
@@ -787,6 +789,21 @@ OBJECT_DECLARE_SIMPLE_TYPE(SpaprTceTable, SPAPR_TCE_TABLE)
DECLARE_INSTANCE_CHECKER(IOMMUMemoryRegion, SPAPR_IOMMU_MEMORY_REGION,
TYPE_SPAPR_IOMMU_MEMORY_REGION)
+/* Defs and structs exchanged with guest when reporting drc perf stats */
+#define SCM_STATS_EYECATCHER "SCMSTATS"
+
+struct QEMU_PACKED papr_scm_perf_stat {
+ uint64_t statistic_id;
+ uint64_t statistic_value;
+};
+
+struct QEMU_PACKED papr_scm_perf_stats {
+ uint8_t eye_catcher[8]; /* Should be “SCMSTATS” */
+ uint32_t stats_version; /* Should be 0x01 */
+ uint32_t num_statistics; /* Number of stats following */
+ struct papr_scm_perf_stat scm_statistics[]; /* Performance matrics */
+};
+
struct SpaprTceTable {
DeviceState parent;
uint32_t liobn;
--
2.31.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
@ 2021-05-15 7:49 ` Vaibhav Jain
0 siblings, 0 replies; 12+ messages in thread
From: Vaibhav Jain @ 2021-05-15 7:49 UTC (permalink / raw)
To: qemu-devel, kvm-ppc, qemu-ppc, david, mst, imammedo, xiaoguangrong.eric
Cc: ehabkost, aneesh.kumar, groug, shivaprasadbhat, bharata, Vaibhav Jain
Add support for H_SCM_PERFORMANCE_STATS described at [1] for
spapr nvdimms. This enables guest to fetch performance stats[2] like
expected life of an nvdimm ('MemLife ') etc and display them to the
user. Linux kernel support for fetching these performance stats and
exposing them to the user-space was done via [3].
The hcall semantics mandate that each nvdimm performance stats is
uniquely identied by a 8-byte ascii string encoded as an unsigned
integer (e.g 'MemLife ' = 0x4D656D4C69666520) and its value be a
8-byte unsigned integer. These performance-stats are exchanged with
the guest in via a guest allocated buffer called
'requestAndResultBuffer' or rr-buffer for short. This buffer contains
a header descibed by 'struct papr_scm_perf_stats' followed by an array
of performance-stats described by 'struct papr_scm_perf_stat'. The
hypervisor is expected to validate the rr-buffer header and then based
on the request copy the needed performance-stats to the array of
'struct papr_scm_perf_stat' following the header.
The patch proposes a new function h_scm_performance_stats() that
services the H_SCM_PERFORMANCE_STATS hcall. After verifying the
validity of the rr-buffer header via scm_perf_check_rr_buffer() it
proceeds to fill the rr-buffer with requested performance-stats. The
value of individual stats is retrived from individual accessor
function for the stat which are indexed in the array
'nvdimm_perf_stats'.
References:
[1] "Hypercall Op-codes (hcalls)"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269
[2] Sysfs attribute documentation for papr_scm
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36
[3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP"
https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog
Since RFC-v2:
* s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the
minimum size buffer needed to return all supported performance
stats. Value for this is derived from size of array 'nvdimm_perf_stats'
* Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported
rr-buffer size from a guest. Value for this is derived from hard
limit of SCM_STATS_MAX_STATS.
* Updated scm_perf_check_rr_buffer() to add a check for max size of
rr-buffer. [David]
* Updated scm_perf_check_rr_buffer() to removed a reduntant check for
min size of rr-buffer [David]
* Updated h_scm_performance_stats() to have a single allocation for
rr-buffer and copy it back to guest memory in a single shot. [David]
Since RFC-v1:
* Removed empty lines from code [ David ]
* Updated struct papr_scm_perf_stat to use uint64_t for
statistic_id.
* Added a hard limit on max number of stats requested to 255 [ David ]
* Updated scm_perf_check_rr_buffer() to check for rr-buffer header
size [ David ]
* Removed a redundant check from nvdimm_stat_getval() [ David ]
* Removed a redundant call to address_space_access_valid() in
scm_perf_check_rr_buffer() [ David ]
* Instead of allocating a minimum size local buffer, allocate a max
possible size local rr-buffer. [ David ]
* Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ]
* Updated h_scm_performance_stats() to use a canned-response method
for simplifying num_stats=0 case [ David ].
---
hw/ppc/spapr_nvdimm.c | 222 +++++++++++++++++++++++++++++++++++++++++
include/hw/ppc/spapr.h | 19 +++-
2 files changed, 240 insertions(+), 1 deletion(-)
diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
index 252204e25f..287326b9c0 100644
--- a/hw/ppc/spapr_nvdimm.c
+++ b/hw/ppc/spapr_nvdimm.c
@@ -35,6 +35,19 @@
/* SCM device is unable to persist memory contents */
#define PAPR_PMEM_UNARMED PPC_BIT(0)
+/* Minimum output buffer size needed to return all nvdimm_perf_stats */
+#define SCM_STATS_MIN_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
+ sizeof(struct papr_scm_perf_stat) * \
+ ARRAY_SIZE(nvdimm_perf_stats))
+
+/* Maximum number of stats that we can return back in a single stat request */
+#define SCM_STATS_MAX_STATS 255
+
+/* Maximum possible output buffer to fulfill a perf-stats request */
+#define SCM_STATS_MAX_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
+ sizeof(struct papr_scm_perf_stat) * \
+ SCM_STATS_MAX_STATS)
+
bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
uint64_t size, Error **errp)
{
@@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr,
return H_SUCCESS;
}
+static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val)
+{
+ *val = 0;
+ return H_SUCCESS;
+}
+
+static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t *val)
+{
+ /* Assume full life available of an NVDIMM right now */
+ *val = 100;
+ return H_SUCCESS;
+}
+
+/*
+ * Holds all supported performance stats accessors. Each performance-statistic
+ * is uniquely identified by a 8-byte ascii string for example: 'MemLife '
+ * which indicate in percentage how much usage life of an nvdimm is remaining.
+ * 'NoopStat' which is primarily used to test support for retriving performance
+ * stats and also to replace unknown stats present in the rr-buffer.
+ *
+ */
+static const struct {
+ char stat_id[8];
+ int (*stat_getval)(SpaprDrc *drc, uint64_t id, uint64_t *val);
+} nvdimm_perf_stats[] = {
+ { "NoopStat", perf_stat_noop},
+ { "MemLife ", perf_stat_memlife},
+};
+
+/*
+ * Given a nvdimm drc and stat-name return its value. In case given stat-name
+ * isnt supported then return H_PARTIAL.
+ */
+static int nvdimm_stat_getval(SpaprDrc *drc, uint64_t id, uint64_t *val)
+{
+ int index;
+ char stat_id[8];
+
+ /* since comparision is done */
+ memcpy(&stat_id[0], &id, 8);
+ *val = 0;
+
+ /* Lookup the stats-id in the nvdimm_perf_stats table */
+ for (index = 0; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
+ if (memcmp(nvdimm_perf_stats[index].stat_id, &stat_id[0], 8) = 0) {
+ return nvdimm_perf_stats[index].stat_getval(drc, id, val);
+ }
+ }
+ return H_PARTIAL;
+}
+
+/*
+ * Given a request & result buffer header verify its contents. Also
+ * buffer-size and number of stats requested are within our expected
+ * sane bounds.
+ */
+static int scm_perf_check_rr_buffer(struct papr_scm_perf_stats *header,
+ hwaddr addr, size_t size,
+ uint32_t *num_stats)
+{
+ size_t expected_buffsize;
+
+ /* Verify the header eyecather and version */
+ if (memcmp(&header->eye_catcher, SCM_STATS_EYECATCHER,
+ sizeof(header->eye_catcher))) {
+ return H_BAD_DATA;
+ }
+ if (be32_to_cpu(header->stats_version) != 0x1) {
+ return H_NOT_AVAILABLE;
+ }
+
+ /* verify that rr buffer has enough space */
+ *num_stats = be32_to_cpu(header->num_statistics);
+ if (*num_stats = 0) { /* Return all stats */
+ expected_buffsize = SCM_STATS_MIN_OUTPUT_BUFFER;
+ } else if (*num_stats > SCM_STATS_MAX_STATS) {
+ /* Too many stats requested */
+ return H_P3;
+ } else { /* Return a subset of stats */
+ expected_buffsize = sizeof(struct papr_scm_perf_stats) +
+ (*num_stats) * sizeof(struct papr_scm_perf_stat);
+ }
+
+ if (size < expected_buffsize || size > SCM_STATS_MAX_OUTPUT_BUFFER) {
+ return H_P3;
+ }
+
+ return H_SUCCESS;
+}
+
+/*
+ * For a given DRC index (R3) return one ore more performance stats of an nvdimm
+ * device in guest allocated Request-and-result buffer (rr-buffer) (R4) of
+ * given 'size' (R5). The rr-buffer consists of a header described by
+ * 'struct papr_scm_perf_stats' that embeds the 'stats_version' and
+ * 'num_statistics' fields. This is followed by an array of
+ * 'struct papr_scm_perf_stat'. Based on the request type the writes the
+ * performance into the array of 'struct papr_scm_perf_stat' embedded inside
+ * the rr-buffer provided by the guest.
+ * Special cases handled are:
+ * 'size' = 0 : Return the maximum possible size of rr-buffer
+ * 'size' != 0 && 'num_statistics = 0' : Return all possible performance stats
+ *
+ * In case there was an error fetching a specific stats (e.g stat-id unknown or
+ * any other error) then return the stat-id in R4 and also replace its stat
+ * entry in rr-buffer with 'NoopStat'
+ */
+static target_ulong h_scm_performance_stats(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong opcode,
+ target_ulong *args)
+{
+ SpaprDrc *drc = spapr_drc_by_index(args[0]);
+ const hwaddr addr = args[1];
+ size_t size = args[2];
+ struct papr_scm_perf_stats *perfstats;
+ struct papr_scm_perf_stat *stats;
+ uint64_t invalid_stat = 0, stat_val;
+ MemTxResult result;
+ uint32_t num_stats;
+ unsigned long rc;
+ int index;
+
+ /* Ensure that the drc is valid & is valid PMEM dimm and is plugged in */
+ if (!drc || !drc->dev ||
+ spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
+ return H_PARAMETER;
+ }
+
+ /* Guest requested max buffer size for output buffer */
+ if (size = 0) {
+ args[0] = SCM_STATS_MAX_OUTPUT_BUFFER;
+ return H_SUCCESS;
+ }
+
+ /* verify size is enough to hold rr-buffer header */
+ if (size < sizeof(struct papr_scm_perf_stats)) {
+ return H_BAD_DATA;
+ }
+
+ /* allocate enough buffer space locally for holding max stats */
+ perfstats = g_try_malloc0(SCM_STATS_MAX_OUTPUT_BUFFER);
+ if (!perfstats) {
+ return H_NO_MEM;
+ }
+
+ /* Read and verify rr-buffer header */
+ result = address_space_read(&address_space_memory, addr,
+ MEMTXATTRS_UNSPECIFIED, perfstats,
+ sizeof(struct papr_scm_perf_stats));
+ rc = (result = MEMTX_OK) ?
+ scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) :
+ H_PRIVILEGE;
+ if (rc != H_SUCCESS) {
+ g_free(perfstats);
+ return rc;
+ }
+
+ stats = &perfstats->scm_statistics[0];
+ /* when returning all stats generate a canned response first */
+ if (num_stats = 0) {
+ for (index = 1; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
+ memcpy(&stats[index - 1].statistic_id,
+ &nvdimm_perf_stats[index].stat_id, 8);
+ }
+ num_stats = ARRAY_SIZE(nvdimm_perf_stats) - 1;
+ } else {
+ /* copy the rr-buffer from the guest memory */
+ result = address_space_read(&address_space_memory,
+ addr + sizeof(struct papr_scm_perf_stats),
+ MEMTXATTRS_UNSPECIFIED, stats,
+ size - sizeof(struct papr_scm_perf_stats));
+ if (result != MEMTX_OK) {
+ g_free(perfstats);
+ return H_PRIVILEGE;
+ }
+ }
+
+ for (index = 0; index < num_stats; ++index) {
+ rc = nvdimm_stat_getval(drc, stats[index].statistic_id, &stat_val);
+
+ /* On error add noop stat to rr buffer & save last inval stat-id */
+ if (rc != H_SUCCESS) {
+ if (!invalid_stat) {
+ invalid_stat = stats[index].statistic_id;
+ }
+ memcpy(&stats[index].statistic_id, nvdimm_perf_stats[0].stat_id, 8);
+ }
+ /* Caller expects stat values in BE encoding */
+ stats[index].statistic_value = cpu_to_be64(stat_val);
+ }
+
+ /* Update and copy the local rr-buffer back to guest */
+ perfstats->num_statistics = cpu_to_be32(num_stats);
+ g_assert(size <= SCM_STATS_MAX_OUTPUT_BUFFER);
+ result = address_space_write(&address_space_memory, addr,
+ MEMTXATTRS_UNSPECIFIED, perfstats, size);
+
+ /* Cleanup the stats buffer */
+ g_free(perfstats);
+ if (result) {
+ return H_PRIVILEGE;
+ }
+ /* Check if there was a failure in fetching any stat */
+ args[0] = invalid_stat;
+ return invalid_stat ? H_PARTIAL : H_SUCCESS;
+}
+
static void spapr_scm_register_types(void)
{
/* qemu/scm specific hcalls */
@@ -511,6 +732,7 @@ static void spapr_scm_register_types(void)
spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all);
spapr_register_hypercall(H_SCM_HEALTH, h_scm_health);
+ spapr_register_hypercall(H_SCM_PERFORMANCE_STATS, h_scm_performance_stats);
}
type_init(spapr_scm_register_types)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index d2b5a9bdf9..6f3353b4ea 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -326,6 +326,7 @@ struct SpaprMachineState {
#define H_P8 -61
#define H_P9 -62
#define H_OVERLAP -68
+#define H_BAD_DATA -70
#define H_UNSUPPORTED_FLAG -256
#define H_MULTI_THREADS_ACTIVE -9005
@@ -539,8 +540,9 @@ struct SpaprMachineState {
#define H_SCM_UNBIND_MEM 0x3F0
#define H_SCM_UNBIND_ALL 0x3FC
#define H_SCM_HEALTH 0x400
+#define H_SCM_PERFORMANCE_STATS 0x418
-#define MAX_HCALL_OPCODE H_SCM_HEALTH
+#define MAX_HCALL_OPCODE H_SCM_PERFORMANCE_STATS
/* The hcalls above are standardized in PAPR and implemented by pHyp
* as well.
@@ -787,6 +789,21 @@ OBJECT_DECLARE_SIMPLE_TYPE(SpaprTceTable, SPAPR_TCE_TABLE)
DECLARE_INSTANCE_CHECKER(IOMMUMemoryRegion, SPAPR_IOMMU_MEMORY_REGION,
TYPE_SPAPR_IOMMU_MEMORY_REGION)
+/* Defs and structs exchanged with guest when reporting drc perf stats */
+#define SCM_STATS_EYECATCHER "SCMSTATS"
+
+struct QEMU_PACKED papr_scm_perf_stat {
+ uint64_t statistic_id;
+ uint64_t statistic_value;
+};
+
+struct QEMU_PACKED papr_scm_perf_stats {
+ uint8_t eye_catcher[8]; /* Should be “SCMSTATS” */
+ uint32_t stats_version; /* Should be 0x01 */
+ uint32_t num_statistics; /* Number of stats following */
+ struct papr_scm_perf_stat scm_statistics[]; /* Performance matrics */
+};
+
struct SpaprTceTable {
DeviceState parent;
uint32_t liobn;
--
2.31.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
2021-05-15 7:49 ` Vaibhav Jain
@ 2021-05-17 6:23 ` David Gibson
-1 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2021-05-17 6:23 UTC (permalink / raw)
To: Vaibhav Jain
Cc: xiaoguangrong.eric, mst, aneesh.kumar, qemu-devel, kvm-ppc,
groug, shivaprasadbhat, qemu-ppc, bharata, imammedo, ehabkost
[-- Attachment #1: Type: text/plain, Size: 17531 bytes --]
On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote:
> Add support for H_SCM_PERFORMANCE_STATS described at [1] for
> spapr nvdimms. This enables guest to fetch performance stats[2] like
> expected life of an nvdimm ('MemLife ') etc and display them to the
> user. Linux kernel support for fetching these performance stats and
> exposing them to the user-space was done via [3].
>
> The hcall semantics mandate that each nvdimm performance stats is
> uniquely identied by a 8-byte ascii string encoded as an unsigned
> integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a
> 8-byte unsigned integer. These performance-stats are exchanged with
> the guest in via a guest allocated buffer called
> 'requestAndResultBuffer' or rr-buffer for short. This buffer contains
> a header descibed by 'struct papr_scm_perf_stats' followed by an array
> of performance-stats described by 'struct papr_scm_perf_stat'. The
> hypervisor is expected to validate the rr-buffer header and then based
> on the request copy the needed performance-stats to the array of
> 'struct papr_scm_perf_stat' following the header.
>
> The patch proposes a new function h_scm_performance_stats() that
> services the H_SCM_PERFORMANCE_STATS hcall. After verifying the
> validity of the rr-buffer header via scm_perf_check_rr_buffer() it
> proceeds to fill the rr-buffer with requested performance-stats. The
> value of individual stats is retrived from individual accessor
> function for the stat which are indexed in the array
> 'nvdimm_perf_stats'.
>
> References:
> [1] "Hypercall Op-codes (hcalls)"
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269
> [2] Sysfs attribute documentation for papr_scm
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36
> [3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP"
> https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Changelog
>
> Since RFC-v2:
> * s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the
> minimum size buffer needed to return all supported performance
> stats. Value for this is derived from size of array 'nvdimm_perf_stats'
> * Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported
> rr-buffer size from a guest. Value for this is derived from hard
> limit of SCM_STATS_MAX_STATS.
> * Updated scm_perf_check_rr_buffer() to add a check for max size of
> rr-buffer. [David]
> * Updated scm_perf_check_rr_buffer() to removed a reduntant check for
> min size of rr-buffer [David]
> * Updated h_scm_performance_stats() to have a single allocation for
> rr-buffer and copy it back to guest memory in a single shot. [David]
>
> Since RFC-v1:
> * Removed empty lines from code [ David ]
> * Updated struct papr_scm_perf_stat to use uint64_t for
> statistic_id.
> * Added a hard limit on max number of stats requested to 255 [ David ]
> * Updated scm_perf_check_rr_buffer() to check for rr-buffer header
> size [ David ]
> * Removed a redundant check from nvdimm_stat_getval() [ David ]
> * Removed a redundant call to address_space_access_valid() in
> scm_perf_check_rr_buffer() [ David ]
> * Instead of allocating a minimum size local buffer, allocate a max
> possible size local rr-buffer. [ David ]
> * Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ]
> * Updated h_scm_performance_stats() to use a canned-response method
> for simplifying num_stats==0 case [ David ].
> ---
> hw/ppc/spapr_nvdimm.c | 222 +++++++++++++++++++++++++++++++++++++++++
> include/hw/ppc/spapr.h | 19 +++-
> 2 files changed, 240 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
> index 252204e25f..287326b9c0 100644
> --- a/hw/ppc/spapr_nvdimm.c
> +++ b/hw/ppc/spapr_nvdimm.c
> @@ -35,6 +35,19 @@
> /* SCM device is unable to persist memory contents */
> #define PAPR_PMEM_UNARMED PPC_BIT(0)
>
> +/* Minimum output buffer size needed to return all nvdimm_perf_stats */
> +#define SCM_STATS_MIN_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
> + sizeof(struct papr_scm_perf_stat) * \
> + ARRAY_SIZE(nvdimm_perf_stats))
MIN_OUTPUT_BUFFER is a better name, but still not great. I think we
can get rid of this define completely in a neat way, though. See below.
> +/* Maximum number of stats that we can return back in a single stat request */
> +#define SCM_STATS_MAX_STATS 255
Although it's technically allowed, I'm still not convinced there's
actually any reason to allow the user to request the same stat over
and over.
> +/* Maximum possible output buffer to fulfill a perf-stats request */
> +#define SCM_STATS_MAX_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
> + sizeof(struct papr_scm_perf_stat) * \
> + SCM_STATS_MAX_STATS)
> +
> bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
> uint64_t size, Error **errp)
> {
> @@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr,
> return H_SUCCESS;
> }
>
> +static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val)
> +{
> + *val = 0;
> + return H_SUCCESS;
> +}
> +
> +static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t *val)
> +{
> + /* Assume full life available of an NVDIMM right now */
> + *val = 100;
> + return H_SUCCESS;
> +}
> +
> +/*
> + * Holds all supported performance stats accessors. Each performance-statistic
> + * is uniquely identified by a 8-byte ascii string for example: 'MemLife '
> + * which indicate in percentage how much usage life of an nvdimm is remaining.
> + * 'NoopStat' which is primarily used to test support for retriving performance
> + * stats and also to replace unknown stats present in the rr-buffer.
> + *
> + */
> +static const struct {
> + char stat_id[8];
So using a char[] here, but a uint64_t in the request structure makes
it pretty hard to follow if you're doing the right thing
w.r.t. endianness, especially since you effectively memcmp() directly
between u64s and char[]s. You really want to use a consistent type
for the ids.
> + int (*stat_getval)(SpaprDrc *drc, uint64_t id, uint64_t *val);
> +} nvdimm_perf_stats[] = {
> + { "NoopStat", perf_stat_noop},
> + { "MemLife ", perf_stat_memlife},
> +};
> +
> +/*
> + * Given a nvdimm drc and stat-name return its value. In case given stat-name
> + * isnt supported then return H_PARTIAL.
> + */
> +static int nvdimm_stat_getval(SpaprDrc *drc, uint64_t id, uint64_t *val)
> +{
> + int index;
> + char stat_id[8];
> +
> + /* since comparision is done */
> + memcpy(&stat_id[0], &id, 8);
I don't see why you're making this temporary copy here.
> + *val = 0;
> +
> + /* Lookup the stats-id in the nvdimm_perf_stats table */
> + for (index = 0; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
> + if (memcmp(nvdimm_perf_stats[index].stat_id, &stat_id[0], 8) == 0) {
> + return nvdimm_perf_stats[index].stat_getval(drc, id, val);
> + }
> + }
> + return H_PARTIAL;
> +}
> +
> +/*
> + * Given a request & result buffer header verify its contents. Also
> + * buffer-size and number of stats requested are within our expected
> + * sane bounds.
> + */
> +static int scm_perf_check_rr_buffer(struct papr_scm_perf_stats *header,
> + hwaddr addr, size_t size,
> + uint32_t *num_stats)
> +{
> + size_t expected_buffsize;
> +
> + /* Verify the header eyecather and version */
> + if (memcmp(&header->eye_catcher, SCM_STATS_EYECATCHER,
> + sizeof(header->eye_catcher))) {
> + return H_BAD_DATA;
> + }
> + if (be32_to_cpu(header->stats_version) != 0x1) {
> + return H_NOT_AVAILABLE;
> + }
> +
> + /* verify that rr buffer has enough space */
> + *num_stats = be32_to_cpu(header->num_statistics);
> + if (*num_stats == 0) { /* Return all stats */
> + expected_buffsize = SCM_STATS_MIN_OUTPUT_BUFFER;
> + } else if (*num_stats > SCM_STATS_MAX_STATS) {
> + /* Too many stats requested */
> + return H_P3;
I'd recommend testing and exiting on this error case before handling
the all stats case. Disposing of error cases early is more idiomatic.
You can then combine the all stats and n-stats cases a bit more nicely
with something like:
actual_numstats = (*num_stats) ? (*num_stats) : ARRAY_SIZE(...);
Then use the same logic to compute the expected bufsize (min_bufsize
might be a better name) in both cases.
> + } else { /* Return a subset of stats */
> + expected_buffsize = sizeof(struct papr_scm_perf_stats) +
> + (*num_stats) * sizeof(struct papr_scm_perf_stat);
> + }
> +
> + if (size < expected_buffsize || size > SCM_STATS_MAX_OUTPUT_BUFFER) {
> + return H_P3;
I think you can avoid the MAX_OUTPUT_BUFFER check here...
> + }
> +
> + return H_SUCCESS;
> +}
> +
> +/*
> + * For a given DRC index (R3) return one ore more performance stats of an nvdimm
> + * device in guest allocated Request-and-result buffer (rr-buffer) (R4) of
> + * given 'size' (R5). The rr-buffer consists of a header described by
> + * 'struct papr_scm_perf_stats' that embeds the 'stats_version' and
> + * 'num_statistics' fields. This is followed by an array of
> + * 'struct papr_scm_perf_stat'. Based on the request type the writes the
> + * performance into the array of 'struct papr_scm_perf_stat' embedded inside
> + * the rr-buffer provided by the guest.
> + * Special cases handled are:
> + * 'size' == 0 : Return the maximum possible size of rr-buffer
> + * 'size' != 0 && 'num_statistics == 0' : Return all possible performance stats
> + *
> + * In case there was an error fetching a specific stats (e.g stat-id unknown or
> + * any other error) then return the stat-id in R4 and also replace its stat
> + * entry in rr-buffer with 'NoopStat'
> + */
> +static target_ulong h_scm_performance_stats(PowerPCCPU *cpu,
> + SpaprMachineState *spapr,
> + target_ulong opcode,
> + target_ulong *args)
> +{
> + SpaprDrc *drc = spapr_drc_by_index(args[0]);
> + const hwaddr addr = args[1];
> + size_t size = args[2];
> + struct papr_scm_perf_stats *perfstats;
> + struct papr_scm_perf_stat *stats;
> + uint64_t invalid_stat = 0, stat_val;
> + MemTxResult result;
> + uint32_t num_stats;
> + unsigned long rc;
> + int index;
> +
> + /* Ensure that the drc is valid & is valid PMEM dimm and is plugged in */
> + if (!drc || !drc->dev ||
> + spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> + return H_PARAMETER;
> + }
> +
> + /* Guest requested max buffer size for output buffer */
> + if (size == 0) {
> + args[0] = SCM_STATS_MAX_OUTPUT_BUFFER;
> + return H_SUCCESS;
> + }
> +
> + /* verify size is enough to hold rr-buffer header */
> + if (size < sizeof(struct papr_scm_perf_stats)) {
.. if you put it here instead, then you will have dealt with all
obviously bad buffer sizes early.
> + return H_BAD_DATA;
> + }
> +
> + /* allocate enough buffer space locally for holding max stats */
> + perfstats = g_try_malloc0(SCM_STATS_MAX_OUTPUT_BUFFER);
Then you can safely base this malloc on the given size, rather than
always over-allocating.
> + if (!perfstats) {
> + return H_NO_MEM;
> + }
> +
> + /* Read and verify rr-buffer header */
> + result = address_space_read(&address_space_memory, addr,
> + MEMTXATTRS_UNSPECIFIED, perfstats,
> + sizeof(struct papr_scm_perf_stats));
And you can also read the entire thing with a single memory read here.
> + rc = (result == MEMTX_OK) ?
> + scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) :
> + H_PRIVILEGE;
This is a bit cryptic. Just deal with the memtx error first, then run
the buffer validation. Actually, you can unify the exit paths for
these and the success case by using a goto label near the end which
has the g_free() and return rc.
> + if (rc != H_SUCCESS) {
> + g_free(perfstats);
> + return rc;
> + }
> +
> + stats = &perfstats->scm_statistics[0];
> + /* when returning all stats generate a canned response first */
> + if (num_stats == 0) {
> + for (index = 1; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
> + memcpy(&stats[index - 1].statistic_id,
> + &nvdimm_perf_stats[index].stat_id, 8);
> + }
> + num_stats = ARRAY_SIZE(nvdimm_perf_stats) - 1;
> + } else {
> + /* copy the rr-buffer from the guest memory */
> + result = address_space_read(&address_space_memory,
> + addr + sizeof(struct papr_scm_perf_stats),
> + MEMTXATTRS_UNSPECIFIED, stats,
> + size - sizeof(struct papr_scm_perf_stats));
> + if (result != MEMTX_OK) {
> + g_free(perfstats);
> + return H_PRIVILEGE;
> + }
> + }
> +
> + for (index = 0; index < num_stats; ++index) {
> + rc = nvdimm_stat_getval(drc, stats[index].statistic_id, &stat_val);
> +
> + /* On error add noop stat to rr buffer & save last inval stat-id */
> + if (rc != H_SUCCESS) {
> + if (!invalid_stat) {
> + invalid_stat = stats[index].statistic_id;
> + }
> + memcpy(&stats[index].statistic_id, nvdimm_perf_stats[0].stat_id, 8);
> + }
> + /* Caller expects stat values in BE encoding */
> + stats[index].statistic_value = cpu_to_be64(stat_val);
> + }
> +
> + /* Update and copy the local rr-buffer back to guest */
> + perfstats->num_statistics = cpu_to_be32(num_stats);
> + g_assert(size <= SCM_STATS_MAX_OUTPUT_BUFFER);
> + result = address_space_write(&address_space_memory, addr,
> + MEMTXATTRS_UNSPECIFIED, perfstats, size);
> +
> + /* Cleanup the stats buffer */
> + g_free(perfstats);
> + if (result) {
> + return H_PRIVILEGE;
> + }
> + /* Check if there was a failure in fetching any stat */
> + args[0] = invalid_stat;
> + return invalid_stat ? H_PARTIAL : H_SUCCESS;
> +}
> +
> static void spapr_scm_register_types(void)
> {
> /* qemu/scm specific hcalls */
> @@ -511,6 +732,7 @@ static void spapr_scm_register_types(void)
> spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
> spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all);
> spapr_register_hypercall(H_SCM_HEALTH, h_scm_health);
> + spapr_register_hypercall(H_SCM_PERFORMANCE_STATS, h_scm_performance_stats);
> }
>
> type_init(spapr_scm_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index d2b5a9bdf9..6f3353b4ea 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -326,6 +326,7 @@ struct SpaprMachineState {
> #define H_P8 -61
> #define H_P9 -62
> #define H_OVERLAP -68
> +#define H_BAD_DATA -70
> #define H_UNSUPPORTED_FLAG -256
> #define H_MULTI_THREADS_ACTIVE -9005
>
> @@ -539,8 +540,9 @@ struct SpaprMachineState {
> #define H_SCM_UNBIND_MEM 0x3F0
> #define H_SCM_UNBIND_ALL 0x3FC
> #define H_SCM_HEALTH 0x400
> +#define H_SCM_PERFORMANCE_STATS 0x418
>
> -#define MAX_HCALL_OPCODE H_SCM_HEALTH
> +#define MAX_HCALL_OPCODE H_SCM_PERFORMANCE_STATS
>
> /* The hcalls above are standardized in PAPR and implemented by pHyp
> * as well.
> @@ -787,6 +789,21 @@ OBJECT_DECLARE_SIMPLE_TYPE(SpaprTceTable, SPAPR_TCE_TABLE)
> DECLARE_INSTANCE_CHECKER(IOMMUMemoryRegion, SPAPR_IOMMU_MEMORY_REGION,
> TYPE_SPAPR_IOMMU_MEMORY_REGION)
>
> +/* Defs and structs exchanged with guest when reporting drc perf stats */
> +#define SCM_STATS_EYECATCHER "SCMSTATS"
> +
> +struct QEMU_PACKED papr_scm_perf_stat {
> + uint64_t statistic_id;
> + uint64_t statistic_value;
> +};
> +
> +struct QEMU_PACKED papr_scm_perf_stats {
> + uint8_t eye_catcher[8]; /* Should be “SCMSTATS” */
> + uint32_t stats_version; /* Should be 0x01 */
> + uint32_t num_statistics; /* Number of stats following */
> + struct papr_scm_perf_stat scm_statistics[]; /* Performance matrics */
> +};
> +
> struct SpaprTceTable {
> DeviceState parent;
> uint32_t liobn;
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
@ 2021-05-17 6:23 ` David Gibson
0 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2021-05-17 6:23 UTC (permalink / raw)
To: Vaibhav Jain
Cc: xiaoguangrong.eric, mst, aneesh.kumar, qemu-devel, kvm-ppc,
groug, shivaprasadbhat, qemu-ppc, bharata, imammedo, ehabkost
[-- Attachment #1: Type: text/plain, Size: 17531 bytes --]
On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote:
> Add support for H_SCM_PERFORMANCE_STATS described at [1] for
> spapr nvdimms. This enables guest to fetch performance stats[2] like
> expected life of an nvdimm ('MemLife ') etc and display them to the
> user. Linux kernel support for fetching these performance stats and
> exposing them to the user-space was done via [3].
>
> The hcall semantics mandate that each nvdimm performance stats is
> uniquely identied by a 8-byte ascii string encoded as an unsigned
> integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a
> 8-byte unsigned integer. These performance-stats are exchanged with
> the guest in via a guest allocated buffer called
> 'requestAndResultBuffer' or rr-buffer for short. This buffer contains
> a header descibed by 'struct papr_scm_perf_stats' followed by an array
> of performance-stats described by 'struct papr_scm_perf_stat'. The
> hypervisor is expected to validate the rr-buffer header and then based
> on the request copy the needed performance-stats to the array of
> 'struct papr_scm_perf_stat' following the header.
>
> The patch proposes a new function h_scm_performance_stats() that
> services the H_SCM_PERFORMANCE_STATS hcall. After verifying the
> validity of the rr-buffer header via scm_perf_check_rr_buffer() it
> proceeds to fill the rr-buffer with requested performance-stats. The
> value of individual stats is retrived from individual accessor
> function for the stat which are indexed in the array
> 'nvdimm_perf_stats'.
>
> References:
> [1] "Hypercall Op-codes (hcalls)"
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269
> [2] Sysfs attribute documentation for papr_scm
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36
> [3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP"
> https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Changelog
>
> Since RFC-v2:
> * s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the
> minimum size buffer needed to return all supported performance
> stats. Value for this is derived from size of array 'nvdimm_perf_stats'
> * Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported
> rr-buffer size from a guest. Value for this is derived from hard
> limit of SCM_STATS_MAX_STATS.
> * Updated scm_perf_check_rr_buffer() to add a check for max size of
> rr-buffer. [David]
> * Updated scm_perf_check_rr_buffer() to removed a reduntant check for
> min size of rr-buffer [David]
> * Updated h_scm_performance_stats() to have a single allocation for
> rr-buffer and copy it back to guest memory in a single shot. [David]
>
> Since RFC-v1:
> * Removed empty lines from code [ David ]
> * Updated struct papr_scm_perf_stat to use uint64_t for
> statistic_id.
> * Added a hard limit on max number of stats requested to 255 [ David ]
> * Updated scm_perf_check_rr_buffer() to check for rr-buffer header
> size [ David ]
> * Removed a redundant check from nvdimm_stat_getval() [ David ]
> * Removed a redundant call to address_space_access_valid() in
> scm_perf_check_rr_buffer() [ David ]
> * Instead of allocating a minimum size local buffer, allocate a max
> possible size local rr-buffer. [ David ]
> * Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ]
> * Updated h_scm_performance_stats() to use a canned-response method
> for simplifying num_stats==0 case [ David ].
> ---
> hw/ppc/spapr_nvdimm.c | 222 +++++++++++++++++++++++++++++++++++++++++
> include/hw/ppc/spapr.h | 19 +++-
> 2 files changed, 240 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
> index 252204e25f..287326b9c0 100644
> --- a/hw/ppc/spapr_nvdimm.c
> +++ b/hw/ppc/spapr_nvdimm.c
> @@ -35,6 +35,19 @@
> /* SCM device is unable to persist memory contents */
> #define PAPR_PMEM_UNARMED PPC_BIT(0)
>
> +/* Minimum output buffer size needed to return all nvdimm_perf_stats */
> +#define SCM_STATS_MIN_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
> + sizeof(struct papr_scm_perf_stat) * \
> + ARRAY_SIZE(nvdimm_perf_stats))
MIN_OUTPUT_BUFFER is a better name, but still not great. I think we
can get rid of this define completely in a neat way, though. See below.
> +/* Maximum number of stats that we can return back in a single stat request */
> +#define SCM_STATS_MAX_STATS 255
Although it's technically allowed, I'm still not convinced there's
actually any reason to allow the user to request the same stat over
and over.
> +/* Maximum possible output buffer to fulfill a perf-stats request */
> +#define SCM_STATS_MAX_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
> + sizeof(struct papr_scm_perf_stat) * \
> + SCM_STATS_MAX_STATS)
> +
> bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
> uint64_t size, Error **errp)
> {
> @@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr,
> return H_SUCCESS;
> }
>
> +static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val)
> +{
> + *val = 0;
> + return H_SUCCESS;
> +}
> +
> +static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t *val)
> +{
> + /* Assume full life available of an NVDIMM right now */
> + *val = 100;
> + return H_SUCCESS;
> +}
> +
> +/*
> + * Holds all supported performance stats accessors. Each performance-statistic
> + * is uniquely identified by a 8-byte ascii string for example: 'MemLife '
> + * which indicate in percentage how much usage life of an nvdimm is remaining.
> + * 'NoopStat' which is primarily used to test support for retriving performance
> + * stats and also to replace unknown stats present in the rr-buffer.
> + *
> + */
> +static const struct {
> + char stat_id[8];
So using a char[] here, but a uint64_t in the request structure makes
it pretty hard to follow if you're doing the right thing
w.r.t. endianness, especially since you effectively memcmp() directly
between u64s and char[]s. You really want to use a consistent type
for the ids.
> + int (*stat_getval)(SpaprDrc *drc, uint64_t id, uint64_t *val);
> +} nvdimm_perf_stats[] = {
> + { "NoopStat", perf_stat_noop},
> + { "MemLife ", perf_stat_memlife},
> +};
> +
> +/*
> + * Given a nvdimm drc and stat-name return its value. In case given stat-name
> + * isnt supported then return H_PARTIAL.
> + */
> +static int nvdimm_stat_getval(SpaprDrc *drc, uint64_t id, uint64_t *val)
> +{
> + int index;
> + char stat_id[8];
> +
> + /* since comparision is done */
> + memcpy(&stat_id[0], &id, 8);
I don't see why you're making this temporary copy here.
> + *val = 0;
> +
> + /* Lookup the stats-id in the nvdimm_perf_stats table */
> + for (index = 0; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
> + if (memcmp(nvdimm_perf_stats[index].stat_id, &stat_id[0], 8) == 0) {
> + return nvdimm_perf_stats[index].stat_getval(drc, id, val);
> + }
> + }
> + return H_PARTIAL;
> +}
> +
> +/*
> + * Given a request & result buffer header verify its contents. Also
> + * buffer-size and number of stats requested are within our expected
> + * sane bounds.
> + */
> +static int scm_perf_check_rr_buffer(struct papr_scm_perf_stats *header,
> + hwaddr addr, size_t size,
> + uint32_t *num_stats)
> +{
> + size_t expected_buffsize;
> +
> + /* Verify the header eyecather and version */
> + if (memcmp(&header->eye_catcher, SCM_STATS_EYECATCHER,
> + sizeof(header->eye_catcher))) {
> + return H_BAD_DATA;
> + }
> + if (be32_to_cpu(header->stats_version) != 0x1) {
> + return H_NOT_AVAILABLE;
> + }
> +
> + /* verify that rr buffer has enough space */
> + *num_stats = be32_to_cpu(header->num_statistics);
> + if (*num_stats == 0) { /* Return all stats */
> + expected_buffsize = SCM_STATS_MIN_OUTPUT_BUFFER;
> + } else if (*num_stats > SCM_STATS_MAX_STATS) {
> + /* Too many stats requested */
> + return H_P3;
I'd recommend testing and exiting on this error case before handling
the all stats case. Disposing of error cases early is more idiomatic.
You can then combine the all stats and n-stats cases a bit more nicely
with something like:
actual_numstats = (*num_stats) ? (*num_stats) : ARRAY_SIZE(...);
Then use the same logic to compute the expected bufsize (min_bufsize
might be a better name) in both cases.
> + } else { /* Return a subset of stats */
> + expected_buffsize = sizeof(struct papr_scm_perf_stats) +
> + (*num_stats) * sizeof(struct papr_scm_perf_stat);
> + }
> +
> + if (size < expected_buffsize || size > SCM_STATS_MAX_OUTPUT_BUFFER) {
> + return H_P3;
I think you can avoid the MAX_OUTPUT_BUFFER check here...
> + }
> +
> + return H_SUCCESS;
> +}
> +
> +/*
> + * For a given DRC index (R3) return one ore more performance stats of an nvdimm
> + * device in guest allocated Request-and-result buffer (rr-buffer) (R4) of
> + * given 'size' (R5). The rr-buffer consists of a header described by
> + * 'struct papr_scm_perf_stats' that embeds the 'stats_version' and
> + * 'num_statistics' fields. This is followed by an array of
> + * 'struct papr_scm_perf_stat'. Based on the request type the writes the
> + * performance into the array of 'struct papr_scm_perf_stat' embedded inside
> + * the rr-buffer provided by the guest.
> + * Special cases handled are:
> + * 'size' == 0 : Return the maximum possible size of rr-buffer
> + * 'size' != 0 && 'num_statistics == 0' : Return all possible performance stats
> + *
> + * In case there was an error fetching a specific stats (e.g stat-id unknown or
> + * any other error) then return the stat-id in R4 and also replace its stat
> + * entry in rr-buffer with 'NoopStat'
> + */
> +static target_ulong h_scm_performance_stats(PowerPCCPU *cpu,
> + SpaprMachineState *spapr,
> + target_ulong opcode,
> + target_ulong *args)
> +{
> + SpaprDrc *drc = spapr_drc_by_index(args[0]);
> + const hwaddr addr = args[1];
> + size_t size = args[2];
> + struct papr_scm_perf_stats *perfstats;
> + struct papr_scm_perf_stat *stats;
> + uint64_t invalid_stat = 0, stat_val;
> + MemTxResult result;
> + uint32_t num_stats;
> + unsigned long rc;
> + int index;
> +
> + /* Ensure that the drc is valid & is valid PMEM dimm and is plugged in */
> + if (!drc || !drc->dev ||
> + spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> + return H_PARAMETER;
> + }
> +
> + /* Guest requested max buffer size for output buffer */
> + if (size == 0) {
> + args[0] = SCM_STATS_MAX_OUTPUT_BUFFER;
> + return H_SUCCESS;
> + }
> +
> + /* verify size is enough to hold rr-buffer header */
> + if (size < sizeof(struct papr_scm_perf_stats)) {
.. if you put it here instead, then you will have dealt with all
obviously bad buffer sizes early.
> + return H_BAD_DATA;
> + }
> +
> + /* allocate enough buffer space locally for holding max stats */
> + perfstats = g_try_malloc0(SCM_STATS_MAX_OUTPUT_BUFFER);
Then you can safely base this malloc on the given size, rather than
always over-allocating.
> + if (!perfstats) {
> + return H_NO_MEM;
> + }
> +
> + /* Read and verify rr-buffer header */
> + result = address_space_read(&address_space_memory, addr,
> + MEMTXATTRS_UNSPECIFIED, perfstats,
> + sizeof(struct papr_scm_perf_stats));
And you can also read the entire thing with a single memory read here.
> + rc = (result == MEMTX_OK) ?
> + scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) :
> + H_PRIVILEGE;
This is a bit cryptic. Just deal with the memtx error first, then run
the buffer validation. Actually, you can unify the exit paths for
these and the success case by using a goto label near the end which
has the g_free() and return rc.
> + if (rc != H_SUCCESS) {
> + g_free(perfstats);
> + return rc;
> + }
> +
> + stats = &perfstats->scm_statistics[0];
> + /* when returning all stats generate a canned response first */
> + if (num_stats == 0) {
> + for (index = 1; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
> + memcpy(&stats[index - 1].statistic_id,
> + &nvdimm_perf_stats[index].stat_id, 8);
> + }
> + num_stats = ARRAY_SIZE(nvdimm_perf_stats) - 1;
> + } else {
> + /* copy the rr-buffer from the guest memory */
> + result = address_space_read(&address_space_memory,
> + addr + sizeof(struct papr_scm_perf_stats),
> + MEMTXATTRS_UNSPECIFIED, stats,
> + size - sizeof(struct papr_scm_perf_stats));
> + if (result != MEMTX_OK) {
> + g_free(perfstats);
> + return H_PRIVILEGE;
> + }
> + }
> +
> + for (index = 0; index < num_stats; ++index) {
> + rc = nvdimm_stat_getval(drc, stats[index].statistic_id, &stat_val);
> +
> + /* On error add noop stat to rr buffer & save last inval stat-id */
> + if (rc != H_SUCCESS) {
> + if (!invalid_stat) {
> + invalid_stat = stats[index].statistic_id;
> + }
> + memcpy(&stats[index].statistic_id, nvdimm_perf_stats[0].stat_id, 8);
> + }
> + /* Caller expects stat values in BE encoding */
> + stats[index].statistic_value = cpu_to_be64(stat_val);
> + }
> +
> + /* Update and copy the local rr-buffer back to guest */
> + perfstats->num_statistics = cpu_to_be32(num_stats);
> + g_assert(size <= SCM_STATS_MAX_OUTPUT_BUFFER);
> + result = address_space_write(&address_space_memory, addr,
> + MEMTXATTRS_UNSPECIFIED, perfstats, size);
> +
> + /* Cleanup the stats buffer */
> + g_free(perfstats);
> + if (result) {
> + return H_PRIVILEGE;
> + }
> + /* Check if there was a failure in fetching any stat */
> + args[0] = invalid_stat;
> + return invalid_stat ? H_PARTIAL : H_SUCCESS;
> +}
> +
> static void spapr_scm_register_types(void)
> {
> /* qemu/scm specific hcalls */
> @@ -511,6 +732,7 @@ static void spapr_scm_register_types(void)
> spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
> spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all);
> spapr_register_hypercall(H_SCM_HEALTH, h_scm_health);
> + spapr_register_hypercall(H_SCM_PERFORMANCE_STATS, h_scm_performance_stats);
> }
>
> type_init(spapr_scm_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index d2b5a9bdf9..6f3353b4ea 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -326,6 +326,7 @@ struct SpaprMachineState {
> #define H_P8 -61
> #define H_P9 -62
> #define H_OVERLAP -68
> +#define H_BAD_DATA -70
> #define H_UNSUPPORTED_FLAG -256
> #define H_MULTI_THREADS_ACTIVE -9005
>
> @@ -539,8 +540,9 @@ struct SpaprMachineState {
> #define H_SCM_UNBIND_MEM 0x3F0
> #define H_SCM_UNBIND_ALL 0x3FC
> #define H_SCM_HEALTH 0x400
> +#define H_SCM_PERFORMANCE_STATS 0x418
>
> -#define MAX_HCALL_OPCODE H_SCM_HEALTH
> +#define MAX_HCALL_OPCODE H_SCM_PERFORMANCE_STATS
>
> /* The hcalls above are standardized in PAPR and implemented by pHyp
> * as well.
> @@ -787,6 +789,21 @@ OBJECT_DECLARE_SIMPLE_TYPE(SpaprTceTable, SPAPR_TCE_TABLE)
> DECLARE_INSTANCE_CHECKER(IOMMUMemoryRegion, SPAPR_IOMMU_MEMORY_REGION,
> TYPE_SPAPR_IOMMU_MEMORY_REGION)
>
> +/* Defs and structs exchanged with guest when reporting drc perf stats */
> +#define SCM_STATS_EYECATCHER "SCMSTATS"
> +
> +struct QEMU_PACKED papr_scm_perf_stat {
> + uint64_t statistic_id;
> + uint64_t statistic_value;
> +};
> +
> +struct QEMU_PACKED papr_scm_perf_stats {
> + uint8_t eye_catcher[8]; /* Should be “SCMSTATS” */
> + uint32_t stats_version; /* Should be 0x01 */
> + uint32_t num_statistics; /* Number of stats following */
> + struct papr_scm_perf_stat scm_statistics[]; /* Performance matrics */
> +};
> +
> struct SpaprTceTable {
> DeviceState parent;
> uint32_t liobn;
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
2021-05-17 6:23 ` David Gibson
@ 2021-05-17 7:55 ` Greg Kurz
-1 siblings, 0 replies; 12+ messages in thread
From: Greg Kurz @ 2021-05-17 7:55 UTC (permalink / raw)
To: David Gibson
Cc: xiaoguangrong.eric, mst, aneesh.kumar, bharata, qemu-devel,
kvm-ppc, shivaprasadbhat, qemu-ppc, imammedo, Vaibhav Jain,
ehabkost
[-- Attachment #1: Type: text/plain, Size: 657 bytes --]
On Mon, 17 May 2021 16:23:56 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote:
[...]
> > + rc = (result == MEMTX_OK) ?
> > + scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) :
> > + H_PRIVILEGE;
>
> This is a bit cryptic. Just deal with the memtx error first, then run
> the buffer validation. Actually, you can unify the exit paths for
> these and the success case by using a goto label near the end which
> has the g_free() and return rc.
>
It seems all the g_free() calls could even be avoided by
converting perfstats to g_autofree.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
@ 2021-05-17 7:55 ` Greg Kurz
0 siblings, 0 replies; 12+ messages in thread
From: Greg Kurz @ 2021-05-17 7:55 UTC (permalink / raw)
To: David Gibson
Cc: xiaoguangrong.eric, mst, aneesh.kumar, bharata, qemu-devel,
kvm-ppc, shivaprasadbhat, qemu-ppc, imammedo, Vaibhav Jain,
ehabkost
[-- Attachment #1: Type: text/plain, Size: 657 bytes --]
On Mon, 17 May 2021 16:23:56 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote:
[...]
> > + rc = (result == MEMTX_OK) ?
> > + scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) :
> > + H_PRIVILEGE;
>
> This is a bit cryptic. Just deal with the memtx error first, then run
> the buffer validation. Actually, you can unify the exit paths for
> these and the success case by using a goto label near the end which
> has the g_free() and return rc.
>
It seems all the g_free() calls could even be avoided by
converting perfstats to g_autofree.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
2021-05-17 7:55 ` Greg Kurz
@ 2021-05-17 8:16 ` David Gibson
-1 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2021-05-17 8:16 UTC (permalink / raw)
To: Greg Kurz
Cc: xiaoguangrong.eric, mst, aneesh.kumar, bharata, qemu-devel,
kvm-ppc, shivaprasadbhat, qemu-ppc, imammedo, Vaibhav Jain,
ehabkost
[-- Attachment #1: Type: text/plain, Size: 978 bytes --]
On Mon, May 17, 2021 at 09:55:31AM +0200, Greg Kurz wrote:
> On Mon, 17 May 2021 16:23:56 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote:
> [...]
> > > + rc = (result == MEMTX_OK) ?
> > > + scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) :
> > > + H_PRIVILEGE;
> >
> > This is a bit cryptic. Just deal with the memtx error first, then run
> > the buffer validation. Actually, you can unify the exit paths for
> > these and the success case by using a goto label near the end which
> > has the g_free() and return rc.
> >
>
> It seems all the g_free() calls could even be avoided by
> converting perfstats to g_autofree.
That's an even better idea.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
@ 2021-05-17 8:16 ` David Gibson
0 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2021-05-17 8:16 UTC (permalink / raw)
To: Greg Kurz
Cc: xiaoguangrong.eric, mst, aneesh.kumar, bharata, qemu-devel,
kvm-ppc, shivaprasadbhat, qemu-ppc, imammedo, Vaibhav Jain,
ehabkost
[-- Attachment #1: Type: text/plain, Size: 978 bytes --]
On Mon, May 17, 2021 at 09:55:31AM +0200, Greg Kurz wrote:
> On Mon, 17 May 2021 16:23:56 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote:
> [...]
> > > + rc = (result == MEMTX_OK) ?
> > > + scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) :
> > > + H_PRIVILEGE;
> >
> > This is a bit cryptic. Just deal with the memtx error first, then run
> > the buffer validation. Actually, you can unify the exit paths for
> > these and the success case by using a goto label near the end which
> > has the g_free() and return rc.
> >
>
> It seems all the g_free() calls could even be avoided by
> converting perfstats to g_autofree.
That's an even better idea.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
2021-05-17 6:23 ` David Gibson
@ 2021-05-22 3:43 ` Vaibhav Jain
-1 siblings, 0 replies; 12+ messages in thread
From: Vaibhav Jain @ 2021-05-22 3:31 UTC (permalink / raw)
To: David Gibson
Cc: xiaoguangrong.eric, mst, aneesh.kumar, qemu-devel, kvm-ppc,
groug, shivaprasadbhat, qemu-ppc, bharata, imammedo, ehabkost
Thanks for looking into this patch David and Groug,
David Gibson <david@gibson.dropbear.id.au> writes:
> On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote:
>> Add support for H_SCM_PERFORMANCE_STATS described at [1] for
>> spapr nvdimms. This enables guest to fetch performance stats[2] like
>> expected life of an nvdimm ('MemLife ') etc and display them to the
>> user. Linux kernel support for fetching these performance stats and
>> exposing them to the user-space was done via [3].
>>
>> The hcall semantics mandate that each nvdimm performance stats is
>> uniquely identied by a 8-byte ascii string encoded as an unsigned
>> integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a
>> 8-byte unsigned integer. These performance-stats are exchanged with
>> the guest in via a guest allocated buffer called
>> 'requestAndResultBuffer' or rr-buffer for short. This buffer contains
>> a header descibed by 'struct papr_scm_perf_stats' followed by an array
>> of performance-stats described by 'struct papr_scm_perf_stat'. The
>> hypervisor is expected to validate the rr-buffer header and then based
>> on the request copy the needed performance-stats to the array of
>> 'struct papr_scm_perf_stat' following the header.
>>
>> The patch proposes a new function h_scm_performance_stats() that
>> services the H_SCM_PERFORMANCE_STATS hcall. After verifying the
>> validity of the rr-buffer header via scm_perf_check_rr_buffer() it
>> proceeds to fill the rr-buffer with requested performance-stats. The
>> value of individual stats is retrived from individual accessor
>> function for the stat which are indexed in the array
>> 'nvdimm_perf_stats'.
>>
>> References:
>> [1] "Hypercall Op-codes (hcalls)"
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269
>> [2] Sysfs attribute documentation for papr_scm
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36
>> [3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP"
>> https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
>>
>> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
>> ---
>> Changelog
>>
>> Since RFC-v2:
>> * s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the
>> minimum size buffer needed to return all supported performance
>> stats. Value for this is derived from size of array 'nvdimm_perf_stats'
>> * Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported
>> rr-buffer size from a guest. Value for this is derived from hard
>> limit of SCM_STATS_MAX_STATS.
>> * Updated scm_perf_check_rr_buffer() to add a check for max size of
>> rr-buffer. [David]
>> * Updated scm_perf_check_rr_buffer() to removed a reduntant check for
>> min size of rr-buffer [David]
>> * Updated h_scm_performance_stats() to have a single allocation for
>> rr-buffer and copy it back to guest memory in a single shot. [David]
>>
>> Since RFC-v1:
>> * Removed empty lines from code [ David ]
>> * Updated struct papr_scm_perf_stat to use uint64_t for
>> statistic_id.
>> * Added a hard limit on max number of stats requested to 255 [ David ]
>> * Updated scm_perf_check_rr_buffer() to check for rr-buffer header
>> size [ David ]
>> * Removed a redundant check from nvdimm_stat_getval() [ David ]
>> * Removed a redundant call to address_space_access_valid() in
>> scm_perf_check_rr_buffer() [ David ]
>> * Instead of allocating a minimum size local buffer, allocate a max
>> possible size local rr-buffer. [ David ]
>> * Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ]
>> * Updated h_scm_performance_stats() to use a canned-response method
>> for simplifying num_stats==0 case [ David ].
>> ---
>> hw/ppc/spapr_nvdimm.c | 222 +++++++++++++++++++++++++++++++++++++++++
>> include/hw/ppc/spapr.h | 19 +++-
>> 2 files changed, 240 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
>> index 252204e25f..287326b9c0 100644
>> --- a/hw/ppc/spapr_nvdimm.c
>> +++ b/hw/ppc/spapr_nvdimm.c
>> @@ -35,6 +35,19 @@
>> /* SCM device is unable to persist memory contents */
>> #define PAPR_PMEM_UNARMED PPC_BIT(0)
>>
>> +/* Minimum output buffer size needed to return all nvdimm_perf_stats */
>> +#define SCM_STATS_MIN_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
>> + sizeof(struct papr_scm_perf_stat) * \
>> + ARRAY_SIZE(nvdimm_perf_stats))
>
> MIN_OUTPUT_BUFFER is a better name, but still not great. I think we
> can get rid of this define completely in a neat way, though. See below.
>
>
Not sure how we can get rid of it since we still need to advertise to
the guest how much rr-buffer size we expect to return all
perf-stats. Sorry but I didnt make out of any suggestions below that
could get rid of this define.
>> +/* Maximum number of stats that we can return back in a single stat request */
>> +#define SCM_STATS_MAX_STATS 255
>
> Although it's technically allowed, I'm still not convinced there's
> actually any reason to allow the user to request the same stat over
> and over.
>
Matching the PowerVM behaviour here which doesnt enforce any limitations
on the how many times a single perf-stat can appear in rr-buffer.
>> +/* Maximum possible output buffer to fulfill a perf-stats request */
>> +#define SCM_STATS_MAX_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
>> + sizeof(struct papr_scm_perf_stat) * \
>> + SCM_STATS_MAX_STATS)
>> +
>> bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
>> uint64_t size, Error **errp)
>> {
>> @@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr,
>> return H_SUCCESS;
>> }
>>
>> +static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val)
>> +{
>> + *val = 0;
>> + return H_SUCCESS;
>> +}
>> +
>> +static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t *val)
>> +{
>> + /* Assume full life available of an NVDIMM right now */
>> + *val = 100;
>> + return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * Holds all supported performance stats accessors. Each performance-statistic
>> + * is uniquely identified by a 8-byte ascii string for example: 'MemLife '
>> + * which indicate in percentage how much usage life of an nvdimm is remaining.
>> + * 'NoopStat' which is primarily used to test support for retriving performance
>> + * stats and also to replace unknown stats present in the rr-buffer.
>> + *
>> + */
>> +static const struct {
>> + char stat_id[8];
>
> So using a char[] here, but a uint64_t in the request structure makes
> it pretty hard to follow if you're doing the right thing
> w.r.t. endianness, especially since you effectively memcmp() directly
> between u64s and char[]s. You really want to use a consistent type
> for the ids.
>
Though the PAPR-SCM defines stat-ids as u64 they are essentially 8-byte
ascii strings encoded in a u64. The guest kernel and this proposed qemu
patch doesnt do any math operations on them which might be effected by
their endianess.
The switch from u64->char[8] is done only for a more convinent
ASCII representation stats-ids in nvdimm_pref_stats[].
>> + int (*stat_getval)(SpaprDrc *drc, uint64_t id, uint64_t *val);
>> +} nvdimm_perf_stats[] = {
>> + { "NoopStat", perf_stat_noop},
>> + { "MemLife ", perf_stat_memlife},
>> +};
>> +
>> +/*
>> + * Given a nvdimm drc and stat-name return its value. In case given stat-name
>> + * isnt supported then return H_PARTIAL.
>> + */
>> +static int nvdimm_stat_getval(SpaprDrc *drc, uint64_t id, uint64_t *val)
>> +{
>> + int index;
>> + char stat_id[8];
>> +
>> + /* since comparision is done */
>> + memcpy(&stat_id[0], &id, 8);
>
> I don't see why you're making this temporary copy here.
>
Agree, removed this in next iteration of this patch.
>> + *val = 0;
>> +
>> + /* Lookup the stats-id in the nvdimm_perf_stats table */
>> + for (index = 0; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
>> + if (memcmp(nvdimm_perf_stats[index].stat_id, &stat_id[0], 8) == 0) {
>> + return nvdimm_perf_stats[index].stat_getval(drc, id, val);
>> + }
>> + }
>> + return H_PARTIAL;
>> +}
>> +
>> +/*
>> + * Given a request & result buffer header verify its contents. Also
>> + * buffer-size and number of stats requested are within our expected
>> + * sane bounds.
>> + */
>> +static int scm_perf_check_rr_buffer(struct papr_scm_perf_stats *header,
>> + hwaddr addr, size_t size,
>> + uint32_t *num_stats)
>> +{
>> + size_t expected_buffsize;
>> +
>> + /* Verify the header eyecather and version */
>> + if (memcmp(&header->eye_catcher, SCM_STATS_EYECATCHER,
>> + sizeof(header->eye_catcher))) {
>> + return H_BAD_DATA;
>> + }
>> + if (be32_to_cpu(header->stats_version) != 0x1) {
>> + return H_NOT_AVAILABLE;
>> + }
>> +
>> + /* verify that rr buffer has enough space */
>> + *num_stats = be32_to_cpu(header->num_statistics);
>> + if (*num_stats == 0) { /* Return all stats */
>> + expected_buffsize = SCM_STATS_MIN_OUTPUT_BUFFER;
>> + } else if (*num_stats > SCM_STATS_MAX_STATS) {
>> + /* Too many stats requested */
>> + return H_P3;
>
> I'd recommend testing and exiting on this error case before handling
> the all stats case. Disposing of error cases early is more idiomatic.
>
> You can then combine the all stats and n-stats cases a bit more nicely
> with something like:
> actual_numstats = (*num_stats) ? (*num_stats) : ARRAY_SIZE(...);
>
> Then use the same logic to compute the expected bufsize (min_bufsize
> might be a better name) in both cases.
>
>
Agree, have done the change you suggested in next iteration.
>> + } else { /* Return a subset of stats */
>> + expected_buffsize = sizeof(struct papr_scm_perf_stats) +
>> + (*num_stats) * sizeof(struct papr_scm_perf_stat);
>> + }
>> +
>> + if (size < expected_buffsize || size > SCM_STATS_MAX_OUTPUT_BUFFER) {
>> + return H_P3;
>
> I think you can avoid the MAX_OUTPUT_BUFFER check here...
>
Yes, moved the check to h_scm_performance_stats() in next iteration.
>> + }
>> +
>> + return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * For a given DRC index (R3) return one ore more performance stats of an nvdimm
>> + * device in guest allocated Request-and-result buffer (rr-buffer) (R4) of
>> + * given 'size' (R5). The rr-buffer consists of a header described by
>> + * 'struct papr_scm_perf_stats' that embeds the 'stats_version' and
>> + * 'num_statistics' fields. This is followed by an array of
>> + * 'struct papr_scm_perf_stat'. Based on the request type the writes the
>> + * performance into the array of 'struct papr_scm_perf_stat' embedded inside
>> + * the rr-buffer provided by the guest.
>> + * Special cases handled are:
>> + * 'size' == 0 : Return the maximum possible size of rr-buffer
>> + * 'size' != 0 && 'num_statistics == 0' : Return all possible performance stats
>> + *
>> + * In case there was an error fetching a specific stats (e.g stat-id unknown or
>> + * any other error) then return the stat-id in R4 and also replace its stat
>> + * entry in rr-buffer with 'NoopStat'
>> + */
>> +static target_ulong h_scm_performance_stats(PowerPCCPU *cpu,
>> + SpaprMachineState *spapr,
>> + target_ulong opcode,
>> + target_ulong *args)
>> +{
>> + SpaprDrc *drc = spapr_drc_by_index(args[0]);
>> + const hwaddr addr = args[1];
>> + size_t size = args[2];
>> + struct papr_scm_perf_stats *perfstats;
>> + struct papr_scm_perf_stat *stats;
>> + uint64_t invalid_stat = 0, stat_val;
>> + MemTxResult result;
>> + uint32_t num_stats;
>> + unsigned long rc;
>> + int index;
>> +
>> + /* Ensure that the drc is valid & is valid PMEM dimm and is plugged in */
>> + if (!drc || !drc->dev ||
>> + spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
>> + return H_PARAMETER;
>> + }
>> +
>> + /* Guest requested max buffer size for output buffer */
>> + if (size == 0) {
>> + args[0] = SCM_STATS_MAX_OUTPUT_BUFFER;
>> + return H_SUCCESS;
>> + }
>> +
>> + /* verify size is enough to hold rr-buffer header */
>> + if (size < sizeof(struct papr_scm_perf_stats)) {
>
>
> .. if you put it here instead, then you will have dealt with all
> obviously bad buffer sizes early.
>
Agree
>> + return H_BAD_DATA;
>> + }
>> +
>> + /* allocate enough buffer space locally for holding max stats */
>> + perfstats = g_try_malloc0(SCM_STATS_MAX_OUTPUT_BUFFER);
>
> Then you can safely base this malloc on the given size, rather than
> always over-allocating.
>
Right, have updated this in v4
>> + if (!perfstats) {
>> + return H_NO_MEM;
>> + }
>> +
>> + /* Read and verify rr-buffer header */
>> + result = address_space_read(&address_space_memory, addr,
>> + MEMTXATTRS_UNSPECIFIED, perfstats,
>> + sizeof(struct papr_scm_perf_stats));
>
> And you can also read the entire thing with a single memory read here.
>
Yes agree. Addressed this in v4.
>> + rc = (result == MEMTX_OK) ?
>> + scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) :
>> + H_PRIVILEGE;
>
> This is a bit cryptic. Just deal with the memtx error first, then run
> the buffer validation. Actually, you can unify the exit paths for
> these and the success case by using a goto label near the end which
> has the g_free() and return rc.
>
Sure, addressed this in v4 by using g_autofree
>> + if (rc != H_SUCCESS) {
>> + g_free(perfstats);
>> + return rc;
>> + }
>> +
>> + stats = &perfstats->scm_statistics[0];
>> + /* when returning all stats generate a canned response first */
>> + if (num_stats == 0) {
>> + for (index = 1; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
>> + memcpy(&stats[index - 1].statistic_id,
>> + &nvdimm_perf_stats[index].stat_id, 8);
>> + }
>> + num_stats = ARRAY_SIZE(nvdimm_perf_stats) - 1;
>> + } else {
>> + /* copy the rr-buffer from the guest memory */
>> + result = address_space_read(&address_space_memory,
>> + addr + sizeof(struct papr_scm_perf_stats),
>> + MEMTXATTRS_UNSPECIFIED, stats,
>> + size - sizeof(struct papr_scm_perf_stats));
>> + if (result != MEMTX_OK) {
>> + g_free(perfstats);
>> + return H_PRIVILEGE;
>> + }
>> + }
>> +
>> + for (index = 0; index < num_stats; ++index) {
>> + rc = nvdimm_stat_getval(drc, stats[index].statistic_id, &stat_val);
>> +
>> + /* On error add noop stat to rr buffer & save last inval stat-id */
>> + if (rc != H_SUCCESS) {
>> + if (!invalid_stat) {
>> + invalid_stat = stats[index].statistic_id;
>> + }
>> + memcpy(&stats[index].statistic_id, nvdimm_perf_stats[0].stat_id, 8);
>> + }
>> + /* Caller expects stat values in BE encoding */
>> + stats[index].statistic_value = cpu_to_be64(stat_val);
>> + }
>> +
>> + /* Update and copy the local rr-buffer back to guest */
>> + perfstats->num_statistics = cpu_to_be32(num_stats);
>> + g_assert(size <= SCM_STATS_MAX_OUTPUT_BUFFER);
>> + result = address_space_write(&address_space_memory, addr,
>> + MEMTXATTRS_UNSPECIFIED, perfstats, size);
>> +
>> + /* Cleanup the stats buffer */
>> + g_free(perfstats);
>> + if (result) {
>> + return H_PRIVILEGE;
>> + }
>> + /* Check if there was a failure in fetching any stat */
>> + args[0] = invalid_stat;
>> + return invalid_stat ? H_PARTIAL : H_SUCCESS;
>> +}
>> +
>> static void spapr_scm_register_types(void)
>> {
>> /* qemu/scm specific hcalls */
>> @@ -511,6 +732,7 @@ static void spapr_scm_register_types(void)
>> spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
>> spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all);
>> spapr_register_hypercall(H_SCM_HEALTH, h_scm_health);
>> + spapr_register_hypercall(H_SCM_PERFORMANCE_STATS, h_scm_performance_stats);
>> }
>>
>> type_init(spapr_scm_register_types)
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index d2b5a9bdf9..6f3353b4ea 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -326,6 +326,7 @@ struct SpaprMachineState {
>> #define H_P8 -61
>> #define H_P9 -62
>> #define H_OVERLAP -68
>> +#define H_BAD_DATA -70
>> #define H_UNSUPPORTED_FLAG -256
>> #define H_MULTI_THREADS_ACTIVE -9005
>>
>> @@ -539,8 +540,9 @@ struct SpaprMachineState {
>> #define H_SCM_UNBIND_MEM 0x3F0
>> #define H_SCM_UNBIND_ALL 0x3FC
>> #define H_SCM_HEALTH 0x400
>> +#define H_SCM_PERFORMANCE_STATS 0x418
>>
>> -#define MAX_HCALL_OPCODE H_SCM_HEALTH
>> +#define MAX_HCALL_OPCODE H_SCM_PERFORMANCE_STATS
>>
>> /* The hcalls above are standardized in PAPR and implemented by pHyp
>> * as well.
>> @@ -787,6 +789,21 @@ OBJECT_DECLARE_SIMPLE_TYPE(SpaprTceTable, SPAPR_TCE_TABLE)
>> DECLARE_INSTANCE_CHECKER(IOMMUMemoryRegion, SPAPR_IOMMU_MEMORY_REGION,
>> TYPE_SPAPR_IOMMU_MEMORY_REGION)
>>
>> +/* Defs and structs exchanged with guest when reporting drc perf stats */
>> +#define SCM_STATS_EYECATCHER "SCMSTATS"
>> +
>> +struct QEMU_PACKED papr_scm_perf_stat {
>> + uint64_t statistic_id;
>> + uint64_t statistic_value;
>> +};
>> +
>> +struct QEMU_PACKED papr_scm_perf_stats {
>> + uint8_t eye_catcher[8]; /* Should be “SCMSTATS” */
>> + uint32_t stats_version; /* Should be 0x01 */
>> + uint32_t num_statistics; /* Number of stats following */
>> + struct papr_scm_perf_stat scm_statistics[]; /* Performance matrics */
>> +};
>> +
>> struct SpaprTceTable {
>> DeviceState parent;
>> uint32_t liobn;
>
> --
> David Gibson | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
> | _way_ _around_!
> http://www.ozlabs.org/~dgibson
--
Cheers
~ Vaibhav
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
@ 2021-05-22 3:43 ` Vaibhav Jain
0 siblings, 0 replies; 12+ messages in thread
From: Vaibhav Jain @ 2021-05-22 3:43 UTC (permalink / raw)
To: David Gibson
Cc: xiaoguangrong.eric, mst, aneesh.kumar, qemu-devel, kvm-ppc,
groug, shivaprasadbhat, qemu-ppc, bharata, imammedo, ehabkost
Thanks for looking into this patch David and Groug,
David Gibson <david@gibson.dropbear.id.au> writes:
> On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote:
>> Add support for H_SCM_PERFORMANCE_STATS described at [1] for
>> spapr nvdimms. This enables guest to fetch performance stats[2] like
>> expected life of an nvdimm ('MemLife ') etc and display them to the
>> user. Linux kernel support for fetching these performance stats and
>> exposing them to the user-space was done via [3].
>>
>> The hcall semantics mandate that each nvdimm performance stats is
>> uniquely identied by a 8-byte ascii string encoded as an unsigned
>> integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a
>> 8-byte unsigned integer. These performance-stats are exchanged with
>> the guest in via a guest allocated buffer called
>> 'requestAndResultBuffer' or rr-buffer for short. This buffer contains
>> a header descibed by 'struct papr_scm_perf_stats' followed by an array
>> of performance-stats described by 'struct papr_scm_perf_stat'. The
>> hypervisor is expected to validate the rr-buffer header and then based
>> on the request copy the needed performance-stats to the array of
>> 'struct papr_scm_perf_stat' following the header.
>>
>> The patch proposes a new function h_scm_performance_stats() that
>> services the H_SCM_PERFORMANCE_STATS hcall. After verifying the
>> validity of the rr-buffer header via scm_perf_check_rr_buffer() it
>> proceeds to fill the rr-buffer with requested performance-stats. The
>> value of individual stats is retrived from individual accessor
>> function for the stat which are indexed in the array
>> 'nvdimm_perf_stats'.
>>
>> References:
>> [1] "Hypercall Op-codes (hcalls)"
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269
>> [2] Sysfs attribute documentation for papr_scm
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36
>> [3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP"
>> https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
>>
>> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
>> ---
>> Changelog
>>
>> Since RFC-v2:
>> * s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the
>> minimum size buffer needed to return all supported performance
>> stats. Value for this is derived from size of array 'nvdimm_perf_stats'
>> * Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported
>> rr-buffer size from a guest. Value for this is derived from hard
>> limit of SCM_STATS_MAX_STATS.
>> * Updated scm_perf_check_rr_buffer() to add a check for max size of
>> rr-buffer. [David]
>> * Updated scm_perf_check_rr_buffer() to removed a reduntant check for
>> min size of rr-buffer [David]
>> * Updated h_scm_performance_stats() to have a single allocation for
>> rr-buffer and copy it back to guest memory in a single shot. [David]
>>
>> Since RFC-v1:
>> * Removed empty lines from code [ David ]
>> * Updated struct papr_scm_perf_stat to use uint64_t for
>> statistic_id.
>> * Added a hard limit on max number of stats requested to 255 [ David ]
>> * Updated scm_perf_check_rr_buffer() to check for rr-buffer header
>> size [ David ]
>> * Removed a redundant check from nvdimm_stat_getval() [ David ]
>> * Removed a redundant call to address_space_access_valid() in
>> scm_perf_check_rr_buffer() [ David ]
>> * Instead of allocating a minimum size local buffer, allocate a max
>> possible size local rr-buffer. [ David ]
>> * Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ]
>> * Updated h_scm_performance_stats() to use a canned-response method
>> for simplifying num_stats==0 case [ David ].
>> ---
>> hw/ppc/spapr_nvdimm.c | 222 +++++++++++++++++++++++++++++++++++++++++
>> include/hw/ppc/spapr.h | 19 +++-
>> 2 files changed, 240 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
>> index 252204e25f..287326b9c0 100644
>> --- a/hw/ppc/spapr_nvdimm.c
>> +++ b/hw/ppc/spapr_nvdimm.c
>> @@ -35,6 +35,19 @@
>> /* SCM device is unable to persist memory contents */
>> #define PAPR_PMEM_UNARMED PPC_BIT(0)
>>
>> +/* Minimum output buffer size needed to return all nvdimm_perf_stats */
>> +#define SCM_STATS_MIN_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
>> + sizeof(struct papr_scm_perf_stat) * \
>> + ARRAY_SIZE(nvdimm_perf_stats))
>
> MIN_OUTPUT_BUFFER is a better name, but still not great. I think we
> can get rid of this define completely in a neat way, though. See below.
>
>
Not sure how we can get rid of it since we still need to advertise to
the guest how much rr-buffer size we expect to return all
perf-stats. Sorry but I didnt make out of any suggestions below that
could get rid of this define.
>> +/* Maximum number of stats that we can return back in a single stat request */
>> +#define SCM_STATS_MAX_STATS 255
>
> Although it's technically allowed, I'm still not convinced there's
> actually any reason to allow the user to request the same stat over
> and over.
>
Matching the PowerVM behaviour here which doesnt enforce any limitations
on the how many times a single perf-stat can appear in rr-buffer.
>> +/* Maximum possible output buffer to fulfill a perf-stats request */
>> +#define SCM_STATS_MAX_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
>> + sizeof(struct papr_scm_perf_stat) * \
>> + SCM_STATS_MAX_STATS)
>> +
>> bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
>> uint64_t size, Error **errp)
>> {
>> @@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr,
>> return H_SUCCESS;
>> }
>>
>> +static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val)
>> +{
>> + *val = 0;
>> + return H_SUCCESS;
>> +}
>> +
>> +static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t *val)
>> +{
>> + /* Assume full life available of an NVDIMM right now */
>> + *val = 100;
>> + return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * Holds all supported performance stats accessors. Each performance-statistic
>> + * is uniquely identified by a 8-byte ascii string for example: 'MemLife '
>> + * which indicate in percentage how much usage life of an nvdimm is remaining.
>> + * 'NoopStat' which is primarily used to test support for retriving performance
>> + * stats and also to replace unknown stats present in the rr-buffer.
>> + *
>> + */
>> +static const struct {
>> + char stat_id[8];
>
> So using a char[] here, but a uint64_t in the request structure makes
> it pretty hard to follow if you're doing the right thing
> w.r.t. endianness, especially since you effectively memcmp() directly
> between u64s and char[]s. You really want to use a consistent type
> for the ids.
>
Though the PAPR-SCM defines stat-ids as u64 they are essentially 8-byte
ascii strings encoded in a u64. The guest kernel and this proposed qemu
patch doesnt do any math operations on them which might be effected by
their endianess.
The switch from u64->char[8] is done only for a more convinent
ASCII representation stats-ids in nvdimm_pref_stats[].
>> + int (*stat_getval)(SpaprDrc *drc, uint64_t id, uint64_t *val);
>> +} nvdimm_perf_stats[] = {
>> + { "NoopStat", perf_stat_noop},
>> + { "MemLife ", perf_stat_memlife},
>> +};
>> +
>> +/*
>> + * Given a nvdimm drc and stat-name return its value. In case given stat-name
>> + * isnt supported then return H_PARTIAL.
>> + */
>> +static int nvdimm_stat_getval(SpaprDrc *drc, uint64_t id, uint64_t *val)
>> +{
>> + int index;
>> + char stat_id[8];
>> +
>> + /* since comparision is done */
>> + memcpy(&stat_id[0], &id, 8);
>
> I don't see why you're making this temporary copy here.
>
Agree, removed this in next iteration of this patch.
>> + *val = 0;
>> +
>> + /* Lookup the stats-id in the nvdimm_perf_stats table */
>> + for (index = 0; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
>> + if (memcmp(nvdimm_perf_stats[index].stat_id, &stat_id[0], 8) == 0) {
>> + return nvdimm_perf_stats[index].stat_getval(drc, id, val);
>> + }
>> + }
>> + return H_PARTIAL;
>> +}
>> +
>> +/*
>> + * Given a request & result buffer header verify its contents. Also
>> + * buffer-size and number of stats requested are within our expected
>> + * sane bounds.
>> + */
>> +static int scm_perf_check_rr_buffer(struct papr_scm_perf_stats *header,
>> + hwaddr addr, size_t size,
>> + uint32_t *num_stats)
>> +{
>> + size_t expected_buffsize;
>> +
>> + /* Verify the header eyecather and version */
>> + if (memcmp(&header->eye_catcher, SCM_STATS_EYECATCHER,
>> + sizeof(header->eye_catcher))) {
>> + return H_BAD_DATA;
>> + }
>> + if (be32_to_cpu(header->stats_version) != 0x1) {
>> + return H_NOT_AVAILABLE;
>> + }
>> +
>> + /* verify that rr buffer has enough space */
>> + *num_stats = be32_to_cpu(header->num_statistics);
>> + if (*num_stats == 0) { /* Return all stats */
>> + expected_buffsize = SCM_STATS_MIN_OUTPUT_BUFFER;
>> + } else if (*num_stats > SCM_STATS_MAX_STATS) {
>> + /* Too many stats requested */
>> + return H_P3;
>
> I'd recommend testing and exiting on this error case before handling
> the all stats case. Disposing of error cases early is more idiomatic.
>
> You can then combine the all stats and n-stats cases a bit more nicely
> with something like:
> actual_numstats = (*num_stats) ? (*num_stats) : ARRAY_SIZE(...);
>
> Then use the same logic to compute the expected bufsize (min_bufsize
> might be a better name) in both cases.
>
>
Agree, have done the change you suggested in next iteration.
>> + } else { /* Return a subset of stats */
>> + expected_buffsize = sizeof(struct papr_scm_perf_stats) +
>> + (*num_stats) * sizeof(struct papr_scm_perf_stat);
>> + }
>> +
>> + if (size < expected_buffsize || size > SCM_STATS_MAX_OUTPUT_BUFFER) {
>> + return H_P3;
>
> I think you can avoid the MAX_OUTPUT_BUFFER check here...
>
Yes, moved the check to h_scm_performance_stats() in next iteration.
>> + }
>> +
>> + return H_SUCCESS;
>> +}
>> +
>> +/*
>> + * For a given DRC index (R3) return one ore more performance stats of an nvdimm
>> + * device in guest allocated Request-and-result buffer (rr-buffer) (R4) of
>> + * given 'size' (R5). The rr-buffer consists of a header described by
>> + * 'struct papr_scm_perf_stats' that embeds the 'stats_version' and
>> + * 'num_statistics' fields. This is followed by an array of
>> + * 'struct papr_scm_perf_stat'. Based on the request type the writes the
>> + * performance into the array of 'struct papr_scm_perf_stat' embedded inside
>> + * the rr-buffer provided by the guest.
>> + * Special cases handled are:
>> + * 'size' == 0 : Return the maximum possible size of rr-buffer
>> + * 'size' != 0 && 'num_statistics == 0' : Return all possible performance stats
>> + *
>> + * In case there was an error fetching a specific stats (e.g stat-id unknown or
>> + * any other error) then return the stat-id in R4 and also replace its stat
>> + * entry in rr-buffer with 'NoopStat'
>> + */
>> +static target_ulong h_scm_performance_stats(PowerPCCPU *cpu,
>> + SpaprMachineState *spapr,
>> + target_ulong opcode,
>> + target_ulong *args)
>> +{
>> + SpaprDrc *drc = spapr_drc_by_index(args[0]);
>> + const hwaddr addr = args[1];
>> + size_t size = args[2];
>> + struct papr_scm_perf_stats *perfstats;
>> + struct papr_scm_perf_stat *stats;
>> + uint64_t invalid_stat = 0, stat_val;
>> + MemTxResult result;
>> + uint32_t num_stats;
>> + unsigned long rc;
>> + int index;
>> +
>> + /* Ensure that the drc is valid & is valid PMEM dimm and is plugged in */
>> + if (!drc || !drc->dev ||
>> + spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
>> + return H_PARAMETER;
>> + }
>> +
>> + /* Guest requested max buffer size for output buffer */
>> + if (size == 0) {
>> + args[0] = SCM_STATS_MAX_OUTPUT_BUFFER;
>> + return H_SUCCESS;
>> + }
>> +
>> + /* verify size is enough to hold rr-buffer header */
>> + if (size < sizeof(struct papr_scm_perf_stats)) {
>
>
> .. if you put it here instead, then you will have dealt with all
> obviously bad buffer sizes early.
>
Agree
>> + return H_BAD_DATA;
>> + }
>> +
>> + /* allocate enough buffer space locally for holding max stats */
>> + perfstats = g_try_malloc0(SCM_STATS_MAX_OUTPUT_BUFFER);
>
> Then you can safely base this malloc on the given size, rather than
> always over-allocating.
>
Right, have updated this in v4
>> + if (!perfstats) {
>> + return H_NO_MEM;
>> + }
>> +
>> + /* Read and verify rr-buffer header */
>> + result = address_space_read(&address_space_memory, addr,
>> + MEMTXATTRS_UNSPECIFIED, perfstats,
>> + sizeof(struct papr_scm_perf_stats));
>
> And you can also read the entire thing with a single memory read here.
>
Yes agree. Addressed this in v4.
>> + rc = (result == MEMTX_OK) ?
>> + scm_perf_check_rr_buffer(perfstats, addr, size, &num_stats) :
>> + H_PRIVILEGE;
>
> This is a bit cryptic. Just deal with the memtx error first, then run
> the buffer validation. Actually, you can unify the exit paths for
> these and the success case by using a goto label near the end which
> has the g_free() and return rc.
>
Sure, addressed this in v4 by using g_autofree
>> + if (rc != H_SUCCESS) {
>> + g_free(perfstats);
>> + return rc;
>> + }
>> +
>> + stats = &perfstats->scm_statistics[0];
>> + /* when returning all stats generate a canned response first */
>> + if (num_stats == 0) {
>> + for (index = 1; index < ARRAY_SIZE(nvdimm_perf_stats); ++index) {
>> + memcpy(&stats[index - 1].statistic_id,
>> + &nvdimm_perf_stats[index].stat_id, 8);
>> + }
>> + num_stats = ARRAY_SIZE(nvdimm_perf_stats) - 1;
>> + } else {
>> + /* copy the rr-buffer from the guest memory */
>> + result = address_space_read(&address_space_memory,
>> + addr + sizeof(struct papr_scm_perf_stats),
>> + MEMTXATTRS_UNSPECIFIED, stats,
>> + size - sizeof(struct papr_scm_perf_stats));
>> + if (result != MEMTX_OK) {
>> + g_free(perfstats);
>> + return H_PRIVILEGE;
>> + }
>> + }
>> +
>> + for (index = 0; index < num_stats; ++index) {
>> + rc = nvdimm_stat_getval(drc, stats[index].statistic_id, &stat_val);
>> +
>> + /* On error add noop stat to rr buffer & save last inval stat-id */
>> + if (rc != H_SUCCESS) {
>> + if (!invalid_stat) {
>> + invalid_stat = stats[index].statistic_id;
>> + }
>> + memcpy(&stats[index].statistic_id, nvdimm_perf_stats[0].stat_id, 8);
>> + }
>> + /* Caller expects stat values in BE encoding */
>> + stats[index].statistic_value = cpu_to_be64(stat_val);
>> + }
>> +
>> + /* Update and copy the local rr-buffer back to guest */
>> + perfstats->num_statistics = cpu_to_be32(num_stats);
>> + g_assert(size <= SCM_STATS_MAX_OUTPUT_BUFFER);
>> + result = address_space_write(&address_space_memory, addr,
>> + MEMTXATTRS_UNSPECIFIED, perfstats, size);
>> +
>> + /* Cleanup the stats buffer */
>> + g_free(perfstats);
>> + if (result) {
>> + return H_PRIVILEGE;
>> + }
>> + /* Check if there was a failure in fetching any stat */
>> + args[0] = invalid_stat;
>> + return invalid_stat ? H_PARTIAL : H_SUCCESS;
>> +}
>> +
>> static void spapr_scm_register_types(void)
>> {
>> /* qemu/scm specific hcalls */
>> @@ -511,6 +732,7 @@ static void spapr_scm_register_types(void)
>> spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
>> spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all);
>> spapr_register_hypercall(H_SCM_HEALTH, h_scm_health);
>> + spapr_register_hypercall(H_SCM_PERFORMANCE_STATS, h_scm_performance_stats);
>> }
>>
>> type_init(spapr_scm_register_types)
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index d2b5a9bdf9..6f3353b4ea 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -326,6 +326,7 @@ struct SpaprMachineState {
>> #define H_P8 -61
>> #define H_P9 -62
>> #define H_OVERLAP -68
>> +#define H_BAD_DATA -70
>> #define H_UNSUPPORTED_FLAG -256
>> #define H_MULTI_THREADS_ACTIVE -9005
>>
>> @@ -539,8 +540,9 @@ struct SpaprMachineState {
>> #define H_SCM_UNBIND_MEM 0x3F0
>> #define H_SCM_UNBIND_ALL 0x3FC
>> #define H_SCM_HEALTH 0x400
>> +#define H_SCM_PERFORMANCE_STATS 0x418
>>
>> -#define MAX_HCALL_OPCODE H_SCM_HEALTH
>> +#define MAX_HCALL_OPCODE H_SCM_PERFORMANCE_STATS
>>
>> /* The hcalls above are standardized in PAPR and implemented by pHyp
>> * as well.
>> @@ -787,6 +789,21 @@ OBJECT_DECLARE_SIMPLE_TYPE(SpaprTceTable, SPAPR_TCE_TABLE)
>> DECLARE_INSTANCE_CHECKER(IOMMUMemoryRegion, SPAPR_IOMMU_MEMORY_REGION,
>> TYPE_SPAPR_IOMMU_MEMORY_REGION)
>>
>> +/* Defs and structs exchanged with guest when reporting drc perf stats */
>> +#define SCM_STATS_EYECATCHER "SCMSTATS"
>> +
>> +struct QEMU_PACKED papr_scm_perf_stat {
>> + uint64_t statistic_id;
>> + uint64_t statistic_value;
>> +};
>> +
>> +struct QEMU_PACKED papr_scm_perf_stats {
>> + uint8_t eye_catcher[8]; /* Should be “SCMSTATS” */
>> + uint32_t stats_version; /* Should be 0x01 */
>> + uint32_t num_statistics; /* Number of stats following */
>> + struct papr_scm_perf_stat scm_statistics[]; /* Performance matrics */
>> +};
>> +
>> struct SpaprTceTable {
>> DeviceState parent;
>> uint32_t liobn;
>
> --
> David Gibson | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
> | _way_ _around_!
> http://www.ozlabs.org/~dgibson
--
Cheers
~ Vaibhav
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
2021-05-22 3:43 ` Vaibhav Jain
@ 2021-05-24 7:32 ` David Gibson
-1 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2021-05-24 7:32 UTC (permalink / raw)
To: Vaibhav Jain
Cc: xiaoguangrong.eric, mst, aneesh.kumar, qemu-devel, kvm-ppc,
groug, shivaprasadbhat, qemu-ppc, bharata, imammedo, ehabkost
[-- Attachment #1: Type: text/plain, Size: 8549 bytes --]
On Sat, May 22, 2021 at 09:01:26AM +0530, Vaibhav Jain wrote:
> Thanks for looking into this patch David and Groug,
>
> David Gibson <david@gibson.dropbear.id.au> writes:
> > On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote:
> >> Add support for H_SCM_PERFORMANCE_STATS described at [1] for
> >> spapr nvdimms. This enables guest to fetch performance stats[2] like
> >> expected life of an nvdimm ('MemLife ') etc and display them to the
> >> user. Linux kernel support for fetching these performance stats and
> >> exposing them to the user-space was done via [3].
> >>
> >> The hcall semantics mandate that each nvdimm performance stats is
> >> uniquely identied by a 8-byte ascii string encoded as an unsigned
> >> integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a
> >> 8-byte unsigned integer. These performance-stats are exchanged with
> >> the guest in via a guest allocated buffer called
> >> 'requestAndResultBuffer' or rr-buffer for short. This buffer contains
> >> a header descibed by 'struct papr_scm_perf_stats' followed by an array
> >> of performance-stats described by 'struct papr_scm_perf_stat'. The
> >> hypervisor is expected to validate the rr-buffer header and then based
> >> on the request copy the needed performance-stats to the array of
> >> 'struct papr_scm_perf_stat' following the header.
> >>
> >> The patch proposes a new function h_scm_performance_stats() that
> >> services the H_SCM_PERFORMANCE_STATS hcall. After verifying the
> >> validity of the rr-buffer header via scm_perf_check_rr_buffer() it
> >> proceeds to fill the rr-buffer with requested performance-stats. The
> >> value of individual stats is retrived from individual accessor
> >> function for the stat which are indexed in the array
> >> 'nvdimm_perf_stats'.
> >>
> >> References:
> >> [1] "Hypercall Op-codes (hcalls)"
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269
> >> [2] Sysfs attribute documentation for papr_scm
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36
> >> [3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP"
> >> https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
> >>
> >> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> >> ---
> >> Changelog
> >>
> >> Since RFC-v2:
> >> * s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the
> >> minimum size buffer needed to return all supported performance
> >> stats. Value for this is derived from size of array 'nvdimm_perf_stats'
> >> * Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported
> >> rr-buffer size from a guest. Value for this is derived from hard
> >> limit of SCM_STATS_MAX_STATS.
> >> * Updated scm_perf_check_rr_buffer() to add a check for max size of
> >> rr-buffer. [David]
> >> * Updated scm_perf_check_rr_buffer() to removed a reduntant check for
> >> min size of rr-buffer [David]
> >> * Updated h_scm_performance_stats() to have a single allocation for
> >> rr-buffer and copy it back to guest memory in a single shot. [David]
> >>
> >> Since RFC-v1:
> >> * Removed empty lines from code [ David ]
> >> * Updated struct papr_scm_perf_stat to use uint64_t for
> >> statistic_id.
> >> * Added a hard limit on max number of stats requested to 255 [ David ]
> >> * Updated scm_perf_check_rr_buffer() to check for rr-buffer header
> >> size [ David ]
> >> * Removed a redundant check from nvdimm_stat_getval() [ David ]
> >> * Removed a redundant call to address_space_access_valid() in
> >> scm_perf_check_rr_buffer() [ David ]
> >> * Instead of allocating a minimum size local buffer, allocate a max
> >> possible size local rr-buffer. [ David ]
> >> * Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ]
> >> * Updated h_scm_performance_stats() to use a canned-response method
> >> for simplifying num_stats==0 case [ David ].
> >> ---
> >> hw/ppc/spapr_nvdimm.c | 222 +++++++++++++++++++++++++++++++++++++++++
> >> include/hw/ppc/spapr.h | 19 +++-
> >> 2 files changed, 240 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
> >> index 252204e25f..287326b9c0 100644
> >> --- a/hw/ppc/spapr_nvdimm.c
> >> +++ b/hw/ppc/spapr_nvdimm.c
> >> @@ -35,6 +35,19 @@
> >> /* SCM device is unable to persist memory contents */
> >> #define PAPR_PMEM_UNARMED PPC_BIT(0)
> >>
> >> +/* Minimum output buffer size needed to return all nvdimm_perf_stats */
> >> +#define SCM_STATS_MIN_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
> >> + sizeof(struct papr_scm_perf_stat) * \
> >> + ARRAY_SIZE(nvdimm_perf_stats))
> >
> > MIN_OUTPUT_BUFFER is a better name, but still not great. I think we
> > can get rid of this define completely in a neat way, though. See below.
> >
> >
> Not sure how we can get rid of it since we still need to advertise to
> the guest how much rr-buffer size we expect to return all
> perf-stats. Sorry but I didnt make out of any suggestions below that
> could get rid of this define.
>
>
> >> +/* Maximum number of stats that we can return back in a single stat request */
> >> +#define SCM_STATS_MAX_STATS 255
> >
> > Although it's technically allowed, I'm still not convinced there's
> > actually any reason to allow the user to request the same stat over
> > and over.
> >
> Matching the PowerVM behaviour here which doesnt enforce any limitations
> on the how many times a single perf-stat can appear in rr-buffer.
Hm, I guess matching PowerVM is worthwhile. Still can't imagine any
case where a client would actually want to do so.
>
> >> +/* Maximum possible output buffer to fulfill a perf-stats request */
> >> +#define SCM_STATS_MAX_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
> >> + sizeof(struct papr_scm_perf_stat) * \
> >> + SCM_STATS_MAX_STATS)
> >> +
> >> bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
> >> uint64_t size, Error **errp)
> >> {
> >> @@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >> return H_SUCCESS;
> >> }
> >>
> >> +static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val)
> >> +{
> >> + *val = 0;
> >> + return H_SUCCESS;
> >> +}
> >> +
> >> +static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t *val)
> >> +{
> >> + /* Assume full life available of an NVDIMM right now */
> >> + *val = 100;
> >> + return H_SUCCESS;
> >> +}
> >> +
> >> +/*
> >> + * Holds all supported performance stats accessors. Each performance-statistic
> >> + * is uniquely identified by a 8-byte ascii string for example: 'MemLife '
> >> + * which indicate in percentage how much usage life of an nvdimm is remaining.
> >> + * 'NoopStat' which is primarily used to test support for retriving performance
> >> + * stats and also to replace unknown stats present in the rr-buffer.
> >> + *
> >> + */
> >> +static const struct {
> >> + char stat_id[8];
> >
> > So using a char[] here, but a uint64_t in the request structure makes
> > it pretty hard to follow if you're doing the right thing
> > w.r.t. endianness, especially since you effectively memcmp() directly
> > between u64s and char[]s. You really want to use a consistent type
> > for the ids.
> >
> Though the PAPR-SCM defines stat-ids as u64 they are essentially 8-byte
> ascii strings encoded in a u64.
Yes, I got that. The typing should still be consistent.
> The guest kernel and this proposed qemu
> patch doesnt do any math operations on them which might be effected by
> their endianess.
You do however return it in a register in at least one case, so you
need to be careful about how that's loaded or stored.
> The switch from u64->char[8] is done only for a more convinent
> ASCII representation stats-ids in nvdimm_pref_stats[].
Sounds like it would make more sense to use char[8] everywhere, then.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall
@ 2021-05-24 7:32 ` David Gibson
0 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2021-05-24 7:32 UTC (permalink / raw)
To: Vaibhav Jain
Cc: xiaoguangrong.eric, mst, aneesh.kumar, qemu-devel, kvm-ppc,
groug, shivaprasadbhat, qemu-ppc, bharata, imammedo, ehabkost
[-- Attachment #1: Type: text/plain, Size: 8549 bytes --]
On Sat, May 22, 2021 at 09:01:26AM +0530, Vaibhav Jain wrote:
> Thanks for looking into this patch David and Groug,
>
> David Gibson <david@gibson.dropbear.id.au> writes:
> > On Sat, May 15, 2021 at 01:07:59PM +0530, Vaibhav Jain wrote:
> >> Add support for H_SCM_PERFORMANCE_STATS described at [1] for
> >> spapr nvdimms. This enables guest to fetch performance stats[2] like
> >> expected life of an nvdimm ('MemLife ') etc and display them to the
> >> user. Linux kernel support for fetching these performance stats and
> >> exposing them to the user-space was done via [3].
> >>
> >> The hcall semantics mandate that each nvdimm performance stats is
> >> uniquely identied by a 8-byte ascii string encoded as an unsigned
> >> integer (e.g 'MemLife ' == 0x4D656D4C69666520) and its value be a
> >> 8-byte unsigned integer. These performance-stats are exchanged with
> >> the guest in via a guest allocated buffer called
> >> 'requestAndResultBuffer' or rr-buffer for short. This buffer contains
> >> a header descibed by 'struct papr_scm_perf_stats' followed by an array
> >> of performance-stats described by 'struct papr_scm_perf_stat'. The
> >> hypervisor is expected to validate the rr-buffer header and then based
> >> on the request copy the needed performance-stats to the array of
> >> 'struct papr_scm_perf_stat' following the header.
> >>
> >> The patch proposes a new function h_scm_performance_stats() that
> >> services the H_SCM_PERFORMANCE_STATS hcall. After verifying the
> >> validity of the rr-buffer header via scm_perf_check_rr_buffer() it
> >> proceeds to fill the rr-buffer with requested performance-stats. The
> >> value of individual stats is retrived from individual accessor
> >> function for the stat which are indexed in the array
> >> 'nvdimm_perf_stats'.
> >>
> >> References:
> >> [1] "Hypercall Op-codes (hcalls)"
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n269
> >> [2] Sysfs attribute documentation for papr_scm
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-papr-pmem#n36
> >> [3] "powerpc/papr_scm: Fetch nvdimm performance stats from PHYP"
> >> https://lore.kernel.org/r/20200731064153.182203-2-vaibhav@linux.ibm.com
> >>
> >> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> >> ---
> >> Changelog
> >>
> >> Since RFC-v2:
> >> * s/SCM_STATS_MAX_OUTPUT_BUFFER/SCM_STATS_MIN_OUTPUT_BUFFER/ thats the
> >> minimum size buffer needed to return all supported performance
> >> stats. Value for this is derived from size of array 'nvdimm_perf_stats'
> >> * Added SCM_STATS_MAX_OUTPUT_BUFFER to indicate maximum supported
> >> rr-buffer size from a guest. Value for this is derived from hard
> >> limit of SCM_STATS_MAX_STATS.
> >> * Updated scm_perf_check_rr_buffer() to add a check for max size of
> >> rr-buffer. [David]
> >> * Updated scm_perf_check_rr_buffer() to removed a reduntant check for
> >> min size of rr-buffer [David]
> >> * Updated h_scm_performance_stats() to have a single allocation for
> >> rr-buffer and copy it back to guest memory in a single shot. [David]
> >>
> >> Since RFC-v1:
> >> * Removed empty lines from code [ David ]
> >> * Updated struct papr_scm_perf_stat to use uint64_t for
> >> statistic_id.
> >> * Added a hard limit on max number of stats requested to 255 [ David ]
> >> * Updated scm_perf_check_rr_buffer() to check for rr-buffer header
> >> size [ David ]
> >> * Removed a redundant check from nvdimm_stat_getval() [ David ]
> >> * Removed a redundant call to address_space_access_valid() in
> >> scm_perf_check_rr_buffer() [ David ]
> >> * Instead of allocating a minimum size local buffer, allocate a max
> >> possible size local rr-buffer. [ David ]
> >> * Updated nvdimm_stat_getval() to set 'val' to '0' on error. [ David ]
> >> * Updated h_scm_performance_stats() to use a canned-response method
> >> for simplifying num_stats==0 case [ David ].
> >> ---
> >> hw/ppc/spapr_nvdimm.c | 222 +++++++++++++++++++++++++++++++++++++++++
> >> include/hw/ppc/spapr.h | 19 +++-
> >> 2 files changed, 240 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
> >> index 252204e25f..287326b9c0 100644
> >> --- a/hw/ppc/spapr_nvdimm.c
> >> +++ b/hw/ppc/spapr_nvdimm.c
> >> @@ -35,6 +35,19 @@
> >> /* SCM device is unable to persist memory contents */
> >> #define PAPR_PMEM_UNARMED PPC_BIT(0)
> >>
> >> +/* Minimum output buffer size needed to return all nvdimm_perf_stats */
> >> +#define SCM_STATS_MIN_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
> >> + sizeof(struct papr_scm_perf_stat) * \
> >> + ARRAY_SIZE(nvdimm_perf_stats))
> >
> > MIN_OUTPUT_BUFFER is a better name, but still not great. I think we
> > can get rid of this define completely in a neat way, though. See below.
> >
> >
> Not sure how we can get rid of it since we still need to advertise to
> the guest how much rr-buffer size we expect to return all
> perf-stats. Sorry but I didnt make out of any suggestions below that
> could get rid of this define.
>
>
> >> +/* Maximum number of stats that we can return back in a single stat request */
> >> +#define SCM_STATS_MAX_STATS 255
> >
> > Although it's technically allowed, I'm still not convinced there's
> > actually any reason to allow the user to request the same stat over
> > and over.
> >
> Matching the PowerVM behaviour here which doesnt enforce any limitations
> on the how many times a single perf-stat can appear in rr-buffer.
Hm, I guess matching PowerVM is worthwhile. Still can't imagine any
case where a client would actually want to do so.
>
> >> +/* Maximum possible output buffer to fulfill a perf-stats request */
> >> +#define SCM_STATS_MAX_OUTPUT_BUFFER (sizeof(struct papr_scm_perf_stats) + \
> >> + sizeof(struct papr_scm_perf_stat) * \
> >> + SCM_STATS_MAX_STATS)
> >> +
> >> bool spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
> >> uint64_t size, Error **errp)
> >> {
> >> @@ -502,6 +515,214 @@ static target_ulong h_scm_health(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >> return H_SUCCESS;
> >> }
> >>
> >> +static int perf_stat_noop(SpaprDrc *drc, uint64_t unused, uint64_t *val)
> >> +{
> >> + *val = 0;
> >> + return H_SUCCESS;
> >> +}
> >> +
> >> +static int perf_stat_memlife(SpaprDrc *drc, uint64_t unused, uint64_t *val)
> >> +{
> >> + /* Assume full life available of an NVDIMM right now */
> >> + *val = 100;
> >> + return H_SUCCESS;
> >> +}
> >> +
> >> +/*
> >> + * Holds all supported performance stats accessors. Each performance-statistic
> >> + * is uniquely identified by a 8-byte ascii string for example: 'MemLife '
> >> + * which indicate in percentage how much usage life of an nvdimm is remaining.
> >> + * 'NoopStat' which is primarily used to test support for retriving performance
> >> + * stats and also to replace unknown stats present in the rr-buffer.
> >> + *
> >> + */
> >> +static const struct {
> >> + char stat_id[8];
> >
> > So using a char[] here, but a uint64_t in the request structure makes
> > it pretty hard to follow if you're doing the right thing
> > w.r.t. endianness, especially since you effectively memcmp() directly
> > between u64s and char[]s. You really want to use a consistent type
> > for the ids.
> >
> Though the PAPR-SCM defines stat-ids as u64 they are essentially 8-byte
> ascii strings encoded in a u64.
Yes, I got that. The typing should still be consistent.
> The guest kernel and this proposed qemu
> patch doesnt do any math operations on them which might be effected by
> their endianess.
You do however return it in a register in at least one case, so you
need to be careful about how that's loaded or stored.
> The switch from u64->char[8] is done only for a more convinent
> ASCII representation stats-ids in nvdimm_pref_stats[].
Sounds like it would make more sense to use char[8] everywhere, then.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-05-24 7:33 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-15 7:37 [RFC PATCH v3] ppc/spapr: Add support for H_SCM_PERFORMANCE_STATS hcall Vaibhav Jain
2021-05-15 7:49 ` Vaibhav Jain
2021-05-17 6:23 ` David Gibson
2021-05-17 6:23 ` David Gibson
2021-05-17 7:55 ` Greg Kurz
2021-05-17 7:55 ` Greg Kurz
2021-05-17 8:16 ` David Gibson
2021-05-17 8:16 ` David Gibson
2021-05-22 3:31 ` Vaibhav Jain
2021-05-22 3:43 ` Vaibhav Jain
2021-05-24 7:32 ` David Gibson
2021-05-24 7:32 ` David Gibson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.