All of lore.kernel.org
 help / color / mirror / Atom feed
* Feedback requested: Exposing NVDIMM performance statistics in a generic way
@ 2020-05-25  9:00 Vaibhav Jain
  2020-05-27 19:24 ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Vaibhav Jain @ 2020-05-25  9:00 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Aneesh Kumar K.V, Alastair D'Silva

Hello,

I am looking for some community feedback on these two Problem-statements:

1.How to expose NVDIMM performance statistics in an arch or nvdimm vendor
agnostic manner ?

2. Is there a common set of performance statistics for NVDIMMs that all
vendors should provide ?

Problem context
===============
While working on bring up of PAPR SCM based NVDIMMs[1] for arch/powerpc
we want to expose certain dimm performance statistics like "Media
Read/Write Counts", "Power-on Seconds" etc to user-space [2]. These
performance statistics are similar to what ipmctl[3] reports for Intel®
Optane™ persistent memory via the '-show performance' command line
arg. However the reported set of performance stats doesn't cover the
entirety of all performance stats supported by PAPR SCM based NVDimms.

For example here is a subset of performance stats which are specific to
PAPR SCM NVDimms and that not reported by ipmctl:

* Controller Reset Count
* Controller Reset Elapsed Time
* Power-on Seconds
* Cache Read Hit Count
* Cache Write Hit Count

Possibility of updating ipmctl to add support for these performance
statistics is greatly hampered by no support for ACPI on Powerpc
arch. Secondly vendors who dont support ACPI/NFIT command set
similar to Intel® Optane™ (Example MSFT) are also left out in
lurch. Problem-statement#1 points to this specific problem.

Additionally in absence of any pre-agreed set of performance statistics
which all vendors should support, adding support for such a
functionality in ipmctl may not bode well of other nvdimm vendors. For
example if support for reporting "Controller Reset Count" is added to
ipmctl then it may not be applicable to other vendors such as Intel®
Optane™. This issue is what Problem-statement#2 refers to.

Possible Solution for Problem#1
===============================

One possible solution to Problem#1 can to add support for reporting
NVDIMM performance statistics in 'ndtcl'. 'libndctl' already has a layer
that abstracts underlying NVDIMM vendors (via struct ndctl_dimm_ops),
making supporting different NVDIMM vendors fairly easy. Also ndctl is
more widely used compared to 'ipmctl', hence adding such a functionality
to ndctl would make it more widely used.

Above solution was implemented as RFC patch-set[2] that exposes these
performance statistics through a generic abstraction in libndctl and
added a presentation layer for this data in ndctl[4]. It added a new
command line flags '--stat' to ndctl to report *all* nvdimm vendor
reported performance stats. The output is similar to one below:

# ndctl list -D --stats
[
  {
    "dev":"nmem0",
    "stats":{
      "Power-on Seconds":603931,
      "Media Read Count":0,
      "Media Write Count":6313,
    }
  }
]

This was done by adding two new dimm-ops callbacks that were
implemented by the papr_scm implementation within libndctl. These
callbacks are invoked by newly introduce code in 'util/json-smart.c'
that format the returned stats from these new dimm-ops and transform
them into a json-object to later presentation. I would request you to
look at RFC patch-set[2] to understand the implementation details.

Possibled Solution for Problem#2
================================

Solution to Problem-statement#2 is what eludes me though. If there is a
minimal set of performance stats (similar to what ndctl enforces for
health-stats) then implementation of such a functionality in
ndctl/ipmctl would be easy to implement. But is it really possible to
have such a common set of performance stats that NVDIMM vendors can
expose.

Patch-set[2] though tries to bypass this problem by letting the vendor
descide which performance stats to expose. This opens up a possibility
of this functionality to abused by dimm vendors to reports arbirary data
through this flag that may not be performance-stats.

Summing-up
==========

In light of above, requesting your feedback as to how
problem-statements#{1, 2} can be addressed within ndctl subsystem. Also
are these problems even worth solving.

References
==========
[1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/papr_hcalls.rst

[2] "[ndctl RFC-PATCH 0/4] Add support for reporting PAPR NVDIMM
Statistics"
https://lore.kernel.org/linux-nvdimm/20200518110814.145644-1-vaibhav@linux.ibm.com/

[3] https://docs.pmem.io/ipmctl-user-guide/instrumentation/show-device-performance

[4] "[RFC-PATCH 1/4] ndctl,libndctl: Implement new dimm-ops 'new_stats'
and 'get_stat'"
https://lore.kernel.org/linux-nvdimm/20200514225258.508463-2-vaibhav@linux.ibm.com

Thanks,
~ Vaibhav
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feedback requested: Exposing NVDIMM performance statistics in a generic way
  2020-05-25  9:00 Feedback requested: Exposing NVDIMM performance statistics in a generic way Vaibhav Jain
@ 2020-05-27 19:24 ` Dan Williams
  2020-05-28  0:55   ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2020-05-27 19:24 UTC (permalink / raw)
  To: Vaibhav Jain; +Cc: linux-nvdimm, Aneesh Kumar K.V, Alastair D'Silva

On Mon, May 25, 2020 at 2:01 AM Vaibhav Jain <vaibhav@linux.ibm.com> wrote:
>
> Hello,
>
> I am looking for some community feedback on these two Problem-statements:
>
> 1.How to expose NVDIMM performance statistics in an arch or nvdimm vendor
> agnostic manner ?
>
> 2. Is there a common set of performance statistics for NVDIMMs that all
> vendors should provide ?

It would be nice to try to encourage some common keyword metrics
across vendors, but I suspect that like perf there will always be some
statistics that are architecture specific.

> Problem context
> ===============
> While working on bring up of PAPR SCM based NVDIMMs[1] for arch/powerpc
> we want to expose certain dimm performance statistics like "Media
> Read/Write Counts", "Power-on Seconds" etc to user-space [2]. These
> performance statistics are similar to what ipmctl[3] reports for Intel®
> Optane™ persistent memory via the '-show performance' command line
> arg. However the reported set of performance stats doesn't cover the
> entirety of all performance stats supported by PAPR SCM based NVDimms.
>
> For example here is a subset of performance stats which are specific to
> PAPR SCM NVDimms and that not reported by ipmctl:
>
> * Controller Reset Count
> * Controller Reset Elapsed Time
> * Power-on Seconds
> * Cache Read Hit Count
> * Cache Write Hit Count
>
> Possibility of updating ipmctl to add support for these performance
> statistics is greatly hampered by no support for ACPI on Powerpc
> arch. Secondly vendors who dont support ACPI/NFIT command set
> similar to Intel® Optane™ (Example MSFT) are also left out in
> lurch. Problem-statement#1 points to this specific problem.

ipmctl is vendor specific and OS agnostic.

ndctl is vendor/platform agnostic and Linux specific.

ndctl is built for this abstraction as it depends on libnvdimm of
which ACPI/NFIT is just one of many co-equal bus providers. In short,
I would expect this support to land in ndctl, not ipmctl.

The only thing that has prevented ndctl from adding performance
statistics was the lack of a public specification, otherwise ndctl
aims to abstract and provide a common tool for any publicly specified
persistent memory device.

> Additionally in absence of any pre-agreed set of performance statistics
> which all vendors should support, adding support for such a
> functionality in ipmctl may not bode well of other nvdimm vendors. For
> example if support for reporting "Controller Reset Count" is added to
> ipmctl then it may not be applicable to other vendors such as Intel®
> Optane™. This issue is what Problem-statement#2 refers to.
>
> Possible Solution for Problem#1
> ===============================
>
> One possible solution to Problem#1 can to add support for reporting
> NVDIMM performance statistics in 'ndtcl'. 'libndctl' already has a layer
> that abstracts underlying NVDIMM vendors (via struct ndctl_dimm_ops),
> making supporting different NVDIMM vendors fairly easy. Also ndctl is
> more widely used compared to 'ipmctl', hence adding such a functionality
> to ndctl would make it more widely used.
>
> Above solution was implemented as RFC patch-set[2] that exposes these
> performance statistics through a generic abstraction in libndctl and
> added a presentation layer for this data in ndctl[4]. It added a new
> command line flags '--stat' to ndctl to report *all* nvdimm vendor
> reported performance stats. The output is similar to one below:
>
> # ndctl list -D --stats
> [
>   {
>     "dev":"nmem0",
>     "stats":{
>       "Power-on Seconds":603931,
>       "Media Read Count":0,
>       "Media Write Count":6313,

I wonder if this should be explicit about the platform-specific stats
versus the common ones? At least with the health data implementation
so far it is implementing a common set of keywords across vendors. I
just worry that someone that writes a useful tool for this data needs
to understand that their tool is vendor generic, or tied to a given
implementation. I'm thinking "platform_stats" for the ones that are
tied to the nvdimm-bus-provider vs "stats" that might reasonably show
up on more than one vendor's implementation.

>     }
>   }
> ]
>
> This was done by adding two new dimm-ops callbacks that were
> implemented by the papr_scm implementation within libndctl. These
> callbacks are invoked by newly introduce code in 'util/json-smart.c'
> that format the returned stats from these new dimm-ops and transform
> them into a json-object to later presentation. I would request you to
> look at RFC patch-set[2] to understand the implementation details.

I'm ok to add some stats to ndctl, but I want ndctl to be limited to
general statistics and not performance counters. Performance counters
and performance events should be abstracted through perf where
possible.

> Possibled Solution for Problem#2
> ================================
>
> Solution to Problem-statement#2 is what eludes me though. If there is a
> minimal set of performance stats (similar to what ndctl enforces for
> health-stats) then implementation of such a functionality in
> ndctl/ipmctl would be easy to implement. But is it really possible to
> have such a common set of performance stats that NVDIMM vendors can
> expose.
>
> Patch-set[2] though tries to bypass this problem by letting the vendor
> descide which performance stats to expose. This opens up a possibility
> of this functionality to abused by dimm vendors to reports arbirary data
> through this flag that may not be performance-stats.
>
> Summing-up
> ==========
>
> In light of above, requesting your feedback as to how
> problem-statements#{1, 2} can be addressed within ndctl subsystem. Also
> are these problems even worth solving.

Yes, I think it's worth solving, just the hard part of giving
implementations enough freedom to convey the data they need, but not
enough freedom that we damage ndctl and the kernel's ability to
maintain a common interface across vendors.

>
> References
> ==========
> [1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/papr_hcalls.rst
>
> [2] "[ndctl RFC-PATCH 0/4] Add support for reporting PAPR NVDIMM
> Statistics"
> https://lore.kernel.org/linux-nvdimm/20200518110814.145644-1-vaibhav@linux.ibm.com/
>
> [3] https://docs.pmem.io/ipmctl-user-guide/instrumentation/show-device-performance
>
> [4] "[RFC-PATCH 1/4] ndctl,libndctl: Implement new dimm-ops 'new_stats'
> and 'get_stat'"
> https://lore.kernel.org/linux-nvdimm/20200514225258.508463-2-vaibhav@linux.ibm.com

Appreciate the thorough write-up.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feedback requested: Exposing NVDIMM performance statistics in a generic way
  2020-05-27 19:24 ` Dan Williams
@ 2020-05-28  0:55   ` Dan Williams
  2020-05-28 18:59     ` Vaibhav Jain
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2020-05-28  0:55 UTC (permalink / raw)
  To: Vaibhav Jain; +Cc: linux-nvdimm, Aneesh Kumar K.V, Alastair D'Silva

On Wed, May 27, 2020 at 12:24 PM Dan Williams <dan.j.williams@intel.com> wrote:
[..]
> > This was done by adding two new dimm-ops callbacks that were
> > implemented by the papr_scm implementation within libndctl. These
> > callbacks are invoked by newly introduce code in 'util/json-smart.c'
> > that format the returned stats from these new dimm-ops and transform
> > them into a json-object to later presentation. I would request you to
> > look at RFC patch-set[2] to understand the implementation details.
>
> I'm ok to add some stats to ndctl, but I want ndctl to be limited to
> general statistics and not performance counters. Performance counters
> and performance events should be abstracted through perf where
> possible.

Another aspect that helps common statistics is to expose them in
sysfs. I'm going to go review your proposed ioctl mechanism, but I
would hope that is reserved for multi-field command payloads that need
to be sent as a unit rather than statistics retrieval that is amenable
to a sysfs interface.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feedback requested: Exposing NVDIMM performance statistics in a generic way
  2020-05-28  0:55   ` Dan Williams
@ 2020-05-28 18:59     ` Vaibhav Jain
  2020-05-28 22:40       ` Dan Williams
  2020-10-23 17:28       ` Michal Suchánek
  0 siblings, 2 replies; 8+ messages in thread
From: Vaibhav Jain @ 2020-05-28 18:59 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, Aneesh Kumar K.V, Alastair D'Silva

Thanks for this taking time to look into this Dan,

Agree with the points you have made earlier that I am summarizing below:

* This is better done in ndctl rather than ipmctl.
* Should only expose general performance metrics and not performance
  counters. Performance counter should be exposed via perf
* Vendor specific metrics to be separated from generic performance
  metrics.

One way to split generic and vendor specific metrics might be to report
generic performance metrics together with dimm health metrics such as
"temprature_celsius" or "spares_percentage" that are already reported in
by dimm health output.

Vendor specific performance metrics can be reported as a seperate object
in the json output. Something similar to output below:

# ndctl list -DH --stats --vendor-stats
[
  {
    "dev":"nmem0",
    "health":{
      "health_state":"ok",
      "shutdown_state":"clean",
      "temperature_celsius":48.00,
      "spares_percentage":10,

      /* Generic performance metrics/stats */
      "TotalMediaReads": 18929,
      "TotalMediaWrites": 0,
      ....
    }
    
    /* Vendor specific stats for the dimm */
    "vendor-stats": {
    "Controller Reset Count":10
    "Controller Reset Elapsed Time": 3600
    "Power-on Seconds": 3600
    }
  }
]


Dan Williams <dan.j.williams@intel.com> writes:

> On Wed, May 27, 2020 at 12:24 PM Dan Williams <dan.j.williams@intel.com> wrote:
> [..]
>> > This was done by adding two new dimm-ops callbacks that were
>> > implemented by the papr_scm implementation within libndctl. These
>> > callbacks are invoked by newly introduce code in 'util/json-smart.c'
>> > that format the returned stats from these new dimm-ops and transform
>> > them into a json-object to later presentation. I would request you to
>> > look at RFC patch-set[2] to understand the implementation details.
>>
>> I'm ok to add some stats to ndctl, but I want ndctl to be limited to
>> general statistics and not performance counters. Performance counters
>> and performance events should be abstracted through perf where
>> possible.
>
> Another aspect that helps common statistics is to expose them in
> sysfs. I'm going to go review your proposed ioctl mechanism, but I
> would hope that is reserved for multi-field command payloads that need
> to be sent as a unit rather than statistics retrieval that is amenable
> to a sysfs interface.

The patchset is using a machenism similar to GET_CONFIG_SIZE/DATA to
retrive a struct composed of tuples of (stat-id, stat-value) from
papr_scm and then exposes them to ndctl via some new dimm-ops.

-- 
Cheers
~ Vaibhav
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feedback requested: Exposing NVDIMM performance statistics in a generic way
  2020-05-28 18:59     ` Vaibhav Jain
@ 2020-05-28 22:40       ` Dan Williams
  2020-10-23 17:28       ` Michal Suchánek
  1 sibling, 0 replies; 8+ messages in thread
From: Dan Williams @ 2020-05-28 22:40 UTC (permalink / raw)
  To: Vaibhav Jain; +Cc: linux-nvdimm, Aneesh Kumar K.V, Alastair D'Silva

On Thu, May 28, 2020 at 11:59 AM Vaibhav Jain <vaibhav@linux.ibm.com> wrote:
>
> Thanks for this taking time to look into this Dan,
>
> Agree with the points you have made earlier that I am summarizing below:
>
> * This is better done in ndctl rather than ipmctl.
> * Should only expose general performance metrics and not performance
>   counters. Performance counter should be exposed via perf
> * Vendor specific metrics to be separated from generic performance
>   metrics.
>
> One way to split generic and vendor specific metrics might be to report
> generic performance metrics together with dimm health metrics such as
> "temprature_celsius" or "spares_percentage" that are already reported in
> by dimm health output.
>
> Vendor specific performance metrics can be reported as a seperate object
> in the json output. Something similar to output below:
>
> # ndctl list -DH --stats --vendor-stats
> [
>   {
>     "dev":"nmem0",
>     "health":{
>       "health_state":"ok",
>       "shutdown_state":"clean",
>       "temperature_celsius":48.00,
>       "spares_percentage":10,
>
>       /* Generic performance metrics/stats */
>       "TotalMediaReads": 18929,
>       "TotalMediaWrites": 0,
>       ....
>     }
>
>     /* Vendor specific stats for the dimm */
>     "vendor-stats": {
>     "Controller Reset Count":10
>     "Controller Reset Elapsed Time": 3600
>     "Power-on Seconds": 3600

Looks reasonable, although I think I want to maintain the
"Linux-style" format for the keys i.e. lowercase + underbars. If only
for consistency, but it also simplifies parsers that have this far
have assumed no whitespace in the key names.

>     }
>   }
> ]
>
>
> Dan Williams <dan.j.williams@intel.com> writes:
>
> > On Wed, May 27, 2020 at 12:24 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > [..]
> >> > This was done by adding two new dimm-ops callbacks that were
> >> > implemented by the papr_scm implementation within libndctl. These
> >> > callbacks are invoked by newly introduce code in 'util/json-smart.c'
> >> > that format the returned stats from these new dimm-ops and transform
> >> > them into a json-object to later presentation. I would request you to
> >> > look at RFC patch-set[2] to understand the implementation details.
> >>
> >> I'm ok to add some stats to ndctl, but I want ndctl to be limited to
> >> general statistics and not performance counters. Performance counters
> >> and performance events should be abstracted through perf where
> >> possible.
> >
> > Another aspect that helps common statistics is to expose them in
> > sysfs. I'm going to go review your proposed ioctl mechanism, but I
> > would hope that is reserved for multi-field command payloads that need
> > to be sent as a unit rather than statistics retrieval that is amenable
> > to a sysfs interface.
>
> The patchset is using a machenism similar to GET_CONFIG_SIZE/DATA to
> retrive a struct composed of tuples of (stat-id, stat-value) from
> papr_scm and then exposes them to ndctl via some new dimm-ops.

I think sysfs is a better fit for this. Yes, we could make this work
as you have identified, but I think it was a mistake that I did this
for health properties especially the static ones.

See commit:

     0ead11181fe0 acpi, nfit: Collect shutdown status

That started as data which was only available via ioctl, but It
simplified userspace to have a sysfs attribute. In addition to the
built-in enumeration / capability detection that sysfs affords, it
also allows for the kernel to cache this property once that many
different userspace agents might want to read. Between perf for
dynamic peformance properties, and sysfs for static / health data,
what's left for the ioctl path?
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feedback requested: Exposing NVDIMM performance statistics in a generic way
  2020-05-28 18:59     ` Vaibhav Jain
  2020-05-28 22:40       ` Dan Williams
@ 2020-10-23 17:28       ` Michal Suchánek
  2020-10-23 19:03         ` Dan Williams
  1 sibling, 1 reply; 8+ messages in thread
From: Michal Suchánek @ 2020-10-23 17:28 UTC (permalink / raw)
  To: vaibhav; +Cc: alastair, aneesh.kumar, linux-nvdimm

Hello,

On Thu, May 28, 2020 at 11:59 AM Vaibhav Jain <vaibhav@linux.ibm.com> wrote:
>
> Thanks for this taking time to look into this Dan,
>
> Agree with the points you have made earlier that I am summarizing below:
>
> * This is better done in ndctl rather than ipmctl.
> * Should only expose general performance metrics and not performance
>   counters. Performance counter should be exposed via perf
> * Vendor specific metrics to be separated from generic performance
>   metrics.
>
> One way to split generic and vendor specific metrics might be to report
> generic performance metrics together with dimm health metrics such as
> "temprature_celsius" or "spares_percentage" that are already reported in
> by dimm health output.
>
> Vendor specific performance metrics can be reported as a seperate object
> in the json output. Something similar to output below:
>
> # ndctl list -DH --stats --vendor-stats
> [
>   {
>     "dev":"nmem0",
>     "health":{
>       "health_state":"ok",
>       "shutdown_state":"clean",
>       "temperature_celsius":48.00,
>       "spares_percentage":10,
>
>       /* Generic performance metrics/stats */
>       "TotalMediaReads": 18929,
>       "TotalMediaWrites": 0,
>       ....
>     }
>
>     /* Vendor specific stats for the dimm */
>     "vendor-stats": {
>     "Controller Reset Count":10
>     "Controller Reset Elapsed Time": 3600
>     "Power-on Seconds": 3600

How do you tell generic from vendor-specific stats, though?

Controller reset count and power-on time may not be reported by some
controllers but sound pretty generic.

Even if you declare that the stats reported by all controllers
available at this moment are generic a later one may not report some of
these 'generic' statistics, or report them in different way/units, or
may simply not report anything at all for some technical reason.

Kernels that do not have this feature will not report anything at all
either.

Thanks

Michal
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feedback requested: Exposing NVDIMM performance statistics in a generic way
  2020-10-23 17:28       ` Michal Suchánek
@ 2020-10-23 19:03         ` Dan Williams
  2020-11-23  7:21           ` Vaibhav Jain
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2020-10-23 19:03 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Vaibhav Jain, Alastair D'Silva, Aneesh Kumar K.V, linux-nvdimm

On Fri, Oct 23, 2020 at 10:28 AM Michal Suchánek <msuchanek@suse.de> wrote:
>
> Hello,
>
> On Thu, May 28, 2020 at 11:59 AM Vaibhav Jain <vaibhav@linux.ibm.com> wrote:
> >
> > Thanks for this taking time to look into this Dan,
> >
> > Agree with the points you have made earlier that I am summarizing below:
> >
> > * This is better done in ndctl rather than ipmctl.
> > * Should only expose general performance metrics and not performance
> >   counters. Performance counter should be exposed via perf
> > * Vendor specific metrics to be separated from generic performance
> >   metrics.
> >
> > One way to split generic and vendor specific metrics might be to report
> > generic performance metrics together with dimm health metrics such as
> > "temprature_celsius" or "spares_percentage" that are already reported in
> > by dimm health output.
> >
> > Vendor specific performance metrics can be reported as a seperate object
> > in the json output. Something similar to output below:
> >
> > # ndctl list -DH --stats --vendor-stats
> > [
> >   {
> >     "dev":"nmem0",
> >     "health":{
> >       "health_state":"ok",
> >       "shutdown_state":"clean",
> >       "temperature_celsius":48.00,
> >       "spares_percentage":10,
> >
> >       /* Generic performance metrics/stats */
> >       "TotalMediaReads": 18929,
> >       "TotalMediaWrites": 0,
> >       ....
> >     }
> >
> >     /* Vendor specific stats for the dimm */
> >     "vendor-stats": {
> >     "Controller Reset Count":10
> >     "Controller Reset Elapsed Time": 3600
> >     "Power-on Seconds": 3600
>
> How do you tell generic from vendor-specific stats, though?
>
> Controller reset count and power-on time may not be reported by some
> controllers but sound pretty generic.
>
> Even if you declare that the stats reported by all controllers
> available at this moment are generic a later one may not report some of
> these 'generic' statistics, or report them in different way/units, or
> may simply not report anything at all for some technical reason.
>
> Kernels that do not have this feature will not report anything at all
> either.

My expectation is that for a given json attribute name any vendor
backend that supports it must convey it in a compatible way. If a
given attribute does not make sense for a given vendor, or is not yet
implemented then leaving it unpopulated is indeed the expectation.

The goal is to both minimize vendor specific logic in infrastructure
that consumes the ndctl json while at the same time balance vendor
needs. In other words avoid "needless" differentiation as much as
possible with small amount of compat work across vendors.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feedback requested: Exposing NVDIMM performance statistics in a generic way
  2020-10-23 19:03         ` Dan Williams
@ 2020-11-23  7:21           ` Vaibhav Jain
  0 siblings, 0 replies; 8+ messages in thread
From: Vaibhav Jain @ 2020-11-23  7:21 UTC (permalink / raw)
  To: Dan Williams, Michal Suchánek
  Cc: Alastair D'Silva, Aneesh Kumar K.V, linux-nvdimm

Hi Dan and Michal,

I have posted an RFC patch to implment the kernel side interface for
this in libnvdimm with an implementation in papr-scm driver module at [1]. Can
you please take a look at the patch seried and provide your inputs.

[1] https://lore.kernel.org/linux-nvdimm/20201108211549.122018-1-vaibhav@linux.ibm.com/

Thanks,
~ Vaibhav

Dan Williams <dan.j.williams@intel.com> writes:

> On Fri, Oct 23, 2020 at 10:28 AM Michal Suchánek <msuchanek@suse.de> wrote:
>>
>> Hello,
>>
>> On Thu, May 28, 2020 at 11:59 AM Vaibhav Jain <vaibhav@linux.ibm.com> wrote:
>> >
>> > Thanks for this taking time to look into this Dan,
>> >
>> > Agree with the points you have made earlier that I am summarizing below:
>> >
>> > * This is better done in ndctl rather than ipmctl.
>> > * Should only expose general performance metrics and not performance
>> >   counters. Performance counter should be exposed via perf
>> > * Vendor specific metrics to be separated from generic performance
>> >   metrics.
>> >
>> > One way to split generic and vendor specific metrics might be to report
>> > generic performance metrics together with dimm health metrics such as
>> > "temprature_celsius" or "spares_percentage" that are already reported in
>> > by dimm health output.
>> >
>> > Vendor specific performance metrics can be reported as a seperate object
>> > in the json output. Something similar to output below:
>> >
>> > # ndctl list -DH --stats --vendor-stats
>> > [
>> >   {
>> >     "dev":"nmem0",
>> >     "health":{
>> >       "health_state":"ok",
>> >       "shutdown_state":"clean",
>> >       "temperature_celsius":48.00,
>> >       "spares_percentage":10,
>> >
>> >       /* Generic performance metrics/stats */
>> >       "TotalMediaReads": 18929,
>> >       "TotalMediaWrites": 0,
>> >       ....
>> >     }
>> >
>> >     /* Vendor specific stats for the dimm */
>> >     "vendor-stats": {
>> >     "Controller Reset Count":10
>> >     "Controller Reset Elapsed Time": 3600
>> >     "Power-on Seconds": 3600
>>
>> How do you tell generic from vendor-specific stats, though?
>>
>> Controller reset count and power-on time may not be reported by some
>> controllers but sound pretty generic.
>>
>> Even if you declare that the stats reported by all controllers
>> available at this moment are generic a later one may not report some of
>> these 'generic' statistics, or report them in different way/units, or
>> may simply not report anything at all for some technical reason.
>>
>> Kernels that do not have this feature will not report anything at all
>> either.
>
> My expectation is that for a given json attribute name any vendor
> backend that supports it must convey it in a compatible way. If a
> given attribute does not make sense for a given vendor, or is not yet
> implemented then leaving it unpopulated is indeed the expectation.
>
> The goal is to both minimize vendor specific logic in infrastructure
> that consumes the ndctl json while at the same time balance vendor
> needs. In other words avoid "needless" differentiation as much as
> possible with small amount of compat work across vendors.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-23  7:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-25  9:00 Feedback requested: Exposing NVDIMM performance statistics in a generic way Vaibhav Jain
2020-05-27 19:24 ` Dan Williams
2020-05-28  0:55   ` Dan Williams
2020-05-28 18:59     ` Vaibhav Jain
2020-05-28 22:40       ` Dan Williams
2020-10-23 17:28       ` Michal Suchánek
2020-10-23 19:03         ` Dan Williams
2020-11-23  7:21           ` Vaibhav Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.