linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Rajat Jain <rajatja@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Philippe Ombredanne <pombredanne@nexb.com>,
	Kate Stewart <kstewart@linuxfoundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Frederick Lawler <fred@fredlawl.com>,
	Oza Pawandeep <poza@codeaurora.org>,
	Keith Busch <keith.busch@intel.com>,
	Alexandru Gagniuc <mr.nuke.me@gmail.com>,
	Thomas Tai <thomas.tai@oracle.com>,
	"Steven Rostedt (VMware)" <rostedt@goodmis.org>,
	linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Jes Sorensen <jsorensen@fb.com>,
	Kyle McMartin <jkkm@fb.com>,
	rajatxjain@gmail.com, Tyler Baicar <tbaicar@codeaurora.org>
Subject: Re: [PATCH v5 3/5] PCI/AER: Add sysfs attributes to provide breakdown of AERs
Date: Thu, 21 Jun 2018 13:48:22 -0500	[thread overview]
Message-ID: <20180621184822.GB14136@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <20180620234147.48438-3-rajatja@google.com>

[+cc Tyler for AER dmesg decoding]

I really like this idea a lot; thanks for putting it together!

On Wed, Jun 20, 2018 at 04:41:45PM -0700, Rajat Jain wrote:
> Add sysfs attributes to provide breakdown of the AERs seen,
> into different type of correctable or uncorrectable errors:
> 
> dev_breakdown_correctable
> dev_breakdown_uncorrectable

- Can you include a more complete sysfs path here in the commit log,
  as well as a snippet of the contents?  From the doc patch, I think
  it is currently:

    /sys/bus/pci/devices/<dev>/aer_stats/dev_breakdown_correctable
    /sys/bus/pci/devices/<dev>/aer_stats/dev_breakdown_uncorrectable

- I'm not sure it's worth making a new subdirectory.  What if you
  simply added these?

    /sys/bus/pci/devices/<dev>/aer_correctable
    /sys/bus/pci/devices/<dev>/aer_uncorrectable

  or perhaps, since you split the "total" files into
  cor/nonfatal/fatal, these could match?

    /sys/bus/pci/devices/<dev>/aer_correctable
    /sys/bus/pci/devices/<dev>/aer_nonfatal
    /sys/bus/pci/devices/<dev>/aer_fatal

  I think the nonfatal/fatal distinction might be worth exposing
  because some of those are configurable and the kernel handling is
  significantly different.  So I think it would make this more
  approachable if the "remove/re-enumerate" situations that will be
  obvious in dmesg logs were clearly connected with "aer_fatal"
  statistics, as opposed to being connected to some subset of what's
  in "aer_uncorrectable".

- Possibly the totals that you currently have in dev_total_cor_errs
  could even be added to the bottom of these?  Not sure what direction
  would be best, and as you say, there's the potential for confusion
  because the individual items won't add up to the totals.  If they
  were in the same file, maybe that could be addressed in the label.

- Can you include the related doc update in the same patch?  That way
  the doc update is more likely to be backported along with the patch.

- I was going to ask whether these should all be in a single file or
  whether they should be split up so there's a separate file for each
  type or error, each containing a single number.  But
  Documentation/filesystems/sysfs.txt says either is OK and
  /sys/devices/system/node/node0/vmstat is an example of a similar
  situation in an existing file, so I think what you did is perfect.

> Signed-off-by: Rajat Jain <rajatja@google.com>
> ---
> v5: Fix the signature
> v4: use "%llu" in place of "%llx"
> v3: Merge everything in aer.c
> 
>  drivers/pci/pcie/aer.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index ce0d675d7bd3..c989bb5bb6f1 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -587,10 +587,38 @@ aer_stats_aggregate_attr(dev_total_cor_errs);
>  aer_stats_aggregate_attr(dev_total_fatal_errs);
>  aer_stats_aggregate_attr(dev_total_nonfatal_errs);
>  
> +#define aer_stats_breakdown_attr(field, stats_array, strings_array)	\
> +	static ssize_t							\
> +	field##_show(struct device *dev, struct device_attribute *attr,	\
> +		     char *buf)						\
> +{									\
> +	unsigned int i;							\
> +	char *str = buf;						\
> +	struct pci_dev *pdev = to_pci_dev(dev);				\
> +	u64 *stats = pdev->aer_stats->stats_array;			\

Nit: add a blank line here.

> +	for (i = 0; i < ARRAY_SIZE(strings_array); i++) {		\
> +		if (strings_array[i])					\
> +			str += sprintf(str, "%s = 0x%llu\n",		\
> +				       strings_array[i], stats[i]);	\
> +		else if (stats[i])					\
> +			str += sprintf(str, #stats_array "bit[%d] = 0x%llu\n",\
> +				       i, stats[i]);			\

- I like the way this uses the same text as used in dmesg
  (aer_correctable_error_string[] and
  aer_uncorrectable_error_string[]).

- I think this incorrectly prints a "0x" prefix for a decimal number
  (probably an artifact of your v4 change).

- Tyler posted a patch [1] to update those dmesg strings so they match
  the way lspci decodes them.  I really liked that update, but we
  never quite finished it.  If we're going to do that, it would be
  nice to do it first, so we don't publish new sysfs files, then
  immediately change the labels used in them.

- IIRC, Tyler's patch had the nice property of changing the strings so
  each error name had no spaces, which would make it a little easier
  to parse this sysfs file: each line would be a single identifier
  followed by a single number (I would probably remove the "=" from
  the middle).

[1] https://lkml.kernel.org/r/1518034285-3543-1-git-send-email-tbaicar@codeaurora.org

> +	}								\
> +	return str-buf;							\
> +}									\
> +static DEVICE_ATTR_RO(field)
> +
> +aer_stats_breakdown_attr(dev_breakdown_correctable, dev_cor_errs,
> +			 aer_correctable_error_string);
> +aer_stats_breakdown_attr(dev_breakdown_uncorrectable, dev_uncor_errs,
> +			 aer_uncorrectable_error_string);
> +
>  static struct attribute *aer_stats_attrs[] __ro_after_init = {
>  	&dev_attr_dev_total_cor_errs.attr,
>  	&dev_attr_dev_total_fatal_errs.attr,
>  	&dev_attr_dev_total_nonfatal_errs.attr,
> +	&dev_attr_dev_breakdown_correctable.attr,
> +	&dev_attr_dev_breakdown_uncorrectable.attr,
>  	NULL
>  };
>  
> -- 
> 2.18.0.rc1.244.gcf134e6275-goog
> 

  reply	other threads:[~2018-06-21 18:48 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-22 22:28 [PATCH 0/5] Expose PCIe AER stats via sysfs Rajat Jain
2018-05-22 22:28 ` [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices Rajat Jain
2018-05-23  8:27   ` Greg Kroah-Hartman
2018-05-23 14:20   ` Jes Sorensen
2018-05-23 14:26     ` Alex G.
2018-05-23 14:28       ` Jes Sorensen
2018-05-23 14:26     ` Matthew Wilcox
2018-05-23 14:32       ` Jes Sorensen
2018-05-23 14:33         ` Alex G.
2018-05-23 14:46           ` Steven Rostedt
2018-05-22 22:28 ` [PATCH 2/5] PCI/AER: Add sysfs stats " Rajat Jain
2018-05-22 22:50   ` Alex G.
2018-05-22 23:27     ` Rajat Jain
2018-05-22 23:30       ` Sinan Kaya
2018-05-23  8:22   ` Greg Kroah-Hartman
2018-05-23  8:24   ` Greg Kroah-Hartman
2018-05-22 22:28 ` [PATCH 3/5] PCP/AER: Add sysfs attributes to provide breakdown of AERs Rajat Jain
2018-05-23  8:25   ` Greg Kroah-Hartman
2018-05-22 22:28 ` [PATCH 4/5] PCI/AER: Add sysfs attributes for rootport cumulative stats Rajat Jain
2018-05-22 22:28 ` [PATCH 5/5] Documentation/PCI: Add details of PCI AER statistics Rajat Jain
2018-05-22 22:52   ` Alex G.
2018-05-22 23:18     ` Rajat Jain
2018-05-23  8:23   ` Greg Kroah-Hartman
2018-05-23 17:58 ` [PATCH v2 0/5] Expose PCIe AER stats via sysfs Rajat Jain
2018-05-23 17:58   ` [PATCH v2 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices Rajat Jain
2018-05-24  6:08     ` Greg Kroah-Hartman
2018-05-23 17:58   ` [PATCH v2 2/5] PCI/AER: Add sysfs stats " Rajat Jain
2018-05-23 17:58   ` [PATCH v2 3/5] PCI/AER: Add sysfs attributes to provide breakdown of AERs Rajat Jain
2018-05-23 17:58   ` [PATCH v2 4/5] PCI/AER: Add sysfs attributes for rootport cumulative stats Rajat Jain
2018-05-23 17:58   ` [PATCH v2 5/5] Documentation/ABI: Add details of PCI AER statistics Rajat Jain
2018-06-17  5:24     ` poza
2018-06-19  0:11       ` Rajat Jain
2018-06-19  0:32         ` Rajat Jain
2018-06-19  6:03         ` poza
2018-06-19 16:31           ` Rajat Jain
2018-06-21  9:19             ` poza
2018-06-22  0:45               ` Rajat Jain
2018-06-19 22:16   ` [PATCH v2 0/5] Expose PCIe AER stats via sysfs Bjorn Helgaas
2018-06-19 22:17     ` Rajat Jain
2018-06-19 22:20     ` Alex G.
2018-06-19 22:25       ` Steven Rostedt
2018-06-19 22:29         ` Alex G.
2018-06-20  1:12     ` [PATCH v3 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices Rajat Jain
2018-06-20  1:12       ` [PATCH v3 2/5] PCI/AER: Add sysfs stats " Rajat Jain
2018-06-20  1:12       ` [PATCH v3 3/5] PCI/AER: Add sysfs attributes to provide breakdown of AERs Rajat Jain
2018-06-20  1:12       ` [PATCH v3 4/5] PCI/AER: Add sysfs attributes for rootport cumulative stats Rajat Jain
2018-06-20  3:13         ` kbuild test robot
2018-06-20  1:12       ` [PATCH v3 5/5] Documentation/ABI: Add details of PCI AER statistics Rajat Jain
2018-06-20 23:28 ` [PATCH v4 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices Rajat Jain
2018-06-20 23:28   ` [PATCH v4 2/5] PCI/AER: Add sysfs stats " Rajat Jain
2018-06-20 23:41 ` [PATCH v5 1/5] PCI/AER: Define and allocate aer_stats structure " Rajat Jain
2018-06-20 23:41   ` [PATCH v5 2/5] PCI/AER: Add sysfs stats " Rajat Jain
2018-06-20 23:41   ` [PATCH v5 3/5] PCI/AER: Add sysfs attributes to provide breakdown of AERs Rajat Jain
2018-06-21 18:48     ` Bjorn Helgaas [this message]
2018-06-21 21:25       ` Rajat Jain
2018-06-22 16:38         ` Tyler Baicar
2018-06-22 17:27           ` Bjorn Helgaas
2018-06-20 23:41   ` [PATCH v5 4/5] PCI/AER: Add sysfs attributes for rootport cumulative stats Rajat Jain
2018-06-20 23:41   ` [PATCH v5 5/5] Documentation/ABI: Add details of PCI AER statistics Rajat Jain
2018-06-21 13:17   ` [PATCH v5 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices Bjorn Helgaas
2018-06-21 20:41     ` Rajat Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180621184822.GB14136@bhelgaas-glaptop.roam.corp.google.com \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=corbet@lwn.net \
    --cc=fred@fredlawl.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jkkm@fb.com \
    --cc=jsorensen@fb.com \
    --cc=keith.busch@intel.com \
    --cc=kstewart@linuxfoundation.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mr.nuke.me@gmail.com \
    --cc=pombredanne@nexb.com \
    --cc=poza@codeaurora.org \
    --cc=rajatja@google.com \
    --cc=rajatxjain@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=tbaicar@codeaurora.org \
    --cc=tglx@linutronix.de \
    --cc=thomas.tai@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).