linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>,
	linux-cxl@vger.kernel.org, dan.j.williams@intel.com,
	ira.weiny@intel.com, vishal.l.verma@intel.com,
	alison.schofield@intel.com, Bjorn Helgaas <bhelgaas@google.com>,
	Stefan Roese <sr@denx.de>,
	Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>
Subject: Re: [PATCH v5] cxl: add RAS status unmasking for CXL
Date: Thu, 29 Dec 2022 11:27:31 -0600	[thread overview]
Message-ID: <20221229172731.GA611562@bhelgaas> (raw)
In-Reply-To: <20221217175204.0000170a@huawei.com>

[+cc Stefan, Kuppuswamy]

On Sat, Dec 17, 2022 at 05:52:04PM +0000, Jonathan Cameron wrote:
> On Fri, 16 Dec 2022 09:10:02 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
> > By default the CXL RAS mask registers bits are defaulted to 1's and
> > suppress all error reporting. If the kernel has negotiated ownership
> > of error handling for CXL then unmask the mask registers by writing 0s.

> +CC Bjorn for a question on the AER control element of this.

Timely question (unlike my response, sorry, I've been on vacation).

> I realized that adding this patch still only enables error because I
> didn't check the PCIe spec when writing the QEMU emulation. I had
> changed the value of  "Correctable Internal Error Mask" to default
> to unmasked.  PCIe 6.0 says it defaults to masked.  For some reason
> I thought these masks were impdef (should have checked ;)

I assume you refer to the AER "Corrected Internal Error Mask" bit
(PCIe r6.0, sec 7.8.4.6), which indeed defaults to 1b (masked) if the
bit is implemented.

> I'll drop that change in QEMU, but upshot is that if we want the
> correctable ones to be delivered, we'll need to clear that bit as
> well as the ones you do in this patch.

I don't think Corrected Internal Error Mask affects reporting of other
Correctable Errors, e.g., Receiver Error, Bad TLP, Bad DLLP, etc.  As
I read sec 6.2.10, those would not be classified as "internal" errors.

> This got me thinking about why I never saw the same issue for UNC
> errors.  Upshot, QEMU never sets the UNCOR mask so defaults to everything on
> (it also doesn't allow anyone to write the register). Curiously it has
> the code that uses the mask even though there was no means to set any
> bits in it. 
> I'll fix that (draft patch below) + we should update lspci to cover more of these AER
> flags as it's a PITA debugging by reading the hex dumps!
> 
> Bjorn, is there any convention on drivers enabling these 'default' masked
> AER errors?  Or is expectation that this is policy and userspace should
> be dealing with it?

I think AER configuration should generally be a system policy
decision, not a driver-level choice (unless we need quirks to work
around device defects, of course).

We now have f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), which turns on error reporting in Device Control for all
devices at enumeration-time when the OS has control of AER.  But this
is only the generic device-level control; it doesn't configure any
*AER* registers.

I'm surprised to learn that the only writes to PCI_ERR_UNCOR_MASK are
some mips and powerpc arch-specific code and a few individual drivers.
It seems like maybe pci_aer_init() should do some more configuration
of the AER mask and severity registers.

Bjorn

  parent reply	other threads:[~2022-12-29 17:30 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-16 16:10 [PATCH v5] cxl: add RAS status unmasking for CXL Dave Jiang
2022-12-17 17:52 ` Jonathan Cameron
2022-12-17 18:05   ` Jonathan Cameron
2023-01-05 16:53     ` Dave Jiang
2023-01-06 11:22       ` Jonathan Cameron
2022-12-29 17:27   ` Bjorn Helgaas [this message]
2023-01-05 16:31     ` Jonathan Cameron
2023-01-05 16:54       ` Bjorn Helgaas
2023-01-06 11:26         ` Jonathan Cameron
2023-02-10  8:07           ` Ira Weiny
2023-02-10 12:29             ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221229172731.GA611562@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=sr@denx.de \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).