All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>,
	linux-cxl@vger.kernel.org, dan.j.williams@intel.com,
	ira.weiny@intel.com, vishal.l.verma@intel.com,
	alison.schofield@intel.com, Bjorn Helgaas <bhelgaas@google.com>,
	Stefan Roese <sr@denx.de>,
	Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>
Subject: Re: [PATCH v5] cxl: add RAS status unmasking for CXL
Date: Thu, 29 Dec 2022 11:27:31 -0600	[thread overview]
Message-ID: <20221229172731.GA611562@bhelgaas> (raw)
In-Reply-To: <20221217175204.0000170a@huawei.com>

[+cc Stefan, Kuppuswamy]

On Sat, Dec 17, 2022 at 05:52:04PM +0000, Jonathan Cameron wrote:
> On Fri, 16 Dec 2022 09:10:02 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
> > By default the CXL RAS mask registers bits are defaulted to 1's and
> > suppress all error reporting. If the kernel has negotiated ownership
> > of error handling for CXL then unmask the mask registers by writing 0s.

> +CC Bjorn for a question on the AER control element of this.

Timely question (unlike my response, sorry, I've been on vacation).

> I realized that adding this patch still only enables error because I
> didn't check the PCIe spec when writing the QEMU emulation. I had
> changed the value of  "Correctable Internal Error Mask" to default
> to unmasked.  PCIe 6.0 says it defaults to masked.  For some reason
> I thought these masks were impdef (should have checked ;)

I assume you refer to the AER "Corrected Internal Error Mask" bit
(PCIe r6.0, sec 7.8.4.6), which indeed defaults to 1b (masked) if the
bit is implemented.

> I'll drop that change in QEMU, but upshot is that if we want the
> correctable ones to be delivered, we'll need to clear that bit as
> well as the ones you do in this patch.

I don't think Corrected Internal Error Mask affects reporting of other
Correctable Errors, e.g., Receiver Error, Bad TLP, Bad DLLP, etc.  As
I read sec 6.2.10, those would not be classified as "internal" errors.

> This got me thinking about why I never saw the same issue for UNC
> errors.  Upshot, QEMU never sets the UNCOR mask so defaults to everything on
> (it also doesn't allow anyone to write the register). Curiously it has
> the code that uses the mask even though there was no means to set any
> bits in it. 
> I'll fix that (draft patch below) + we should update lspci to cover more of these AER
> flags as it's a PITA debugging by reading the hex dumps!
> 
> Bjorn, is there any convention on drivers enabling these 'default' masked
> AER errors?  Or is expectation that this is policy and userspace should
> be dealing with it?

I think AER configuration should generally be a system policy
decision, not a driver-level choice (unless we need quirks to work
around device defects, of course).

We now have f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), which turns on error reporting in Device Control for all
devices at enumeration-time when the OS has control of AER.  But this
is only the generic device-level control; it doesn't configure any
*AER* registers.

I'm surprised to learn that the only writes to PCI_ERR_UNCOR_MASK are
some mips and powerpc arch-specific code and a few individual drivers.
It seems like maybe pci_aer_init() should do some more configuration
of the AER mask and severity registers.

Bjorn

  parent reply	other threads:[~2022-12-29 17:30 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-16 16:10 [PATCH v5] cxl: add RAS status unmasking for CXL Dave Jiang
2022-12-17 17:52 ` Jonathan Cameron
2022-12-17 18:05   ` Jonathan Cameron
2023-01-05 16:53     ` Dave Jiang
2023-01-06 11:22       ` Jonathan Cameron
2022-12-29 17:27   ` Bjorn Helgaas [this message]
2023-01-05 16:31     ` Jonathan Cameron
2023-01-05 16:54       ` Bjorn Helgaas
2023-01-06 11:26         ` Jonathan Cameron
2023-02-10  8:07           ` Ira Weiny
2023-02-10 12:29             ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221229172731.GA611562@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=sr@denx.de \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.