All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Robert Richter <rrichter@amd.com>
Cc: Ira Weiny <ira.weiny@intel.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Terry Bowman <terry.bowman@amd.com>, <alison.schofield@intel.com>,
	<vishal.l.verma@intel.com>, <bwidawsk@kernel.org>,
	<dan.j.williams@intel.com>, <dave.jiang@intel.com>,
	<linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<bhelgaas@google.com>, Oliver O'Halloran <oohall@gmail.com>,
	"Mahesh J Salgaonkar" <mahesh@linux.ibm.com>,
	<linuxppc-dev@lists.ozlabs.org>, <linux-pci@vger.kernel.org>
Subject: Re: [PATCH v3 6/6] PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling
Date: Fri, 14 Apr 2023 12:55:43 +0100	[thread overview]
Message-ID: <20230414125543.000021f6@Huawei.com> (raw)
In-Reply-To: <ZDk3QeWZDOP8sr4s@rric.localdomain>

On Fri, 14 Apr 2023 13:21:37 +0200
Robert Richter <rrichter@amd.com> wrote:

> On 13.04.23 15:52:36, Ira Weiny wrote:
> > Jonathan Cameron wrote:  
> > > On Wed, 12 Apr 2023 16:29:01 -0500
> > > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >   
> > > > On Tue, Apr 11, 2023 at 01:03:02PM -0500, Terry Bowman wrote:  
> > > > > From: Robert Richter <rrichter@amd.com>
> > > > >   
> 
> > > > > +static int __cxl_unmask_internal_errors(struct pci_dev *rcec)
> > > > > +{
> > > > > +	int aer, rc;
> > > > > +	u32 mask;
> > > > > +
> > > > > +	/*
> > > > > +	 * Internal errors are masked by default, unmask RCEC's here
> > > > > +	 * PCI6.0 7.8.4.3 Uncorrectable Error Mask Register (Offset 08h)
> > > > > +	 * PCI6.0 7.8.4.6 Correctable Error Mask Register (Offset 14h)
> > > > > +	 */    
> > > > 
> > > > Unmasking internal errors doesn't have anything specific to do with
> > > > CXL, so I don't think it should have "cxl" in the function name.
> > > > Maybe something like "pci_aer_unmask_internal_errors()".  
> > > 
> > > This reminds me.  Not sure we resolved earlier discussion on changing
> > > the system wide policy to turn these on 
> > > https://lore.kernel.org/linux-cxl/20221229172731.GA611562@bhelgaas/
> > > which needs pretty much the same thing.
> > > 
> > > Ira, I think you were picking this one up?
> > > https://lore.kernel.org/linux-cxl/63e5fb533f304_13244829412@iweiny-mobl.notmuch/  
> > 
> > After this discussion I posted an RFC to enable those errors.
> > 
> > https://lore.kernel.org/all/20230209-cxl-pci-aer-v1-1-f9a817fa4016@intel.com/
> > 

Ah. I'd forgotten that thread. Thanks!

> > Unfortunately the prevailing opinion was that this was unsafe.  And no one
> > piped up with a reason to pursue the alternative of a pci core call to enable
> > them as needed.
> > 
> > So I abandoned the work.
> > 
> > I think the direction things where headed was to have a call like:
> > 
> > int pci_enable_pci_internal_errors(struct pci_dev *dev)
> > {
> > 	int pos_cap_err;
> > 	u32 reg;
> > 
> > 	if (!pcie_aer_is_native(dev))
> > 		return -EIO;
> > 
> > 	pos_cap_err = dev->aer_cap;
> > 
> > 	/* Unmask correctable and uncorrectable (non-fatal) internal errors */
> > 	pci_read_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, &reg);
> > 	reg &= ~PCI_ERR_COR_INTERNAL;
> > 	pci_write_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, reg);
> > 	
> > 	pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, &reg);
> > 	reg &= ~PCI_ERR_UNC_INTN;
> > 	pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, reg);
> > 	
> > 	pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, &reg);
> > 	reg &= ~PCI_ERR_UNC_INTN;
> > 	pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, reg);
> > 
> > 	return 0;
> > }
> > 
> > ... and call this from the cxl code where it is needed.  
> 
> The version I have ready after addressing Bjorn's comments is pretty
> much the same, apart from error checking of the read/writes.
> 
> From your patch proposed you will need it in aer.c too and we do not
> need to export it.

I think for the other components we'll want to call it from cxl_pci_ras_unmask()
so an export needed.

I also wonder if a more generic function would be better as seems likely
similar code will be needed for errors other than this pair.


> 
> This patch only enables it for (CXL) RCECs. You might want to extend
> this for CXL endpoints (and ports?) then.

Definitely.  We have the same limitation you are seeing.  No errors
without turning this on.

Jonathan



> 
> > 
> > Is this an acceptable direction?  Terry is welcome to steal the above from my
> > patch and throw it into the PCI core.
> > 
> > Looking at the current state of things I think cxl_pci_ras_unmask() may
> > actually be broken now without calling something like the above.  For that I
> > dropped the ball.  
> 
> Thanks,
> 
> -Robert
> 
> > 
> > Ira  


WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Robert Richter <rrichter@amd.com>
Cc: alison.schofield@intel.com, dave.jiang@intel.com,
	Terry Bowman <terry.bowman@amd.com>,
	vishal.l.verma@intel.com, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	bhelgaas@google.com, Bjorn Helgaas <helgaas@kernel.org>,
	linux-pci@vger.kernel.org, bwidawsk@kernel.org,
	Oliver O'Halloran <oohall@gmail.com>,
	dan.j.williams@intel.com, Ira Weiny <ira.weiny@intel.com>
Subject: Re: [PATCH v3 6/6] PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling
Date: Fri, 14 Apr 2023 12:55:43 +0100	[thread overview]
Message-ID: <20230414125543.000021f6@Huawei.com> (raw)
In-Reply-To: <ZDk3QeWZDOP8sr4s@rric.localdomain>

On Fri, 14 Apr 2023 13:21:37 +0200
Robert Richter <rrichter@amd.com> wrote:

> On 13.04.23 15:52:36, Ira Weiny wrote:
> > Jonathan Cameron wrote:  
> > > On Wed, 12 Apr 2023 16:29:01 -0500
> > > Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >   
> > > > On Tue, Apr 11, 2023 at 01:03:02PM -0500, Terry Bowman wrote:  
> > > > > From: Robert Richter <rrichter@amd.com>
> > > > >   
> 
> > > > > +static int __cxl_unmask_internal_errors(struct pci_dev *rcec)
> > > > > +{
> > > > > +	int aer, rc;
> > > > > +	u32 mask;
> > > > > +
> > > > > +	/*
> > > > > +	 * Internal errors are masked by default, unmask RCEC's here
> > > > > +	 * PCI6.0 7.8.4.3 Uncorrectable Error Mask Register (Offset 08h)
> > > > > +	 * PCI6.0 7.8.4.6 Correctable Error Mask Register (Offset 14h)
> > > > > +	 */    
> > > > 
> > > > Unmasking internal errors doesn't have anything specific to do with
> > > > CXL, so I don't think it should have "cxl" in the function name.
> > > > Maybe something like "pci_aer_unmask_internal_errors()".  
> > > 
> > > This reminds me.  Not sure we resolved earlier discussion on changing
> > > the system wide policy to turn these on 
> > > https://lore.kernel.org/linux-cxl/20221229172731.GA611562@bhelgaas/
> > > which needs pretty much the same thing.
> > > 
> > > Ira, I think you were picking this one up?
> > > https://lore.kernel.org/linux-cxl/63e5fb533f304_13244829412@iweiny-mobl.notmuch/  
> > 
> > After this discussion I posted an RFC to enable those errors.
> > 
> > https://lore.kernel.org/all/20230209-cxl-pci-aer-v1-1-f9a817fa4016@intel.com/
> > 

Ah. I'd forgotten that thread. Thanks!

> > Unfortunately the prevailing opinion was that this was unsafe.  And no one
> > piped up with a reason to pursue the alternative of a pci core call to enable
> > them as needed.
> > 
> > So I abandoned the work.
> > 
> > I think the direction things where headed was to have a call like:
> > 
> > int pci_enable_pci_internal_errors(struct pci_dev *dev)
> > {
> > 	int pos_cap_err;
> > 	u32 reg;
> > 
> > 	if (!pcie_aer_is_native(dev))
> > 		return -EIO;
> > 
> > 	pos_cap_err = dev->aer_cap;
> > 
> > 	/* Unmask correctable and uncorrectable (non-fatal) internal errors */
> > 	pci_read_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, &reg);
> > 	reg &= ~PCI_ERR_COR_INTERNAL;
> > 	pci_write_config_dword(dev, pos_cap_err + PCI_ERR_COR_MASK, reg);
> > 	
> > 	pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, &reg);
> > 	reg &= ~PCI_ERR_UNC_INTN;
> > 	pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_SEVER, reg);
> > 	
> > 	pci_read_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, &reg);
> > 	reg &= ~PCI_ERR_UNC_INTN;
> > 	pci_write_config_dword(dev, pos_cap_err + PCI_ERR_UNCOR_MASK, reg);
> > 
> > 	return 0;
> > }
> > 
> > ... and call this from the cxl code where it is needed.  
> 
> The version I have ready after addressing Bjorn's comments is pretty
> much the same, apart from error checking of the read/writes.
> 
> From your patch proposed you will need it in aer.c too and we do not
> need to export it.

I think for the other components we'll want to call it from cxl_pci_ras_unmask()
so an export needed.

I also wonder if a more generic function would be better as seems likely
similar code will be needed for errors other than this pair.


> 
> This patch only enables it for (CXL) RCECs. You might want to extend
> this for CXL endpoints (and ports?) then.

Definitely.  We have the same limitation you are seeing.  No errors
without turning this on.

Jonathan



> 
> > 
> > Is this an acceptable direction?  Terry is welcome to steal the above from my
> > patch and throw it into the PCI core.
> > 
> > Looking at the current state of things I think cxl_pci_ras_unmask() may
> > actually be broken now without calling something like the above.  For that I
> > dropped the ball.  
> 
> Thanks,
> 
> -Robert
> 
> > 
> > Ira  


  reply	other threads:[~2023-04-14 11:55 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-11 18:02 [PATCH v3 0/6] cxl/pci: Add support for RCH RAS error handling Terry Bowman
2023-04-11 18:02 ` [PATCH v3 1/6] cxl/pci: Add RCH downstream port AER and RAS register discovery Terry Bowman
2023-04-13 15:30   ` Jonathan Cameron
2023-04-13 19:13     ` Terry Bowman
2023-04-14 11:47       ` Jonathan Cameron
2023-04-14 11:51       ` Robert Richter
2023-04-17 23:00   ` Dan Williams
2023-04-18 15:59     ` Terry Bowman
2023-04-27 13:52     ` Robert Richter
2023-04-11 18:02 ` [PATCH v3 2/6] efi/cper: Export cper_mem_err_unpack() for use by modules Terry Bowman
2023-04-12 11:04   ` Ard Biesheuvel
2023-04-13 16:08   ` Jonathan Cameron
2023-04-13 19:40     ` Terry Bowman
2023-04-14 11:48       ` Jonathan Cameron
2023-04-14 12:44         ` Robert Richter
     [not found]         ` <aba5d2ee-f451-145c-81c2-72595129483b@amd.com>
2023-04-14 15:17           ` Terry Bowman
2023-04-17 23:08   ` Dan Williams
2023-04-11 18:02 ` [PATCH v3 3/6] PCI/AER: Export cper_print_aer() " Terry Bowman
2023-04-13 16:13   ` Jonathan Cameron
2023-04-17 23:11   ` Dan Williams
2023-04-11 18:03 ` [PATCH v3 4/6] cxl/pci: Add RCH downstream port error logging Terry Bowman
2023-04-12  1:32   ` kernel test robot
2023-04-12  3:04   ` kernel test robot
2023-04-13 16:50   ` Jonathan Cameron
2023-04-14 16:36     ` Terry Bowman
2023-04-17 16:56       ` Jonathan Cameron
2023-04-18  0:06   ` Dan Williams
2023-04-24 18:39     ` Terry Bowman
2023-04-11 18:03 ` [PATCH v3 5/6] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler Terry Bowman
2023-04-11 18:03   ` Terry Bowman
2023-04-12 22:02   ` Bjorn Helgaas
2023-04-12 22:02     ` Bjorn Helgaas
2023-04-13 11:40     ` Robert Richter
2023-04-13 11:40       ` Robert Richter
2023-04-14 21:32       ` Bjorn Helgaas
2023-04-14 21:32         ` Bjorn Helgaas
2023-04-17 22:00         ` Robert Richter
2023-04-17 22:00           ` Robert Richter
2023-04-19 14:17           ` Robert Richter
2023-04-19 14:17             ` Robert Richter
2023-04-14 12:19   ` Jonathan Cameron
2023-04-14 12:19     ` Jonathan Cameron
2023-04-14 14:35     ` Robert Richter
2023-04-14 14:35       ` Robert Richter
2023-04-17 16:54       ` Jonathan Cameron
2023-04-17 16:54         ` Jonathan Cameron
2023-04-17 20:36         ` Robert Richter
2023-04-17 20:36           ` Robert Richter
2023-04-18  1:01   ` Dan Williams
2023-04-18  1:01     ` Dan Williams
2023-04-19 13:30     ` Robert Richter
2023-04-19 13:30       ` Robert Richter
2023-04-11 18:03 ` [PATCH v3 6/6] PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling Terry Bowman
2023-04-11 18:03   ` Terry Bowman
2023-04-12 21:29   ` Bjorn Helgaas
2023-04-12 21:29     ` Bjorn Helgaas
2023-04-13 13:38     ` Robert Richter
2023-04-13 13:38       ` Robert Richter
2023-04-13 17:05       ` Jonathan Cameron
2023-04-13 17:05         ` Jonathan Cameron
2023-04-14 11:58         ` Robert Richter
2023-04-14 11:58           ` Robert Richter
2023-04-14 21:49       ` Bjorn Helgaas
2023-04-14 21:49         ` Bjorn Helgaas
2023-04-13 17:01     ` Jonathan Cameron
2023-04-13 17:01       ` Jonathan Cameron
2023-04-13 22:52       ` Ira Weiny
2023-04-13 22:52         ` Ira Weiny
2023-04-14 11:21         ` Robert Richter
2023-04-14 11:21           ` Robert Richter
2023-04-14 11:55           ` Jonathan Cameron [this message]
2023-04-14 11:55             ` Jonathan Cameron
2023-04-14 14:47             ` Robert Richter
2023-04-14 14:47               ` Robert Richter
2023-04-18  2:37   ` Dan Williams
2023-04-18  2:37     ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230414125543.000021f6@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=bwidawsk@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=helgaas@kernel.org \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=oohall@gmail.com \
    --cc=rrichter@amd.com \
    --cc=terry.bowman@amd.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.