All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] cxl: add RAS status unmasking for CXL
@ 2022-12-15 18:08 Dave Jiang
  2022-12-16  0:01 ` Dan Williams
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dave Jiang @ 2022-12-15 18:08 UTC (permalink / raw)
  To: linux-cxl
  Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
	Jonathan.Cameron

By default the CXL RAS mask registers bits are defaulted to 1's and
suppress all error reporting. If the kernel has negotiated ownership
of error handling for CXL then unmask the mask registers by writing 0s.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>

---

Based on patch posted by Ira [1] to export CXL native error reporting control.

[1]: https://lore.kernel.org/linux-cxl/20221212070627.1372402-2-ira.weiny@intel.com/

v4:
- Fix masking of RAS register. (Jonathan)

v3:
- Remove flex bus port status check. (Jonathan)
- Only unmask known mask bits. (Jonathan)

v2:
- Add definition of PCI_EXP_LNKSTA2_FLIT. (Dan)
- Return error for cxl_pci_ras_unmask(). (Dan)
- Add dev_dbg() for register bits to be cleared. (Dan)
- Check Flex Port DVSEC status. (Dan)
---
 drivers/cxl/cxl.h             |    1 +
 drivers/cxl/pci.c             |   48 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/pci_regs.h |    1 +
 3 files changed, 50 insertions(+)

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1b1cf459ac77..31e795c6d537 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -130,6 +130,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
 #define   CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
 #define CXL_RAS_UNCORRECTABLE_MASK_OFFSET 0x4
 #define   CXL_RAS_UNCORRECTABLE_MASK_MASK (GENMASK(16, 14) | GENMASK(11, 0))
+#define   CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK BIT(8)
 #define CXL_RAS_UNCORRECTABLE_SEVERITY_OFFSET 0x8
 #define   CXL_RAS_UNCORRECTABLE_SEVERITY_MASK (GENMASK(16, 14) | GENMASK(11, 0))
 #define CXL_RAS_CORRECTABLE_STATUS_OFFSET 0xC
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 33083a522fd1..64272db6a3fd 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -419,6 +419,53 @@ static void disable_aer(void *pdev)
 	pci_disable_pcie_error_reporting(pdev);
 }
 
+/*
+ * CXL v3.0 6.2.3 Table 6-4
+ * The table indicates that if PCIe Flit Mode is set, then CXL is in 256B flits
+ * mode, otherwise it's 68B flits mode.
+ */
+static bool cxl_pci_flit_256(struct pci_dev *pdev)
+{
+	u32 lnksta2;
+
+	pcie_capability_read_dword(pdev, PCI_EXP_LNKSTA2, &lnksta2);
+	return lnksta2 & PCI_EXP_LNKSTA2_FLIT;
+}
+
+static int cxl_pci_ras_unmask(struct pci_dev *pdev)
+{
+	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+	void __iomem *addr;
+	u32 val, mask;
+
+	if (!cxlds->regs.ras)
+		return -ENODEV;
+
+	/* BIOS has CXL error control */
+	if (!host_bridge->native_cxl_error)
+		return -EOPNOTSUPP;
+
+	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
+	val = readl(addr);
+	dev_dbg(&pdev->dev, "Uncorrectable RAS Errors Mask: %#x\n", val);
+
+	mask = CXL_RAS_UNCORRECTABLE_MASK_MASK;
+	if (!cxl_pci_flit_256(pdev))
+		mask &= ~CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
+	val &= ~mask;
+	writel(val, addr);
+	dev_dbg(&pdev->dev, "Unmasked Uncorrectable RAS Errors Mask: %#x\n", val);
+
+	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
+	val = readl(addr);
+	dev_dbg(&pdev->dev, "Correctable RAS Errors Mask: %#x\n", val);
+	val &= ~CXL_RAS_CORRECTABLE_MASK_MASK;
+	writel(val, addr);
+	dev_dbg(&pdev->dev, "Unmasked Correctable RAS Errors Mask: %#x\n", val);
+	return 0;
+}
+
 static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct cxl_register_map map;
@@ -498,6 +545,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	if (cxlds->regs.ras) {
 		pci_enable_pcie_error_reporting(pdev);
+		cxl_pci_ras_unmask(pdev);
 		rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
 		if (rc)
 			return rc;
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 82a03ea954af..576ee2ec973f 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -693,6 +693,7 @@
 #define  PCI_EXP_LNKCTL2_TX_MARGIN	0x0380 /* Transmit Margin */
 #define  PCI_EXP_LNKCTL2_HASD		0x0020 /* HW Autonomous Speed Disable */
 #define PCI_EXP_LNKSTA2		0x32	/* Link Status 2 */
+#define  PCI_EXP_LNKSTA2_FLIT		BIT(10) /* Flit Mode Status */
 #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	0x32	/* end of v2 EPs w/ link */
 #define PCI_EXP_SLTCAP2		0x34	/* Slot Capabilities 2 */
 #define  PCI_EXP_SLTCAP2_IBPD	0x00000001 /* In-band PD Disable Supported */



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: [PATCH v4] cxl: add RAS status unmasking for CXL
  2022-12-15 18:08 [PATCH v4] cxl: add RAS status unmasking for CXL Dave Jiang
@ 2022-12-16  0:01 ` Dan Williams
  2022-12-16  9:36 ` Jonathan Cameron
  2022-12-16 12:34 ` Jonathan Cameron
  2 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2022-12-16  0:01 UTC (permalink / raw)
  To: Dave Jiang, linux-cxl
  Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
	Jonathan.Cameron

Dave Jiang wrote:
> By default the CXL RAS mask registers bits are defaulted to 1's and
> suppress all error reporting. If the kernel has negotiated ownership
> of error handling for CXL then unmask the mask registers by writing 0s.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> 
> ---
> 
> Based on patch posted by Ira [1] to export CXL native error reporting control.
> 
> [1]: https://lore.kernel.org/linux-cxl/20221212070627.1372402-2-ira.weiny@intel.com/
> 
> v4:
> - Fix masking of RAS register. (Jonathan)
> 
> v3:
> - Remove flex bus port status check. (Jonathan)
> - Only unmask known mask bits. (Jonathan)
> 
> v2:
> - Add definition of PCI_EXP_LNKSTA2_FLIT. (Dan)
> - Return error for cxl_pci_ras_unmask(). (Dan)
> - Add dev_dbg() for register bits to be cleared. (Dan)
> - Check Flex Port DVSEC status. (Dan)
> ---
>  drivers/cxl/cxl.h             |    1 +
>  drivers/cxl/pci.c             |   48 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/pci_regs.h |    1 +
>  3 files changed, 50 insertions(+)
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1b1cf459ac77..31e795c6d537 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -130,6 +130,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>  #define   CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>  #define CXL_RAS_UNCORRECTABLE_MASK_OFFSET 0x4
>  #define   CXL_RAS_UNCORRECTABLE_MASK_MASK (GENMASK(16, 14) | GENMASK(11, 0))
> +#define   CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK BIT(8)
>  #define CXL_RAS_UNCORRECTABLE_SEVERITY_OFFSET 0x8
>  #define   CXL_RAS_UNCORRECTABLE_SEVERITY_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>  #define CXL_RAS_CORRECTABLE_STATUS_OFFSET 0xC
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 33083a522fd1..64272db6a3fd 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -419,6 +419,53 @@ static void disable_aer(void *pdev)
>  	pci_disable_pcie_error_reporting(pdev);
>  }
>  
> +/*
> + * CXL v3.0 6.2.3 Table 6-4
> + * The table indicates that if PCIe Flit Mode is set, then CXL is in 256B flits
> + * mode, otherwise it's 68B flits mode.
> + */
> +static bool cxl_pci_flit_256(struct pci_dev *pdev)
> +{
> +	u32 lnksta2;
> +
> +	pcie_capability_read_dword(pdev, PCI_EXP_LNKSTA2, &lnksta2);
> +	return lnksta2 & PCI_EXP_LNKSTA2_FLIT;
> +}
> +
> +static int cxl_pci_ras_unmask(struct pci_dev *pdev)
> +{
> +	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	void __iomem *addr;
> +	u32 val, mask;
> +
> +	if (!cxlds->regs.ras)
> +		return -ENODEV;
> +
> +	/* BIOS has CXL error control */
> +	if (!host_bridge->native_cxl_error)
> +		return -EOPNOTSUPP;
> +
> +	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
> +	val = readl(addr);
> +	dev_dbg(&pdev->dev, "Uncorrectable RAS Errors Mask: %#x\n", val);
> +
> +	mask = CXL_RAS_UNCORRECTABLE_MASK_MASK;
> +	if (!cxl_pci_flit_256(pdev))
> +		mask &= ~CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
> +	val &= ~mask;
> +	writel(val, addr);
> +	dev_dbg(&pdev->dev, "Unmasked Uncorrectable RAS Errors Mask: %#x\n", val);

I was thinking it would be useful to convey which bits are actually
toggling, otherwise this is just printing
CXL_RAS_UNCORRECTABLE_MASK_MASK which isn't very interesting from a
debug perspective:

Something like:

dev_dbg(&pdev->dev, "Uncorrectable RAS Errors Mask: %#x -> %#x\n", orig, val);

Otherwise, this looks good to me.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] cxl: add RAS status unmasking for CXL
  2022-12-15 18:08 [PATCH v4] cxl: add RAS status unmasking for CXL Dave Jiang
  2022-12-16  0:01 ` Dan Williams
@ 2022-12-16  9:36 ` Jonathan Cameron
  2022-12-16 15:37   ` Dave Jiang
  2022-12-16 12:34 ` Jonathan Cameron
  2 siblings, 1 reply; 7+ messages in thread
From: Jonathan Cameron @ 2022-12-16  9:36 UTC (permalink / raw)
  To: Dave Jiang
  Cc: linux-cxl, dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield

On Thu, 15 Dec 2022 11:08:03 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> By default the CXL RAS mask registers bits are defaulted to 1's and
> suppress all error reporting. If the kernel has negotiated ownership
> of error handling for CXL then unmask the mask registers by writing 0s.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> 

One trivial comment inline that is up to you.

Dan's suggestion makes sense.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
> 
> Based on patch posted by Ira [1] to export CXL native error reporting control.
> 
> [1]: https://lore.kernel.org/linux-cxl/20221212070627.1372402-2-ira.weiny@intel.com/
> 
> v4:
> - Fix masking of RAS register. (Jonathan)
> 
> v3:
> - Remove flex bus port status check. (Jonathan)
> - Only unmask known mask bits. (Jonathan)
> 
> v2:
> - Add definition of PCI_EXP_LNKSTA2_FLIT. (Dan)
> - Return error for cxl_pci_ras_unmask(). (Dan)
> - Add dev_dbg() for register bits to be cleared. (Dan)
> - Check Flex Port DVSEC status. (Dan)
> ---
>  drivers/cxl/cxl.h             |    1 +
>  drivers/cxl/pci.c             |   48 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/pci_regs.h |    1 +
>  3 files changed, 50 insertions(+)
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1b1cf459ac77..31e795c6d537 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -130,6 +130,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>  #define   CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>  #define CXL_RAS_UNCORRECTABLE_MASK_OFFSET 0x4
>  #define   CXL_RAS_UNCORRECTABLE_MASK_MASK (GENMASK(16, 14) | GENMASK(11, 0))
> +#define   CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK BIT(8)
>  #define CXL_RAS_UNCORRECTABLE_SEVERITY_OFFSET 0x8
>  #define   CXL_RAS_UNCORRECTABLE_SEVERITY_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>  #define CXL_RAS_CORRECTABLE_STATUS_OFFSET 0xC
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 33083a522fd1..64272db6a3fd 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -419,6 +419,53 @@ static void disable_aer(void *pdev)
>  	pci_disable_pcie_error_reporting(pdev);
>  }
>  
> +/*
> + * CXL v3.0 6.2.3 Table 6-4
> + * The table indicates that if PCIe Flit Mode is set, then CXL is in 256B flits
> + * mode, otherwise it's 68B flits mode.
> + */
> +static bool cxl_pci_flit_256(struct pci_dev *pdev)
> +{
> +	u32 lnksta2;
> +
> +	pcie_capability_read_dword(pdev, PCI_EXP_LNKSTA2, &lnksta2);
> +	return lnksta2 & PCI_EXP_LNKSTA2_FLIT;
> +}
> +
> +static int cxl_pci_ras_unmask(struct pci_dev *pdev)
> +{
> +	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	void __iomem *addr;
> +	u32 val, mask;
> +
> +	if (!cxlds->regs.ras)
> +		return -ENODEV;
> +
> +	/* BIOS has CXL error control */
> +	if (!host_bridge->native_cxl_error)
> +		return -EOPNOTSUPP;
> +
> +	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
> +	val = readl(addr);
> +	dev_dbg(&pdev->dev, "Uncorrectable RAS Errors Mask: %#x\n", val);
> +
> +	mask = CXL_RAS_UNCORRECTABLE_MASK_MASK;
> +	if (!cxl_pci_flit_256(pdev))
> +		mask &= ~CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
> +	val &= ~mask;
> +	writel(val, addr);
> +	dev_dbg(&pdev->dev, "Unmasked Uncorrectable RAS Errors Mask: %#x\n", val);
> +
> +	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
> +	val = readl(addr);
> +	dev_dbg(&pdev->dev, "Correctable RAS Errors Mask: %#x\n", val);
> +	val &= ~CXL_RAS_CORRECTABLE_MASK_MASK;
> +	writel(val, addr);
> +	dev_dbg(&pdev->dev, "Unmasked Correctable RAS Errors Mask: %#x\n", val);

Real nitpick but I'd like a blank line here as the return isn't related to the
previous block of code.

> +	return 0;
> +}
> +
>  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>  	struct cxl_register_map map;
> @@ -498,6 +545,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  
>  	if (cxlds->regs.ras) {
>  		pci_enable_pcie_error_reporting(pdev);
> +		cxl_pci_ras_unmask(pdev);
>  		rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
>  		if (rc)
>  			return rc;
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 82a03ea954af..576ee2ec973f 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -693,6 +693,7 @@
>  #define  PCI_EXP_LNKCTL2_TX_MARGIN	0x0380 /* Transmit Margin */
>  #define  PCI_EXP_LNKCTL2_HASD		0x0020 /* HW Autonomous Speed Disable */
>  #define PCI_EXP_LNKSTA2		0x32	/* Link Status 2 */
> +#define  PCI_EXP_LNKSTA2_FLIT		BIT(10) /* Flit Mode Status */
>  #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	0x32	/* end of v2 EPs w/ link */
>  #define PCI_EXP_SLTCAP2		0x34	/* Slot Capabilities 2 */
>  #define  PCI_EXP_SLTCAP2_IBPD	0x00000001 /* In-band PD Disable Supported */
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] cxl: add RAS status unmasking for CXL
  2022-12-15 18:08 [PATCH v4] cxl: add RAS status unmasking for CXL Dave Jiang
  2022-12-16  0:01 ` Dan Williams
  2022-12-16  9:36 ` Jonathan Cameron
@ 2022-12-16 12:34 ` Jonathan Cameron
  2022-12-16 15:36   ` Dave Jiang
  2022-12-16 16:28   ` Dan Williams
  2 siblings, 2 replies; 7+ messages in thread
From: Jonathan Cameron @ 2022-12-16 12:34 UTC (permalink / raw)
  To: Dave Jiang
  Cc: linux-cxl, dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield

On Thu, 15 Dec 2022 11:08:03 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> By default the CXL RAS mask registers bits are defaulted to 1's and
> suppress all error reporting. If the kernel has negotiated ownership
> of error handling for CXL then unmask the mask registers by writing 0s.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> 
> ---
> 
> Based on patch posted by Ira [1] to export CXL native error reporting control.
When you say based upon this, you mean cherry-picking just that patch off the
front of Ira's series?  

That raises the question - are we treating this as a fix? If not, probably better
to pull that change into this series that I think is simpler to land than Ira's one
and get that queued up then rebase Ira's.  If we aren't in a hurry, fine to queue
this one second but then you need to rebase on Ira's full series.

The two series are causing a bunch of fuzz for each other. Sure easy to repair
but it's going to require manual intervention.

Jonathan

> 
> [1]: https://lore.kernel.org/linux-cxl/20221212070627.1372402-2-ira.weiny@intel.com/
> 
> v4:
> - Fix masking of RAS register. (Jonathan)
> 
> v3:
> - Remove flex bus port status check. (Jonathan)
> - Only unmask known mask bits. (Jonathan)
> 
> v2:
> - Add definition of PCI_EXP_LNKSTA2_FLIT. (Dan)
> - Return error for cxl_pci_ras_unmask(). (Dan)
> - Add dev_dbg() for register bits to be cleared. (Dan)
> - Check Flex Port DVSEC status. (Dan)
> ---
>  drivers/cxl/cxl.h             |    1 +
>  drivers/cxl/pci.c             |   48 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/pci_regs.h |    1 +
>  3 files changed, 50 insertions(+)
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1b1cf459ac77..31e795c6d537 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -130,6 +130,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>  #define   CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>  #define CXL_RAS_UNCORRECTABLE_MASK_OFFSET 0x4
>  #define   CXL_RAS_UNCORRECTABLE_MASK_MASK (GENMASK(16, 14) | GENMASK(11, 0))
> +#define   CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK BIT(8)
>  #define CXL_RAS_UNCORRECTABLE_SEVERITY_OFFSET 0x8
>  #define   CXL_RAS_UNCORRECTABLE_SEVERITY_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>  #define CXL_RAS_CORRECTABLE_STATUS_OFFSET 0xC
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 33083a522fd1..64272db6a3fd 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -419,6 +419,53 @@ static void disable_aer(void *pdev)
>  	pci_disable_pcie_error_reporting(pdev);
>  }
>  
> +/*
> + * CXL v3.0 6.2.3 Table 6-4
> + * The table indicates that if PCIe Flit Mode is set, then CXL is in 256B flits
> + * mode, otherwise it's 68B flits mode.
> + */
> +static bool cxl_pci_flit_256(struct pci_dev *pdev)
> +{
> +	u32 lnksta2;
> +
> +	pcie_capability_read_dword(pdev, PCI_EXP_LNKSTA2, &lnksta2);
> +	return lnksta2 & PCI_EXP_LNKSTA2_FLIT;
> +}
> +
> +static int cxl_pci_ras_unmask(struct pci_dev *pdev)
> +{
> +	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	void __iomem *addr;
> +	u32 val, mask;
> +
> +	if (!cxlds->regs.ras)
> +		return -ENODEV;
> +
> +	/* BIOS has CXL error control */
> +	if (!host_bridge->native_cxl_error)
> +		return -EOPNOTSUPP;
> +
> +	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
> +	val = readl(addr);
> +	dev_dbg(&pdev->dev, "Uncorrectable RAS Errors Mask: %#x\n", val);
> +
> +	mask = CXL_RAS_UNCORRECTABLE_MASK_MASK;
> +	if (!cxl_pci_flit_256(pdev))
> +		mask &= ~CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
> +	val &= ~mask;
> +	writel(val, addr);
> +	dev_dbg(&pdev->dev, "Unmasked Uncorrectable RAS Errors Mask: %#x\n", val);
> +
> +	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
> +	val = readl(addr);
> +	dev_dbg(&pdev->dev, "Correctable RAS Errors Mask: %#x\n", val);
> +	val &= ~CXL_RAS_CORRECTABLE_MASK_MASK;
> +	writel(val, addr);
> +	dev_dbg(&pdev->dev, "Unmasked Correctable RAS Errors Mask: %#x\n", val);
> +	return 0;
> +}
> +
>  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>  	struct cxl_register_map map;
> @@ -498,6 +545,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  
>  	if (cxlds->regs.ras) {
>  		pci_enable_pcie_error_reporting(pdev);
> +		cxl_pci_ras_unmask(pdev);
>  		rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
>  		if (rc)
>  			return rc;
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 82a03ea954af..576ee2ec973f 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -693,6 +693,7 @@
>  #define  PCI_EXP_LNKCTL2_TX_MARGIN	0x0380 /* Transmit Margin */
>  #define  PCI_EXP_LNKCTL2_HASD		0x0020 /* HW Autonomous Speed Disable */
>  #define PCI_EXP_LNKSTA2		0x32	/* Link Status 2 */
> +#define  PCI_EXP_LNKSTA2_FLIT		BIT(10) /* Flit Mode Status */
>  #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	0x32	/* end of v2 EPs w/ link */
>  #define PCI_EXP_SLTCAP2		0x34	/* Slot Capabilities 2 */
>  #define  PCI_EXP_SLTCAP2_IBPD	0x00000001 /* In-band PD Disable Supported */
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] cxl: add RAS status unmasking for CXL
  2022-12-16 12:34 ` Jonathan Cameron
@ 2022-12-16 15:36   ` Dave Jiang
  2022-12-16 16:28   ` Dan Williams
  1 sibling, 0 replies; 7+ messages in thread
From: Dave Jiang @ 2022-12-16 15:36 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield



On 12/16/2022 5:34 AM, Jonathan Cameron wrote:
> On Thu, 15 Dec 2022 11:08:03 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
>> By default the CXL RAS mask registers bits are defaulted to 1's and
>> suppress all error reporting. If the kernel has negotiated ownership
>> of error handling for CXL then unmask the mask registers by writing 0s.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>
>> ---
>>
>> Based on patch posted by Ira [1] to export CXL native error reporting control.
> When you say based upon this, you mean cherry-picking just that patch off the
> front of Ira's series?

Correct.

> 
> That raises the question - are we treating this as a fix? If not, probably better
> to pull that change into this series that I think is simpler to land than Ira's one
> and get that queued up then rebase Ira's.  If we aren't in a hurry, fine to queue
> this one second but then you need to rebase on Ira's full series.
> 
> The two series are causing a bunch of fuzz for each other. Sure easy to repair
> but it's going to require manual intervention.

I'll let Dan make the call on that. I don't have an issue with what you 
suggested.

> 
> Jonathan
> 
>>
>> [1]: https://lore.kernel.org/linux-cxl/20221212070627.1372402-2-ira.weiny@intel.com/
>>
>> v4:
>> - Fix masking of RAS register. (Jonathan)
>>
>> v3:
>> - Remove flex bus port status check. (Jonathan)
>> - Only unmask known mask bits. (Jonathan)
>>
>> v2:
>> - Add definition of PCI_EXP_LNKSTA2_FLIT. (Dan)
>> - Return error for cxl_pci_ras_unmask(). (Dan)
>> - Add dev_dbg() for register bits to be cleared. (Dan)
>> - Check Flex Port DVSEC status. (Dan)
>> ---
>>   drivers/cxl/cxl.h             |    1 +
>>   drivers/cxl/pci.c             |   48 +++++++++++++++++++++++++++++++++++++++++
>>   include/uapi/linux/pci_regs.h |    1 +
>>   3 files changed, 50 insertions(+)
>>
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 1b1cf459ac77..31e795c6d537 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -130,6 +130,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>>   #define   CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>>   #define CXL_RAS_UNCORRECTABLE_MASK_OFFSET 0x4
>>   #define   CXL_RAS_UNCORRECTABLE_MASK_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>> +#define   CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK BIT(8)
>>   #define CXL_RAS_UNCORRECTABLE_SEVERITY_OFFSET 0x8
>>   #define   CXL_RAS_UNCORRECTABLE_SEVERITY_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>>   #define CXL_RAS_CORRECTABLE_STATUS_OFFSET 0xC
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index 33083a522fd1..64272db6a3fd 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -419,6 +419,53 @@ static void disable_aer(void *pdev)
>>   	pci_disable_pcie_error_reporting(pdev);
>>   }
>>   
>> +/*
>> + * CXL v3.0 6.2.3 Table 6-4
>> + * The table indicates that if PCIe Flit Mode is set, then CXL is in 256B flits
>> + * mode, otherwise it's 68B flits mode.
>> + */
>> +static bool cxl_pci_flit_256(struct pci_dev *pdev)
>> +{
>> +	u32 lnksta2;
>> +
>> +	pcie_capability_read_dword(pdev, PCI_EXP_LNKSTA2, &lnksta2);
>> +	return lnksta2 & PCI_EXP_LNKSTA2_FLIT;
>> +}
>> +
>> +static int cxl_pci_ras_unmask(struct pci_dev *pdev)
>> +{
>> +	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
>> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>> +	void __iomem *addr;
>> +	u32 val, mask;
>> +
>> +	if (!cxlds->regs.ras)
>> +		return -ENODEV;
>> +
>> +	/* BIOS has CXL error control */
>> +	if (!host_bridge->native_cxl_error)
>> +		return -EOPNOTSUPP;
>> +
>> +	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
>> +	val = readl(addr);
>> +	dev_dbg(&pdev->dev, "Uncorrectable RAS Errors Mask: %#x\n", val);
>> +
>> +	mask = CXL_RAS_UNCORRECTABLE_MASK_MASK;
>> +	if (!cxl_pci_flit_256(pdev))
>> +		mask &= ~CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
>> +	val &= ~mask;
>> +	writel(val, addr);
>> +	dev_dbg(&pdev->dev, "Unmasked Uncorrectable RAS Errors Mask: %#x\n", val);
>> +
>> +	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
>> +	val = readl(addr);
>> +	dev_dbg(&pdev->dev, "Correctable RAS Errors Mask: %#x\n", val);
>> +	val &= ~CXL_RAS_CORRECTABLE_MASK_MASK;
>> +	writel(val, addr);
>> +	dev_dbg(&pdev->dev, "Unmasked Correctable RAS Errors Mask: %#x\n", val);
>> +	return 0;
>> +}
>> +
>>   static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>   {
>>   	struct cxl_register_map map;
>> @@ -498,6 +545,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>   
>>   	if (cxlds->regs.ras) {
>>   		pci_enable_pcie_error_reporting(pdev);
>> +		cxl_pci_ras_unmask(pdev);
>>   		rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
>>   		if (rc)
>>   			return rc;
>> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
>> index 82a03ea954af..576ee2ec973f 100644
>> --- a/include/uapi/linux/pci_regs.h
>> +++ b/include/uapi/linux/pci_regs.h
>> @@ -693,6 +693,7 @@
>>   #define  PCI_EXP_LNKCTL2_TX_MARGIN	0x0380 /* Transmit Margin */
>>   #define  PCI_EXP_LNKCTL2_HASD		0x0020 /* HW Autonomous Speed Disable */
>>   #define PCI_EXP_LNKSTA2		0x32	/* Link Status 2 */
>> +#define  PCI_EXP_LNKSTA2_FLIT		BIT(10) /* Flit Mode Status */
>>   #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	0x32	/* end of v2 EPs w/ link */
>>   #define PCI_EXP_SLTCAP2		0x34	/* Slot Capabilities 2 */
>>   #define  PCI_EXP_SLTCAP2_IBPD	0x00000001 /* In-band PD Disable Supported */
>>
>>
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] cxl: add RAS status unmasking for CXL
  2022-12-16  9:36 ` Jonathan Cameron
@ 2022-12-16 15:37   ` Dave Jiang
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Jiang @ 2022-12-16 15:37 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield



On 12/16/2022 2:36 AM, Jonathan Cameron wrote:
> On Thu, 15 Dec 2022 11:08:03 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
>> By default the CXL RAS mask registers bits are defaulted to 1's and
>> suppress all error reporting. If the kernel has negotiated ownership
>> of error handling for CXL then unmask the mask registers by writing 0s.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>
> 
> One trivial comment inline that is up to you.

Not a problem. I'll roll that in with changes Dan is asking.

> 
> Dan's suggestion makes sense.
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
>> ---
>>
>> Based on patch posted by Ira [1] to export CXL native error reporting control.
>>
>> [1]: https://lore.kernel.org/linux-cxl/20221212070627.1372402-2-ira.weiny@intel.com/
>>
>> v4:
>> - Fix masking of RAS register. (Jonathan)
>>
>> v3:
>> - Remove flex bus port status check. (Jonathan)
>> - Only unmask known mask bits. (Jonathan)
>>
>> v2:
>> - Add definition of PCI_EXP_LNKSTA2_FLIT. (Dan)
>> - Return error for cxl_pci_ras_unmask(). (Dan)
>> - Add dev_dbg() for register bits to be cleared. (Dan)
>> - Check Flex Port DVSEC status. (Dan)
>> ---
>>   drivers/cxl/cxl.h             |    1 +
>>   drivers/cxl/pci.c             |   48 +++++++++++++++++++++++++++++++++++++++++
>>   include/uapi/linux/pci_regs.h |    1 +
>>   3 files changed, 50 insertions(+)
>>
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 1b1cf459ac77..31e795c6d537 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -130,6 +130,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>>   #define   CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>>   #define CXL_RAS_UNCORRECTABLE_MASK_OFFSET 0x4
>>   #define   CXL_RAS_UNCORRECTABLE_MASK_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>> +#define   CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK BIT(8)
>>   #define CXL_RAS_UNCORRECTABLE_SEVERITY_OFFSET 0x8
>>   #define   CXL_RAS_UNCORRECTABLE_SEVERITY_MASK (GENMASK(16, 14) | GENMASK(11, 0))
>>   #define CXL_RAS_CORRECTABLE_STATUS_OFFSET 0xC
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index 33083a522fd1..64272db6a3fd 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -419,6 +419,53 @@ static void disable_aer(void *pdev)
>>   	pci_disable_pcie_error_reporting(pdev);
>>   }
>>   
>> +/*
>> + * CXL v3.0 6.2.3 Table 6-4
>> + * The table indicates that if PCIe Flit Mode is set, then CXL is in 256B flits
>> + * mode, otherwise it's 68B flits mode.
>> + */
>> +static bool cxl_pci_flit_256(struct pci_dev *pdev)
>> +{
>> +	u32 lnksta2;
>> +
>> +	pcie_capability_read_dword(pdev, PCI_EXP_LNKSTA2, &lnksta2);
>> +	return lnksta2 & PCI_EXP_LNKSTA2_FLIT;
>> +}
>> +
>> +static int cxl_pci_ras_unmask(struct pci_dev *pdev)
>> +{
>> +	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
>> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>> +	void __iomem *addr;
>> +	u32 val, mask;
>> +
>> +	if (!cxlds->regs.ras)
>> +		return -ENODEV;
>> +
>> +	/* BIOS has CXL error control */
>> +	if (!host_bridge->native_cxl_error)
>> +		return -EOPNOTSUPP;
>> +
>> +	addr = cxlds->regs.ras + CXL_RAS_UNCORRECTABLE_MASK_OFFSET;
>> +	val = readl(addr);
>> +	dev_dbg(&pdev->dev, "Uncorrectable RAS Errors Mask: %#x\n", val);
>> +
>> +	mask = CXL_RAS_UNCORRECTABLE_MASK_MASK;
>> +	if (!cxl_pci_flit_256(pdev))
>> +		mask &= ~CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK;
>> +	val &= ~mask;
>> +	writel(val, addr);
>> +	dev_dbg(&pdev->dev, "Unmasked Uncorrectable RAS Errors Mask: %#x\n", val);
>> +
>> +	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_MASK_OFFSET;
>> +	val = readl(addr);
>> +	dev_dbg(&pdev->dev, "Correctable RAS Errors Mask: %#x\n", val);
>> +	val &= ~CXL_RAS_CORRECTABLE_MASK_MASK;
>> +	writel(val, addr);
>> +	dev_dbg(&pdev->dev, "Unmasked Correctable RAS Errors Mask: %#x\n", val);
> 
> Real nitpick but I'd like a blank line here as the return isn't related to the
> previous block of code.
> 
>> +	return 0;
>> +}
>> +
>>   static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>   {
>>   	struct cxl_register_map map;
>> @@ -498,6 +545,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>   
>>   	if (cxlds->regs.ras) {
>>   		pci_enable_pcie_error_reporting(pdev);
>> +		cxl_pci_ras_unmask(pdev);
>>   		rc = devm_add_action_or_reset(&pdev->dev, disable_aer, pdev);
>>   		if (rc)
>>   			return rc;
>> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
>> index 82a03ea954af..576ee2ec973f 100644
>> --- a/include/uapi/linux/pci_regs.h
>> +++ b/include/uapi/linux/pci_regs.h
>> @@ -693,6 +693,7 @@
>>   #define  PCI_EXP_LNKCTL2_TX_MARGIN	0x0380 /* Transmit Margin */
>>   #define  PCI_EXP_LNKCTL2_HASD		0x0020 /* HW Autonomous Speed Disable */
>>   #define PCI_EXP_LNKSTA2		0x32	/* Link Status 2 */
>> +#define  PCI_EXP_LNKSTA2_FLIT		BIT(10) /* Flit Mode Status */
>>   #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2	0x32	/* end of v2 EPs w/ link */
>>   #define PCI_EXP_SLTCAP2		0x34	/* Slot Capabilities 2 */
>>   #define  PCI_EXP_SLTCAP2_IBPD	0x00000001 /* In-band PD Disable Supported */
>>
>>
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] cxl: add RAS status unmasking for CXL
  2022-12-16 12:34 ` Jonathan Cameron
  2022-12-16 15:36   ` Dave Jiang
@ 2022-12-16 16:28   ` Dan Williams
  1 sibling, 0 replies; 7+ messages in thread
From: Dan Williams @ 2022-12-16 16:28 UTC (permalink / raw)
  To: Jonathan Cameron, Dave Jiang
  Cc: linux-cxl, dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield

Jonathan Cameron wrote:
> On Thu, 15 Dec 2022 11:08:03 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
> > By default the CXL RAS mask registers bits are defaulted to 1's and
> > suppress all error reporting. If the kernel has negotiated ownership
> > of error handling for CXL then unmask the mask registers by writing 0s.
> > 
> > Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> > 
> > ---
> > 
> > Based on patch posted by Ira [1] to export CXL native error reporting control.
> When you say based upon this, you mean cherry-picking just that patch off the
> front of Ira's series?  
> 
> That raises the question - are we treating this as a fix? If not, probably better
> to pull that change into this series that I think is simpler to land than Ira's one
> and get that queued up then rebase Ira's.  If we aren't in a hurry, fine to queue
> this one second but then you need to rebase on Ira's full series.

My expectation is that this is v6.3 material and that linux-cxl
community needs the time to see what this does on real devices before
committing to having all the errors on all devices all the time.

That acpi_pci_root patch needs an ack from Rafael and Bjorn, not sure if
that will come in before the holidays, but once it does I will throw it
into cxl/next.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-12-16 16:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-15 18:08 [PATCH v4] cxl: add RAS status unmasking for CXL Dave Jiang
2022-12-16  0:01 ` Dan Williams
2022-12-16  9:36 ` Jonathan Cameron
2022-12-16 15:37   ` Dave Jiang
2022-12-16 12:34 ` Jonathan Cameron
2022-12-16 15:36   ` Dave Jiang
2022-12-16 16:28   ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.