All of lore.kernel.org
 help / color / mirror / Atom feed
* CXL type 3 which doesn't have cxl mem enabled.
@ 2022-04-26 17:08 Jonathan Cameron
  2022-04-26 17:19 ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Cameron @ 2022-04-26 17:08 UTC (permalink / raw)
  To: Dan Williams, linux-cxl, ben.widawsky, vishal.l.verma, ira.weiny,
	alison.schofield

Hi All,

I ran into this whilst debugging why on the current QEMU code
we now get a probe failure for CXL mem due to the range 1 size being
non 0. 

The conditions for whether we have legacy ranges programmed don't
take into account if Mem_Enable = 1.  That is if the
DVSEC CXL Control Mem_Enable bit is set on the type 3 device.
If it's not then there is no existing user of the CXL memory
setup by firmware or similar so we can switch over to HDM
decoders and it doesn't matter what is in the range registers.

Unfortunately the QEMU code was bringing the device up with
Mem_Enabled already set.  So I fixed that.  After all default
value of that bit should be 0.

A few problems then showed up.

1. Nothing in the Linux code actually sets Mem_Enabled to 1.
2. Probing fails in mem.c as wait_for_media() checks for
   info->mem_enabled (cached value of this bit).

So, dirty hack was to 
* drop that check in wait_for_media() as media being enabled doesn't
  have much to do with CXL.mem protocol being enabled.
* Make the check in cxl_hdm_decode_init()
   if (info->mem_enabled && !global_enable && info->ranges)
* Immediately after enabling the hdm decoder global enable,
  also set the Mem_Enable bit also update info->mem_enabled.

This seems to work, but I can't help thinking I'm missing something.

Jonathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CXL type 3 which doesn't have cxl mem enabled.
  2022-04-26 17:08 CXL type 3 which doesn't have cxl mem enabled Jonathan Cameron
@ 2022-04-26 17:19 ` Dan Williams
  2022-04-26 18:06   ` Jonathan Cameron
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2022-04-26 17:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, Ben Widawsky, Vishal L Verma, Weiny, Ira, Schofield, Alison

On Tue, Apr 26, 2022 at 10:09 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> Hi All,
>
> I ran into this whilst debugging why on the current QEMU code
> we now get a probe failure for CXL mem due to the range 1 size being
> non 0.
>
> The conditions for whether we have legacy ranges programmed don't
> take into account if Mem_Enable = 1.  That is if the
> DVSEC CXL Control Mem_Enable bit is set on the type 3 device.
> If it's not then there is no existing user of the CXL memory
> setup by firmware or similar so we can switch over to HDM
> decoders and it doesn't matter what is in the range registers.
>
> Unfortunately the QEMU code was bringing the device up with
> Mem_Enabled already set.  So I fixed that.  After all default
> value of that bit should be 0.
>
> A few problems then showed up.
>
> 1. Nothing in the Linux code actually sets Mem_Enabled to 1.

That's because the device is supposed to, I though, set it of its own
accord as a result of link training. It's an RO field in the spec, so
Linux can't set it:

8.2.1.3.3 DVSEC Flex Bus Port Status (Offset 0Eh)
"Mem_Enabled: When set, indicates that CXL.mem protocol operation has
been enabled as a result of PCIe alternate protocol negotiation for
Flex Bus."

> 2. Probing fails in mem.c as wait_for_media() checks for
>    info->mem_enabled (cached value of this bit).
>
> So, dirty hack was to
> * drop that check in wait_for_media() as media being enabled doesn't
>   have much to do with CXL.mem protocol being enabled.

Per above I think it does.

> * Make the check in cxl_hdm_decode_init()
>    if (info->mem_enabled && !global_enable && info->ranges)
> * Immediately after enabling the hdm decoder global enable,
>   also set the Mem_Enable bit also update info->mem_enabled.
>
> This seems to work, but I can't help thinking I'm missing something.

I think QEMU should just unconditionally init that value to 1.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CXL type 3 which doesn't have cxl mem enabled.
  2022-04-26 17:19 ` Dan Williams
@ 2022-04-26 18:06   ` Jonathan Cameron
  2022-04-26 19:00     ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Cameron @ 2022-04-26 18:06 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, Vishal L Verma, Weiny, Ira, Schofield, Alison

On Tue, 26 Apr 2022 10:19:55 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> On Tue, Apr 26, 2022 at 10:09 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > Hi All,
> >
> > I ran into this whilst debugging why on the current QEMU code
> > we now get a probe failure for CXL mem due to the range 1 size being
> > non 0.
> >
> > The conditions for whether we have legacy ranges programmed don't
> > take into account if Mem_Enable = 1.  That is if the
> > DVSEC CXL Control Mem_Enable bit is set on the type 3 device.
> > If it's not then there is no existing user of the CXL memory
> > setup by firmware or similar so we can switch over to HDM
> > decoders and it doesn't matter what is in the range registers.
> >
> > Unfortunately the QEMU code was bringing the device up with
> > Mem_Enabled already set.  So I fixed that.  After all default
> > value of that bit should be 0.
> >
> > A few problems then showed up.
> >
> > 1. Nothing in the Linux code actually sets Mem_Enabled to 1.  

Sorry - my mistake, that should be Mem_Enable. Though that doesn't
actually clarify things much...

> 
> That's because the device is supposed to, I though, set it of its own
> accord as a result of link training. It's an RO field in the spec, so
> Linux can't set it:
> 
> 8.2.1.3.3 DVSEC Flex Bus Port Status (Offset 0Eh)
> "Mem_Enabled: When set, indicates that CXL.mem protocol operation has
> been enabled as a result of PCIe alternate protocol negotiation for
> Flex Bus."

Agreed with that statement.

Ah. Nothing like confusing register field names that are very similar....
A Mem_Enable is in DVSEC for CXL Device 8.1.3.2 and is RWL.
Just for giggles there is also a Mem_Enable in the Flex Bus Port Control
but the range registers comment isn't about that one (I hope anyway!).

The kernel currently sets the value of info->mem_enabled using
the Mem_Enable field of the DVSEC for CXL Device.
https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/pci.c#L501

So I think wrong name and wrong DVSEC for that particular condition.

Jonathan


> 
> > 2. Probing fails in mem.c as wait_for_media() checks for
> >    info->mem_enabled (cached value of this bit).
> >
> > So, dirty hack was to
> > * drop that check in wait_for_media() as media being enabled doesn't
> >   have much to do with CXL.mem protocol being enabled.  
> 
> Per above I think it does.
> 
> > * Make the check in cxl_hdm_decode_init()
> >    if (info->mem_enabled && !global_enable && info->ranges)
> > * Immediately after enabling the hdm decoder global enable,
> >   also set the Mem_Enable bit also update info->mem_enabled.
> >
> > This seems to work, but I can't help thinking I'm missing something.  
> 
> I think QEMU should just unconditionally init that value to 1.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CXL type 3 which doesn't have cxl mem enabled.
  2022-04-26 18:06   ` Jonathan Cameron
@ 2022-04-26 19:00     ` Dan Williams
  2022-04-26 19:38       ` Jonathan Cameron
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2022-04-26 19:00 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, Ben Widawsky, Vishal L Verma, Weiny, Ira, Schofield, Alison

[-- Attachment #1: Type: text/plain, Size: 2581 bytes --]

On Tue, Apr 26, 2022 at 11:06 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Tue, 26 Apr 2022 10:19:55 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > On Tue, Apr 26, 2022 at 10:09 AM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > Hi All,
> > >
> > > I ran into this whilst debugging why on the current QEMU code
> > > we now get a probe failure for CXL mem due to the range 1 size being
> > > non 0.
> > >
> > > The conditions for whether we have legacy ranges programmed don't
> > > take into account if Mem_Enable = 1.  That is if the
> > > DVSEC CXL Control Mem_Enable bit is set on the type 3 device.
> > > If it's not then there is no existing user of the CXL memory
> > > setup by firmware or similar so we can switch over to HDM
> > > decoders and it doesn't matter what is in the range registers.
> > >
> > > Unfortunately the QEMU code was bringing the device up with
> > > Mem_Enabled already set.  So I fixed that.  After all default
> > > value of that bit should be 0.
> > >
> > > A few problems then showed up.
> > >
> > > 1. Nothing in the Linux code actually sets Mem_Enabled to 1.
>
> Sorry - my mistake, that should be Mem_Enable. Though that doesn't
> actually clarify things much...
>
> >
> > That's because the device is supposed to, I though, set it of its own
> > accord as a result of link training. It's an RO field in the spec, so
> > Linux can't set it:
> >
> > 8.2.1.3.3 DVSEC Flex Bus Port Status (Offset 0Eh)
> > "Mem_Enabled: When set, indicates that CXL.mem protocol operation has
> > been enabled as a result of PCIe alternate protocol negotiation for
> > Flex Bus."
>
> Agreed with that statement.
>
> Ah. Nothing like confusing register field names that are very similar....
> A Mem_Enable is in DVSEC for CXL Device 8.1.3.2 and is RWL.
> Just for giggles there is also a Mem_Enable in the Flex Bus Port Control
> but the range registers comment isn't about that one (I hope anyway!).

Not sure whether to laugh or cry at that, sorry for the mix up on my part.

> The kernel currently sets the value of info->mem_enabled using
> the Mem_Enable field of the DVSEC for CXL Device.
> https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/pci.c#L501
>
> So I think wrong name and wrong DVSEC for that particular condition.

Yeah, I don't even see a need to cache that value, so something like
the attached? Note that the intent was to only have cxl_mem worry
about MMIO mapped register details and not require the 'struct
pci_dev' which makes things easier for cxl_test in the near term.

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 3466 bytes --]

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 7235d2f976e5..ef6950a2a4fd 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -150,12 +150,10 @@ static inline int cxl_mbox_cmd_rc2errno(struct cxl_mbox_cmd *mbox_cmd)
 
 /**
  * struct cxl_endpoint_dvsec_info - Cached DVSEC info
- * @mem_enabled: cached value of mem_enabled in the DVSEC, PCIE_DEVICE
  * @ranges: Number of active HDM ranges this device uses.
  * @dvsec_range: cached attributes of the ranges in the DVSEC, PCIE_DEVICE
  */
 struct cxl_endpoint_dvsec_info {
-	bool mem_enabled;
 	int ranges;
 	struct range dvsec_range[2];
 };
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 401b0fbe21db..c2d9dadf4a2e 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -27,12 +27,8 @@
 static int wait_for_media(struct cxl_memdev *cxlmd)
 {
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
-	struct cxl_endpoint_dvsec_info *info = &cxlds->info;
 	int rc;
 
-	if (!info->mem_enabled)
-		return -EBUSY;
-
 	rc = cxlds->wait_media_ready(cxlds);
 	if (rc)
 		return rc;
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index e7ab9a34d718..5c8f933bbece 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -463,6 +463,17 @@ static int wait_for_media_ready(struct cxl_dev_state *cxlds)
 	return 0;
 }
 
+static void cxl_disable_mem(void *pdev)
+{
+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+	int d = cxlds->cxl_dvsec;
+	u16 ctrl;
+
+	pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
+	ctrl &= ~CXL_DVSEC_MEM_ENABLE;
+	pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, ctrl);
+}
+
 /*
  * Return positive number of non-zero ranges on success and a negative
  * error code on failure. The cxl_mem driver depends on ranges == 0 to
@@ -486,13 +497,26 @@ static int __cxl_dvsec_ranges(struct cxl_dev_state *cxlds,
 	if (rc)
 		return rc;
 
+	if (!(cap & CXL_DVSEC_MEM_CAPABLE)) {
+		dev_dbg(dev, "Not MEM Capable\n");
+		return -ENXIO;
+	}
+
 	rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
 	if (rc)
 		return rc;
 
-	if (!(cap & CXL_DVSEC_MEM_CAPABLE)) {
-		dev_dbg(dev, "Not MEM Capable\n");
-		return -ENXIO;
+	if (!(ctrl & CXL_DVSEC_MEM_ENABLE)) {
+		ctrl |= CXL_DVSEC_MEM_ENABLE;
+		rc = pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET,
+					   ctrl);
+		if (rc)
+			return rc;
+
+		rc = devm_add_action_or_reset(&pdev->dev, cxl_disable_mem,
+					      pdev);
+		if (rc)
+			return rc;
 	}
 
 	/*
@@ -511,8 +535,6 @@ static int __cxl_dvsec_ranges(struct cxl_dev_state *cxlds,
 		return rc;
 	}
 
-	info->mem_enabled = FIELD_GET(CXL_DVSEC_MEM_ENABLE, ctrl);
-
 	for (i = 0; i < hdm_count; i++) {
 		u64 base, size;
 		u32 temp;
@@ -585,6 +607,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	cxlds = cxl_dev_state_create(&pdev->dev);
 	if (IS_ERR(cxlds))
 		return PTR_ERR(cxlds);
+	pci_set_drvdata(pdev, cxlds);
 
 	cxlds->serial = pci_get_dsn(pdev);
 	cxlds->cxl_dvsec = pci_find_dvsec_capability(
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index b6b726eff3e2..44d01224734a 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -250,10 +250,6 @@ static void label_area_release(void *lsa)
 
 static void mock_validate_dvsec_ranges(struct cxl_dev_state *cxlds)
 {
-	struct cxl_endpoint_dvsec_info *info;
-
-	info = &cxlds->info;
-	info->mem_enabled = true;
 }
 
 static int cxl_mock_mem_probe(struct platform_device *pdev)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: CXL type 3 which doesn't have cxl mem enabled.
  2022-04-26 19:00     ` Dan Williams
@ 2022-04-26 19:38       ` Jonathan Cameron
  2022-04-26 20:02         ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Cameron @ 2022-04-26 19:38 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, Vishal L Verma, Weiny, Ira, Schofield, Alison

On Tue, 26 Apr 2022 12:00:41 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> On Tue, Apr 26, 2022 at 11:06 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Tue, 26 Apr 2022 10:19:55 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > > On Tue, Apr 26, 2022 at 10:09 AM Jonathan Cameron
> > > <Jonathan.Cameron@huawei.com> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > I ran into this whilst debugging why on the current QEMU code
> > > > we now get a probe failure for CXL mem due to the range 1 size being
> > > > non 0.
> > > >
> > > > The conditions for whether we have legacy ranges programmed don't
> > > > take into account if Mem_Enable = 1.  That is if the
> > > > DVSEC CXL Control Mem_Enable bit is set on the type 3 device.
> > > > If it's not then there is no existing user of the CXL memory
> > > > setup by firmware or similar so we can switch over to HDM
> > > > decoders and it doesn't matter what is in the range registers.
> > > >
> > > > Unfortunately the QEMU code was bringing the device up with
> > > > Mem_Enabled already set.  So I fixed that.  After all default
> > > > value of that bit should be 0.
> > > >
> > > > A few problems then showed up.
> > > >
> > > > 1. Nothing in the Linux code actually sets Mem_Enabled to 1.
> >
> > Sorry - my mistake, that should be Mem_Enable. Though that doesn't
> > actually clarify things much...
> >
> > >
> > > That's because the device is supposed to, I though, set it of its own
> > > accord as a result of link training. It's an RO field in the spec, so
> > > Linux can't set it:
> > >
> > > 8.2.1.3.3 DVSEC Flex Bus Port Status (Offset 0Eh)
> > > "Mem_Enabled: When set, indicates that CXL.mem protocol operation has
> > > been enabled as a result of PCIe alternate protocol negotiation for
> > > Flex Bus."
> >
> > Agreed with that statement.
> >
> > Ah. Nothing like confusing register field names that are very similar....
> > A Mem_Enable is in DVSEC for CXL Device 8.1.3.2 and is RWL.
> > Just for giggles there is also a Mem_Enable in the Flex Bus Port Control
> > but the range registers comment isn't about that one (I hope anyway!).
> 
> Not sure whether to laugh or cry at that, sorry for the mix up on my part.
> 
> > The kernel currently sets the value of info->mem_enabled using
> > the Mem_Enable field of the DVSEC for CXL Device.
> > https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/pci.c#L501
> >
> > So I think wrong name and wrong DVSEC for that particular condition.
> 
> Yeah, I don't even see a need to cache that value, so something like
> the attached? Note that the intent was to only have cxl_mem worry
> about MMIO mapped register details and not require the 'struct
> pci_dev' which makes things easier for cxl_test in the near term.
> 
Hi Dan,

That fixes this problem (I'll test tomorrow but it looks right), but...

I think we still run into the problem I was debugging in the
first place which is whether the Device DVSEC Range 1 size is non 0.
https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/pci.c#L541
(ranges ends up > 0 and hence we conclude firmware already programmed the
 device and fail the probe).

As far as I can tell a CXL 2.0 type 3 device is allowed to provide
the option to use ranges or HDM decoders (or it can be HDM decoder only).
See the comment at end of 8.1.3.8.4 :

"A CXL.mem capable device that implements CXL HDM Decoder Capability registers
follows the above behavior as long as HDM Decoder Enable bit in CXL HDM Decoder
Global Control register is zero."

As such we can't use the range size alone to check if
the Range registers are in use (it's a RO value, not something previously
configured)  We need to perform the full check as described which
includes checking Mem_Enable (which is in the above behavior that comment
is directing us towards).

As the below patch has already set Mem_Enable hardware field unconditionally
we don't have the necessary info by the time we reach the relevant code.
https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/mem.c#L107

So you could cache the current value (perhaps with a more meaningful
name than the spec gives it!) of Device DVSEC Mem_Enable
at the point where you have it written in this patch and add a check
on the cached value at the point in the reference above.

That's still a little ugly as ideally we shouldn't transition through
a somewhat invalid state - though it is harmless as no traffic
will be sent by the host (probably - though I suspect hardware folk
would tell me we can't assume it...).
The invalid state being:
- Mem_Enable set
- Range registers in use because global HDM Decoder enable not yet set.
- Range registers were programmed by firmware to something that actually
  works but not enabled for some odd reason. I think they might even be
  technically valid with the defaults even though the base is 0 (imagine
  a very large CXL memory - some of it might overlap with a region being
  routed by the host to the CXL host bridges).

Ideally we wouldn't set that Mem_Enable until we have switched
to the HDM decoders.  To avoid that we probably need another callback
from cxl_mem into cxl_pci.

Agreed on the laughing or crying. I'm off to find a beer.

Jonathan



> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 7235d2f976e5..ef6950a2a4fd 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -150,12 +150,10 @@ static inline int cxl_mbox_cmd_rc2errno(struct cxl_mbox_cmd *mbox_cmd)
>  
>  /**
>   * struct cxl_endpoint_dvsec_info - Cached DVSEC info
> - * @mem_enabled: cached value of mem_enabled in the DVSEC, PCIE_DEVICE
>   * @ranges: Number of active HDM ranges this device uses.
>   * @dvsec_range: cached attributes of the ranges in the DVSEC, PCIE_DEVICE
>   */
>  struct cxl_endpoint_dvsec_info {
> -	bool mem_enabled;
>  	int ranges;
>  	struct range dvsec_range[2];
>  };
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 401b0fbe21db..c2d9dadf4a2e 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -27,12 +27,8 @@
>  static int wait_for_media(struct cxl_memdev *cxlmd)
>  {
>  	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> -	struct cxl_endpoint_dvsec_info *info = &cxlds->info;
>  	int rc;
>  
> -	if (!info->mem_enabled)
> -		return -EBUSY;
> -
>  	rc = cxlds->wait_media_ready(cxlds);
>  	if (rc)
>  		return rc;
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index e7ab9a34d718..5c8f933bbece 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -463,6 +463,17 @@ static int wait_for_media_ready(struct cxl_dev_state *cxlds)
>  	return 0;
>  }
>  
> +static void cxl_disable_mem(void *pdev)
> +{
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	int d = cxlds->cxl_dvsec;
> +	u16 ctrl;
> +
> +	pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
> +	ctrl &= ~CXL_DVSEC_MEM_ENABLE;
> +	pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, ctrl);
> +}
> +
>  /*
>   * Return positive number of non-zero ranges on success and a negative
>   * error code on failure. The cxl_mem driver depends on ranges == 0 to
> @@ -486,13 +497,26 @@ static int __cxl_dvsec_ranges(struct cxl_dev_state *cxlds,
>  	if (rc)
>  		return rc;
>  
> +	if (!(cap & CXL_DVSEC_MEM_CAPABLE)) {
> +		dev_dbg(dev, "Not MEM Capable\n");
> +		return -ENXIO;
> +	}
> +
>  	rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
>  	if (rc)
>  		return rc;
>  
> -	if (!(cap & CXL_DVSEC_MEM_CAPABLE)) {
> -		dev_dbg(dev, "Not MEM Capable\n");
> -		return -ENXIO;
> +	if (!(ctrl & CXL_DVSEC_MEM_ENABLE)) {
> +		ctrl |= CXL_DVSEC_MEM_ENABLE;
> +		rc = pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET,
> +					   ctrl);
> +		if (rc)
> +			return rc;
> +
> +		rc = devm_add_action_or_reset(&pdev->dev, cxl_disable_mem,
> +					      pdev);
> +		if (rc)
> +			return rc;
>  	}
>  
>  	/*
> @@ -511,8 +535,6 @@ static int __cxl_dvsec_ranges(struct cxl_dev_state *cxlds,
>  		return rc;
>  	}
>  
> -	info->mem_enabled = FIELD_GET(CXL_DVSEC_MEM_ENABLE, ctrl);
> -
>  	for (i = 0; i < hdm_count; i++) {
>  		u64 base, size;
>  		u32 temp;
> @@ -585,6 +607,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	cxlds = cxl_dev_state_create(&pdev->dev);
>  	if (IS_ERR(cxlds))
>  		return PTR_ERR(cxlds);
> +	pci_set_drvdata(pdev, cxlds);
>  
>  	cxlds->serial = pci_get_dsn(pdev);
>  	cxlds->cxl_dvsec = pci_find_dvsec_capability(
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index b6b726eff3e2..44d01224734a 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -250,10 +250,6 @@ static void label_area_release(void *lsa)
>  
>  static void mock_validate_dvsec_ranges(struct cxl_dev_state *cxlds)
>  {
> -	struct cxl_endpoint_dvsec_info *info;
> -
> -	info = &cxlds->info;
> -	info->mem_enabled = true;
>  }
>  
>  static int cxl_mock_mem_probe(struct platform_device *pdev)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CXL type 3 which doesn't have cxl mem enabled.
  2022-04-26 19:38       ` Jonathan Cameron
@ 2022-04-26 20:02         ` Dan Williams
  2022-04-27  8:35           ` Jonathan Cameron
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2022-04-26 20:02 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, Ben Widawsky, Vishal L Verma, Weiny, Ira, Schofield, Alison

On Tue, Apr 26, 2022 at 12:39 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Tue, 26 Apr 2022 12:00:41 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > On Tue, Apr 26, 2022 at 11:06 AM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > On Tue, 26 Apr 2022 10:19:55 -0700
> > > Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > > On Tue, Apr 26, 2022 at 10:09 AM Jonathan Cameron
> > > > <Jonathan.Cameron@huawei.com> wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I ran into this whilst debugging why on the current QEMU code
> > > > > we now get a probe failure for CXL mem due to the range 1 size being
> > > > > non 0.
> > > > >
> > > > > The conditions for whether we have legacy ranges programmed don't
> > > > > take into account if Mem_Enable = 1.  That is if the
> > > > > DVSEC CXL Control Mem_Enable bit is set on the type 3 device.
> > > > > If it's not then there is no existing user of the CXL memory
> > > > > setup by firmware or similar so we can switch over to HDM
> > > > > decoders and it doesn't matter what is in the range registers.
> > > > >
> > > > > Unfortunately the QEMU code was bringing the device up with
> > > > > Mem_Enabled already set.  So I fixed that.  After all default
> > > > > value of that bit should be 0.
> > > > >
> > > > > A few problems then showed up.
> > > > >
> > > > > 1. Nothing in the Linux code actually sets Mem_Enabled to 1.
> > >
> > > Sorry - my mistake, that should be Mem_Enable. Though that doesn't
> > > actually clarify things much...
> > >
> > > >
> > > > That's because the device is supposed to, I though, set it of its own
> > > > accord as a result of link training. It's an RO field in the spec, so
> > > > Linux can't set it:
> > > >
> > > > 8.2.1.3.3 DVSEC Flex Bus Port Status (Offset 0Eh)
> > > > "Mem_Enabled: When set, indicates that CXL.mem protocol operation has
> > > > been enabled as a result of PCIe alternate protocol negotiation for
> > > > Flex Bus."
> > >
> > > Agreed with that statement.
> > >
> > > Ah. Nothing like confusing register field names that are very similar....
> > > A Mem_Enable is in DVSEC for CXL Device 8.1.3.2 and is RWL.
> > > Just for giggles there is also a Mem_Enable in the Flex Bus Port Control
> > > but the range registers comment isn't about that one (I hope anyway!).
> >
> > Not sure whether to laugh or cry at that, sorry for the mix up on my part.
> >
> > > The kernel currently sets the value of info->mem_enabled using
> > > the Mem_Enable field of the DVSEC for CXL Device.
> > > https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/pci.c#L501
> > >
> > > So I think wrong name and wrong DVSEC for that particular condition.
> >
> > Yeah, I don't even see a need to cache that value, so something like
> > the attached? Note that the intent was to only have cxl_mem worry
> > about MMIO mapped register details and not require the 'struct
> > pci_dev' which makes things easier for cxl_test in the near term.
> >
> Hi Dan,
>
> That fixes this problem (I'll test tomorrow but it looks right), but...
>
> I think we still run into the problem I was debugging in the
> first place which is whether the Device DVSEC Range 1 size is non 0.
> https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/pci.c#L541
> (ranges ends up > 0 and hence we conclude firmware already programmed the
>  device and fail the probe).
>
> As far as I can tell a CXL 2.0 type 3 device is allowed to provide
> the option to use ranges or HDM decoders (or it can be HDM decoder only).
> See the comment at end of 8.1.3.8.4 :
>
> "A CXL.mem capable device that implements CXL HDM Decoder Capability registers
> follows the above behavior as long as HDM Decoder Enable bit in CXL HDM Decoder
> Global Control register is zero."
>
> As such we can't use the range size alone to check if
> the Range registers are in use (it's a RO value, not something previously
> configured)  We need to perform the full check as described which
> includes checking Mem_Enable (which is in the above behavior that comment
> is directing us towards).
>
> As the below patch has already set Mem_Enable hardware field unconditionally
> we don't have the necessary info by the time we reach the relevant code.
> https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/mem.c#L107
>
> So you could cache the current value (perhaps with a more meaningful
> name than the spec gives it!) of Device DVSEC Mem_Enable
> at the point where you have it written in this patch and add a check
> on the cached value at the point in the reference above.
>
> That's still a little ugly as ideally we shouldn't transition through
> a somewhat invalid state - though it is harmless as no traffic
> will be sent by the host (probably - though I suspect hardware folk
> would tell me we can't assume it...).

...and there is also the worry about malicious devices, but I don't
know how to determine the difference between valid config, invalid
config, and malicious device claiming a range that it shouldn't.
Perhaps this needs at a minimum cross validation with the CFMWS?

> The invalid state being:
> - Mem_Enable set
> - Range registers in use because global HDM Decoder enable not yet set.
> - Range registers were programmed by firmware to something that actually
>   works but not enabled for some odd reason. I think they might even be
>   technically valid with the defaults even though the base is 0 (imagine
>   a very large CXL memory - some of it might overlap with a region being
>   routed by the host to the CXL host bridges).

Oh yuck, yes the default init state of any device is that it will
decode the first 256MB of memory as long as mem_enabled is set, and
devices are "trusted" to not decode anything that they shouldn't.

I'm wondering if Linux should take a more draconian approach and
mandate that all devices that advertise the CXL 2.0 Class Code
capability must boot in HDM decoder enabled mode, or Mem_enable=0
mode. Any CXL 2.0 device found to have Mem_enable=1 without HDM
decoders enabled will have HDM decoder operation forced upon them so
that Linux can trust that nothing is being decoded by accident, if
that breaks the system, that's a BIOS bug, not a Linux bug. Of course,
could have a module parameter to override that policy while the BIOS
update is in-flight to the target system.

> Ideally we wouldn't set that Mem_Enable until we have switched
> to the HDM decoders.  To avoid that we probably need another callback
> from cxl_mem into cxl_pci.

Either another callback, or move more validation out of cxl_mem and
into cxl_pci. I am leaning towards the latter.

> Agreed on the laughing or crying. I'm off to find a beer.

Cheers!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CXL type 3 which doesn't have cxl mem enabled.
  2022-04-26 20:02         ` Dan Williams
@ 2022-04-27  8:35           ` Jonathan Cameron
  2022-04-28 21:10             ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Cameron @ 2022-04-27  8:35 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Ben Widawsky, Vishal L Verma, Weiny, Ira, Schofield, Alison

On Tue, 26 Apr 2022 13:02:09 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> On Tue, Apr 26, 2022 at 12:39 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Tue, 26 Apr 2022 12:00:41 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >  
> > > On Tue, Apr 26, 2022 at 11:06 AM Jonathan Cameron
> > > <Jonathan.Cameron@huawei.com> wrote:  
> > > >
> > > > On Tue, 26 Apr 2022 10:19:55 -0700
> > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > >  
> > > > > On Tue, Apr 26, 2022 at 10:09 AM Jonathan Cameron
> > > > > <Jonathan.Cameron@huawei.com> wrote:  
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I ran into this whilst debugging why on the current QEMU code
> > > > > > we now get a probe failure for CXL mem due to the range 1 size being
> > > > > > non 0.
> > > > > >
> > > > > > The conditions for whether we have legacy ranges programmed don't
> > > > > > take into account if Mem_Enable = 1.  That is if the
> > > > > > DVSEC CXL Control Mem_Enable bit is set on the type 3 device.
> > > > > > If it's not then there is no existing user of the CXL memory
> > > > > > setup by firmware or similar so we can switch over to HDM
> > > > > > decoders and it doesn't matter what is in the range registers.
> > > > > >
> > > > > > Unfortunately the QEMU code was bringing the device up with
> > > > > > Mem_Enabled already set.  So I fixed that.  After all default
> > > > > > value of that bit should be 0.
> > > > > >
> > > > > > A few problems then showed up.
> > > > > >
> > > > > > 1. Nothing in the Linux code actually sets Mem_Enabled to 1.  
> > > >
> > > > Sorry - my mistake, that should be Mem_Enable. Though that doesn't
> > > > actually clarify things much...
> > > >  
> > > > >
> > > > > That's because the device is supposed to, I though, set it of its own
> > > > > accord as a result of link training. It's an RO field in the spec, so
> > > > > Linux can't set it:
> > > > >
> > > > > 8.2.1.3.3 DVSEC Flex Bus Port Status (Offset 0Eh)
> > > > > "Mem_Enabled: When set, indicates that CXL.mem protocol operation has
> > > > > been enabled as a result of PCIe alternate protocol negotiation for
> > > > > Flex Bus."  
> > > >
> > > > Agreed with that statement.
> > > >
> > > > Ah. Nothing like confusing register field names that are very similar....
> > > > A Mem_Enable is in DVSEC for CXL Device 8.1.3.2 and is RWL.
> > > > Just for giggles there is also a Mem_Enable in the Flex Bus Port Control
> > > > but the range registers comment isn't about that one (I hope anyway!).  
> > >
> > > Not sure whether to laugh or cry at that, sorry for the mix up on my part.
> > >  
> > > > The kernel currently sets the value of info->mem_enabled using
> > > > the Mem_Enable field of the DVSEC for CXL Device.
> > > > https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/pci.c#L501
> > > >
> > > > So I think wrong name and wrong DVSEC for that particular condition.  
> > >
> > > Yeah, I don't even see a need to cache that value, so something like
> > > the attached? Note that the intent was to only have cxl_mem worry
> > > about MMIO mapped register details and not require the 'struct
> > > pci_dev' which makes things easier for cxl_test in the near term.
> > >  
> > Hi Dan,
> >
> > That fixes this problem (I'll test tomorrow but it looks right), but...
> >
> > I think we still run into the problem I was debugging in the
> > first place which is whether the Device DVSEC Range 1 size is non 0.
> > https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/pci.c#L541
> > (ranges ends up > 0 and hence we conclude firmware already programmed the
> >  device and fail the probe).
> >
> > As far as I can tell a CXL 2.0 type 3 device is allowed to provide
> > the option to use ranges or HDM decoders (or it can be HDM decoder only).
> > See the comment at end of 8.1.3.8.4 :
> >
> > "A CXL.mem capable device that implements CXL HDM Decoder Capability registers
> > follows the above behavior as long as HDM Decoder Enable bit in CXL HDM Decoder
> > Global Control register is zero."
> >
> > As such we can't use the range size alone to check if
> > the Range registers are in use (it's a RO value, not something previously
> > configured)  We need to perform the full check as described which
> > includes checking Mem_Enable (which is in the above behavior that comment
> > is directing us towards).
> >
> > As the below patch has already set Mem_Enable hardware field unconditionally
> > we don't have the necessary info by the time we reach the relevant code.
> > https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/mem.c#L107
> >
> > So you could cache the current value (perhaps with a more meaningful
> > name than the spec gives it!) of Device DVSEC Mem_Enable
> > at the point where you have it written in this patch and add a check
> > on the cached value at the point in the reference above.
> >
> > That's still a little ugly as ideally we shouldn't transition through
> > a somewhat invalid state - though it is harmless as no traffic
> > will be sent by the host (probably - though I suspect hardware folk
> > would tell me we can't assume it...).  
> 
> ...and there is also the worry about malicious devices, but I don't
> know how to determine the difference between valid config, invalid
> config, and malicious device claiming a range that it shouldn't.
> Perhaps this needs at a minimum cross validation with the CFMWS?
> 
> > The invalid state being:
> > - Mem_Enable set
> > - Range registers in use because global HDM Decoder enable not yet set.
> > - Range registers were programmed by firmware to something that actually
> >   works but not enabled for some odd reason. I think they might even be
> >   technically valid with the defaults even though the base is 0 (imagine
> >   a very large CXL memory - some of it might overlap with a region being
> >   routed by the host to the CXL host bridges).  
> 
> Oh yuck, yes the default init state of any device is that it will
> decode the first 256MB of memory as long as mem_enabled is set, and
> devices are "trusted" to not decode anything that they shouldn't.
> 
> I'm wondering if Linux should take a more draconian approach and
> mandate that all devices that advertise the CXL 2.0 Class Code
> capability must boot in HDM decoder enabled mode, or Mem_enable=0
> mode. Any CXL 2.0 device found to have Mem_enable=1 without HDM
> decoders enabled will have HDM decoder operation forced upon them so
> that Linux can trust that nothing is being decoded by accident, if
> that breaks the system, that's a BIOS bug, not a Linux bug. Of course,
> could have a module parameter to override that policy while the BIOS
> update is in-flight to the target system.

I'm not sure it is technically a BIOS bug. Using the range approach with
everything fully set up - e.g. with Memory also in the EFI memory map
and other appropriate places is fine.  We should just ignore those
devices.  Hopefully they also have the lock set.

We could make what you suggest a Linux 'boot standard' though...
We'd want to communicate that strongly to various BIOS teams though.
Even better if we can get other OSVs on side.

I'll check with our BIOS team whether they'd mind this restriction.

> 
> > Ideally we wouldn't set that Mem_Enable until we have switched
> > to the HDM decoders.  To avoid that we probably need another callback
> > from cxl_mem into cxl_pci.  
> 
> Either another callback, or move more validation out of cxl_mem and
> into cxl_pci. I am leaning towards the latter.
Sure. That should work.

Jonathan
> 
> > Agreed on the laughing or crying. I'm off to find a beer.  
> 
> Cheers!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CXL type 3 which doesn't have cxl mem enabled.
  2022-04-27  8:35           ` Jonathan Cameron
@ 2022-04-28 21:10             ` Dan Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2022-04-28 21:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, Ben Widawsky, Vishal L Verma, Weiny, Ira, Schofield, Alison

On Wed, Apr 27, 2022 at 1:36 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
> >
> > Oh yuck, yes the default init state of any device is that it will
> > decode the first 256MB of memory as long as mem_enabled is set, and
> > devices are "trusted" to not decode anything that they shouldn't.
> >
> > I'm wondering if Linux should take a more draconian approach and
> > mandate that all devices that advertise the CXL 2.0 Class Code
> > capability must boot in HDM decoder enabled mode, or Mem_enable=0
> > mode. Any CXL 2.0 device found to have Mem_enable=1 without HDM
> > decoders enabled will have HDM decoder operation forced upon them so
> > that Linux can trust that nothing is being decoded by accident, if
> > that breaks the system, that's a BIOS bug, not a Linux bug. Of course,
> > could have a module parameter to override that policy while the BIOS
> > update is in-flight to the target system.
>
> I'm not sure it is technically a BIOS bug. Using the range approach with
> everything fully set up - e.g. with Memory also in the EFI memory map
> and other appropriate places is fine.  We should just ignore those
> devices.  Hopefully they also have the lock set.

The problem is that having the range in the EFI memory map is
ambiguous as to whether the BIOS programmed the CXL device to claim
that range, or some other (untrusted/unexpected) entity programmed the
device to claim those ranges. The only positive indication that the
platform BIOS expects CXL devices to claim EFI memory map ranges is if
the entry appears in the CFMWS with the "Fixed Device Configuration"
bit set.

> We could make what you suggest a Linux 'boot standard' though...
> We'd want to communicate that strongly to various BIOS teams though.
> Even better if we can get other OSVs on side.

Similar to the existing standard that the driver sets with respect to
vendor specific commands I expect this is a kernel policy that can be
set first and discussed later when / if it starts to conflict with
reality. In the meantime it lets the driver have some reasonable
boundaries about the configurations it needs to be prepared to accept.

> I'll check with our BIOS team whether they'd mind this restriction.

Especially in the case of legacy CXL 1.1 devices that pre-date the
definition of the CXL 2.0 memory device class code, I would be
surprised if this policy stance impacted existing or in-flight product
plans, but yes please speak up now.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-28 21:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-26 17:08 CXL type 3 which doesn't have cxl mem enabled Jonathan Cameron
2022-04-26 17:19 ` Dan Williams
2022-04-26 18:06   ` Jonathan Cameron
2022-04-26 19:00     ` Dan Williams
2022-04-26 19:38       ` Jonathan Cameron
2022-04-26 20:02         ` Dan Williams
2022-04-27  8:35           ` Jonathan Cameron
2022-04-28 21:10             ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.