linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-03-29 21:35 [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration Dan Williams
@ 2023-03-29 16:04 ` Gregory Price
  2023-03-30  4:21   ` Dan Williams
  2023-03-30  0:06 ` Dave Jiang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Gregory Price @ 2023-03-29 16:04 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Dave Jiang

On Wed, Mar 29, 2023 at 02:35:55PM -0700, Dan Williams wrote:
> One motivation for mapping range registers to decoder objects is
> to use those settings for region autodiscovery.
> 
> The need to map a region for devices programmed to use range registers
> is especially urgent now that the kernel no longer routes "Soft
> Reserved" ranges in the memory map to device-dax by default. The CXL
> memory range loses all access mechanisms.
> 
> Complete the implementation by filling out ways and granularity, marking
> the DPA reservation, and setting the endpoint-decoder state to signal
> autodiscovery.
> 
> Fixes: 09d09e04d2fc ("cxl/dax: Create dax devices for CXL RAM regions")
> Tested-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/hdm.c |   30 ++++++++++++++++++++++++------
>  1 file changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 9884b6d4d930..5339c0719177 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -738,20 +738,26 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
>  	return 0;
>  }
>  
> -static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
> -					    struct cxl_decoder *cxld, int which,
> -					    struct cxl_endpoint_dvsec_info *info)
> +static int cxl_setup_hdm_decoder_from_dvsec(
> +	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
> +	int which, struct cxl_endpoint_dvsec_info *info)
>  {
> +	struct cxl_endpoint_decoder *cxled;
> +	struct range *range;
> +	int rc;
> +
>  	if (!is_cxl_endpoint(port))
>  		return -EOPNOTSUPP;
>  
> -	if (!range_len(&info->dvsec_range[which]))
> +	cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +	range = &info->dvsec_range[which];
> +	if (!range_len(range))
>  		return -ENOENT;
>  
>  	cxld->target_type = CXL_DECODER_EXPANDER;
>  	cxld->commit = NULL;
>  	cxld->reset = NULL;
> -	cxld->hpa_range = info->dvsec_range[which];
> +	cxld->hpa_range = *range;
>  
>  	/*
>  	 * Set the emulated decoder as locked pending additional support to
> @@ -760,6 +766,17 @@ static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
>  	cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
>  	port->commit_end = cxld->id;
>  
> +	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, range_len(range), 0);

I have an example host where *dpa_base ends up being 0x0 here, and as a
result later down the line the region fails to register with this:

[   21.974117] cxl_pci 0000:3f:00.0: mem0:decoder1.0 no CXL window for range 0x0:0x1fffffffff

this traces back to here:

int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm,
                                struct cxl_endpoint_dvsec_info *info)
{
        u64 dpa_base = 0;    <- dpa_base is set to 0 and never updated
...
        for (i = 0; i < cxlhdm->decoder_count; i++) {
...
                rc = init_hdm_decoder(port, cxld, target_map, hdm, i,
                                      &dpa_base, info);
}


static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
                            int *target_map, void __iomem *hdm, int which,
                            u64 *dpa_base, struct cxl_endpoint_dvsec_info *info)
{
... passed directly into here
        if (should_emulate_decoders(info))
                return cxl_setup_hdm_decoder_from_dvsec(port, cxld, dpa_base,
                                                        which, info);
...
}

static int cxl_setup_hdm_decoder_from_dvsec(
        struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
        int which, struct cxl_endpoint_dvsec_info *info)
{
        rc = devm_cxl_dpa_reserve(cxled, *dpa_base, range_len(range), 0);
        if (rc) {
                dev_err(&port->dev,
                        "decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
                        port->id, cxld->id, *dpa_base,
                        *dpa_base + range_len(range) - 1, rc);
                return rc;
        }
        *dpa_base += range_len(range);
        cxled->state = CXL_DECODER_STATE_AUTO;

        return 0;
}


(full cxl log with bonus prints i added)
[   21.607034] cxl_pci 0000:3f:00.0: No component registers (-19)
[   21.641831] cxl_pci 0000:3f:00.0: DOE: [d80] failed to cache protocols : -5
[   21.642866] cxl_pci 0000:3f:00.0: Failed to create MB object for MB @ d80
[   21.643686] cxl_pci 0000:3f:00.0: Failed to request region 0x0000000000001fff-0x000000000010201e
[   21.644371] cxl add_dpa_res: (0, 1fffffffff)
[   21.645540] cxl add_dpa_res: (2000000000, 1fffffffff)
[   21.965692] cxl hdm dvsec range: (0, 1fffffffff)
[   21.967045] cxl emulating decoders: dpa_base(0)   <-  *dpa_base
[   21.967778] cxl_add_to_region: searching for root decoder with address range(0, 1fffffffff)
[   21.972824] cxl match decoder: found root decoder, r1(1050000000, 304fffffff) r2(0, 1fffffffff)
[   21.974117] cxl_pci 0000:3f:00.0: mem0:decoder1.0 no CXL window for range 0x0:0x1fffffffff
[   21.974905] cxl discover_region: failed to add to region: 0x0-0x1fffffffff



Ultimately having trouble deciding if this is something broken with
bios, the setup_hdm_decoder code, or the discover_region code.  I'm not
PCI guru, but we should expect the rdm dvsec range to be in the range of
the root decoder window / CFMW:  r1(1050000000, 304fffffff)


confirming this is the issue, i *forced* the dvsec range register to read
out base+0x1050000000


diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 7328a2552411..6fc6df0f7b5a 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -340,12 +344,16 @@ int cxl_dvsec_rr_decode(struct device *dev, int d,
                if (rc)
                        return rc;

                base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;

                info->dvsec_range[i] = (struct range) {
-                       .start = base,
-                       .end = base + size - 1
+                       .start = 0x1050000000 + base,
+                       .end = 0x1050000000 + base + size - 1
                };


And this resulted in everything working "as one would expect"

[user@host0 ~]# ls /sys/bus/cxl/devices/
dax_region0  decoder0.0  decoder1.0  endpoint1  mem0  region0  root0

[user@host0 ~]# numactl --hardware
available: 2 nodes (0-1)
node 1 cpus:
node 1 size: 0 MB
node 1 free: 0 MB
node distances:
node   0   1
  0:  10  50
  1:  255  10

[user@host0 ~]# echo online_movable > /sys/bus/node/devices/node1/memory33/state
[user@host0 ~]# numactl --hardware
available: 2 nodes (0-1)
node 1 size: 2048 MB
node 1 free: 2048 MB
node distances:
node   0   1
  0:  10  50
  1:  255  10


Basically the question is:

Is the DVSEC Range Register expected to be programmed by bios, and are
not being programmed correctly?

Or is there something else missing here to correct for the CMFW base?



If it's the former, then this patch set is gtg and i'm happy to add my
Tested-by tag.  If it's the latter, can we hotfix it before release?



> +	if (rc) {
> +		dev_err(&port->dev,
> +			"decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
> +			port->id, cxld->id, *dpa_base,
> +			*dpa_base + range_len(range) - 1, rc);
> +		return rc;
> +	}
> +	*dpa_base += range_len(range);
> +	cxled->state = CXL_DECODER_STATE_AUTO;
> +
>  	return 0;
>  }
>  
> @@ -779,7 +796,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  	} target_list;
>  
>  	if (should_emulate_decoders(info))
> -		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, which, info);
> +		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, dpa_base,
> +							which, info);
>  
>  	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which));
>  	base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which));
> 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-03-30  4:21   ` Dan Williams
@ 2023-03-29 17:20     ` Gregory Price
  2023-05-16  6:43       ` Gregory Price
  2023-03-29 17:21     ` Gregory Price
  1 sibling, 1 reply; 19+ messages in thread
From: Gregory Price @ 2023-03-29 17:20 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Dave Jiang

On Wed, Mar 29, 2023 at 09:21:25PM -0700, Dan Williams wrote:
> Gregory Price wrote:
> [..]
> > I have an example host where *dpa_base ends up being 0x0 here, and as a
> > result later down the line the region fails to register with this:
> 
> First off, thanks for testing.
> 
> > [   21.974117] cxl_pci 0000:3f:00.0: mem0:decoder1.0 no CXL window for range 0x0:0x1fffffffff
> 
> So, this debug message is a statement about *HPA*, DPA is the device
> local offset, it just so happens that in this case the HPA base and
> DPA base are aligned.
> 
> [..]
> > (full cxl log with bonus prints i added)
> > [   21.607034] cxl_pci 0000:3f:00.0: No component registers (-19)
> > [   21.641831] cxl_pci 0000:3f:00.0: DOE: [d80] failed to cache protocols : -5
> > [   21.642866] cxl_pci 0000:3f:00.0: Failed to create MB object for MB @ d80
> > [   21.643686] cxl_pci 0000:3f:00.0: Failed to request region 0x0000000000001fff-0x000000000010201e
> 
> I assume that this is the attempt to map the RAS component registers
> given the driver load did not fail. Would be interested to see this
> platform's /proc/iomem, but this is separate from the DPA mapping issue.
> Perhaps this device tries to define the RAS capability with another
> block that was already mapped?
> 

I'll snag more information on this tomorrow when I can do a bit more.

> > [   21.644371] cxl add_dpa_res: (0, 1fffffffff)
> > [   21.645540] cxl add_dpa_res: (2000000000, 1fffffffff)
> 
> Ok, this device has a 128G ram region and no pmem.
> 
> > [   21.965692] cxl hdm dvsec range: (0, 1fffffffff)
> > [   21.967045] cxl emulating decoders: dpa_base(0)   <-  *dpa_base
> 
> That's ok...
> 
> > [   21.967778] cxl_add_to_region: searching for root decoder with address range(0, 1fffffffff)
> 
> Now it switches to search for a CFMWS for the HPA that just to be
> identity mapped to DPA by accident.
> 
> > [   21.972824] cxl match decoder: found root decoder, r1(1050000000, 304fffffff) r2(0, 1fffffffff)
> 
> I am assuming that this means the only CXL window defined for this
> platform is 1050000000-304fffffff?

Yes sorry, r1() is the root decoder which maps to the CFMW, r2 is the
device dvsec defined range.  I actually wrote this out while i was
working towards the second half of the experiment, just left this here
for completeness incase i got something wrong.

> 
> > [   21.974117] cxl_pci 0000:3f:00.0: mem0:decoder1.0 no CXL window for range 0x0:0x1fffffffff
> > [   21.974905] cxl discover_region: failed to add to region: 0x0-0x1fffffffff
> 
> Driver is giving up for cause because either the BIOS failed to program
> the range registers of the device correctly, or it failed to publish a
> window for where the device is decoding.
> 
> > Ultimately having trouble deciding if this is something broken with
> > bios, the setup_hdm_decoder code, or the discover_region code.  I'm not
> > PCI guru, but we should expect the rdm dvsec range to be in the range of
> > the root decoder window / CFMW:  r1(1050000000, 304fffffff)
> 
> Yes, either that or a CFWMS for 0-1fffffffff
>

Given this system is single-socket, and has only a single CXL memory
expander connected to it, and that a CFMWS for 0-1fff.. seems
irrational, I'm going to go with the Range Register is wrong.

> > confirming this is the issue, i *forced* the dvsec range register to read
> > out base+0x1050000000
> 
> Nice!
> 
> > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> > index 7328a2552411..6fc6df0f7b5a 100644
> > --- a/drivers/cxl/core/pci.c
> > +++ b/drivers/cxl/core/pci.c
> > @@ -340,12 +344,16 @@ int cxl_dvsec_rr_decode(struct device *dev, int d,
> >                 if (rc)
> >                         return rc;
> > 
> >                 base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;
> > 
> >                 info->dvsec_range[i] = (struct range) {
> > -                       .start = base,
> > -                       .end = base + size - 1
> > +                       .start = 0x1050000000 + base,
> > +                       .end = 0x1050000000 + base + size - 1
> >                 };
> > 
> > 
> > And this resulted in everything working "as one would expect"
> > 
> > [user@host0 ~]# ls /sys/bus/cxl/devices/
> > dax_region0  decoder0.0  decoder1.0  endpoint1  mem0  region0  root0
> > 
> > [user@host0 ~]# numactl --hardware
> > available: 2 nodes (0-1)
> > node 1 cpus:
> > node 1 size: 0 MB
> > node 1 free: 0 MB
> > node distances:
> > node   0   1
> >   0:  10  50
> >   1:  255  10
> > 
> > [user@host0 ~]# echo online_movable > /sys/bus/node/devices/node1/memory33/state
> > [user@host0 ~]# numactl --hardware
> > available: 2 nodes (0-1)
> > node 1 size: 2048 MB
> > node 1 free: 2048 MB
> > node distances:
> > node   0   1
> >   0:  10  50
> >   1:  255  10
> > 
> > 
> > Basically the question is:
> > 
> > Is the DVSEC Range Register expected to be programmed by bios, and are
> > not being programmed correctly?
> 
> This debug experiment makes me think perhaps the *device* is at fault,
> not the BIOS. Perhaps the device accepts writes to CXL_DVSEC_RANGE_BASE
> to set up the decode as expected, but reads return 0? That's the only
> way that I can see that forcing that offset results in successfully
> talking to memory.
> 

Noteably, this is the same system in which i discovered a bios bug with
the CEDT.CMFW base.  I also have a patch in the kernel that corrects the
CFMW interleave target indexing.

So it's entirely possible that the bios is not programming things
correctly and simply leaving the device indexed at 0x0, while the device
sets up most everything else.

But I will investigate a bit further.

> > Or is there something else missing here to correct for the CMFW base?
> [..]
> > If it's the former, then this patch set is gtg and i'm happy to add my
> > Tested-by tag.  If it's the latter, can we hotfix it before release?
> 
> Yeah, FWIW, I don't see anything wrong from the kernel side, and would
> be curious what the BIOS reports if it reads back what it wrote to the
> CXL DVSEC range registers.

At this point I concur, you can add my tested-by to each of the patches
on this line.  I will reach out to the platform and device vendors to
investigate further.  Thanks for the input.

Tested-by: Gregory Price <gregory.price@memverge.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-03-30  4:21   ` Dan Williams
  2023-03-29 17:20     ` Gregory Price
@ 2023-03-29 17:21     ` Gregory Price
  2023-03-30  6:33       ` Dan Williams
  1 sibling, 1 reply; 19+ messages in thread
From: Gregory Price @ 2023-03-29 17:21 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Dave Jiang

On Wed, Mar 29, 2023 at 09:21:25PM -0700, Dan Williams wrote:
> > Is the DVSEC Range Register expected to be programmed by bios, and are
> > not being programmed correctly?
> 
> This debug experiment makes me think perhaps the *device* is at fault,
> not the BIOS. Perhaps the device accepts writes to CXL_DVSEC_RANGE_BASE
> to set up the decode as expected, but reads return 0? That's the only
> way that I can see that forcing that offset results in successfully
> talking to memory.
> 

Oh, i meant to add that i tested whether the memory is accessible via
numactl --membind=1 with both memhog and a python prompt, and things
worked just fine.  So memory works.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
@ 2023-03-29 21:35 Dan Williams
  2023-03-29 16:04 ` Gregory Price
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Dan Williams @ 2023-03-29 21:35 UTC (permalink / raw)
  To: linux-cxl; +Cc: Dave Jiang

One motivation for mapping range registers to decoder objects is
to use those settings for region autodiscovery.

The need to map a region for devices programmed to use range registers
is especially urgent now that the kernel no longer routes "Soft
Reserved" ranges in the memory map to device-dax by default. The CXL
memory range loses all access mechanisms.

Complete the implementation by filling out ways and granularity, marking
the DPA reservation, and setting the endpoint-decoder state to signal
autodiscovery.

Fixes: 09d09e04d2fc ("cxl/dax: Create dax devices for CXL RAM regions")
Tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/hdm.c |   30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 9884b6d4d930..5339c0719177 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -738,20 +738,26 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
 	return 0;
 }
 
-static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
-					    struct cxl_decoder *cxld, int which,
-					    struct cxl_endpoint_dvsec_info *info)
+static int cxl_setup_hdm_decoder_from_dvsec(
+	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
+	int which, struct cxl_endpoint_dvsec_info *info)
 {
+	struct cxl_endpoint_decoder *cxled;
+	struct range *range;
+	int rc;
+
 	if (!is_cxl_endpoint(port))
 		return -EOPNOTSUPP;
 
-	if (!range_len(&info->dvsec_range[which]))
+	cxled = to_cxl_endpoint_decoder(&cxld->dev);
+	range = &info->dvsec_range[which];
+	if (!range_len(range))
 		return -ENOENT;
 
 	cxld->target_type = CXL_DECODER_EXPANDER;
 	cxld->commit = NULL;
 	cxld->reset = NULL;
-	cxld->hpa_range = info->dvsec_range[which];
+	cxld->hpa_range = *range;
 
 	/*
 	 * Set the emulated decoder as locked pending additional support to
@@ -760,6 +766,17 @@ static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
 	cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
 	port->commit_end = cxld->id;
 
+	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, range_len(range), 0);
+	if (rc) {
+		dev_err(&port->dev,
+			"decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
+			port->id, cxld->id, *dpa_base,
+			*dpa_base + range_len(range) - 1, rc);
+		return rc;
+	}
+	*dpa_base += range_len(range);
+	cxled->state = CXL_DECODER_STATE_AUTO;
+
 	return 0;
 }
 
@@ -779,7 +796,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 	} target_list;
 
 	if (should_emulate_decoders(info))
-		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, which, info);
+		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, dpa_base,
+							which, info);
 
 	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which));
 	base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which));


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-03-29 21:35 [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration Dan Williams
  2023-03-29 16:04 ` Gregory Price
@ 2023-03-30  0:06 ` Dave Jiang
  2023-03-30 17:00 ` Jonathan Cameron
  2023-04-03 23:06 ` [PATCH v2] " Dan Williams
  3 siblings, 0 replies; 19+ messages in thread
From: Dave Jiang @ 2023-03-30  0:06 UTC (permalink / raw)
  To: Dan Williams, linux-cxl



On 3/29/23 2:35 PM, Dan Williams wrote:
> One motivation for mapping range registers to decoder objects is
> to use those settings for region autodiscovery.
> 
> The need to map a region for devices programmed to use range registers
> is especially urgent now that the kernel no longer routes "Soft
> Reserved" ranges in the memory map to device-dax by default. The CXL
> memory range loses all access mechanisms.
> 
> Complete the implementation by filling out ways and granularity, marking

Where is the ways and granularity set done in code?

DJ

> the DPA reservation, and setting the endpoint-decoder state to signal
> autodiscovery.
> 
> Fixes: 09d09e04d2fc ("cxl/dax: Create dax devices for CXL RAM regions")
> Tested-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>   drivers/cxl/core/hdm.c |   30 ++++++++++++++++++++++++------
>   1 file changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 9884b6d4d930..5339c0719177 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -738,20 +738,26 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
>   	return 0;
>   }
>   
> -static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
> -					    struct cxl_decoder *cxld, int which,
> -					    struct cxl_endpoint_dvsec_info *info)
> +static int cxl_setup_hdm_decoder_from_dvsec(
> +	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
> +	int which, struct cxl_endpoint_dvsec_info *info)
>   {
> +	struct cxl_endpoint_decoder *cxled;
> +	struct range *range;
> +	int rc;
> +
>   	if (!is_cxl_endpoint(port))
>   		return -EOPNOTSUPP;
>   
> -	if (!range_len(&info->dvsec_range[which]))
> +	cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +	range = &info->dvsec_range[which];
> +	if (!range_len(range))
>   		return -ENOENT;
>   
>   	cxld->target_type = CXL_DECODER_EXPANDER;
>   	cxld->commit = NULL;
>   	cxld->reset = NULL;
> -	cxld->hpa_range = info->dvsec_range[which];
> +	cxld->hpa_range = *range;
>   
>   	/*
>   	 * Set the emulated decoder as locked pending additional support to
> @@ -760,6 +766,17 @@ static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
>   	cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
>   	port->commit_end = cxld->id;
>   
> +	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, range_len(range), 0);
> +	if (rc) {
> +		dev_err(&port->dev,
> +			"decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
> +			port->id, cxld->id, *dpa_base,
> +			*dpa_base + range_len(range) - 1, rc);
> +		return rc;
> +	}
> +	*dpa_base += range_len(range);
> +	cxled->state = CXL_DECODER_STATE_AUTO;
> +
>   	return 0;
>   }
>   
> @@ -779,7 +796,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>   	} target_list;
>   
>   	if (should_emulate_decoders(info))
> -		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, which, info);
> +		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, dpa_base,
> +							which, info);
>   
>   	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which));
>   	base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which));
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-03-29 16:04 ` Gregory Price
@ 2023-03-30  4:21   ` Dan Williams
  2023-03-29 17:20     ` Gregory Price
  2023-03-29 17:21     ` Gregory Price
  0 siblings, 2 replies; 19+ messages in thread
From: Dan Williams @ 2023-03-30  4:21 UTC (permalink / raw)
  To: Gregory Price, Dan Williams; +Cc: linux-cxl, Dave Jiang

Gregory Price wrote:
[..]
> I have an example host where *dpa_base ends up being 0x0 here, and as a
> result later down the line the region fails to register with this:

First off, thanks for testing.

> [   21.974117] cxl_pci 0000:3f:00.0: mem0:decoder1.0 no CXL window for range 0x0:0x1fffffffff

So, this debug message is a statement about *HPA*, DPA is the device
local offset, it just so happens that in this case the HPA base and
DPA base are aligned.

[..]
> (full cxl log with bonus prints i added)
> [   21.607034] cxl_pci 0000:3f:00.0: No component registers (-19)
> [   21.641831] cxl_pci 0000:3f:00.0: DOE: [d80] failed to cache protocols : -5
> [   21.642866] cxl_pci 0000:3f:00.0: Failed to create MB object for MB @ d80
> [   21.643686] cxl_pci 0000:3f:00.0: Failed to request region 0x0000000000001fff-0x000000000010201e

I assume that this is the attempt to map the RAS component registers
given the driver load did not fail. Would be interested to see this
platform's /proc/iomem, but this is separate from the DPA mapping issue.
Perhaps this device tries to define the RAS capability with another
block that was already mapped?

> [   21.644371] cxl add_dpa_res: (0, 1fffffffff)
> [   21.645540] cxl add_dpa_res: (2000000000, 1fffffffff)

Ok, this device has a 128G ram region and no pmem.

> [   21.965692] cxl hdm dvsec range: (0, 1fffffffff)
> [   21.967045] cxl emulating decoders: dpa_base(0)   <-  *dpa_base

That's ok...

> [   21.967778] cxl_add_to_region: searching for root decoder with address range(0, 1fffffffff)

Now it switches to search for a CFMWS for the HPA that just to be
identity mapped to DPA by accident.

> [   21.972824] cxl match decoder: found root decoder, r1(1050000000, 304fffffff) r2(0, 1fffffffff)

I am assuming that this means the only CXL window defined for this
platform is 1050000000-304fffffff?

> [   21.974117] cxl_pci 0000:3f:00.0: mem0:decoder1.0 no CXL window for range 0x0:0x1fffffffff
> [   21.974905] cxl discover_region: failed to add to region: 0x0-0x1fffffffff

Driver is giving up for cause because either the BIOS failed to program
the range registers of the device correctly, or it failed to publish a
window for where the device is decoding.

> Ultimately having trouble deciding if this is something broken with
> bios, the setup_hdm_decoder code, or the discover_region code.  I'm not
> PCI guru, but we should expect the rdm dvsec range to be in the range of
> the root decoder window / CFMW:  r1(1050000000, 304fffffff)

Yes, either that or a CFWMS for 0-1fffffffff

> confirming this is the issue, i *forced* the dvsec range register to read
> out base+0x1050000000

Nice!

> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 7328a2552411..6fc6df0f7b5a 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -340,12 +344,16 @@ int cxl_dvsec_rr_decode(struct device *dev, int d,
>                 if (rc)
>                         return rc;
> 
>                 base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;
> 
>                 info->dvsec_range[i] = (struct range) {
> -                       .start = base,
> -                       .end = base + size - 1
> +                       .start = 0x1050000000 + base,
> +                       .end = 0x1050000000 + base + size - 1
>                 };
> 
> 
> And this resulted in everything working "as one would expect"
> 
> [user@host0 ~]# ls /sys/bus/cxl/devices/
> dax_region0  decoder0.0  decoder1.0  endpoint1  mem0  region0  root0
> 
> [user@host0 ~]# numactl --hardware
> available: 2 nodes (0-1)
> node 1 cpus:
> node 1 size: 0 MB
> node 1 free: 0 MB
> node distances:
> node   0   1
>   0:  10  50
>   1:  255  10
> 
> [user@host0 ~]# echo online_movable > /sys/bus/node/devices/node1/memory33/state
> [user@host0 ~]# numactl --hardware
> available: 2 nodes (0-1)
> node 1 size: 2048 MB
> node 1 free: 2048 MB
> node distances:
> node   0   1
>   0:  10  50
>   1:  255  10
> 
> 
> Basically the question is:
> 
> Is the DVSEC Range Register expected to be programmed by bios, and are
> not being programmed correctly?

This debug experiment makes me think perhaps the *device* is at fault,
not the BIOS. Perhaps the device accepts writes to CXL_DVSEC_RANGE_BASE
to set up the decode as expected, but reads return 0? That's the only
way that I can see that forcing that offset results in successfully
talking to memory.

> Or is there something else missing here to correct for the CMFW base?
[..]
> If it's the former, then this patch set is gtg and i'm happy to add my
> Tested-by tag.  If it's the latter, can we hotfix it before release?

Yeah, FWIW, I don't see anything wrong from the kernel side, and would
be curious what the BIOS reports if it reads back what it wrote to the
CXL DVSEC range registers.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-03-30  6:33       ` Dan Williams
@ 2023-03-30  4:27         ` Gregory Price
  2023-04-16  4:05           ` Gregory Price
  0 siblings, 1 reply; 19+ messages in thread
From: Gregory Price @ 2023-03-30  4:27 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Dave Jiang

On Wed, Mar 29, 2023 at 11:33:05PM -0700, Dan Williams wrote:
> Gregory Price wrote:
> > On Wed, Mar 29, 2023 at 09:21:25PM -0700, Dan Williams wrote:
> > > > Is the DVSEC Range Register expected to be programmed by bios, and are
> > > > not being programmed correctly?
> > > 
> > > This debug experiment makes me think perhaps the *device* is at fault,
> > > not the BIOS. Perhaps the device accepts writes to CXL_DVSEC_RANGE_BASE
> > > to set up the decode as expected, but reads return 0? That's the only
> > > way that I can see that forcing that offset results in successfully
> > > talking to memory.
> > > 
> > 
> > Oh, i meant to add that i tested whether the memory is accessible via
> > numactl --membind=1 with both memhog and a python prompt, and things
> > worked just fine.  So memory works.
> 
> One other theory is that the device is correct, but the platform CXL
> window accepts transactions at an offset and then removes that offset
> when transmitting the address down the CXL port. So device thinks its
> decoding 0x0 and never sees the offset removed by the host bridge.

Wouldn't that be against the spec?  I thought the device intended to
receive HPA and do its decode accordingly.

Otherwise you could have multiple devices programmed capable of decoding
0x0, which is already the device address, so there's nothing to
"decode".

I'll follow up as I learn more, this is concerning.  Certainly explains
why every time I switch hardware nothing seems to work quite right.

~Gregory

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-03-29 17:21     ` Gregory Price
@ 2023-03-30  6:33       ` Dan Williams
  2023-03-30  4:27         ` Gregory Price
  0 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2023-03-30  6:33 UTC (permalink / raw)
  To: Gregory Price, Dan Williams; +Cc: linux-cxl, Dave Jiang

Gregory Price wrote:
> On Wed, Mar 29, 2023 at 09:21:25PM -0700, Dan Williams wrote:
> > > Is the DVSEC Range Register expected to be programmed by bios, and are
> > > not being programmed correctly?
> > 
> > This debug experiment makes me think perhaps the *device* is at fault,
> > not the BIOS. Perhaps the device accepts writes to CXL_DVSEC_RANGE_BASE
> > to set up the decode as expected, but reads return 0? That's the only
> > way that I can see that forcing that offset results in successfully
> > talking to memory.
> > 
> 
> Oh, i meant to add that i tested whether the memory is accessible via
> numactl --membind=1 with both memhog and a python prompt, and things
> worked just fine.  So memory works.

One other theory is that the device is correct, but the platform CXL
window accepts transactions at an offset and then removes that offset
when transmitting the address down the CXL port. So device thinks its
decoding 0x0 and never sees the offset removed by the host bridge.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-03-29 21:35 [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration Dan Williams
  2023-03-29 16:04 ` Gregory Price
  2023-03-30  0:06 ` Dave Jiang
@ 2023-03-30 17:00 ` Jonathan Cameron
  2023-03-30 18:24   ` Dan Williams
  2023-04-03 23:06 ` [PATCH v2] " Dan Williams
  3 siblings, 1 reply; 19+ messages in thread
From: Jonathan Cameron @ 2023-03-30 17:00 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Dave Jiang

On Wed, 29 Mar 2023 14:35:55 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> One motivation for mapping range registers to decoder objects is
> to use those settings for region autodiscovery.
> 
> The need to map a region for devices programmed to use range registers
> is especially urgent now that the kernel no longer routes "Soft
> Reserved" ranges in the memory map to device-dax by default. The CXL
> memory range loses all access mechanisms.
> 
> Complete the implementation by filling out ways and granularity, marking
> the DPA reservation, and setting the endpoint-decoder state to signal
> autodiscovery.
> 
> Fixes: 09d09e04d2fc ("cxl/dax: Create dax devices for CXL RAM regions")
> Tested-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/cxl/core/hdm.c |   30 ++++++++++++++++++++++++------
>  1 file changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 9884b6d4d930..5339c0719177 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -738,20 +738,26 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
>  	return 0;
>  }
>  
> -static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
> -					    struct cxl_decoder *cxld, int which,
> -					    struct cxl_endpoint_dvsec_info *info)
> +static int cxl_setup_hdm_decoder_from_dvsec(
> +	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
> +	int which, struct cxl_endpoint_dvsec_info *info)
>  {
> +	struct cxl_endpoint_decoder *cxled;
> +	struct range *range;
> +	int rc;
> +
>  	if (!is_cxl_endpoint(port))
>  		return -EOPNOTSUPP;
>  
> -	if (!range_len(&info->dvsec_range[which]))
> +	cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +	range = &info->dvsec_range[which];
> +	if (!range_len(range))

You call range_len(range) quite a few times in here.
I'd be tempted to suggest a more minor diff of

	u64 len;
	...

	len = range_len(&info->dvsec_range[which]);

then use that throughout and don't bother with
the local range variable.

range is a bit odd to keep in a local variable as
it's not immediately obvious it's an hpa range - obviously you
could call it hpa_range which would help, but feels like
the length is the important bit except in one place.

>  		return -ENOENT;
>  
>  	cxld->target_type = CXL_DECODER_EXPANDER;
>  	cxld->commit = NULL;
>  	cxld->reset = NULL;
> -	cxld->hpa_range = info->dvsec_range[which];
> +	cxld->hpa_range = *range;
>  
>  	/*
>  	 * Set the emulated decoder as locked pending additional support to
> @@ -760,6 +766,17 @@ static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
>  	cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
>  	port->commit_end = cxld->id;
>  
> +	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, range_len(range), 0);
> +	if (rc) {
> +		dev_err(&port->dev,
> +			"decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
> +			port->id, cxld->id, *dpa_base,
> +			*dpa_base + range_len(range) - 1, rc);
> +		return rc;
> +	}
> +	*dpa_base += range_len(range);
> +	cxled->state = CXL_DECODER_STATE_AUTO;
> +
>  	return 0;
>  }
>  
> @@ -779,7 +796,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  	} target_list;
>  
>  	if (should_emulate_decoders(info))
> -		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, which, info);
> +		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, dpa_base,
> +							which, info);
>  
>  	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which));
>  	base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which));
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-03-30 17:00 ` Jonathan Cameron
@ 2023-03-30 18:24   ` Dan Williams
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Williams @ 2023-03-30 18:24 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams; +Cc: linux-cxl, Dave Jiang

Jonathan Cameron wrote:
> On Wed, 29 Mar 2023 14:35:55 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > One motivation for mapping range registers to decoder objects is
> > to use those settings for region autodiscovery.
> > 
> > The need to map a region for devices programmed to use range registers
> > is especially urgent now that the kernel no longer routes "Soft
> > Reserved" ranges in the memory map to device-dax by default. The CXL
> > memory range loses all access mechanisms.
> > 
> > Complete the implementation by filling out ways and granularity, marking
> > the DPA reservation, and setting the endpoint-decoder state to signal
> > autodiscovery.
> > 
> > Fixes: 09d09e04d2fc ("cxl/dax: Create dax devices for CXL RAM regions")
> > Tested-by: Dave Jiang <dave.jiang@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/cxl/core/hdm.c |   30 ++++++++++++++++++++++++------
> >  1 file changed, 24 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 9884b6d4d930..5339c0719177 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -738,20 +738,26 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
> >  	return 0;
> >  }
> >  
> > -static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
> > -					    struct cxl_decoder *cxld, int which,
> > -					    struct cxl_endpoint_dvsec_info *info)
> > +static int cxl_setup_hdm_decoder_from_dvsec(
> > +	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
> > +	int which, struct cxl_endpoint_dvsec_info *info)
> >  {
> > +	struct cxl_endpoint_decoder *cxled;
> > +	struct range *range;
> > +	int rc;
> > +
> >  	if (!is_cxl_endpoint(port))
> >  		return -EOPNOTSUPP;
> >  
> > -	if (!range_len(&info->dvsec_range[which]))
> > +	cxled = to_cxl_endpoint_decoder(&cxld->dev);
> > +	range = &info->dvsec_range[which];
> > +	if (!range_len(range))
> 
> You call range_len(range) quite a few times in here.
> I'd be tempted to suggest a more minor diff of
> 
> 	u64 len;
> 	...
> 
> 	len = range_len(&info->dvsec_range[which]);
> 
> then use that throughout and don't bother with
> the local range variable.
> 
> range is a bit odd to keep in a local variable as
> it's not immediately obvious it's an hpa range - obviously you
> could call it hpa_range which would help, but feels like
> the length is the important bit except in one place.

Done.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-03-29 21:35 [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration Dan Williams
                   ` (2 preceding siblings ...)
  2023-03-30 17:00 ` Jonathan Cameron
@ 2023-04-03 23:06 ` Dan Williams
  2023-04-03 23:44   ` Dave Jiang
  2023-04-04  9:19   ` Jonathan Cameron
  3 siblings, 2 replies; 19+ messages in thread
From: Dan Williams @ 2023-04-03 23:06 UTC (permalink / raw)
  To: linux-cxl; +Cc: Dave Jiang, Gregory Price

One motivation for mapping range registers to decoder objects is
to use those settings for region autodiscovery.

The need to map a region for devices programmed to use range registers
is especially urgent now that the kernel no longer routes "Soft
Reserved" ranges in the memory map to device-dax by default. The CXL
memory range loses all access mechanisms.

Complete the implementation by filling out ways and granularity, marking
the DPA reservation, and setting the endpoint-decoder state to signal
autodiscovery.

Fixes: 09d09e04d2fc ("cxl/dax: Create dax devices for CXL RAM regions")
Tested-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Gregory Price <gregory.price@memverge.com>
Link: https://lore.kernel.org/r/168012575521.221280.14177293493678527326.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Changes since v1:
- Swap out the local @range variable which was mostly only used for the
  mapping length with a @len variable for that purpose (Jonathan)

 drivers/cxl/core/hdm.c |   27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 9884b6d4d930..02cc2c38b44b 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -738,14 +738,20 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
 	return 0;
 }
 
-static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
-					    struct cxl_decoder *cxld, int which,
-					    struct cxl_endpoint_dvsec_info *info)
+static int cxl_setup_hdm_decoder_from_dvsec(
+	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
+	int which, struct cxl_endpoint_dvsec_info *info)
 {
+	struct cxl_endpoint_decoder *cxled;
+	u64 len;
+	int rc;
+
 	if (!is_cxl_endpoint(port))
 		return -EOPNOTSUPP;
 
-	if (!range_len(&info->dvsec_range[which]))
+	cxled = to_cxl_endpoint_decoder(&cxld->dev);
+	len = range_len(&info->dvsec_range[which]);
+	if (!len)
 		return -ENOENT;
 
 	cxld->target_type = CXL_DECODER_EXPANDER;
@@ -760,6 +766,16 @@ static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
 	cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
 	port->commit_end = cxld->id;
 
+	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, len, 0);
+	if (rc) {
+		dev_err(&port->dev,
+			"decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
+			port->id, cxld->id, *dpa_base, *dpa_base + len - 1, rc);
+		return rc;
+	}
+	*dpa_base += len;
+	cxled->state = CXL_DECODER_STATE_AUTO;
+
 	return 0;
 }
 
@@ -779,7 +795,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 	} target_list;
 
 	if (should_emulate_decoders(info))
-		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, which, info);
+		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, dpa_base,
+							which, info);
 
 	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which));
 	base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which));


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-04-03 23:06 ` [PATCH v2] " Dan Williams
@ 2023-04-03 23:44   ` Dave Jiang
  2023-04-04  0:08     ` Dan Williams
  2023-04-04  9:19   ` Jonathan Cameron
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Jiang @ 2023-04-03 23:44 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: Gregory Price



On 4/3/23 4:06 PM, Dan Williams wrote:
> One motivation for mapping range registers to decoder objects is
> to use those settings for region autodiscovery.
> 
> The need to map a region for devices programmed to use range registers
> is especially urgent now that the kernel no longer routes "Soft
> Reserved" ranges in the memory map to device-dax by default. The CXL
> memory range loses all access mechanisms.
> 
> Complete the implementation by filling out ways and granularity, marking
> the DPA reservation, and setting the endpoint-decoder state to signal
> autodiscovery.

If you don't mind making a note that the default values of ways and 
granularity are coming from cxl_decode_init(). Thanks.

> 
> Fixes: 09d09e04d2fc ("cxl/dax: Create dax devices for CXL RAM regions")
> Tested-by: Dave Jiang <dave.jiang@intel.com>
> Tested-by: Gregory Price <gregory.price@memverge.com>
> Link: https://lore.kernel.org/r/168012575521.221280.14177293493678527326.stgit@dwillia2-xfh.jf.intel.com
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
> Changes since v1:
> - Swap out the local @range variable which was mostly only used for the
>    mapping length with a @len variable for that purpose (Jonathan)
> 
>   drivers/cxl/core/hdm.c |   27 ++++++++++++++++++++++-----
>   1 file changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 9884b6d4d930..02cc2c38b44b 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -738,14 +738,20 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
>   	return 0;
>   }
>   
> -static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
> -					    struct cxl_decoder *cxld, int which,
> -					    struct cxl_endpoint_dvsec_info *info)
> +static int cxl_setup_hdm_decoder_from_dvsec(
> +	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
> +	int which, struct cxl_endpoint_dvsec_info *info)
>   {
> +	struct cxl_endpoint_decoder *cxled;
> +	u64 len;
> +	int rc;
> +
>   	if (!is_cxl_endpoint(port))
>   		return -EOPNOTSUPP;
>   
> -	if (!range_len(&info->dvsec_range[which]))
> +	cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +	len = range_len(&info->dvsec_range[which]);
> +	if (!len)
>   		return -ENOENT;
>   
>   	cxld->target_type = CXL_DECODER_EXPANDER;
> @@ -760,6 +766,16 @@ static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
>   	cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
>   	port->commit_end = cxld->id;
>   
> +	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, len, 0);
> +	if (rc) {
> +		dev_err(&port->dev,
> +			"decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
> +			port->id, cxld->id, *dpa_base, *dpa_base + len - 1, rc);
> +		return rc;
> +	}
> +	*dpa_base += len;
> +	cxled->state = CXL_DECODER_STATE_AUTO;
> +
>   	return 0;
>   }
>   
> @@ -779,7 +795,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>   	} target_list;
>   
>   	if (should_emulate_decoders(info))
> -		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, which, info);
> +		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, dpa_base,
> +							which, info);
>   
>   	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which));
>   	base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which));
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-04-03 23:44   ` Dave Jiang
@ 2023-04-04  0:08     ` Dan Williams
  2023-04-04  0:16       ` Dave Jiang
  0 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2023-04-04  0:08 UTC (permalink / raw)
  To: Dave Jiang, Dan Williams, linux-cxl; +Cc: Gregory Price

Dave Jiang wrote:
> 
> 
> On 4/3/23 4:06 PM, Dan Williams wrote:
> > One motivation for mapping range registers to decoder objects is
> > to use those settings for region autodiscovery.
> > 
> > The need to map a region for devices programmed to use range registers
> > is especially urgent now that the kernel no longer routes "Soft
> > Reserved" ranges in the memory map to device-dax by default. The CXL
> > memory range loses all access mechanisms.
> > 
> > Complete the implementation by filling out ways and granularity, marking
> > the DPA reservation, and setting the endpoint-decoder state to signal
> > autodiscovery.
> 
> If you don't mind making a note that the default values of ways and 
> granularity are coming from cxl_decode_init(). Thanks.

Changed this paragraph to:

Complete the implementation by marking the DPA reservation and setting
the endpoint-decoder state to signal autodiscovery. Note that the
default settings of ways=1 and granularity=4096 set in cxl_decode_init()
do not need to be updated.

> Reviewed-by: Dave Jiang <dave.jiang@intel.com>

Thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-04-04  0:08     ` Dan Williams
@ 2023-04-04  0:16       ` Dave Jiang
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Jiang @ 2023-04-04  0:16 UTC (permalink / raw)
  To: Dan Williams, linux-cxl; +Cc: Gregory Price



On 4/3/23 5:08 PM, Dan Williams wrote:
> Dave Jiang wrote:
>>
>>
>> On 4/3/23 4:06 PM, Dan Williams wrote:
>>> One motivation for mapping range registers to decoder objects is
>>> to use those settings for region autodiscovery.
>>>
>>> The need to map a region for devices programmed to use range registers
>>> is especially urgent now that the kernel no longer routes "Soft
>>> Reserved" ranges in the memory map to device-dax by default. The CXL
>>> memory range loses all access mechanisms.
>>>
>>> Complete the implementation by filling out ways and granularity, marking
>>> the DPA reservation, and setting the endpoint-decoder state to signal
>>> autodiscovery.
>>
>> If you don't mind making a note that the default values of ways and
>> granularity are coming from cxl_decode_init(). Thanks.
> 
> Changed this paragraph to:
> 
> Complete the implementation by marking the DPA reservation and setting
> the endpoint-decoder state to signal autodiscovery. Note that the
> default settings of ways=1 and granularity=4096 set in cxl_decode_init()
> do not need to be updated.

LGTM

> 
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> 
> Thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-04-03 23:06 ` [PATCH v2] " Dan Williams
  2023-04-03 23:44   ` Dave Jiang
@ 2023-04-04  9:19   ` Jonathan Cameron
  1 sibling, 0 replies; 19+ messages in thread
From: Jonathan Cameron @ 2023-04-04  9:19 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Dave Jiang, Gregory Price

On Mon, 03 Apr 2023 16:06:14 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> One motivation for mapping range registers to decoder objects is
> to use those settings for region autodiscovery.
> 
> The need to map a region for devices programmed to use range registers
> is especially urgent now that the kernel no longer routes "Soft
> Reserved" ranges in the memory map to device-dax by default. The CXL
> memory range loses all access mechanisms.
> 
> Complete the implementation by filling out ways and granularity, marking
> the DPA reservation, and setting the endpoint-decoder state to signal
> autodiscovery.
> 
> Fixes: 09d09e04d2fc ("cxl/dax: Create dax devices for CXL RAM regions")
> Tested-by: Dave Jiang <dave.jiang@intel.com>
> Tested-by: Gregory Price <gregory.price@memverge.com>
> Link: https://lore.kernel.org/r/168012575521.221280.14177293493678527326.stgit@dwillia2-xfh.jf.intel.com
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
LGTM

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
> Changes since v1:
> - Swap out the local @range variable which was mostly only used for the
>   mapping length with a @len variable for that purpose (Jonathan)
> 
>  drivers/cxl/core/hdm.c |   27 ++++++++++++++++++++++-----
>  1 file changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 9884b6d4d930..02cc2c38b44b 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -738,14 +738,20 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
>  	return 0;
>  }
>  
> -static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
> -					    struct cxl_decoder *cxld, int which,
> -					    struct cxl_endpoint_dvsec_info *info)
> +static int cxl_setup_hdm_decoder_from_dvsec(
> +	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
> +	int which, struct cxl_endpoint_dvsec_info *info)
>  {
> +	struct cxl_endpoint_decoder *cxled;
> +	u64 len;
> +	int rc;
> +
>  	if (!is_cxl_endpoint(port))
>  		return -EOPNOTSUPP;
>  
> -	if (!range_len(&info->dvsec_range[which]))
> +	cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +	len = range_len(&info->dvsec_range[which]);
> +	if (!len)
>  		return -ENOENT;
>  
>  	cxld->target_type = CXL_DECODER_EXPANDER;
> @@ -760,6 +766,16 @@ static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
>  	cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
>  	port->commit_end = cxld->id;
>  
> +	rc = devm_cxl_dpa_reserve(cxled, *dpa_base, len, 0);
> +	if (rc) {
> +		dev_err(&port->dev,
> +			"decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
> +			port->id, cxld->id, *dpa_base, *dpa_base + len - 1, rc);
> +		return rc;
> +	}
> +	*dpa_base += len;
> +	cxled->state = CXL_DECODER_STATE_AUTO;
> +
>  	return 0;
>  }
>  
> @@ -779,7 +795,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>  	} target_list;
>  
>  	if (should_emulate_decoders(info))
> -		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, which, info);
> +		return cxl_setup_hdm_decoder_from_dvsec(port, cxld, dpa_base,
> +							which, info);
>  
>  	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which));
>  	base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which));
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-03-30  4:27         ` Gregory Price
@ 2023-04-16  4:05           ` Gregory Price
  2023-04-18  5:51             ` Dan Williams
  0 siblings, 1 reply; 19+ messages in thread
From: Gregory Price @ 2023-04-16  4:05 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Dave Jiang

On Thu, Mar 30, 2023 at 12:27:02AM -0400, Gregory Price wrote:
> On Wed, Mar 29, 2023 at 11:33:05PM -0700, Dan Williams wrote:
> > Gregory Price wrote:
> > > On Wed, Mar 29, 2023 at 09:21:25PM -0700, Dan Williams wrote:
> > > > > Is the DVSEC Range Register expected to be programmed by bios, and are
> > > > > not being programmed correctly?
> > > > 
> > > > This debug experiment makes me think perhaps the *device* is at fault,
> > > > not the BIOS. Perhaps the device accepts writes to CXL_DVSEC_RANGE_BASE
> > > > to set up the decode as expected, but reads return 0? That's the only
> > > > way that I can see that forcing that offset results in successfully
> > > > talking to memory.
> > > > 
> > > 
> > > Oh, i meant to add that i tested whether the memory is accessible via
> > > numactl --membind=1 with both memhog and a python prompt, and things
> > > worked just fine.  So memory works.
> > 
> > One other theory is that the device is correct, but the platform CXL
> > window accepts transactions at an offset and then removes that offset
> > when transmitting the address down the CXL port. So device thinks its
> > decoding 0x0 and never sees the offset removed by the host bridge.
> 
> Wouldn't that be against the spec?  I thought the device intended to
> receive HPA and do its decode accordingly.
> 
> Otherwise you could have multiple devices programmed capable of decoding
> 0x0, which is already the device address, so there's nothing to
> "decode".
> 
> I'll follow up as I learn more, this is concerning.  Certainly explains
> why every time I switch hardware nothing seems to work quite right.
> 
> ~Gregory


Follow up on this:  I attempted to actually write Memory_Base_Lo/Hi with
the appriopriate values, validated the write went through, and then
utilize the memory.

This test lead to a total system lockup, implying a major bus error
and/or device failure.  No stack trace or any indication of exactly what
went wrong - but it's pretty obvious there's a device issue.

if you remember, my initial test was to simply shift the base to the
base of the CXL Fixed Memory Window:

@@ -342,6 +344,7 @@ int cxl_dvsec_rr_decode(struct device *dev, int d,

                base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;

+               base += 0x4050000000;
                info->dvsec_range[i] = (struct range) {
                        .start = base,
                        .end = base + size - 1


To this I added some writebacks to the registers to set them what they
"should be" (i.e. what the BIOS should have done).


The other thing i noticed was a combination of DVSEC flags that seemed
interesting:

DVSEC CXL Capability: c0de
Mem_HwInit_mode == 1  (harware + firmware on device do init)

DVSEC CXL Range 1 Size Low: 804
memory_info_valid == 0
memory_active == 0

So this appears to have been BOTH a BIOS *and* a device issue, but we
should also be checking these bits before using the info from these
registers.



Something to note from the Spec: RCD Discovery (9.11.6)

4. If Mem_HwInit_Mode=1
... snip ...

- Each HDM range is later exposed to the OS as a separate, memory-only
  NUMA node via ACPI SRAT.

— System Firmware obtains CDAT from the UEFI device driver or directly
  from the device via Table Access DOE (see Section 8.1.11) and then
  uses this information during construction of the memory map, ACPI
  SRAT, and ACPI HMAT. See ACPI Specification, CDAT Specification, and
  UEFI Specification for further details.

It also sounds like trying to place this device in dax-device mode isn't
really the intended use-case, and if that's the case maybe that should
be disallowed?



All this considered:  It's still head-scratching why shifting the HPA
read from the range register successfully produces a "working device",
but I suppose that's the definition of "undefined behavior" :]

~Gregory

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-04-16  4:05           ` Gregory Price
@ 2023-04-18  5:51             ` Dan Williams
  2023-04-20  0:50               ` Gregory Price
  0 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2023-04-18  5:51 UTC (permalink / raw)
  To: Gregory Price, Dan Williams; +Cc: linux-cxl, Dave Jiang

Gregory Price wrote:
> On Thu, Mar 30, 2023 at 12:27:02AM -0400, Gregory Price wrote:
> > On Wed, Mar 29, 2023 at 11:33:05PM -0700, Dan Williams wrote:
> > > Gregory Price wrote:
> > > > On Wed, Mar 29, 2023 at 09:21:25PM -0700, Dan Williams wrote:
> > > > > > Is the DVSEC Range Register expected to be programmed by bios, and are
> > > > > > not being programmed correctly?
> > > > > 
> > > > > This debug experiment makes me think perhaps the *device* is at fault,
> > > > > not the BIOS. Perhaps the device accepts writes to CXL_DVSEC_RANGE_BASE
> > > > > to set up the decode as expected, but reads return 0? That's the only
> > > > > way that I can see that forcing that offset results in successfully
> > > > > talking to memory.
> > > > > 
> > > > 
> > > > Oh, i meant to add that i tested whether the memory is accessible via
> > > > numactl --membind=1 with both memhog and a python prompt, and things
> > > > worked just fine.  So memory works.
> > > 
> > > One other theory is that the device is correct, but the platform CXL
> > > window accepts transactions at an offset and then removes that offset
> > > when transmitting the address down the CXL port. So device thinks its
> > > decoding 0x0 and never sees the offset removed by the host bridge.
> > 
> > Wouldn't that be against the spec?  I thought the device intended to
> > receive HPA and do its decode accordingly.
> > 
> > Otherwise you could have multiple devices programmed capable of decoding
> > 0x0, which is already the device address, so there's nothing to
> > "decode".
> > 
> > I'll follow up as I learn more, this is concerning.  Certainly explains
> > why every time I switch hardware nothing seems to work quite right.
> > 
> > ~Gregory
> 
> 
> Follow up on this:  I attempted to actually write Memory_Base_Lo/Hi with
> the appriopriate values, validated the write went through, and then
> utilize the memory.
> 
> This test lead to a total system lockup, implying a major bus error
> and/or device failure.  No stack trace or any indication of exactly what
> went wrong - but it's pretty obvious there's a device issue.
> 
> if you remember, my initial test was to simply shift the base to the
> base of the CXL Fixed Memory Window:
> 
> @@ -342,6 +344,7 @@ int cxl_dvsec_rr_decode(struct device *dev, int d,
> 
>                 base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;
> 
> +               base += 0x4050000000;
>                 info->dvsec_range[i] = (struct range) {
>                         .start = base,
>                         .end = base + size - 1
> 
> 
> To this I added some writebacks to the registers to set them what they
> "should be" (i.e. what the BIOS should have done).
> 
> 
> The other thing i noticed was a combination of DVSEC flags that seemed
> interesting:
> 
> DVSEC CXL Capability: c0de
> Mem_HwInit_mode == 1  (harware + firmware on device do init)
> 
> DVSEC CXL Range 1 Size Low: 804
> memory_info_valid == 0
> memory_active == 0

Is this CXL_DVSEC_RANGE_SIZE_LOW(1) or literally "DVSEC CXL Range 1 Size
Low" which Linux calls the offset CXL_DVSEC_RANGE_SIZE_LOW(0)? If its
the former, it's valid for the second range to be disabled. Linux stack
should have failed cxl_await_media_ready() otherwise if this is
referring to CXL_DVSEC_RANGE_SIZE_LOW(0).

> 
> So this appears to have been BOTH a BIOS *and* a device issue, but we
> should also be checking these bits before using the info from these
> registers.
> 
> 
> 
> Something to note from the Spec: RCD Discovery (9.11.6)
> 
> 4. If Mem_HwInit_Mode=1
> ... snip ...
> 
> - Each HDM range is later exposed to the OS as a separate, memory-only
>   NUMA node via ACPI SRAT.
> 
> — System Firmware obtains CDAT from the UEFI device driver or directly
>   from the device via Table Access DOE (see Section 8.1.11) and then
>   uses this information during construction of the memory map, ACPI
>   SRAT, and ACPI HMAT. See ACPI Specification, CDAT Specification, and
>   UEFI Specification for further details.
> 
> It also sounds like trying to place this device in dax-device mode isn't
> really the intended use-case, and if that's the case maybe that should
> be disallowed?

I don't follow. Any CXL.mem in the end is just a physical address range,
any physical address range can be passed to dax or the core-mm.

> All this considered:  It's still head-scratching why shifting the HPA
> read from the range register successfully produces a "working device",
> but I suppose that's the definition of "undefined behavior" :]

The needs the platform vendor to weigh in, but that may not be possible
if this is evaluation hardware.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-04-18  5:51             ` Dan Williams
@ 2023-04-20  0:50               ` Gregory Price
  0 siblings, 0 replies; 19+ messages in thread
From: Gregory Price @ 2023-04-20  0:50 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Dave Jiang

On Mon, Apr 17, 2023 at 10:51:41PM -0700, Dan Williams wrote:
> > DVSEC CXL Range 1 Size Low: 804
> > memory_info_valid == 0
> > memory_active == 0
> 
> Is this CXL_DVSEC_RANGE_SIZE_LOW(1) or literally "DVSEC CXL Range 1 Size
> Low" which Linux calls the offset CXL_DVSEC_RANGE_SIZE_LOW(0)? If its
> the former, it's valid for the second range to be disabled. Linux stack
> should have failed cxl_await_media_ready() otherwise if this is
> referring to CXL_DVSEC_RANGE_SIZE_LOW(0).
>

disregard, i was looking at size_low(0) but i misread the output, the
real value was 804b, so the lower 2 bits were set (which is what is
expected).

> > All this considered:  It's still head-scratching why shifting the HPA
> > read from the range register successfully produces a "working device",
> > but I suppose that's the definition of "undefined behavior" :]
> 
> The needs the platform vendor to weigh in, but that may not be possible
> if this is evaluation hardware.

Mostly i'm scratching my head at the fact that the device is programmed
with base:0x0 size:2gb where the HPA is almost certainly not 0.  I
originally attempted to set the value of the register to the expected
HPA and that caused what i presume was a machine check (haven't
investigated further yet).

What's confusing about this is one assumes that the CPU puts the HPA on
the bus (0x1080000000 in this case) but the base register has 0x0 -
which should produce a failure.  Instead it works.

Just a real head-scratcher, but i'm going to close this as "hardware's
jacked" at this point.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration
  2023-03-29 17:20     ` Gregory Price
@ 2023-05-16  6:43       ` Gregory Price
  0 siblings, 0 replies; 19+ messages in thread
From: Gregory Price @ 2023-05-16  6:43 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Dave Jiang

On Wed, Mar 29, 2023 at 01:20:03PM -0400, Gregory Price wrote:
> 
> Given this system is single-socket, and has only a single CXL memory
> expander connected to it, and that a CFMWS for 0-1fff.. seems
> irrational, I'm going to go with the Range Register is wrong.
> 
> > > confirming this is the issue, i *forced* the dvsec range register to read
> > > out base+0x1050000000
> > 
> > Nice!
> > 
> > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> > > index 7328a2552411..6fc6df0f7b5a 100644
> > > --- a/drivers/cxl/core/pci.c
> > > +++ b/drivers/cxl/core/pci.c
> > > @@ -340,12 +344,16 @@ int cxl_dvsec_rr_decode(struct device *dev, int d,
> > >                 if (rc)
> > >                         return rc;
> > > 
> > >                 base |= temp & CXL_DVSEC_MEM_BASE_LOW_MASK;
> > > 
> > >                 info->dvsec_range[i] = (struct range) {
> > > -                       .start = base,
> > > -                       .end = base + size - 1
> > > +                       .start = 0x1050000000 + base,
> > > +                       .end = 0x1050000000 + base + size - 1
> > >                 };
> > > 
> > > 
> > > And this resulted in everything working "as one would expect"
> > > 
> > > [user@host0 ~]# ls /sys/bus/cxl/devices/
> > > dax_region0  decoder0.0  decoder1.0  endpoint1  mem0  region0  root0
> > > 
> > > [user@host0 ~]# numactl --hardware
> > > available: 2 nodes (0-1)
> > > node 1 cpus:
> > > node 1 size: 0 MB
> > > node 1 free: 0 MB
> > > node distances:
> > > node   0   1
> > >   0:  10  50
> > >   1:  255  10
> > > 
> > > [user@host0 ~]# echo online_movable > /sys/bus/node/devices/node1/memory33/state
> > > [user@host0 ~]# numactl --hardware
> > > available: 2 nodes (0-1)
> > > node 1 size: 2048 MB
> > > node 1 free: 2048 MB
> > > node distances:
> > > node   0   1
> > >   0:  10  50
> > >   1:  255  10
> > > 
> > > 
> > > Basically the question is:
> > > 
> > > Is the DVSEC Range Register expected to be programmed by bios, and are
> > > not being programmed correctly?
> > 
> > This debug experiment makes me think perhaps the *device* is at fault,
> > not the BIOS. Perhaps the device accepts writes to CXL_DVSEC_RANGE_BASE
> > to set up the decode as expected, but reads return 0? That's the only
> > way that I can see that forcing that offset results in successfully
> > talking to memory.
> > 
> 

Wanted to follow this up with some additional information.

The reason this patch / debug test worked is because the platform

1) Does explicitly programs the range register base with 0x0.

and

2) Does address translation on the root complex and puts 0-indexed
   addresses on the wire to the target devices.

So under this platform, one would expect all of the RCD's to have their
range register filled in with a 0x0 base.

This is frustrating and this won't work with the existing driver code,
because the driver code expects the device's to have the correct range
register value set.

The only thing I can think of is adding code here to check whether this
situation is occurring, and then attempting to read information from the
RCH topology to fill in the info->dvsec_range[*] with the correct value.

I'm not really sure what that would look like, nor whether that's a
defensible patch given that the spec wants RR encoded with the HPA, not
a 0-indexed address.

Just food for thought, my frustration is palpable at the moment with
this problem.

~Gregory

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-05-16 15:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-29 21:35 [PATCH] cxl/hdm: Extend DVSEC range register emulation for region enumeration Dan Williams
2023-03-29 16:04 ` Gregory Price
2023-03-30  4:21   ` Dan Williams
2023-03-29 17:20     ` Gregory Price
2023-05-16  6:43       ` Gregory Price
2023-03-29 17:21     ` Gregory Price
2023-03-30  6:33       ` Dan Williams
2023-03-30  4:27         ` Gregory Price
2023-04-16  4:05           ` Gregory Price
2023-04-18  5:51             ` Dan Williams
2023-04-20  0:50               ` Gregory Price
2023-03-30  0:06 ` Dave Jiang
2023-03-30 17:00 ` Jonathan Cameron
2023-03-30 18:24   ` Dan Williams
2023-04-03 23:06 ` [PATCH v2] " Dan Williams
2023-04-03 23:44   ` Dave Jiang
2023-04-04  0:08     ` Dan Williams
2023-04-04  0:16       ` Dave Jiang
2023-04-04  9:19   ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).