All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Hot ADD using CXL1.1 host
@ 2023-01-31 15:19 Shesha Sreenivasamurthy
  2023-01-31 17:07 ` Dan Williams
  0 siblings, 1 reply; 5+ messages in thread
From: Shesha Sreenivasamurthy @ 2023-01-31 15:19 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-kernel, linux-cxl



On Mon, Jan 30, 2023 at 2:00 PM Dan Williams <dan.j.williams@intel.com> wrote:
> 
> Hi Shesha, Linux email expectations are to not top post, i.e. respond
> inline, like below:
> 
> Shesha Sreenivasamurthy wrote:
>> The re-configuration does not reset the device. It does re-program the PCIe
>> DVSEC for CXL Device register (Section 8.1.3 CXL 2.0 spec Pg 258), register
>> (DVSEC vendor ID 0x1E98, DCSEC ID 0x0).
>> “So you need to dynamically recreate the region, especially if your step 10
>> above resets the device.”
>> Do you mean the DAX region ?
> 
> No, I mean the CXL region.
> 
>> If so, I can if the system stays up. After a few seconds the system
>> crashes. Can the crash be because of a mismatch between DVSEC
>> information with what kernel was informed by BIOS during boot (Some
>> ACPI tables ?)
> 
> My concern is that the platform memory decode configuration is not
> prepared for the CXL device to claim more than what was originally
> programmed in the CXL DVSEC range registers. One of the platform
> firmware updates for CXL 2.0 was the creation of the CFMWS (CXL Fixed
> Memory Window Structure) in the ACPI CEDT (CXL Early Discovery Table).
> That structure indicates which platform address ranges decode to which
> CXL host bridges. Those windows are defined in platform specific
> registersi (not enumerated to the OS). If the window is only 8GB then
> the endpoint device can not decode more. You would need to reboot to get
> the BIOS to allocate more host address space for CXL.
> 
> The expectation for newer platforms is that platform firmware define
> CFMWS such that there is spare capacity in the address map for the OS to
> dynmaically map more CXL.

There seems to be some instability in using DAX. When the system is given all the device memory using efi=nosoftreserve, the stressapptest (https://github.com/stressapptest/stressapptest) runs for an extended period of time. However, when the system is booted without efi=nosoftreserve, and assigned the special purpose memory to system-ram using daxctl, the system crashes after some time (20-30 mins). Is there any known instabilities when using DAX?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Hot ADD using CXL1.1 host
  2023-01-31 15:19 Hot ADD using CXL1.1 host Shesha Sreenivasamurthy
@ 2023-01-31 17:07 ` Dan Williams
  0 siblings, 0 replies; 5+ messages in thread
From: Dan Williams @ 2023-01-31 17:07 UTC (permalink / raw)
  To: Shesha Sreenivasamurthy, Dan Williams; +Cc: linux-kernel, linux-cxl

Shesha Sreenivasamurthy wrote:
[..]
> There seems to be some instability in using DAX. When the system is
> given all the device memory using efi=nosoftreserve, the stressapptest
> (https://github.com/stressapptest/stressapptest) runs for an extended
> period of time. However, when the system is booted without
> efi=nosoftreserve, and assigned the special purpose memory to
> system-ram using daxctl, the system crashes after some time (20-30
> mins). Is there any known instabilities when using DAX?

One difference with late binding of memory is where kernel data
structures are allocated. So the stress profile can change based on
kernel activity. Otherwise there is no known instability with delaying
the online of memory.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Hot ADD using CXL1.1 host
       [not found]     ` <CABL7MgHENj0FcT+whjdskOmqPfNa0zo8gQobpZ0mrW3a3NY_pA@mail.gmail.com>
@ 2023-01-30 22:00       ` Dan Williams
  0 siblings, 0 replies; 5+ messages in thread
From: Dan Williams @ 2023-01-30 22:00 UTC (permalink / raw)
  To: Shesha Sreenivasamurthy, Dan Williams; +Cc: linux-kernel, linux-cxl

Hi Shesha, Linux email expectations are to not top post, i.e. respond
inline, like below:

Shesha Sreenivasamurthy wrote:
> The re-configuration does not reset the device. It does re-program the PCIe
> DVSEC for CXL Device register (Section 8.1.3 CXL 2.0 spec Pg 258), register
> (DVSEC vendor ID 0x1E98, DCSEC ID 0x0).
> 
> 
> 
> “So you need to dynamically recreate the region, especially if your step 10
> above resets the device.”
> 
> 
> 
> Do you mean the DAX region ?

No, I mean the CXL region.

> If so, I can if the system stays up. After a few seconds the system
> crashes. Can the crash be because of a mismatch between DVSEC
> information with what kernel was informed by BIOS during boot (Some
> ACPI tables ?)

My concern is that the platform memory decode configuration is not
prepared for the CXL device to claim more than what was originally
programmed in the CXL DVSEC range registers. One of the platform
firmware updates for CXL 2.0 was the creation of the CFMWS (CXL Fixed
Memory Window Structure) in the ACPI CEDT (CXL Early Discovery Table).
That structure indicates which platform address ranges decode to which
CXL host bridges. Those windows are defined in platform specific
registersi (not enumerated to the OS). If the window is only 8GB then
the endpoint device can not decode more. You would need to reboot to get
the BIOS to allocate more host address space for CXL.

The expectation for newer platforms is that platform firmware define
CFMWS such that there is spare capacity in the address map for the OS to
dynmaically map more CXL.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Hot ADD using CXL1.1 host
  2023-01-30 20:31   ` Dan Williams
@ 2023-01-30 21:19     ` Shesha Sreenivasamurthy
       [not found]     ` <CABL7MgHENj0FcT+whjdskOmqPfNa0zo8gQobpZ0mrW3a3NY_pA@mail.gmail.com>
  1 sibling, 0 replies; 5+ messages in thread
From: Shesha Sreenivasamurthy @ 2023-01-30 21:19 UTC (permalink / raw)
  To: linux-cxl; +Cc: linux-kernel

The re-configuration does not reset the device. It does re-program the
PCIe DVSEC for CXL Device register (Section 8.1.3 CXL 2.0 spec Pg
258), register (DVSEC vendor ID 0x1E98, DCSEC ID 0x0).

“So you need to dynamically recreate the region, especially if your
step 10 above resets the device.”

Do you mean the DAX region ? If so, I can if the system stays up.
After a few seconds the system crashes. Can the crash be because of a
mismatch between DVSEC information with what kernel was informed by
BIOS during boot (Some ACPI tables ?)


Thanks,
Shesha.


On Mon, Jan 30, 2023 at 12:51 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> [ add linux-cxl@vger.kernel.org ]
>
> Hi Shesha, I missed this earlier because it does not appear in my
> "linux-cxl" filter. In general, mail to linux-kernel does not get great
> response from domain-specific experts, so I recommend going to the
> domain specific list like linux-cxl@ in this. Comments below:
>
> Shesha Bhushan Sreenivasamurthy wrote:
> > From: Shesha Bhushan Sreenivasamurthy <sheshas@marvell.com>
> > Date: Thursday, January 26, 2023 at 6:05 PM
> > To: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
> > Subject: Hot ADD using CXL1.1 host
> > Hi All,
> >
> > In our setup, the host is a CXL1.1 running fedora 6.1 kernel. This is
> > connected to a Marvell CXL 2.0 Type-3 memory pooling device. The goal
> > for me is to dynamically change the memory configuration without
> > rebooting the host or the memory device.
> >
> > The approach that I am currently taking is to use dax. I configured
> > the memory device to export 8G and the host sees 8G. I am successful
> > to convert the memory from ‘devdax’ to ‘system-ram’ mode so that
> > general application can use it. At this time, I modify the memory on
> > our memory device to export 16G and host crashes in few mins. The
> > steps I followed are the following
> >
> >
> >   1.  Configure my memory device to export 8G
> >   2.  Boot host. BIOS populates SRAT table with size 8G.
> >   3.  daxctl list --regions --devices -u // Shows 8G
> >   4.  sudo daxctl reconfigure-device --mode=system-ram dax0.0 -f
> >   5.  Use memory in my application
>
> Ok up to this point, no interaction with the CXL enabling. This is just
> the default kernel behavior with a BIOS that applies the EFI_MEMORY_SP
> attribute to an address range.
>
> >   6.  ---- RECONFIGURATION PART ----
> >   7.  sudo daxctl offline-memory dax0.0
> >   8.  sudo daxctl destroy-device  dax0.0 -f // All numa node memory mappings are gone
> >   9.  sudo sh -c "echo 1 > /sys/bus/pci/devices/0000\:38\:00.0/remove"
>
> Note that this only takes care of the software side, the CXL hardware /
> decoder side is not touched.
>
> >   10. Reconfigure memory device to be 16G
>
> Does this reset the device?
>
> >   11. sudo sh -c "echo 1 > /sys/bus/pci/rescan"
> >      *   CXL DEVSEC (Cap ID 0x23, DVSEC VendorID 0x1E98, DVSEC-ID: 0x0) shows size to be 16G 😊
> >   12. daxctl list --regions --devices -u
> >      *   This still shows 8G ☹
>
> Yes, because there is currently no hookup between the CXL subsystem and
> device-dax, but I am working on that:
>
> https://lore.kernel.org/linux-cxl/63d21ce66e5c_ea22229446@dwillia2-xfh.jf.intel.com.notmuch/
>
> >   13. System crashes
> >
> > There is a mismatch between what DXL is seeing and what PCI DVSEC is
> > saying. Looks like I am missing some step so that old 8G information
> > is removed from the system. Can someone advise ?
>
> So you need to dynamically recreate the region, especially if your step
> 10 above resets the device.
>
> > Now, I can try the following
> >
> >   1.  Power off memory device
> >   2.  Power on and boot my host
> >   3.  Power on memory device
> >   4.  Configure the memory device to have 8G
> >   5.  Follow the above 5-12 commands
> >
> > With this, the question I have is – will the host recognize the PCI
> > device as CXL device and run cxl.mem protocol or will it just see it
> > as PCIe device ? Note that the host is CXL1.1.
>
> Does your device support the HDM decoder capability? As it stands the
> driver expects to use HDM decoders for region creation rather than CXL
> DVSEC range registers.
>
> My expectation is that once the ram-region creation work is done you
> should be able to do something like:
>
> cxl disable-region $region
> cxl disable-memdev $memdev
> modprobe -r cxl_pci
> <reconfigure device>
> modprobe cxl_pci
> cxl create-region ...
>
> ...and be back up and running with a new region with the update size.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Hot ADD using CXL1.1 host
       [not found] ` <DM6PR18MB2844505042F7EDCF69CE08DBAFD39@DM6PR18MB2844.namprd18.prod.outlook.com>
@ 2023-01-30 20:31   ` Dan Williams
  2023-01-30 21:19     ` Shesha Sreenivasamurthy
       [not found]     ` <CABL7MgHENj0FcT+whjdskOmqPfNa0zo8gQobpZ0mrW3a3NY_pA@mail.gmail.com>
  0 siblings, 2 replies; 5+ messages in thread
From: Dan Williams @ 2023-01-30 20:31 UTC (permalink / raw)
  To: Shesha Bhushan Sreenivasamurthy, linux-kernel, linux-cxl; +Cc: Dan Williams

[ add linux-cxl@vger.kernel.org ]

Hi Shesha, I missed this earlier because it does not appear in my
"linux-cxl" filter. In general, mail to linux-kernel does not get great
response from domain-specific experts, so I recommend going to the
domain specific list like linux-cxl@ in this. Comments below:

Shesha Bhushan Sreenivasamurthy wrote:
> From: Shesha Bhushan Sreenivasamurthy <sheshas@marvell.com>
> Date: Thursday, January 26, 2023 at 6:05 PM
> To: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
> Subject: Hot ADD using CXL1.1 host
> Hi All,
> 
> In our setup, the host is a CXL1.1 running fedora 6.1 kernel. This is
> connected to a Marvell CXL 2.0 Type-3 memory pooling device. The goal
> for me is to dynamically change the memory configuration without
> rebooting the host or the memory device.
> 
> The approach that I am currently taking is to use dax. I configured
> the memory device to export 8G and the host sees 8G. I am successful
> to convert the memory from ‘devdax’ to ‘system-ram’ mode so that
> general application can use it. At this time, I modify the memory on
> our memory device to export 16G and host crashes in few mins. The
> steps I followed are the following
> 
> 
>   1.  Configure my memory device to export 8G
>   2.  Boot host. BIOS populates SRAT table with size 8G.
>   3.  daxctl list --regions --devices -u // Shows 8G
>   4.  sudo daxctl reconfigure-device --mode=system-ram dax0.0 -f
>   5.  Use memory in my application

Ok up to this point, no interaction with the CXL enabling. This is just
the default kernel behavior with a BIOS that applies the EFI_MEMORY_SP
attribute to an address range.

>   6.  ---- RECONFIGURATION PART ----
>   7.  sudo daxctl offline-memory dax0.0
>   8.  sudo daxctl destroy-device  dax0.0 -f // All numa node memory mappings are gone
>   9.  sudo sh -c "echo 1 > /sys/bus/pci/devices/0000\:38\:00.0/remove"

Note that this only takes care of the software side, the CXL hardware /
decoder side is not touched.

>   10. Reconfigure memory device to be 16G

Does this reset the device?

>   11. sudo sh -c "echo 1 > /sys/bus/pci/rescan"
>      *   CXL DEVSEC (Cap ID 0x23, DVSEC VendorID 0x1E98, DVSEC-ID: 0x0) shows size to be 16G 😊
>   12. daxctl list --regions --devices -u
>      *   This still shows 8G ☹

Yes, because there is currently no hookup between the CXL subsystem and
device-dax, but I am working on that:

https://lore.kernel.org/linux-cxl/63d21ce66e5c_ea22229446@dwillia2-xfh.jf.intel.com.notmuch/

>   13. System crashes
> 
> There is a mismatch between what DXL is seeing and what PCI DVSEC is
> saying. Looks like I am missing some step so that old 8G information
> is removed from the system. Can someone advise ?

So you need to dynamically recreate the region, especially if your step
10 above resets the device.

> Now, I can try the following
> 
>   1.  Power off memory device
>   2.  Power on and boot my host
>   3.  Power on memory device
>   4.  Configure the memory device to have 8G
>   5.  Follow the above 5-12 commands
> 
> With this, the question I have is – will the host recognize the PCI
> device as CXL device and run cxl.mem protocol or will it just see it
> as PCIe device ? Note that the host is CXL1.1.

Does your device support the HDM decoder capability? As it stands the
driver expects to use HDM decoders for region creation rather than CXL
DVSEC range registers.

My expectation is that once the ram-region creation work is done you
should be able to do something like:

cxl disable-region $region
cxl disable-memdev $memdev
modprobe -r cxl_pci
<reconfigure device>
modprobe cxl_pci
cxl create-region ...

...and be back up and running with a new region with the update size.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-31 17:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-31 15:19 Hot ADD using CXL1.1 host Shesha Sreenivasamurthy
2023-01-31 17:07 ` Dan Williams
     [not found] <DM6PR18MB28441B6B542B2A59CF39BF6DAFCE9@DM6PR18MB2844.namprd18.prod.outlook.com>
     [not found] ` <DM6PR18MB2844505042F7EDCF69CE08DBAFD39@DM6PR18MB2844.namprd18.prod.outlook.com>
2023-01-30 20:31   ` Dan Williams
2023-01-30 21:19     ` Shesha Sreenivasamurthy
     [not found]     ` <CABL7MgHENj0FcT+whjdskOmqPfNa0zo8gQobpZ0mrW3a3NY_pA@mail.gmail.com>
2023-01-30 22:00       ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.