linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v1] cxl/pci: Fix debug message in cxl_probe_regs()
  2021-08-11 19:47 [PATCH v1] cxl/pci: Fix debug message in cxl_probe_regs() Li Qiang (Johnny Li)
@ 2021-08-11 15:38 ` Dan Williams
  2021-08-11 16:35   ` CXL Hot and Warm Reset Testing Chris Browy
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2021-08-11 15:38 UTC (permalink / raw)
  To: Li Qiang (Johnny Li)
  Cc: linux-cxl, Weiny, Ira, Ben Widawsky, Jonathan Cameron

On Wed, Aug 11, 2021 at 12:54 AM Li Qiang (Johnny Li)
<johnny.li@montage-tech.com> wrote:
>
> Indicator string for mbox and memdev register set to status
> incorrectly in error message.
>
> Signed-off-by: Li Qiang (Johnny Li) <johnny.li@montage-tech.com>

Looks good to me, applied.

> ---
>  drivers/cxl/pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 47315bb2db10..70e80237865c 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1032,8 +1032,8 @@ static int cxl_probe_regs(struct cxl_mem *cxlm, void __iomem *base,
>                     !dev_map->memdev.valid) {
>                         dev_err(dev, "registers not found: %s%s%s\n",
>                                 !dev_map->status.valid ? "status " : "",
> -                               !dev_map->mbox.valid ? "status " : "",
> -                               !dev_map->memdev.valid ? "status " : "");
> +                               !dev_map->mbox.valid ? "mbox " : "",
> +                               !dev_map->memdev.valid ? "memdev " : "");
>                         return -ENXIO;
>                 }
>
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* CXL Hot and Warm Reset Testing
  2021-08-11 15:38 ` Dan Williams
@ 2021-08-11 16:35   ` Chris Browy
  2021-08-12  1:08     ` Dan Williams
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Browy @ 2021-08-11 16:35 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-cxl, Ben Widawsky, Jonathan Cameron

Hi Dan,

Wondering if you could shed some light on OS orchestrated reset, Sec 9.3 of CXL 2.0 
spec.  How is initiated at OS-level?  I tried some conventional approach or using sysfs 
remove command on RC and then setting RC bridge control secondary bus reset (using 
setpci from Ben's pciutils) and can drive the system into PCIe host reset flow (LTSSM 
Recovery for hot reset) but the CXL host is supposed follow steps in 9.3 including to 
send CXL Power Management.Message RESETPREP REQUEST to CXL Device 
before all this.  None of this occurs.

I’m testing on pure QEMU emulation as well the the QEMU co-sim to real DUT.  This 
work also serves the needs for the CXLCV test software.

What is the proper procedure to test this from OS or user perspective?  Are there sysfs 
entries or other commands that are necessary for CXL compared to PCIe?

Is there something missing in QEMU CXL host bridge hardware point of view 
to automatically generate the VDM?

BTW we want to work through all the reset and power mode testing so hot reset is just the start.

For us the QEMU co-sim is a good approach for pre-silicon verification (if we can get the host 
part right).

Best Regards,
Chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v1] cxl/pci: Fix debug message in cxl_probe_regs()
@ 2021-08-11 19:47 Li Qiang (Johnny Li)
  2021-08-11 15:38 ` Dan Williams
  0 siblings, 1 reply; 10+ messages in thread
From: Li Qiang (Johnny Li) @ 2021-08-11 19:47 UTC (permalink / raw)
  To: linux-cxl
  Cc: ira.weiny, dan.j.williams, ben.widawsky, Jonathan.Cameron,
	Li Qiang (Johnny Li)

Indicator string for mbox and memdev register set to status
incorrectly in error message.

Signed-off-by: Li Qiang (Johnny Li) <johnny.li@montage-tech.com>
---
 drivers/cxl/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 47315bb2db10..70e80237865c 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1032,8 +1032,8 @@ static int cxl_probe_regs(struct cxl_mem *cxlm, void __iomem *base,
 		    !dev_map->memdev.valid) {
 			dev_err(dev, "registers not found: %s%s%s\n",
 				!dev_map->status.valid ? "status " : "",
-				!dev_map->mbox.valid ? "status " : "",
-				!dev_map->memdev.valid ? "status " : "");
+				!dev_map->mbox.valid ? "mbox " : "",
+				!dev_map->memdev.valid ? "memdev " : "");
 			return -ENXIO;
 		}
 
-- 
2.17.1




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: CXL Hot and Warm Reset Testing
  2021-08-11 16:35   ` CXL Hot and Warm Reset Testing Chris Browy
@ 2021-08-12  1:08     ` Dan Williams
  2021-08-13 17:01       ` Vikram Sethi
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2021-08-12  1:08 UTC (permalink / raw)
  To: Chris Browy; +Cc: linux-cxl, Ben Widawsky, Jonathan Cameron

On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com> wrote:
>
> Hi Dan,
>
> Wondering if you could shed some light on OS orchestrated reset, Sec 9.3 of CXL 2.0
> spec.  How is initiated at OS-level?

I have no idea what the spec means by "OS orchestrated reset flow".
Linux PCI core has no idea about special CXL device reset
requirements, it just attempts typical PCI reset methods starting with
FLR, escalating to D3 D0 toggle, and finally attempting secondary bus
reset if any errors were reported on the previous attempts.

> I tried some conventional approach or using sysfs
> remove command on RC and then setting RC bridge control secondary bus reset (using
> setpci from Ben's pciutils) and can drive the system into PCIe host reset flow (LTSSM
> Recovery for hot reset) but the CXL host is supposed follow steps in 9.3 including to
> send CXL Power Management.Message RESETPREP REQUEST to CXL Device
> before all this.  None of this occurs.
>
> I’m testing on pure QEMU emulation as well the the QEMU co-sim to real DUT.  This
> work also serves the needs for the CXLCV test software.

It's not clear to me how a QEMU behavioral model could send things
like the CXL PM VDM.

> What is the proper procedure to test this from OS or user perspective?  Are there sysfs
> entries or other commands that are necessary for CXL compared to PCIe?

/sys/bus/pci/devices/$device/reset is a method to trigger PCI device
reset, but I do not expect that will ever gain CXL specific knowledge.

>
> Is there something missing in QEMU CXL host bridge hardware point of view
> to automatically generate the VDM?

Certainly QEMU is only a behavioral model, and the VDM is a part of a
hardware functional protocol.

> BTW we want to work through all the reset and power mode testing so hot reset is just the start.
>
> For us the QEMU co-sim is a good approach for pre-silicon verification (if we can get the host
> part right).

It sounds promising, but it also sounds out of scope for what the
Linux driver can affect.  As far as I can see, the CXL specification
must assume that OS software treats CXL.io as typical PCIE for the
purposes of issuing resets.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: CXL Hot and Warm Reset Testing
  2021-08-12  1:08     ` Dan Williams
@ 2021-08-13 17:01       ` Vikram Sethi
  2021-08-13 17:14         ` Bjorn Helgaas
  0 siblings, 1 reply; 10+ messages in thread
From: Vikram Sethi @ 2021-08-13 17:01 UTC (permalink / raw)
  To: Dan Williams, Chris Browy
  Cc: linux-cxl, Ben Widawsky, Jonathan Cameron, Alex Williamson,
	Bjorn Helgaas, Shanker Donthineni

Hi Dan, 

> -----Original Message-----
> From: Dan Williams <dan.j.williams@intel.com>
> 
> On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> wrote:
> >
> 
> /sys/bus/pci/devices/$device/reset is a method to trigger PCI device reset,
> but I do not expect that will ever gain CXL specific knowledge.
> 
CXL reset may need some thought, specially for devices that don't expose FLR but do expose CXL reset (while former does not affect CXL.cache/mem, the latter wipes out CXL.cache/mem state in the device and there is discoverability as to whether or not memory contents can be cleared as part of CXL reset). We may need a way of triggering CXL reset from userspace, and if the existing /sys/bus/pci/devices/$device/reset won't have knowledge of CXL reset, there still should be a prioritized order in the kernel in which CXL reset is attempted before more drastic resets like SBR. IIRC CXL reset can also impact all functions that use CXL.cache/mem, but not legacy PCIe functions on the device which do not use CXL.cache/mem (there is discoverability as to which functions are not impacted by CXL reset). 

Thanks,
Vikram

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CXL Hot and Warm Reset Testing
  2021-08-13 17:01       ` Vikram Sethi
@ 2021-08-13 17:14         ` Bjorn Helgaas
  2021-08-13 21:27           ` Dan Williams
  2021-08-14 11:16           ` Amey Narkhede
  0 siblings, 2 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2021-08-13 17:14 UTC (permalink / raw)
  To: Vikram Sethi
  Cc: Dan Williams, Chris Browy, linux-cxl, Ben Widawsky,
	Jonathan Cameron, Alex Williamson, Bjorn Helgaas,
	Shanker Donthineni, Amey Narkhede, linux-pci

[+cc Amey (working on PCI resets), linux-pci]

On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> Hi Dan, 
> 
> > -----Original Message-----
> > From: Dan Williams <dan.j.williams@intel.com>
> > 
> > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> > wrote:
> > 
> > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > device reset, but I do not expect that will ever gain CXL specific
> > knowledge.
> > 
> CXL reset may need some thought, specially for devices that don't
> expose FLR but do expose CXL reset (while former does not affect
> CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> device and there is discoverability as to whether or not memory
> contents can be cleared as part of CXL reset). We may need a way of
> triggering CXL reset from userspace, and if the existing
> /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> reset, there still should be a prioritized order in the kernel in
> which CXL reset is attempted before more drastic resets like SBR.
> IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> but not legacy PCIe functions on the device which do not use
> CXL.cache/mem (there is discoverability as to which functions are
> not impacted by CXL reset). 
> 
> Thanks,
> Vikram

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CXL Hot and Warm Reset Testing
  2021-08-13 17:14         ` Bjorn Helgaas
@ 2021-08-13 21:27           ` Dan Williams
  2021-08-17  3:03             ` Vikram Sethi
  2021-08-14 11:16           ` Amey Narkhede
  1 sibling, 1 reply; 10+ messages in thread
From: Dan Williams @ 2021-08-13 21:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Vikram Sethi, Chris Browy, linux-cxl, Ben Widawsky,
	Jonathan Cameron, Alex Williamson, Bjorn Helgaas,
	Shanker Donthineni, Amey Narkhede, Linux PCI

On Fri, Aug 13, 2021 at 10:14 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Amey (working on PCI resets), linux-pci]
>
> On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> > Hi Dan,
> >
> > > -----Original Message-----
> > > From: Dan Williams <dan.j.williams@intel.com>
> > >
> > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> > > wrote:
> > >
> > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > > device reset, but I do not expect that will ever gain CXL specific
> > > knowledge.
> > >
> > CXL reset may need some thought, specially for devices that don't
> > expose FLR but do expose CXL reset (while former does not affect
> > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> > device and there is discoverability as to whether or not memory
> > contents can be cleared as part of CXL reset). We may need a way of
> > triggering CXL reset from userspace, and if the existing
> > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> > reset, there still should be a prioritized order in the kernel in
> > which CXL reset is attempted before more drastic resets like SBR.
> > IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> > but not legacy PCIe functions on the device which do not use
> > CXL.cache/mem (there is discoverability as to which functions are
> > not impacted by CXL reset).

What's the Linux use case for supporting CXL reset for a CXL memory
expander? PCI reset is useful for device assignment, and CXL reset
might be useful for similarly assigning an accelerator. CXL.mem on the
other hand can be directly assigned at a per-page level without also
needing to assign the device. How could a VM reliably program HDM
decoders when it cannot perceive the host physical address space? I
understand the utility of CXL reset for device bring-up and test
software that knows what it is doing can write config space directly,
but that software would assume all responsibility.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CXL Hot and Warm Reset Testing
  2021-08-13 17:14         ` Bjorn Helgaas
  2021-08-13 21:27           ` Dan Williams
@ 2021-08-14 11:16           ` Amey Narkhede
  2021-08-14 19:47             ` Dan Williams
  1 sibling, 1 reply; 10+ messages in thread
From: Amey Narkhede @ 2021-08-14 11:16 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Vikram Sethi, Chris Browy, linux-cxl, Ben Widawsky,
	Jonathan Cameron, Alex Williamson, Shanker Donthineni,
	Amey Narkhede, Linux PCI

On 21/08/13 12:14PM, Bjorn Helgaas wrote:
> [+cc Amey (working on PCI resets), linux-pci]
>
> On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> > Hi Dan,
> >
> > > -----Original Message-----
> > > From: Dan Williams <dan.j.williams@intel.com>
> > >
> > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> > > wrote:
> > >
> > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > > device reset, but I do not expect that will ever gain CXL specific
> > > knowledge.
> > >
> > CXL reset may need some thought, specially for devices that don't
> > expose FLR but do expose CXL reset (while former does not affect
> > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> > device and there is discoverability as to whether or not memory
> > contents can be cleared as part of CXL reset). We may need a way of
> > triggering CXL reset from userspace, and if the existing
> > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> > reset, there still should be a prioritized order in the kernel in
> > which CXL reset is attempted before more drastic resets like SBR.
> > IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> > but not legacy PCIe functions on the device which do not use
> > CXL.cache/mem (there is discoverability as to which functions are
> > not impacted by CXL reset).
> >
> > Thanks,
> > Vikram

We can add new reset method and expose it to userspace via new 'reset_method'
sysfs attribute introduced in this series
https://lore.kernel.org/linux-pci/20210805162917.3989-1-ameynarkhede03@gmail.com/

Thanks,
Amey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CXL Hot and Warm Reset Testing
  2021-08-14 11:16           ` Amey Narkhede
@ 2021-08-14 19:47             ` Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Williams @ 2021-08-14 19:47 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: Bjorn Helgaas, Vikram Sethi, Chris Browy, linux-cxl,
	Ben Widawsky, Jonathan Cameron, Alex Williamson,
	Shanker Donthineni, Linux PCI

On Sat, Aug 14, 2021 at 4:16 AM Amey Narkhede <ameynarkhede03@gmail.com> wrote:
>
> On 21/08/13 12:14PM, Bjorn Helgaas wrote:
> > [+cc Amey (working on PCI resets), linux-pci]
> >
> > On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> > > Hi Dan,
> > >
> > > > -----Original Message-----
> > > > From: Dan Williams <dan.j.williams@intel.com>
> > > >
> > > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> > > > wrote:
> > > >
> > > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > > > device reset, but I do not expect that will ever gain CXL specific
> > > > knowledge.
> > > >
> > > CXL reset may need some thought, specially for devices that don't
> > > expose FLR but do expose CXL reset (while former does not affect
> > > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> > > device and there is discoverability as to whether or not memory
> > > contents can be cleared as part of CXL reset). We may need a way of
> > > triggering CXL reset from userspace, and if the existing
> > > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> > > reset, there still should be a prioritized order in the kernel in
> > > which CXL reset is attempted before more drastic resets like SBR.
> > > IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> > > but not legacy PCIe functions on the device which do not use
> > > CXL.cache/mem (there is discoverability as to which functions are
> > > not impacted by CXL reset).
> > >
> > > Thanks,
> > > Vikram
>
> We can add new reset method and expose it to userspace via new 'reset_method'
> sysfs attribute introduced in this series
> https://lore.kernel.org/linux-pci/20210805162917.3989-1-ameynarkhede03@gmail.com/

It's not clear to me that's a suitable place for CXL reset though. CXL
reset wants to coordinate with the device's participation in a
potential interleave-set across multiple devices. So something like
/sys/bus/cxl/devices/memX/reset might be a better location for
coordinated CXL reset if needed. Again though, the primary use case
for userspace triggered reset is device assignment, and there are
better mechanisms to assign CXL.mem resources to a guest.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: CXL Hot and Warm Reset Testing
  2021-08-13 21:27           ` Dan Williams
@ 2021-08-17  3:03             ` Vikram Sethi
  0 siblings, 0 replies; 10+ messages in thread
From: Vikram Sethi @ 2021-08-17  3:03 UTC (permalink / raw)
  To: Dan Williams, Bjorn Helgaas
  Cc: Chris Browy, linux-cxl, Ben Widawsky, Jonathan Cameron,
	Alex Williamson, Bjorn Helgaas, Shanker Donthineni,
	Amey Narkhede, Linux PCI


> -----Original Message-----
> From: Dan Williams <dan.j.williams@intel.com>
> On Fri, Aug 13, 2021 at 10:14 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > [+cc Amey (working on PCI resets), linux-pci]
> >
> > On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> > > Hi Dan,
> > >
> > > > -----Original Message-----
> > > > From: Dan Williams <dan.j.williams@intel.com>
> > > >
> > > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy
> > > > <cbrowy@avery-design.com>
> > > > wrote:
> > > >
> > > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > > > device reset, but I do not expect that will ever gain CXL specific
> > > > knowledge.
> > > >
> > > CXL reset may need some thought, specially for devices that don't
> > > expose FLR but do expose CXL reset (while former does not affect
> > > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> > > device and there is discoverability as to whether or not memory
> > > contents can be cleared as part of CXL reset). We may need a way of
> > > triggering CXL reset from userspace, and if the existing
> > > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> > > reset, there still should be a prioritized order in the kernel in
> > > which CXL reset is attempted before more drastic resets like SBR.
> > > IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> > > but not legacy PCIe functions on the device which do not use
> > > CXL.cache/mem (there is discoverability as to which functions are
> > > not impacted by CXL reset).
> 
> What's the Linux use case for supporting CXL reset for a CXL memory
> expander? PCI reset is useful for device assignment, and CXL reset might be
> useful for similarly assigning an accelerator. CXL.mem on the other hand can
> be directly assigned at a per-page level without also needing to assign the
> device. How could a VM reliably program HDM decoders when it cannot
> perceive the host physical address space? I understand the utility of CXL
> reset for device bring-up and test software that knows what it is doing can
> write config space directly, but that software would assume all responsibility.

Agree that CXL reset will be needed for type1/2 CXL devices (accelerators) 
which will need a sysfs interface for userspace to use CXL reset. 
 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-08-17  3:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-11 19:47 [PATCH v1] cxl/pci: Fix debug message in cxl_probe_regs() Li Qiang (Johnny Li)
2021-08-11 15:38 ` Dan Williams
2021-08-11 16:35   ` CXL Hot and Warm Reset Testing Chris Browy
2021-08-12  1:08     ` Dan Williams
2021-08-13 17:01       ` Vikram Sethi
2021-08-13 17:14         ` Bjorn Helgaas
2021-08-13 21:27           ` Dan Williams
2021-08-17  3:03             ` Vikram Sethi
2021-08-14 11:16           ` Amey Narkhede
2021-08-14 19:47             ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).