linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] CXL Development Discussions
       [not found] <CGME20240506192712uscas1p225316f79bb69f979b647d2a06a00a25f@uscas1p2.samsung.com>
@ 2024-05-06 19:27 ` Adam Manzanares
  2024-05-06 20:28   ` Dave Jiang
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Adam Manzanares @ 2024-05-06 19:27 UTC (permalink / raw)
  To: lsf-pc, dan.j.williams, jonathan.cameron, dave, Fan Ni,
	dave.jiang, ira.weiny, alison.schofield, vishal.l.verma,
	gourry.memverge, wj28.lee, rientjes, ruansy.fnst, shradha.t,
	mcgrof, Jim Harris, mhocko
  Cc: linux-mm, linux-cxl, linux-pci

Hello all,

I would like to have a discussion with the CXL development community about
current outstanding issues and also invite developers interested in RAS and
memory tiering to participate.

The first topic I believe we should discuss is how we can ensure as a group
that we are prioritizing upstream work. On a recent upstream CXL development
discussion call there was a call to review more work. I apologize for not
grabbing the link, but I believe Dave Jiang is leveraging patchwork and this
link should be shared with others so we can help get more reviews where needed.

The second topic I would like to discuss is how we integrate RAS features that
have similar equivalents in the kernel. A CXL device can provide info about 
memory media errors in a similar fashion to memory controllers that have EDAC
support. Discussions have been put on the list and I would like to hear thoughts
from the community about where this should go [1]. On the same topic CXL has 
port level RAS features and the PCIe DW series touched on this issue  [2]

The third topic I would like to discuss is how we can get a set of common
benchmarks for memory tiering evaluations. Our team has done some initial
work in this space, but we want to hear more from end users about their 
workloads of concern. There was a proposal related to this topic, but from what 
I understand no meeting has been held [3]. 

The last topic that I believe is worth discussion is how do we come up with
a baseline for testing. I am aware of 3 efforts that could be used cxl_test, 
qemu, and uunit testing framework [4].

Apologies for getting this out late, and please include anyone that may be
interested in joining a discussion.

[1] https://lore.kernel.org/linux-cxl/20240417075053.3273543-1-ruansy.fnst@fujitsu.com/
[2] https://lore.kernel.org/lkml/20231130115044.53512-1-shradha.t@samsung.com/
[3] https://lore.kernel.org/all/2b29dd3d-bb2c-6a8c-94d2-d5c2e035516a@google.com
[4] https://lore.kernel.org/linux-cxl/170795677066.3697776.12587812713093908173.stgit@ubuntu/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-06 19:27 ` [LSF/MM/BPF TOPIC] CXL Development Discussions Adam Manzanares
@ 2024-05-06 20:28   ` Dave Jiang
  2024-05-06 22:58     ` Dan Williams
  2024-05-08 18:08     ` Adam Manzanares
  2024-05-06 23:47   ` Dan Williams
  2024-05-07 11:48   ` Michal Hocko
  2 siblings, 2 replies; 15+ messages in thread
From: Dave Jiang @ 2024-05-06 20:28 UTC (permalink / raw)
  To: Adam Manzanares, lsf-pc, dan.j.williams, jonathan.cameron, dave,
	Fan Ni, ira.weiny, alison.schofield, vishal.l.verma,
	gourry.memverge, wj28.lee, rientjes, ruansy.fnst, shradha.t,
	mcgrof, Jim Harris, mhocko
  Cc: linux-mm, linux-cxl, linux-pci



On 5/6/24 12:27 PM, Adam Manzanares wrote:
> Hello all,
> 
> I would like to have a discussion with the CXL development community about
> current outstanding issues and also invite developers interested in RAS and
> memory tiering to participate.
> 
> The first topic I believe we should discuss is how we can ensure as a group
> that we are prioritizing upstream work. On a recent upstream CXL development
> discussion call there was a call to review more work. I apologize for not
> grabbing the link, but I believe Dave Jiang is leveraging patchwork and this
> link should be shared with others so we can help get more reviews where needed.

Bundle for the potential fixes
https://patchwork.kernel.org/bundle/cxllinux/cxl-fixes/

Bundle for the next merge window
https://patchwork.kernel.org/bundle/cxllinux/cxl-next/

Just be aware patchwork only takes patches, so the bundle are registered with the first patch of a series. The listing does display the origin series.

DJ

> 
> The second topic I would like to discuss is how we integrate RAS features that
> have similar equivalents in the kernel. A CXL device can provide info about 
> memory media errors in a similar fashion to memory controllers that have EDAC
> support. Discussions have been put on the list and I would like to hear thoughts
> from the community about where this should go [1]. On the same topic CXL has 
> port level RAS features and the PCIe DW series touched on this issue  [2]
> 
> The third topic I would like to discuss is how we can get a set of common
> benchmarks for memory tiering evaluations. Our team has done some initial
> work in this space, but we want to hear more from end users about their 
> workloads of concern. There was a proposal related to this topic, but from what 
> I understand no meeting has been held [3]. 
> 
> The last topic that I believe is worth discussion is how do we come up with
> a baseline for testing. I am aware of 3 efforts that could be used cxl_test, 
> qemu, and uunit testing framework [4].
> 
> Apologies for getting this out late, and please include anyone that may be
> interested in joining a discussion.
> 
> [1] https://lore.kernel.org/linux-cxl/20240417075053.3273543-1-ruansy.fnst@fujitsu.com/
> [2] https://lore.kernel.org/lkml/20231130115044.53512-1-shradha.t@samsung.com/
> [3] https://lore.kernel.org/all/2b29dd3d-bb2c-6a8c-94d2-d5c2e035516a@google.com
> [4] https://lore.kernel.org/linux-cxl/170795677066.3697776.12587812713093908173.stgit@ubuntu/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-06 20:28   ` Dave Jiang
@ 2024-05-06 22:58     ` Dan Williams
  2024-05-08 18:08     ` Adam Manzanares
  1 sibling, 0 replies; 15+ messages in thread
From: Dan Williams @ 2024-05-06 22:58 UTC (permalink / raw)
  To: Dave Jiang, Adam Manzanares, lsf-pc, dan.j.williams,
	jonathan.cameron, dave, Fan Ni, ira.weiny, alison.schofield,
	vishal.l.verma, gourry.memverge, wj28.lee, rientjes, ruansy.fnst,
	shradha.t, mcgrof, Jim Harris, mhocko
  Cc: linux-mm, linux-cxl, linux-pci

Dave Jiang wrote:
> 
> 
> On 5/6/24 12:27 PM, Adam Manzanares wrote:
> > Hello all,
> > 
> > I would like to have a discussion with the CXL development community about
> > current outstanding issues and also invite developers interested in RAS and
> > memory tiering to participate.
> > 
> > The first topic I believe we should discuss is how we can ensure as a group
> > that we are prioritizing upstream work. On a recent upstream CXL development
> > discussion call there was a call to review more work. I apologize for not
> > grabbing the link, but I believe Dave Jiang is leveraging patchwork and this
> > link should be shared with others so we can help get more reviews where needed.
> 
> Bundle for the potential fixes
> https://patchwork.kernel.org/bundle/cxllinux/cxl-fixes/
> 
> Bundle for the next merge window
> https://patchwork.kernel.org/bundle/cxllinux/cxl-next/
> 
> Just be aware patchwork only takes patches, so the bundle are
> registered with the first patch of a series. The listing does display
> the origin series.

This seems solvable with a bit of scripting.

I went ahead and fixed up one of the series (CXL 1.1 link stattus) to
include all the patches in the bundle. I think it is useful to have a
one-stop shop for the queue and the state of review / testing.

Let me know if that makes things easier.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-06 19:27 ` [LSF/MM/BPF TOPIC] CXL Development Discussions Adam Manzanares
  2024-05-06 20:28   ` Dave Jiang
@ 2024-05-06 23:47   ` Dan Williams
  2024-05-07 18:50     ` Luis Chamberlain
  2024-05-08 18:26     ` Adam Manzanares
  2024-05-07 11:48   ` Michal Hocko
  2 siblings, 2 replies; 15+ messages in thread
From: Dan Williams @ 2024-05-06 23:47 UTC (permalink / raw)
  To: Adam Manzanares, lsf-pc, dan.j.williams, jonathan.cameron, dave,
	Fan Ni, dave.jiang, ira.weiny, alison.schofield, vishal.l.verma,
	gourry.memverge, wj28.lee, rientjes, ruansy.fnst, shradha.t,
	mcgrof, Jim Harris, mhocko
  Cc: linux-mm, linux-cxl, linux-pci

Adam Manzanares wrote:
> Hello all,
> 
> I would like to have a discussion with the CXL development community about
> current outstanding issues and also invite developers interested in RAS and
> memory tiering to participate.

Thanks for putting this together Adam!

> The first topic I believe we should discuss is how we can ensure as a group
> that we are prioritizing upstream work. On a recent upstream CXL development
> discussion call there was a call to review more work. I apologize for not
> grabbing the link, but I believe Dave Jiang is leveraging patchwork and this
> link should be shared with others so we can help get more reviews where needed.

Dave already replied here but one thing I will add is help keeping an
eye out for things that should be in queue. Likely a good way to
do that is send a note along with a review so both get reflected in the
tracking.

> The second topic I would like to discuss is how we integrate RAS features that
> have similar equivalents in the kernel. A CXL device can provide info about 
> memory media errors in a similar fashion to memory controllers that have EDAC
> support. Discussions have been put on the list and I would like to hear thoughts
> from the community about where this should go [1]. On the same topic CXL has 
> port level RAS features and the PCIe DW series touched on this issue  [2]

If I could uplevel this a bit there are multiple efforts in memory RAS
that likely want to figure out a cohesive story, or at least make
conscious decisions about implementation divergence. Some related work
that caught my eye:

* AMD M1300 specific poison handling that sounds similar to CXL List
  Poison facility:
  http://lore.kernel.org/r/20240214033516.1344948-3-yazen.ghannam@amd.com

* Scrub subsystem that has both ACPI and CXL intercepts:
  http://lore.kernel.org/r/20240419164720.1765-1-shiju.jose@huawei.com

* Inconsistencies between firmware reported fatal errors and native
  error handling, compare:

  ghes_proc()::
        if (ghes_severity(estatus->error_severity) >= GHES_SEV_PANIC)
                __ghes_panic(ghes, estatus, buf_paddr, FIX_APEI_GHES_IRQ);

  ...vs:

  pcie_do_recovery()::
        /* TODO: Should kernel panic here? */
        pci_info(bridge, "device recovery failed\n");

  Also the inconsistencies between EXTLOG, GHES, BERT, and native error
  reporting.

> The third topic I would like to discuss is how we can get a set of common
> benchmarks for memory tiering evaluations. Our team has done some initial
> work in this space, but we want to hear more from end users about their 
> workloads of concern. There was a proposal related to this topic, but from what 
> I understand no meeting has been held [3]. 
> 
> The last topic that I believe is worth discussion is how do we come up with
> a baseline for testing. I am aware of 3 efforts that could be used cxl_test, 
> qemu, and uunit testing framework [4].

I think benchmarking for memory-tiering is orthogonal to patch
unit, function, and integration testing.

For testing I think it is an "all of the above plus hardware testing if
possible" situation. My hope is to get to a point where CXL patchwork
lights up "S/W/F" columns with backend tests similar to NETDEV
patchwork:

https://patchwork.kernel.org/project/netdevbpf/list/

There are some initial discussions about how to do this likely we can
grab some folks to discuss more.

I think Paul and Song would be useful to have for this discussion. Can
you recommend others that would be useful for this or other CXL
topics to help with timeslot conflict resolution?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-06 19:27 ` [LSF/MM/BPF TOPIC] CXL Development Discussions Adam Manzanares
  2024-05-06 20:28   ` Dave Jiang
  2024-05-06 23:47   ` Dan Williams
@ 2024-05-07 11:48   ` Michal Hocko
  2024-05-08 18:35     ` Adam Manzanares
  2 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2024-05-07 11:48 UTC (permalink / raw)
  To: Adam Manzanares
  Cc: lsf-pc, dan.j.williams, jonathan.cameron, dave, Fan Ni,
	dave.jiang, ira.weiny, alison.schofield, vishal.l.verma,
	gourry.memverge, wj28.lee, rientjes, ruansy.fnst, shradha.t,
	mcgrof, Jim Harris, linux-mm, linux-cxl, linux-pci

On Mon 06-05-24 19:27:10, Adam Manzanares wrote:
> Hello all,
> 
> I would like to have a discussion with the CXL development community about
> current outstanding issues and also invite developers interested in RAS and
> memory tiering to participate.
> 
> The first topic I believe we should discuss is how we can ensure as a group
> that we are prioritizing upstream work. On a recent upstream CXL development
> discussion call there was a call to review more work. I apologize for not
> grabbing the link, but I believe Dave Jiang is leveraging patchwork and this
> link should be shared with others so we can help get more reviews where needed.
> 
> The second topic I would like to discuss is how we integrate RAS features that
> have similar equivalents in the kernel. A CXL device can provide info about 
> memory media errors in a similar fashion to memory controllers that have EDAC
> support. Discussions have been put on the list and I would like to hear thoughts
> from the community about where this should go [1]. On the same topic CXL has 
> port level RAS features and the PCIe DW series touched on this issue  [2]
> 
> The third topic I would like to discuss is how we can get a set of common
> benchmarks for memory tiering evaluations. Our team has done some initial
> work in this space, but we want to hear more from end users about their 
> workloads of concern. There was a proposal related to this topic, but from what 
> I understand no meeting has been held [3]. 
> 
> The last topic that I believe is worth discussion is how do we come up with
> a baseline for testing. I am aware of 3 efforts that could be used cxl_test, 
> qemu, and uunit testing framework [4].

This seems to be quite a lot for a single time slot. I think it would
make sense to split that into more slots. WDYT?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-06 23:47   ` Dan Williams
@ 2024-05-07 18:50     ` Luis Chamberlain
  2024-05-08 18:38       ` Adam Manzanares
  2024-05-09  4:19       ` Dan Williams
  2024-05-08 18:26     ` Adam Manzanares
  1 sibling, 2 replies; 15+ messages in thread
From: Luis Chamberlain @ 2024-05-07 18:50 UTC (permalink / raw)
  To: Dan Williams
  Cc: Adam Manzanares, lsf-pc, jonathan.cameron, dave, Fan Ni,
	dave.jiang, ira.weiny, alison.schofield, vishal.l.verma,
	gourry.memverge, wj28.lee, rientjes, ruansy.fnst, shradha.t,
	Jim Harris, mhocko, linux-mm, linux-cxl, linux-pci

On Mon, May 06, 2024 at 04:47:37PM -0700, Dan Williams wrote:
> For testing I think it is an "all of the above plus hardware testing if
> possible" situation. My hope is to get to a point where CXL patchwork
> lights up "S/W/F" columns with backend tests similar to NETDEV
> patchwork:
> 
> https://patchwork.kernel.org/project/netdevbpf/list/
> 
> There are some initial discussions about how to do this likely we can
> grab some folks to discuss more.
> 
> I think Paul and Song would be useful to have for this discussion.

I think everyone and their aunt wants this to happen for their subsystem,
so a separate session to hear about how to get there would be nice.

  Luis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-06 20:28   ` Dave Jiang
  2024-05-06 22:58     ` Dan Williams
@ 2024-05-08 18:08     ` Adam Manzanares
  1 sibling, 0 replies; 15+ messages in thread
From: Adam Manzanares @ 2024-05-08 18:08 UTC (permalink / raw)
  To: Dave Jiang
  Cc: lsf-pc, dan.j.williams, jonathan.cameron, dave, Fan Ni,
	ira.weiny, alison.schofield, vishal.l.verma, gourry.memverge,
	wj28.lee, rientjes, ruansy.fnst, shradha.t, mcgrof, Jim Harris,
	mhocko, linux-mm, linux-cxl, linux-pci

On Mon, May 06, 2024 at 01:28:21PM -0700, Dave Jiang wrote:
> 
> 
> On 5/6/24 12:27 PM, Adam Manzanares wrote:
> > Hello all,
> > 
> > I would like to have a discussion with the CXL development community about
> > current outstanding issues and also invite developers interested in RAS and
> > memory tiering to participate.
> > 
> > The first topic I believe we should discuss is how we can ensure as a group
> > that we are prioritizing upstream work. On a recent upstream CXL development
> > discussion call there was a call to review more work. I apologize for not
> > grabbing the link, but I believe Dave Jiang is leveraging patchwork and this
> > link should be shared with others so we can help get more reviews where needed.
> 
> Bundle for the potential fixes
> https://patchwork.kernel.org/bundle/cxllinux/cxl-fixes/
> 
> Bundle for the next merge window
> https://patchwork.kernel.org/bundle/cxllinux/cxl-next/
> 
> Just be aware patchwork only takes patches, so the bundle are registered with the first patch of a series. The listing does display the origin series.

Thanks Dave, much appreciated. We will leverage this and I will spread the
message about how important it is to review.

> 
> DJ
> 
> > 
> > The second topic I would like to discuss is how we integrate RAS features that
> > have similar equivalents in the kernel. A CXL device can provide info about 
> > memory media errors in a similar fashion to memory controllers that have EDAC
> > support. Discussions have been put on the list and I would like to hear thoughts
> > from the community about where this should go [1]. On the same topic CXL has 
> > port level RAS features and the PCIe DW series touched on this issue  [2]
> > 
> > The third topic I would like to discuss is how we can get a set of common
> > benchmarks for memory tiering evaluations. Our team has done some initial
> > work in this space, but we want to hear more from end users about their 
> > workloads of concern. There was a proposal related to this topic, but from what 
> > I understand no meeting has been held [3]. 
> > 
> > The last topic that I believe is worth discussion is how do we come up with
> > a baseline for testing. I am aware of 3 efforts that could be used cxl_test, 
> > qemu, and uunit testing framework [4].
> > 
> > Apologies for getting this out late, and please include anyone that may be
> > interested in joining a discussion.
> > 
> > [1] https://lore.kernel.org/linux-cxl/20240417075053.3273543-1-ruansy.fnst@fujitsu.com/
> > [2] https://lore.kernel.org/lkml/20231130115044.53512-1-shradha.t@samsung.com/
> > [3] https://lore.kernel.org/all/2b29dd3d-bb2c-6a8c-94d2-d5c2e035516a@google.com
> > [4] https://lore.kernel.org/linux-cxl/170795677066.3697776.12587812713093908173.stgit@ubuntu/
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-06 23:47   ` Dan Williams
  2024-05-07 18:50     ` Luis Chamberlain
@ 2024-05-08 18:26     ` Adam Manzanares
  1 sibling, 0 replies; 15+ messages in thread
From: Adam Manzanares @ 2024-05-08 18:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: lsf-pc, jonathan.cameron, dave, Fan Ni, dave.jiang, ira.weiny,
	alison.schofield, vishal.l.verma, gourry.memverge, wj28.lee,
	rientjes, ruansy.fnst, shradha.t, mcgrof, Jim Harris, mhocko,
	linux-mm, linux-cxl, linux-pci

On Mon, May 06, 2024 at 04:47:37PM -0700, Dan Williams wrote:
> Adam Manzanares wrote:
> > Hello all,
> > 
> > I would like to have a discussion with the CXL development community about
> > current outstanding issues and also invite developers interested in RAS and
> > memory tiering to participate.
> 
> Thanks for putting this together Adam!

NP, its been great working together in the community.

> 
> > The first topic I believe we should discuss is how we can ensure as a group
> > that we are prioritizing upstream work. On a recent upstream CXL development
> > discussion call there was a call to review more work. I apologize for not
> > grabbing the link, but I believe Dave Jiang is leveraging patchwork and this
> > link should be shared with others so we can help get more reviews where needed.
> 
> Dave already replied here but one thing I will add is help keeping an
> eye out for things that should be in queue. Likely a good way to
> do that is send a note along with a review so both get reflected in the
> tracking.
> 

Noted.

> > The second topic I would like to discuss is how we integrate RAS features that
> > have similar equivalents in the kernel. A CXL device can provide info about 
> > memory media errors in a similar fashion to memory controllers that have EDAC
> > support. Discussions have been put on the list and I would like to hear thoughts
> > from the community about where this should go [1]. On the same topic CXL has 
> > port level RAS features and the PCIe DW series touched on this issue  [2]
> 
> If I could uplevel this a bit there are multiple efforts in memory RAS
> that likely want to figure out a cohesive story, or at least make
> conscious decisions about implementation divergence. Some related work
> that caught my eye:
> 
> * AMD M1300 specific poison handling that sounds similar to CXL List
>   Poison facility:
>   http://lore.kernel.org/r/20240214033516.1344948-3-yazen.ghannam@amd.com
> 
> * Scrub subsystem that has both ACPI and CXL intercepts:
>   http://lore.kernel.org/r/20240419164720.1765-1-shiju.jose@huawei.com
> 
> * Inconsistencies between firmware reported fatal errors and native
>   error handling, compare:
> 
>   ghes_proc()::
>         if (ghes_severity(estatus->error_severity) >= GHES_SEV_PANIC)
>                 __ghes_panic(ghes, estatus, buf_paddr, FIX_APEI_GHES_IRQ);
> 
>   ...vs:
> 
>   pcie_do_recovery()::
>         /* TODO: Should kernel panic here? */
>         pci_info(bridge, "device recovery failed\n");
> 
>   Also the inconsistencies between EXTLOG, GHES, BERT, and native error
>   reporting.
> 

Thanks for pointing these out. I will try to put all of these references
in context for discussion.

> > The third topic I would like to discuss is how we can get a set of common
> > benchmarks for memory tiering evaluations. Our team has done some initial
> > work in this space, but we want to hear more from end users about their 
> > workloads of concern. There was a proposal related to this topic, but from what 
> > I understand no meeting has been held [3]. 
> > 
> > The last topic that I believe is worth discussion is how do we come up with
> > a baseline for testing. I am aware of 3 efforts that could be used cxl_test, 
> > qemu, and uunit testing framework [4].
> 
> I think benchmarking for memory-tiering is orthogonal to patch
> unit, function, and integration testing.
> 

Agreed. 

> For testing I think it is an "all of the above plus hardware testing if
> possible" situation. My hope is to get to a point where CXL patchwork
> lights up "S/W/F" columns with backend tests similar to NETDEV
> patchwork:
> 
> https://patchwork.kernel.org/project/netdevbpf/list/
> 
> There are some initial discussions about how to do this likely we can
> grab some folks to discuss more.
> 
> I think Paul and Song would be useful to have for this discussion. Can
> you recommend others that would be useful for this or other CXL
> topics to help with timeslot conflict resolution?
> 

Luis already chimed in and he is definitely our expert in terms of
establishing baselines for new functionalities. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-07 11:48   ` Michal Hocko
@ 2024-05-08 18:35     ` Adam Manzanares
  2024-05-12 13:07       ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Adam Manzanares @ 2024-05-08 18:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: lsf-pc, dan.j.williams, jonathan.cameron, dave, Fan Ni,
	dave.jiang, ira.weiny, alison.schofield, vishal.l.verma,
	gourry.memverge, wj28.lee, rientjes, ruansy.fnst, shradha.t,
	mcgrof, Jim Harris, linux-mm, linux-cxl, linux-pci

On Tue, May 07, 2024 at 01:48:46PM +0200, Michal Hocko wrote:
> On Mon 06-05-24 19:27:10, Adam Manzanares wrote:
> > Hello all,
> > 
> > I would like to have a discussion with the CXL development community about
> > current outstanding issues and also invite developers interested in RAS and
> > memory tiering to participate.
> > 
> > The first topic I believe we should discuss is how we can ensure as a group
> > that we are prioritizing upstream work. On a recent upstream CXL development
> > discussion call there was a call to review more work. I apologize for not
> > grabbing the link, but I believe Dave Jiang is leveraging patchwork and this
> > link should be shared with others so we can help get more reviews where needed.
> > 
> > The second topic I would like to discuss is how we integrate RAS features that
> > have similar equivalents in the kernel. A CXL device can provide info about 
> > memory media errors in a similar fashion to memory controllers that have EDAC
> > support. Discussions have been put on the list and I would like to hear thoughts
> > from the community about where this should go [1]. On the same topic CXL has 
> > port level RAS features and the PCIe DW series touched on this issue  [2]
> > 
> > The third topic I would like to discuss is how we can get a set of common
> > benchmarks for memory tiering evaluations. Our team has done some initial
> > work in this space, but we want to hear more from end users about their 
> > workloads of concern. There was a proposal related to this topic, but from what 
> > I understand no meeting has been held [3]. 
> > 
> > The last topic that I believe is worth discussion is how do we come up with
> > a baseline for testing. I am aware of 3 efforts that could be used cxl_test, 
> > qemu, and uunit testing framework [4].
> 
> This seems to be quite a lot for a single time slot. I think it would
> make sense to split that into more slots. WDYT?

+1. I think the performance implications of CXL memory and how it relates
to existing memory management code tackling performance differentiated memory 
would be nice to separate. I think Davidlohr would be a great candidate to 
lead this discussion.

> -- 
> Michal Hocko
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-07 18:50     ` Luis Chamberlain
@ 2024-05-08 18:38       ` Adam Manzanares
  2024-05-08 19:30         ` Luis Chamberlain
  2024-05-09  4:19       ` Dan Williams
  1 sibling, 1 reply; 15+ messages in thread
From: Adam Manzanares @ 2024-05-08 18:38 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Dan Williams, lsf-pc, jonathan.cameron, dave, Fan Ni, dave.jiang,
	ira.weiny, alison.schofield, vishal.l.verma, gourry.memverge,
	wj28.lee, rientjes, ruansy.fnst, shradha.t, Jim Harris, mhocko,
	linux-mm, linux-cxl, linux-pci

On Tue, May 07, 2024 at 11:50:54AM -0700, Luis Chamberlain wrote:
> On Mon, May 06, 2024 at 04:47:37PM -0700, Dan Williams wrote:
> > For testing I think it is an "all of the above plus hardware testing if
> > possible" situation. My hope is to get to a point where CXL patchwork
> > lights up "S/W/F" columns with backend tests similar to NETDEV
> > patchwork:
> > 
> > https://patchwork.kernel.org/project/netdevbpf/list/
> > 
> > There are some initial discussions about how to do this likely we can
> > grab some folks to discuss more.
> > 
> > I think Paul and Song would be useful to have for this discussion.
> 
> I think everyone and their aunt wants this to happen for their subsystem,
> so a separate session to hear about how to get there would be nice.

+1

> 
>   Luis
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-08 18:38       ` Adam Manzanares
@ 2024-05-08 19:30         ` Luis Chamberlain
  2024-05-09 18:14           ` Song Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Luis Chamberlain @ 2024-05-08 19:30 UTC (permalink / raw)
  To: Adam Manzanares, Song Liu
  Cc: Dan Williams, lsf-pc, jonathan.cameron, dave, Fan Ni, dave.jiang,
	ira.weiny, alison.schofield, vishal.l.verma, gourry.memverge,
	wj28.lee, rientjes, ruansy.fnst, shradha.t, Jim Harris, mhocko,
	linux-mm, linux-cxl, linux-pci

On Wed, May 08, 2024 at 06:38:36PM +0000, Adam Manzanares wrote:
> On Tue, May 07, 2024 at 11:50:54AM -0700, Luis Chamberlain wrote:
> > On Mon, May 06, 2024 at 04:47:37PM -0700, Dan Williams wrote:
> > > For testing I think it is an "all of the above plus hardware testing if
> > > possible" situation. My hope is to get to a point where CXL patchwork
> > > lights up "S/W/F" columns with backend tests similar to NETDEV
> > > patchwork:
> > > 
> > > https://patchwork.kernel.org/project/netdevbpf/list/
> > > 
> > > There are some initial discussions about how to do this likely we can
> > > grab some folks to discuss more.
> > > 
> > > I think Paul and Song would be useful to have for this discussion.
> > 
> > I think everyone and their aunt wants this to happen for their subsystem,
> > so a separate session to hear about how to get there would be nice.
> 
> +1

Song, at last year's LSFMM you had mentioned the above work by ebpf folks
with patchwork integration. While it is great, I am not sure if folks
realize the amount of work required to get the above up and running and
then to maintain it. So I was wondering if perhaps at this year's LSFMM
if we can have a lightning talk or BoF to review just that and give
people clarity about the effort required to do get this going and
maintaining it. Its clear not only CXL folks would be interested, but
also filesystems and likely block layer folks. Would you be up to help
review that with folks with a lightning talk or BoF session? Would there
be anyone else who can talk about that?

  Luis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-07 18:50     ` Luis Chamberlain
  2024-05-08 18:38       ` Adam Manzanares
@ 2024-05-09  4:19       ` Dan Williams
  1 sibling, 0 replies; 15+ messages in thread
From: Dan Williams @ 2024-05-09  4:19 UTC (permalink / raw)
  To: Luis Chamberlain, Dan Williams
  Cc: Adam Manzanares, lsf-pc, jonathan.cameron, dave, Fan Ni,
	dave.jiang, ira.weiny, alison.schofield, vishal.l.verma,
	gourry.memverge, wj28.lee, rientjes, ruansy.fnst, shradha.t,
	Jim Harris, mhocko, linux-mm, linux-cxl, linux-pci

Luis Chamberlain wrote:
> On Mon, May 06, 2024 at 04:47:37PM -0700, Dan Williams wrote:
> > For testing I think it is an "all of the above plus hardware testing if
> > possible" situation. My hope is to get to a point where CXL patchwork
> > lights up "S/W/F" columns with backend tests similar to NETDEV
> > patchwork:
> > 
> > https://patchwork.kernel.org/project/netdevbpf/list/
> > 
> > There are some initial discussions about how to do this likely we can
> > grab some folks to discuss more.
> > 
> > I think Paul and Song would be useful to have for this discussion.
> 
> I think everyone and their aunt wants this to happen for their subsystem,
> so a separate session to hear about how to get there would be nice.

So much so that it sounds like BPF folks are already working on a way to get
this integrated with KernelCI?

Below is an excerpt from the Q&A portion of an OSSNA KernelCI talk that
alluded to this, but I am not sure who was making the comment:

https://www.youtube.com/watch?v=dxeaPCmkiXc&t=2077s

Hopefully folks who have more details about that BPF work will be around
next week.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-08 19:30         ` Luis Chamberlain
@ 2024-05-09 18:14           ` Song Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Song Liu @ 2024-05-09 18:14 UTC (permalink / raw)
  To: Luis Chamberlain, Paul E Luse
  Cc: Adam Manzanares, Dan Williams, lsf-pc, jonathan.cameron, dave,
	Fan Ni, dave.jiang, ira.weiny, alison.schofield, vishal.l.verma,
	gourry.memverge, wj28.lee, rientjes, ruansy.fnst, shradha.t,
	Jim Harris, mhocko, linux-mm, linux-cxl, linux-pci

On Wed, May 8, 2024 at 12:30 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Wed, May 08, 2024 at 06:38:36PM +0000, Adam Manzanares wrote:
> > On Tue, May 07, 2024 at 11:50:54AM -0700, Luis Chamberlain wrote:
> > > On Mon, May 06, 2024 at 04:47:37PM -0700, Dan Williams wrote:
> > > > For testing I think it is an "all of the above plus hardware testing if
> > > > possible" situation. My hope is to get to a point where CXL patchwork
> > > > lights up "S/W/F" columns with backend tests similar to NETDEV
> > > > patchwork:
> > > >
> > > > https://patchwork.kernel.org/project/netdevbpf/list/
> > > >
> > > > There are some initial discussions about how to do this likely we can
> > > > grab some folks to discuss more.
> > > >
> > > > I think Paul and Song would be useful to have for this discussion.
> > >
> > > I think everyone and their aunt wants this to happen for their subsystem,
> > > so a separate session to hear about how to get there would be nice.
> >
> > +1
>
> Song, at last year's LSFMM you had mentioned the above work by ebpf folks
> with patchwork integration. While it is great, I am not sure if folks
> realize the amount of work required to get the above up and running and
> then to maintain it. So I was wondering if perhaps at this year's LSFMM
> if we can have a lightning talk or BoF to review just that and give
> people clarity about the effort required to do get this going and
> maintaining it. Its clear not only CXL folks would be interested, but
> also filesystems and likely block layer folks. Would you be up to help
> review that with folks with a lightning talk or BoF session? Would there
> be anyone else who can talk about that?

Paul and I have worked on using the CI framework used by the BPF
subsystem with md/raid patches. We have a section at LSFMMBPF to
talk about it (Tuesday 11:30).

Thanks,
Song

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-08 18:35     ` Adam Manzanares
@ 2024-05-12 13:07       ` Michal Hocko
  2024-05-13 12:12         ` Davidlohr Bueso
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2024-05-12 13:07 UTC (permalink / raw)
  To: Adam Manzanares, Davidlohr Bueso
  Cc: lsf-pc, dan.j.williams, jonathan.cameron, dave, Fan Ni,
	dave.jiang, ira.weiny, alison.schofield, vishal.l.verma,
	gourry.memverge, wj28.lee, rientjes, ruansy.fnst, shradha.t,
	mcgrof, Jim Harris, linux-mm, linux-cxl, linux-pci

[add Davidlohr]

On Wed 08-05-24 18:35:50, Adam Manzanares wrote:
> On Tue, May 07, 2024 at 01:48:46PM +0200, Michal Hocko wrote:
> > On Mon 06-05-24 19:27:10, Adam Manzanares wrote:
> > > Hello all,
> > > 
> > > I would like to have a discussion with the CXL development community about
> > > current outstanding issues and also invite developers interested in RAS and
> > > memory tiering to participate.
> > > 
> > > The first topic I believe we should discuss is how we can ensure as a group
> > > that we are prioritizing upstream work. On a recent upstream CXL development
> > > discussion call there was a call to review more work. I apologize for not
> > > grabbing the link, but I believe Dave Jiang is leveraging patchwork and this
> > > link should be shared with others so we can help get more reviews where needed.
> > > 
> > > The second topic I would like to discuss is how we integrate RAS features that
> > > have similar equivalents in the kernel. A CXL device can provide info about 
> > > memory media errors in a similar fashion to memory controllers that have EDAC
> > > support. Discussions have been put on the list and I would like to hear thoughts
> > > from the community about where this should go [1]. On the same topic CXL has 
> > > port level RAS features and the PCIe DW series touched on this issue  [2]
> > > 
> > > The third topic I would like to discuss is how we can get a set of common
> > > benchmarks for memory tiering evaluations. Our team has done some initial
> > > work in this space, but we want to hear more from end users about their 
> > > workloads of concern. There was a proposal related to this topic, but from what 
> > > I understand no meeting has been held [3]. 
> > > 
> > > The last topic that I believe is worth discussion is how do we come up with
> > > a baseline for testing. I am aware of 3 efforts that could be used cxl_test, 
> > > qemu, and uunit testing framework [4].
> > 
> > This seems to be quite a lot for a single time slot. I think it would
> > make sense to split that into more slots. WDYT?
> 
> +1. I think the performance implications of CXL memory and how it relates
> to existing memory management code tackling performance differentiated memory 
> would be nice to separate. I think Davidlohr would be a great candidate to 
> lead this discussion.

WDYT Davidlohr?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
  2024-05-12 13:07       ` Michal Hocko
@ 2024-05-13 12:12         ` Davidlohr Bueso
  0 siblings, 0 replies; 15+ messages in thread
From: Davidlohr Bueso @ 2024-05-13 12:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Adam Manzanares, lsf-pc, dan.j.williams, jonathan.cameron,
	Fan Ni, dave.jiang, ira.weiny, alison.schofield, vishal.l.verma,
	gourry.memverge, wj28.lee, rientjes, ruansy.fnst, shradha.t,
	mcgrof, Jim Harris, linux-mm, linux-cxl, linux-pci

On Sun, 12 May 2024, Michal Hocko wrote:

>> +1. I think the performance implications of CXL memory and how it relates
>> to existing memory management code tackling performance differentiated memory
>> would be nice to separate. I think Davidlohr would be a great candidate to
>> lead this discussion.
>
>WDYT Davidlohr?

I think that the relevant performance discussions will happen in the tiering
session from David R.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-05-13 12:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20240506192712uscas1p225316f79bb69f979b647d2a06a00a25f@uscas1p2.samsung.com>
2024-05-06 19:27 ` [LSF/MM/BPF TOPIC] CXL Development Discussions Adam Manzanares
2024-05-06 20:28   ` Dave Jiang
2024-05-06 22:58     ` Dan Williams
2024-05-08 18:08     ` Adam Manzanares
2024-05-06 23:47   ` Dan Williams
2024-05-07 18:50     ` Luis Chamberlain
2024-05-08 18:38       ` Adam Manzanares
2024-05-08 19:30         ` Luis Chamberlain
2024-05-09 18:14           ` Song Liu
2024-05-09  4:19       ` Dan Williams
2024-05-08 18:26     ` Adam Manzanares
2024-05-07 11:48   ` Michal Hocko
2024-05-08 18:35     ` Adam Manzanares
2024-05-12 13:07       ` Michal Hocko
2024-05-13 12:12         ` Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).