linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Adam Manzanares <a.manzanares@samsung.com>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"jonathan.cameron@huawei.com" <jonathan.cameron@huawei.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	Fan Ni <fan.ni@samsung.com>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"alison.schofield@intel.com" <alison.schofield@intel.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"gourry.memverge@gmail.com" <gourry.memverge@gmail.com>,
	"wj28.lee@gmail.com" <wj28.lee@gmail.com>,
	"rientjes@google.com" <rientjes@google.com>,
	"ruansy.fnst@fujitsu.com" <ruansy.fnst@fujitsu.com>,
	"shradha.t@samsung.com" <shradha.t@samsung.com>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>,
	Jim Harris <jim.harris@samsung.com>,
	"mhocko@suse.com" <mhocko@suse.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
Date: Mon, 6 May 2024 16:47:37 -0700	[thread overview]
Message-ID: <66396c1938726_2f63a29443@dwillia2-mobl3.amr.corp.intel.com.notmuch> (raw)
In-Reply-To: <9bf86b97-319f-4f58-b658-1fe3ed0b1993@nmtadam.samsung>

Adam Manzanares wrote:
> Hello all,
> 
> I would like to have a discussion with the CXL development community about
> current outstanding issues and also invite developers interested in RAS and
> memory tiering to participate.

Thanks for putting this together Adam!

> The first topic I believe we should discuss is how we can ensure as a group
> that we are prioritizing upstream work. On a recent upstream CXL development
> discussion call there was a call to review more work. I apologize for not
> grabbing the link, but I believe Dave Jiang is leveraging patchwork and this
> link should be shared with others so we can help get more reviews where needed.

Dave already replied here but one thing I will add is help keeping an
eye out for things that should be in queue. Likely a good way to
do that is send a note along with a review so both get reflected in the
tracking.

> The second topic I would like to discuss is how we integrate RAS features that
> have similar equivalents in the kernel. A CXL device can provide info about 
> memory media errors in a similar fashion to memory controllers that have EDAC
> support. Discussions have been put on the list and I would like to hear thoughts
> from the community about where this should go [1]. On the same topic CXL has 
> port level RAS features and the PCIe DW series touched on this issue  [2]

If I could uplevel this a bit there are multiple efforts in memory RAS
that likely want to figure out a cohesive story, or at least make
conscious decisions about implementation divergence. Some related work
that caught my eye:

* AMD M1300 specific poison handling that sounds similar to CXL List
  Poison facility:
  http://lore.kernel.org/r/20240214033516.1344948-3-yazen.ghannam@amd.com

* Scrub subsystem that has both ACPI and CXL intercepts:
  http://lore.kernel.org/r/20240419164720.1765-1-shiju.jose@huawei.com

* Inconsistencies between firmware reported fatal errors and native
  error handling, compare:

  ghes_proc()::
        if (ghes_severity(estatus->error_severity) >= GHES_SEV_PANIC)
                __ghes_panic(ghes, estatus, buf_paddr, FIX_APEI_GHES_IRQ);

  ...vs:

  pcie_do_recovery()::
        /* TODO: Should kernel panic here? */
        pci_info(bridge, "device recovery failed\n");

  Also the inconsistencies between EXTLOG, GHES, BERT, and native error
  reporting.

> The third topic I would like to discuss is how we can get a set of common
> benchmarks for memory tiering evaluations. Our team has done some initial
> work in this space, but we want to hear more from end users about their 
> workloads of concern. There was a proposal related to this topic, but from what 
> I understand no meeting has been held [3]. 
> 
> The last topic that I believe is worth discussion is how do we come up with
> a baseline for testing. I am aware of 3 efforts that could be used cxl_test, 
> qemu, and uunit testing framework [4].

I think benchmarking for memory-tiering is orthogonal to patch
unit, function, and integration testing.

For testing I think it is an "all of the above plus hardware testing if
possible" situation. My hope is to get to a point where CXL patchwork
lights up "S/W/F" columns with backend tests similar to NETDEV
patchwork:

https://patchwork.kernel.org/project/netdevbpf/list/

There are some initial discussions about how to do this likely we can
grab some folks to discuss more.

I think Paul and Song would be useful to have for this discussion. Can
you recommend others that would be useful for this or other CXL
topics to help with timeslot conflict resolution?

  parent reply	other threads:[~2024-05-06 23:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20240506192712uscas1p225316f79bb69f979b647d2a06a00a25f@uscas1p2.samsung.com>
2024-05-06 19:27 ` [LSF/MM/BPF TOPIC] CXL Development Discussions Adam Manzanares
2024-05-06 20:28   ` Dave Jiang
2024-05-06 22:58     ` Dan Williams
2024-05-08 18:08     ` Adam Manzanares
2024-05-06 23:47   ` Dan Williams [this message]
2024-05-07 18:50     ` Luis Chamberlain
2024-05-08 18:38       ` Adam Manzanares
2024-05-08 19:30         ` Luis Chamberlain
2024-05-09 18:14           ` Song Liu
2024-05-09  4:19       ` Dan Williams
2024-05-08 18:26     ` Adam Manzanares
2024-05-07 11:48   ` Michal Hocko
2024-05-08 18:35     ` Adam Manzanares
2024-05-12 13:07       ` Michal Hocko
2024-05-13 12:12         ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66396c1938726_2f63a29443@dwillia2-mobl3.amr.corp.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=a.manzanares@samsung.com \
    --cc=alison.schofield@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=fan.ni@samsung.com \
    --cc=gourry.memverge@gmail.com \
    --cc=ira.weiny@intel.com \
    --cc=jim.harris@samsung.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=ruansy.fnst@fujitsu.com \
    --cc=shradha.t@samsung.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wj28.lee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).