From: Dave Jiang <dave.jiang@intel.com>
To: Dan Williams <dan.j.williams@intel.com>, linux-cxl@vger.kernel.org
Cc: vishal.l.verma@intel.com, ira.weiny@intel.com,
alison.schofield@intel.com, Jonathan.Cameron@huawei.com
Subject: Re: [PATCH] cxl/region: Fix region setup/teardown for RCDs
Date: Wed, 29 Mar 2023 10:39:04 -0700 [thread overview]
Message-ID: <55a56196-688e-cdaa-b796-657f1da684fe@intel.com> (raw)
In-Reply-To: <168002858268.50647.728091521032131326.stgit@dwillia2-xfh.jf.intel.com>
On 3/28/23 11:36 AM, Dan Williams wrote:
> RCDs (CXL memory devices that link train without VH capability and show
> up as root complex integrated endpoints), hide the presence of the link
> between the endpoint and the host-bridge. The CXL region setup/teardown
> paths assume that a link hop is present and go looking for at least one
> 'struct cxl_port' instance between the CXL root port-object and an
> endpoint port-object leading to crashes of the form:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000008
> [..]
> RIP: 0010:cxl_region_setup_targets+0x3e9/0xae0 [cxl_core]
> [..]
> Call Trace:
> <TASK>
> cxl_region_attach+0x46c/0x7a0 [cxl_core]
> cxl_create_region+0x20b/0x270 [cxl_core]
> cxl_mock_mem_probe+0x641/0x800 [cxl_mock_mem]
> platform_probe+0x5b/0xb0
>
> Detect RCDs explicitly and skip walking the non-existent port hierarchy
> between root and endpoint in that case.
>
> While this has been a problem since:
>
> commit 0a19bfc8de93 ("cxl/port: Add RCD endpoint port enumeration")
>
> ...it becomes a more reliable crash scenario with the new autodiscovery
> implementation.
>
> Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/region.c | 28 +++++++++++++++++++++++++++-
> 1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 808f23ec4e2b..52bbf6268d5f 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -134,9 +134,13 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
> struct cxl_endpoint_decoder *cxled = p->targets[i];
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_port *iter = cxled_to_port(cxled);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct cxl_ep *ep;
> int rc = 0;
>
> + if (cxlds->rcd)
> + goto endpoint_reset;
> +
> while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
> iter = to_cxl_port(iter->dev.parent);
>
> @@ -153,6 +157,7 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
> return rc;
> }
>
> +endpoint_reset:
> rc = cxled->cxld.reset(&cxled->cxld);
> if (rc)
> return rc;
> @@ -1199,6 +1204,7 @@ static void cxl_region_teardown_targets(struct cxl_region *cxlr)
> {
> struct cxl_region_params *p = &cxlr->params;
> struct cxl_endpoint_decoder *cxled;
> + struct cxl_dev_state *cxlds;
> struct cxl_memdev *cxlmd;
> struct cxl_port *iter;
> struct cxl_ep *ep;
> @@ -1214,6 +1220,10 @@ static void cxl_region_teardown_targets(struct cxl_region *cxlr)
> for (i = 0; i < p->nr_targets; i++) {
> cxled = p->targets[i];
> cxlmd = cxled_to_memdev(cxled);
> + cxlds = cxlmd->cxlds;
> +
> + if (cxlds->rcd)
> + continue;
>
> iter = cxled_to_port(cxled);
> while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
> @@ -1229,14 +1239,24 @@ static int cxl_region_setup_targets(struct cxl_region *cxlr)
> {
> struct cxl_region_params *p = &cxlr->params;
> struct cxl_endpoint_decoder *cxled;
> + struct cxl_dev_state *cxlds;
> + int i, rc, rch = 0, vh = 0;
> struct cxl_memdev *cxlmd;
> struct cxl_port *iter;
> struct cxl_ep *ep;
> - int i, rc;
>
> for (i = 0; i < p->nr_targets; i++) {
> cxled = p->targets[i];
> cxlmd = cxled_to_memdev(cxled);
> + cxlds = cxlmd->cxlds;
> +
> + /* validate that all targets agree on topology */
> + if (!cxlds->rcd) {
> + vh++;
> + } else {
> + rch++;
> + continue;
> + }
>
> iter = cxled_to_port(cxled);
> while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
> @@ -1256,6 +1276,12 @@ static int cxl_region_setup_targets(struct cxl_region *cxlr)
> }
> }
>
> + if (rch && vh) {
> + dev_err(&cxlr->dev, "mismatched CXL topologies detected\n");
> + cxl_region_teardown_targets(cxlr);
> + return -ENXIO;
> + }
> +
> return 0;
> }
>
>
next prev parent reply other threads:[~2023-03-29 17:39 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-28 18:36 [PATCH] cxl/region: Fix region setup/teardown for RCDs Dan Williams
2023-03-29 17:39 ` Dave Jiang [this message]
2023-03-31 5:12 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55a56196-688e-cdaa-b796-657f1da684fe@intel.com \
--to=dave.jiang@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).