All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alison Schofield <alison.schofield@intel.com>
To: Wonjae Lee <wj28.lee@samsung.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	KyungSan Kim <ks0204.kim@samsung.com>,
	Hojin Nam <hj96.nam@samsung.com>
Subject: Re: [RFC PATCH] cxl/region: Allow out of order assembly of autodiscovered regions
Date: Mon, 29 Jan 2024 20:37:24 -0800	[thread overview]
Message-ID: <Zbh9BMbpjr/qtVjE@aschofie-mobl2> (raw)
In-Reply-To: <20240126085108epcms2p8bd88403c60851f769dd7b3ccb0755f05@epcms2p8>

On Fri, Jan 26, 2024 at 05:51:08PM +0900, Wonjae Lee wrote:
> Thank you for your quick response and sending the patch!
> 
> I'm very sorry for the delay in replying to your mail. It took some time to
> reproduce test and check the issue.
> 
> On Fri, Jan 12, 2024 at 09:04:21PM -0800, alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> >
> > Wonjae Lee,
> >
> > Here is the RFC Patch I mentioned in this thread:
> > https://lore.kernel.org/linux-cxl/20240112092201epcms2p577b3c979bdc694a370e5952edc091f68@epcms2p5/
> >
> > This passes the cxl-test suite, so hoping no regression, but it needs
> > to be tested with the required config: 2 memdevs connected to the same
> > port, each memdev belongs to a different auto-region. Repeated attempts
> > should hit the test case and emit this debug message upon success:
> >  dev_dbg(&cxlr_a->dev, "allow out of order region ref alloc\n");
> >
> > Failure will be the HPA order violation message.
> >
> > All reviewers & testers welcomed.
> 
> After applying your patch to base-commit, I ran the same test I mentioned in
> the previous thread, and during the boot process, a null pointer dereference
> occurs as shown below:
> 
> thread: https://lore.kernel.org/linux-cxl/ZaFn0dfaJ8iOi1UW@aschofie-mobl2/T/#t
> base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3
> dmesg log:
> [] cxl region0: region sort successful
> [] cxl region0: mem1:endpoint6 decoder6.0 add: mem1:decoder6.0 @ 0 next: none nr_eps: 1 nr_targets: 1
> [] cxl region0: pci0000:3d:port1 decoder1.0 add: mem1:decoder6.0 @ 0 next: mem1 nr_eps: 1 nr_targets: 1
> [] BUG: kernel NULL pointer dereference, address: 00000000000002e8
> ...
> [] Call Trace:
> ...
> []  ? auto_order_ok+0x5f/0xb0 [cxl_core]
> []  cxl_region_attach_position+0x379/0x860 [cxl_core]
> []  cxl_region_attach+0x466/0x8b0 [cxl_core]
>  
>  
> To be specific, cxld_a in auto_order_ok() is NULL:
>     cxld_a = cxl_region_find_decoder(port, cxlr_a); /* cxld_a == NULL */
>  
> I've noticed that if I pass cxled as a function argument appropriately, and get
> cxld_a as shown below, the region initialization succeeds with the log "cxl
> region0: allow out of order region ref alloc" as you mention.
>  
> @@ -756,8 +756,8 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
>         return to_cxl_decoder(dev);
>  }
> -bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> -                  struct cxl_region *cxlr_b)
> +bool auto_order_ok(struct cxl_port *port, struct cxl_endpoint_decoder *cxled,
> +                  struct cxl_region *cxlr_a, struct cxl_region *cxlr_b)
>  {
>         struct cxl_region_ref *cxl_rr;
>         struct cxl_decoder *cxld_a, *cxld_b;
> @@ -771,7 +771,10 @@ bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
>         if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
>                 return false;
> 
> -       cxld_a = cxl_region_find_decoder(port, cxlr_a);
> +       if (port == cxled_to_port(cxled))
> +               cxld_a = &cxled->cxld;
> +       else
> +               cxld_a = cxl_region_find_decoder(port, cxlr_a);
> 
> I hope this helps.

Yes, this is very helpful! Glad to have you trying it out.

I think at this point we can call alloc_region_ref() with a known cxled
of this port and simply pass it on. I'm going to skip this if...else pattern,
and simply get cxld_a = &cxled->cxld;

I think there may be opportunity for refinements in cxl_region_find_decoder()
and it's descendants, but it's not directly tied to this patch.

Thanks,
Alison

> 
> Thanks,
> Wonjae
> 
> > Begin commit log:
> > Autodiscovered regions can fail to assemble if they are not discovered
> > in HPA decode order. The user will see failure messages like:
> >
> > [] cxl region0: endpoint5: HPA order violation region1
> > [] cxl region0: endpoint5: failed to allocate region reference
> >
> > The check that is causing the failure helps the CXL driver enforce
> > a CXL spec mandate that decoders be committed in HPA order. The
> > check is needless for autodiscovered regions since their decoders
> > are already programmed. Trying to enforce order in the assembly of
> > these regions is useless because they are assembled once all their
> > member endpoints arrive, and there is no guarantee on the order in
> > which endpoints are discovered during probe.
> >
> > Keep the existing check, but for autodiscovered regions, allow the
> > out of order assembly after a sanity check that the lowered numbered
> > decoder has the lower HPA starting address.
> >
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  drivers/cxl/core/region.c 34 +++++++++++++++++++++++++++++++++-
> >  1 file changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 0f05692bfec3..8770ebcae05d 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -753,6 +753,37 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
> >  return to_cxl_decoder(dev);
> >  }
> >
> > +bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> > +        struct cxl_region *cxlr_b)
> > +{
> > + struct cxl_region_ref *cxl_rr;
> > + struct cxl_decoder *cxld_a, *cxld_b;
> > +
> > + /*
> > +  * Allow the out of order assembly of auto-discovered regions as
> > +  * long as correct decoder programming order can be verified.
> > +  *
> > +  * Per CXL Spec 3.1 8.2.4.20.12 Committing Decoder Programming,
> > +  * software must commit decoders in HPA order. Therefore it is
> > +  * sufficient to sanity check that the lowered number decoder
> > +  * has the lower HPA starting address.
> > +  */
> > + if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
> > +    return false;
> > +
> > + cxld_a = cxl_region_find_decoder(port, cxlr_a);
> > + cxl_rr = cxl_rr_load(port, cxlr_b);
> > + cxld_b = cxl_rr->decoder;
> > +
> > + if (cxld_b->id > cxld_a->id) {
> > +    dev_dbg(&cxlr_a->dev,
> > +        "allow out of order region ref alloc\n");
> > +    return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> >  static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> >                          struct cxl_region *cxlr)
> >  {
> > @@ -767,7 +798,8 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> >      if (!ip->res)
> >          continue;
> >
> > -    if (ip->res->start > p->res->start) {
> > +    if (ip->res->start > p->res->start &&
> > +        (!auto_order_ok(port, cxlr, iter->region))) {
> >          dev_dbg(&cxlr->dev,
> >              "%s: HPA order violation %s:%pr vs %pr\n",
> >              dev_name(&port->dev),
> >
> > base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3
> > --
> > 2.37.3
> >

  reply	other threads:[~2024-01-30  4:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-13  5:04 [RFC PATCH] cxl/region: Allow out of order assembly of autodiscovered regions alison.schofield
2024-01-14 23:38 ` kernel test robot
2024-01-15  0:38 ` kernel test robot
2024-01-15  2:33 ` kernel test robot
2024-01-18 19:46 ` Dan Williams
2024-02-08 20:52   ` Alison Schofield
2024-02-08 22:57     ` Dan Williams
2024-02-09  0:23       ` Alison Schofield
2024-02-09  5:25         ` Dan Williams
     [not found] ` <CGME20240113050447epcas2p2097a5c49c1f0f9318ec4202843e169b8@epcms2p8>
2024-01-26  8:51   ` Wonjae Lee
2024-01-30  4:37     ` Alison Schofield [this message]
2024-01-31  1:02   ` Wonjae Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zbh9BMbpjr/qtVjE@aschofie-mobl2 \
    --to=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=hj96.nam@samsung.com \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=ks0204.kim@samsung.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    --cc=wj28.lee@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.