All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Ben Widawsky <ben.widawsky@intel.com>
Cc: linux-cxl@vger.kernel.org, patches@lists.linux.dev,
	 kernel test robot <lkp@intel.com>,
	Alison Schofield <alison.schofield@intel.com>,
	 Ira Weiny <ira.weiny@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	 Vishal Verma <vishal.l.verma@intel.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	 Linux NVDIMM <nvdimm@lists.linux.dev>,
	Linux PCI <linux-pci@vger.kernel.org>
Subject: Re: [PATCH v3 02/14] cxl/region: Introduce concept of region configuration
Date: Thu, 17 Feb 2022 13:12:07 -0800	[thread overview]
Message-ID: <CAPcyv4hOFpVRs=D=ppMWv668dX0deaOVYTLG8QxAp45t3ctDfw@mail.gmail.com> (raw)
In-Reply-To: <20220217202024.x6rmx4ypvxi66tek@intel.com>

On Thu, Feb 17, 2022 at 12:20 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
[..]
> > > > > +static bool is_region_active(struct cxl_region *cxlr)
> > > > > +{
> > > > > +       /* TODO: Regions can't be activated yet. */
> > > > > +       return false;
> > > >
> > > > This function seems redundant with just checking "cxlr->dev.driver !=
> > > > NULL"? The benefit of that is there is no need to carry a TODO in the
> > > > series.
> > > >
> > >
> > > The idea behind this was to give the reviewer somewhat of a bigger picture as to
> > > how things should work in the code rather than in a commit message. I will
> > > remove this.
> >
> > They look premature to me.
> >
>
> Given that you don't want me to reference the DWG, it is. The steps outlined
> with TODOs were all based on the DWG's overall flow.

Right, the DWG is a good bootstrap, but the Linux implementation is
going to go beyond it so might as well couch all the language in terms
of base spec references and Linux documentation and not have this
indirection to a 3rd document.

[..]
> > > > ...to shutdown configuration writes once the region is active. Might
> > > > also need a region-wide seqlock like target_list_show. So that region
> > > > probe drains  all active sysfs writers before assuming the
> > > > configuration is stable.
> > >
> > > Initially my thought here is that this is a problem for userspace to deal with.
> > > If userspace can't figure out how to synchronously configure and bind the
> > > region, that's not a kernel problem.
> >
> > The kernel always needs to protect itself. Userspace is free to race
> > itself, but it can not be allowed to trigger a kernel race. So there
> > needs to be protection against userspace writing interleave_ways and
> > the kernel being able to trust that interleave_ways is now static for
> > the life of the region.
>
> Yeah - originally I was relying on the device_lock for this, but that now
> doesn't work. seqlock is fine. I could also copy all the config information at
> the beginning of probe and simply use that.
>
> If we're going the route of making interleave_ways write-once, why not make all
> attributes the same?

Sure. It could always be relaxed later if there was a convincing need
to modify an existing region without tearing it down first. In fact,
that reconfigure flexibility was a source of bugs in NVDIMM sysfs ABI
that the tooling did not leverage because "ndctl create-namespace
--reconfigure" internally did: read namespace attributes, destroy
namepsace, create new namespace with saved attributes.

[..]
> > > > > +static size_t show_targetN(struct cxl_region *cxlr, char *buf, int n)
> > > > > +{
> > > > > +       int ret;
> > > > > +
> > > > > +       device_lock(&cxlr->dev);
> > > > > +       if (!cxlr->config.targets[n])
> > > > > +               ret = sysfs_emit(buf, "\n");
> > > > > +       else
> > > > > +               ret = sysfs_emit(buf, "%s\n",
> > > > > +                                dev_name(&cxlr->config.targets[n]->dev));
> > > > > +       device_unlock(&cxlr->dev);
> > > >
> > > > The component contribution of a memdev to a region is a DPA-span, not
> > > > the whole memdev. I would expect something like dax_mapping_attributes
> > > > or REGION_MAPPING() from drivers/nvdimm/region_devs.c. A tuple of
> > > > information about the component contribution of a memdev to a region.
> > > >
> > >
> > > I think show_target should just return the chosen decoder and then the decoder
> > > attributes will tell the rest, wouldn't they?
> >
> > Given the conflicts that can arise between HDM decoders needing to map
> > increasing DPA values and other conflicts that there will be
> > situations where the kernel auto-picking a decoder will get in the
> > way. Exposing the decoder selection to userspace also gives one more
> > place to do leaf validation. I.e. at decoder-to-region assignment time
> > the kernel can validate that the DPA is available and can be mapped by
> > the given decoder given the state of other decoders on that device.
> >
>
> Okay, but per below, these are associated with setting the target. The attribute
> show does only need to provide the decoder, then userspace can look at the
> decoder to find if it's active/DPAs/etc.

Yes.

>
> > >
> > > > > +
> > > > > +       return ret;
> > > > > +}
> > > > > +
> > > > > +static size_t set_targetN(struct cxl_region *cxlr, const char *buf, int n,
> > > > > +                         size_t len)
> > > > > +{
> > > > > +       struct device *memdev_dev;
> > > > > +       struct cxl_memdev *cxlmd;
> > > > > +
> > > > > +       device_lock(&cxlr->dev);
> > > > > +
> > > > > +       if (len == 1 || cxlr->config.targets[n])
> > > > > +               remove_target(cxlr, n);
> > > > > +
> > > > > +       /* Remove target special case */
> > > > > +       if (len == 1) {
> > > > > +               device_unlock(&cxlr->dev);
> > > > > +               return len;
> > > > > +       }
> > > > > +
> > > > > +       memdev_dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
> > > >
> > > > I think this wants to be an endpoint decoder, not a memdev. Because
> > > > it's the decoder that joins a memdev to a region, or at least a
> > > > decoder should be picked when the memdev is assigned so that the DPA
> > > > mapping can be registered. If all the decoders are allocated then fail
> > > > here.
> > > >
> > >
> > > Per above, I think making this decoders makes sense. I could make it flexible
> > > for ease of use, like if you specify memX, the kernel will pick a decoder for
> > > you however I suspect you won't like that.
> >
> > Right, put the user friendliness in the tooling, not sysfs ABI.
> >
>
> Okay.
>
> > >
> > > > > +       if (!memdev_dev) {
> > > > > +               device_unlock(&cxlr->dev);
> > > > > +               return -ENOENT;
> > > > > +       }
> > > > > +
> > > > > +       /* reference to memdev held until target is unset or region goes away */
> > > > > +
> > > > > +       cxlmd = to_cxl_memdev(memdev_dev);
> > > > > +       cxlr->config.targets[n] = cxlmd;
> > > > > +
> > > > > +       device_unlock(&cxlr->dev);
> > > > > +
> > > > > +       return len;
> > > > > +}
> > > > > +
> > > > > +#define TARGET_ATTR_RW(n)                                                      \
> > > > > +       static ssize_t target##n##_show(                                       \
> > > > > +               struct device *dev, struct device_attribute *attr, char *buf)  \
> > > > > +       {                                                                      \
> > > > > +               return show_targetN(to_cxl_region(dev), buf, (n));             \
> > > > > +       }                                                                      \
> > > > > +       static ssize_t target##n##_store(struct device *dev,                   \
> > > > > +                                        struct device_attribute *attr,        \
> > > > > +                                        const char *buf, size_t len)          \
> > > > > +       {                                                                      \
> > > > > +               return set_targetN(to_cxl_region(dev), buf, (n), len);         \
> > > > > +       }                                                                      \
> > > > > +       static DEVICE_ATTR_RW(target##n)
> > > > > +
> > > > > +TARGET_ATTR_RW(0);
> > > > > +TARGET_ATTR_RW(1);
> > > > > +TARGET_ATTR_RW(2);
> > > > > +TARGET_ATTR_RW(3);
> > > > > +TARGET_ATTR_RW(4);
> > > > > +TARGET_ATTR_RW(5);
> > > > > +TARGET_ATTR_RW(6);
> > > > > +TARGET_ATTR_RW(7);
> > > > > +TARGET_ATTR_RW(8);
> > > > > +TARGET_ATTR_RW(9);
> > > > > +TARGET_ATTR_RW(10);
> > > > > +TARGET_ATTR_RW(11);
> > > > > +TARGET_ATTR_RW(12);
> > > > > +TARGET_ATTR_RW(13);
> > > > > +TARGET_ATTR_RW(14);
> > > > > +TARGET_ATTR_RW(15);
> > > > > +
> > > > > +static struct attribute *interleave_attrs[] = {
> > > > > +       &dev_attr_target0.attr,
> > > > > +       &dev_attr_target1.attr,
> > > > > +       &dev_attr_target2.attr,
> > > > > +       &dev_attr_target3.attr,
> > > > > +       &dev_attr_target4.attr,
> > > > > +       &dev_attr_target5.attr,
> > > > > +       &dev_attr_target6.attr,
> > > > > +       &dev_attr_target7.attr,
> > > > > +       &dev_attr_target8.attr,
> > > > > +       &dev_attr_target9.attr,
> > > > > +       &dev_attr_target10.attr,
> > > > > +       &dev_attr_target11.attr,
> > > > > +       &dev_attr_target12.attr,
> > > > > +       &dev_attr_target13.attr,
> > > > > +       &dev_attr_target14.attr,
> > > > > +       &dev_attr_target15.attr,
> > > > > +       NULL,
> > > > > +};
> > > > > +
> > > > > +static umode_t visible_targets(struct kobject *kobj, struct attribute *a, int n)
> > > > > +{
> > > > > +       struct device *dev = container_of(kobj, struct device, kobj);
> > > > > +       struct cxl_region *cxlr = to_cxl_region(dev);
> > > > > +
> > > > > +       if (n < cxlr->config.interleave_ways)
> > > > > +               return a->mode;
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > > +static const struct attribute_group region_interleave_group = {
> > > > > +       .attrs = interleave_attrs,
> > > > > +       .is_visible = visible_targets,
> > > > > +};
> > > > > +
> > > > > +static const struct attribute_group *region_groups[] = {
> > > > > +       &region_group,
> > > > > +       &region_interleave_group,
> > > > > +       NULL,
> > > > > +};
> > > > > +
> > > > >  static void cxl_region_release(struct device *dev);
> > > > >
> > > > >  static const struct device_type cxl_region_type = {
> > > > >         .name = "cxl_region",
> > > > >         .release = cxl_region_release,
> > > > > +       .groups = region_groups
> > > > >  };
> > > > >
> > > > >  static ssize_t create_region_show(struct device *dev,
> > > > > @@ -108,8 +405,11 @@ static void cxl_region_release(struct device *dev)
> > > > >  {
> > > > >         struct cxl_decoder *cxld = to_cxl_decoder(dev->parent);
> > > > >         struct cxl_region *cxlr = to_cxl_region(dev);
> > > > > +       int i;
> > > > >
> > > > >         ida_free(&cxld->region_ida, cxlr->id);
> > > > > +       for (i = 0; i < cxlr->config.interleave_ways; i++)
> > > > > +               remove_target(cxlr, i);
> > > >
> > > > Like the last patch this feels too late. I expect whatever unregisters
> > > > the region should have already handled removing the targets.
> > > >
> > >
> > > Would remove() be more appropriate?
> >
> > ->remove() does not seem a good fit since it may be the case that
> > someone wants do "echo $region >
> > /sys/bus/cxl/drivers/cxl_region/unbind; echo $region >
> > /sys/bus/cxl/drivers/cxl_region/bind;" without needing to go
> > reconfigure the targets. I am suggesting that before
> > device_unregister(&cxlr->dev) the targets are released.
>
> Okay.
>
> Why would one want to do this? I acknowledge someone *may* do that. I'd like to
> know what value you see there.

There are several debug and error handling scenarios that say "quiesce
CXL.mem". Seems reasonable to map those to "cxl disable-region", and
seems unreasonable that "cxl disable-region" requires "cxl
create-region" to get back to operational state.

  reply	other threads:[~2022-02-17 21:12 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-28  0:26 [PATCH v3 00/14] CXL Region driver Ben Widawsky
2022-01-28  0:26 ` [PATCH v3 01/14] cxl/region: Add region creation ABI Ben Widawsky
2022-01-28 18:14   ` Dan Williams
2022-01-28 18:59     ` Dan Williams
2022-02-02 18:26       ` Ben Widawsky
2022-02-02 18:28         ` Ben Widawsky
2022-02-02 18:48           ` Ben Widawsky
2022-02-02 19:00             ` Dan Williams
2022-02-02 19:02               ` Ben Widawsky
2022-02-02 19:15                 ` Dan Williams
2022-02-01 22:42     ` Ben Widawsky
2022-02-01 15:53   ` Jonathan Cameron
2022-02-17 17:10   ` [PATCH v4 " Ben Widawsky
2022-02-17 17:19     ` [PATCH v5 01/15] " Ben Widawsky
2022-02-17 17:33       ` Ben Widawsky
2022-02-17 17:58       ` Dan Williams
2022-02-17 18:58         ` Ben Widawsky
2022-02-17 20:26           ` Dan Williams
2022-02-17 22:22         ` Ben Widawsky
2022-02-17 23:32           ` Dan Williams
2022-02-18 16:41             ` Ben Widawsky
2022-01-28  0:26 ` [PATCH v3 02/14] cxl/region: Introduce concept of region configuration Ben Widawsky
2022-01-29  0:25   ` Dan Williams
2022-02-01 14:59     ` Ben Widawsky
2022-02-03  5:06       ` Dan Williams
2022-02-01 23:11     ` Ben Widawsky
2022-02-03 17:48       ` Dan Williams
2022-02-03 22:23         ` Ben Widawsky
2022-02-03 23:27           ` Dan Williams
2022-02-04  0:19             ` Ben Widawsky
2022-02-04  2:45               ` Dan Williams
2022-02-17 18:36     ` Ben Widawsky
2022-02-17 19:57       ` Dan Williams
2022-02-17 20:20         ` Ben Widawsky
2022-02-17 21:12           ` Dan Williams [this message]
2022-02-23 21:49         ` Ben Widawsky
2022-02-23 22:24           ` Dan Williams
2022-02-23 22:31             ` Ben Widawsky
2022-02-23 22:42               ` Dan Williams
2022-01-28  0:26 ` [PATCH v3 03/14] cxl/mem: Cache port created by the mem dev Ben Widawsky
2022-02-17  1:20   ` Dan Williams
2022-01-28  0:26 ` [PATCH v3 04/14] cxl/region: Introduce a cxl_region driver Ben Widawsky
2022-02-01 16:21   ` Jonathan Cameron
2022-02-17  6:04   ` Dan Williams
2022-01-28  0:26 ` [PATCH v3 05/14] cxl/acpi: Handle address space allocation Ben Widawsky
2022-02-18 19:17   ` Dan Williams
2022-01-28  0:26 ` [PATCH v3 06/14] cxl/region: Address " Ben Widawsky
2022-02-18 19:51   ` Dan Williams
2022-01-28  0:27 ` [PATCH v3 07/14] cxl/region: Implement XHB verification Ben Widawsky
2022-02-18 20:23   ` Dan Williams
2022-01-28  0:27 ` [PATCH v3 08/14] cxl/region: HB port config verification Ben Widawsky
2022-02-14 16:20   ` Jonathan Cameron
2022-02-14 17:51     ` Ben Widawsky
2022-02-14 18:09       ` Jonathan Cameron
2022-02-15 16:35   ` Jonathan Cameron
2022-02-18 21:04   ` Dan Williams
2022-01-28  0:27 ` [PATCH v3 09/14] cxl/region: Add infrastructure for decoder programming Ben Widawsky
2022-02-01 18:16   ` Jonathan Cameron
2022-02-18 21:53   ` Dan Williams
2022-01-28  0:27 ` [PATCH v3 10/14] cxl/region: Collect host bridge decoders Ben Widawsky
2022-02-01 18:21   ` Jonathan Cameron
2022-02-18 23:42   ` Dan Williams
2022-01-28  0:27 ` [PATCH v3 11/14] cxl/region: Add support for single switch level Ben Widawsky
2022-02-01 18:26   ` Jonathan Cameron
2022-02-15 16:10   ` Jonathan Cameron
2022-02-18 18:23     ` Jonathan Cameron
2022-01-28  0:27 ` [PATCH v3 12/14] cxl: Program decoders for regions Ben Widawsky
2022-02-24  0:08   ` Dan Williams
2022-01-28  0:27 ` [PATCH v3 13/14] cxl/pmem: Convert nvdimm bridge API to use dev Ben Widawsky
2022-01-28  0:27 ` [PATCH v3 14/14] cxl/region: Create an nd_region Ben Widawsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4hOFpVRs=D=ppMWv668dX0deaOVYTLG8QxAp45t3ctDfw@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=ben.widawsky@intel.com \
    --cc=helgaas@kernel.org \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=patches@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.