All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Ben Widawsky <ben.widawsky@intel.com>
Cc: linux-cxl@vger.kernel.org, patches@lists.linux.dev,
	 kernel test robot <lkp@intel.com>,
	Alison Schofield <alison.schofield@intel.com>,
	 Ira Weiny <ira.weiny@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	 Vishal Verma <vishal.l.verma@intel.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	 Linux NVDIMM <nvdimm@lists.linux.dev>,
	Linux PCI <linux-pci@vger.kernel.org>
Subject: Re: [PATCH v3 02/14] cxl/region: Introduce concept of region configuration
Date: Thu, 3 Feb 2022 15:27:02 -0800	[thread overview]
Message-ID: <CAPcyv4icq8_9E+ziU5KKYrAepUtNP32Qf6wYGYpcUU2K1P4mAA@mail.gmail.com> (raw)
In-Reply-To: <20220203222300.gf4st36yoqjxq5q6@intel.com>

On Thu, Feb 3, 2022 at 2:23 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> On 22-02-03 09:48:49, Dan Williams wrote:
> > On Tue, Feb 1, 2022 at 3:11 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > >
> > > On 22-01-28 16:25:34, Dan Williams wrote:
> > > > On Thu, Jan 27, 2022 at 4:27 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > > > >
> > > > > The region creation APIs create a vacant region. Configuring the region
> > > > > works in the same way as similar subsystems such as devdax. Sysfs attrs
> > > > > will be provided to allow userspace to configure the region.  Finally
> > > > > once all configuration is complete, userspace may activate the region.
> > > > >
> > > > > Introduced here are the most basic attributes needed to configure a
> > > > > region. Details of these attribute are described in the ABI
> > > >
> > > > s/attribute/attributes/
> > > >
> > > > > Documentation. Sanity checking of configuration parameters are done at
> > > > > region binding time. This consolidates all such logic in one place,
> > > > > rather than being strewn across multiple places.
> > > >
> > > > I think that's too late for some of the validation. The complex
> > > > validation that the region driver does throughout the topology is
> > > > different from the basic input validation that can  be done at the
> > > > sysfs write time. For example ,this patch allows negative
> > > > interleave_granularity values to specified, just return -EINVAL. I
> > > > agree that sysfs should not validate everything, I disagree with
> > > > pushing all validation to cxl_region_probe().
> > > >
> > >
> > > Two points:c
> > > 1. How do we distinguish "basic input validation". It'd be good if we could
> > >    define "basic input validation". For instance, when I first wrote these
> > >    patches, x3 would have been EINVAL, but today it's allowed. Can you help
> > >    enumerate what you consider basic.
> >
> > I internalized this kernel design principle from Dave Miller many
> > years ago paraphrasing "push decision making out to leaf code as much
> > as possible", and centralizing all validation in cxl_region_probe()
> > violates. The software that makes the mistake does not know it made a
> > mistake until much later and "probe failed" is less descriptive than
> > "EINVAL writing interleave_ways" . I wish I could find the thread
> > because it also talked about his iteration process.
>
> It would definitely be interesting to understand why pushing decision making
> into the leaf code is a violation. Was it primary around the descriptiveness of
> the error?

You mean the other way round, why is it a violation to move decision
making into the core? It was a comment about the inflexibility of the
core logic vs leaf logic, in the case of CXL it's about the
observability of errors at the right granularity which the core can
not do because the core is disconnected from the transaction that
injected the error.

> > Basic input validation to me is things like:
> >
> > - Don't allow writes while the region is active
> > - Check that values are in bound. So yes, the interleave-ways value of
> > 3 would fail until the kernel supports it, and granularity values >
> > 16K would also fail.
> > - Check that memdevs are actually downstream targets of the given decoder
> > - Check that the region uuid is unique
>
> These are obviously easy and informative at attr store time (in fact, active was
> meant to be checked already for many cases). So if we agree to codify this at
> probe via WARN, and add it to kdoc, I've no problem with it.

Why is WARN needed? Either the sysfs validation does it job correctly
or it doesn't. Also if sysfs didn't WARN when the bad input is
specified why would the core do anything higher than dev_err()?
Basically I think the bar for WARN is obvious kernel programming error
where only a kernel-developer will see it vs EINVAL at runtime
scenarios. I have seen Greg raise the bar for WARN in his reviews
given how many deployments turn on 'panic_on_warn'.

> > - Check that decoder has capacity
> > - Check that the memdev has capacity
> > - Check that the decoder to map the DPA is actually available given
> > decoders must be programmed in increasing DPA order
> >
> > Essentially any validation short of walking the topology to program
> > upstream decoders since those errors are only resolved by racing
> > region probes that try to grab upstream decoder resources.
> >
>
> I intentionally avoided doing a lot of these until probe because it seemed like
> not a great policy to deny regions from being populated if another region
> utilizing those resources hasn't been bound yes. For a simple example, if x1
> region A is created and utilizes all of memdev ɑ's capacity you block out any
> other region setup using memdev ɑ, even if region A wasn't bound. There's a
> similar problem with specifying decoders as part of configuration.
>
> I'll infer from your comment that you are fine with this tradeoff, or you have
> some other way to manage this in mind.

It comes back to observability if threadA allocates all the DPA then
yes all other threads should see -ENOSPC. No different than if 3 fdisk
threads all tried to create a partition, the first one to the kernel
wins. If threadA does not end up activating that regionA's capacity
that's userspace's fault, and the admin needs to make sure that
configuration does not race itself. The kernel allocating DPA
immediately lets those races be found early such that threadB finds
all the DPA gone and stops trying to create the region.

> I really see any validation which requires removal of resources from the system
> to be more fit for bind time. I suppose if the proposal is to move the region
> attributes to be DEVICE_ATTR_ADMIN, that pushes the problem onto the system
> administrator. It just seemed like most of the interface could be non-root.

None of the sysfs entries for CXL are writable by non-root.

DEVICE_ATTR_RW() is 0644
DEVICE_ATTR_ADMIN_RW() is 0600

Yes, pushing the problem onto the sysadmin is the only option. Only
CAP_SYS_ADMIN can be trusted to muck with the physical address layout
of the system. Even then CONFIG_LOCKDOWN_KERNEL wants to limit what
CAP_SYS_ADMIN can to do the memory configuration, so I don't see any
room for non-root to be considered in this ABI.

>
> > >
> > > 2. I like the idea that all validation takes place in one place. Obviously you
> > >    do not. So, see #1 and I will rework.
> >
> > The validation helpers need to be written once, where they are called
> > does not much matter, does it?
> >
>
> Somewhat addressed above too...
>
> I think that depends on whether the full list is established as mentioned. If in
> the region driver we can put several assertions that a variety of things don't
> need [re]validation, then it doesn't matter. Without this, when trying to debug
> or add code you need to figure out which place is doing the validation and which
> place should do it.

All I can say it has not been a problem in practice for NVDIMM debug
scenarios which does validation at probe for pre-existing namespaces
and validation at sysfs write for namespace creation.

> At the very least I think the plan should be established in a kdoc.

Sure, a "CXL Region: Theory of Operation" would be a good document to
lead into the patch series as a follow-on to "CXL Bus: Theory of
Operation".

  reply	other threads:[~2022-02-03 23:27 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-28  0:26 [PATCH v3 00/14] CXL Region driver Ben Widawsky
2022-01-28  0:26 ` [PATCH v3 01/14] cxl/region: Add region creation ABI Ben Widawsky
2022-01-28 18:14   ` Dan Williams
2022-01-28 18:59     ` Dan Williams
2022-02-02 18:26       ` Ben Widawsky
2022-02-02 18:28         ` Ben Widawsky
2022-02-02 18:48           ` Ben Widawsky
2022-02-02 19:00             ` Dan Williams
2022-02-02 19:02               ` Ben Widawsky
2022-02-02 19:15                 ` Dan Williams
2022-02-01 22:42     ` Ben Widawsky
2022-02-01 15:53   ` Jonathan Cameron
2022-02-17 17:10   ` [PATCH v4 " Ben Widawsky
2022-02-17 17:19     ` [PATCH v5 01/15] " Ben Widawsky
2022-02-17 17:33       ` Ben Widawsky
2022-02-17 17:58       ` Dan Williams
2022-02-17 18:58         ` Ben Widawsky
2022-02-17 20:26           ` Dan Williams
2022-02-17 22:22         ` Ben Widawsky
2022-02-17 23:32           ` Dan Williams
2022-02-18 16:41             ` Ben Widawsky
2022-01-28  0:26 ` [PATCH v3 02/14] cxl/region: Introduce concept of region configuration Ben Widawsky
2022-01-29  0:25   ` Dan Williams
2022-02-01 14:59     ` Ben Widawsky
2022-02-03  5:06       ` Dan Williams
2022-02-01 23:11     ` Ben Widawsky
2022-02-03 17:48       ` Dan Williams
2022-02-03 22:23         ` Ben Widawsky
2022-02-03 23:27           ` Dan Williams [this message]
2022-02-04  0:19             ` Ben Widawsky
2022-02-04  2:45               ` Dan Williams
2022-02-17 18:36     ` Ben Widawsky
2022-02-17 19:57       ` Dan Williams
2022-02-17 20:20         ` Ben Widawsky
2022-02-17 21:12           ` Dan Williams
2022-02-23 21:49         ` Ben Widawsky
2022-02-23 22:24           ` Dan Williams
2022-02-23 22:31             ` Ben Widawsky
2022-02-23 22:42               ` Dan Williams
2022-01-28  0:26 ` [PATCH v3 03/14] cxl/mem: Cache port created by the mem dev Ben Widawsky
2022-02-17  1:20   ` Dan Williams
2022-01-28  0:26 ` [PATCH v3 04/14] cxl/region: Introduce a cxl_region driver Ben Widawsky
2022-02-01 16:21   ` Jonathan Cameron
2022-02-17  6:04   ` Dan Williams
2022-01-28  0:26 ` [PATCH v3 05/14] cxl/acpi: Handle address space allocation Ben Widawsky
2022-02-18 19:17   ` Dan Williams
2022-01-28  0:26 ` [PATCH v3 06/14] cxl/region: Address " Ben Widawsky
2022-02-18 19:51   ` Dan Williams
2022-01-28  0:27 ` [PATCH v3 07/14] cxl/region: Implement XHB verification Ben Widawsky
2022-02-18 20:23   ` Dan Williams
2022-01-28  0:27 ` [PATCH v3 08/14] cxl/region: HB port config verification Ben Widawsky
2022-02-14 16:20   ` Jonathan Cameron
2022-02-14 17:51     ` Ben Widawsky
2022-02-14 18:09       ` Jonathan Cameron
2022-02-15 16:35   ` Jonathan Cameron
2022-02-18 21:04   ` Dan Williams
2022-01-28  0:27 ` [PATCH v3 09/14] cxl/region: Add infrastructure for decoder programming Ben Widawsky
2022-02-01 18:16   ` Jonathan Cameron
2022-02-18 21:53   ` Dan Williams
2022-01-28  0:27 ` [PATCH v3 10/14] cxl/region: Collect host bridge decoders Ben Widawsky
2022-02-01 18:21   ` Jonathan Cameron
2022-02-18 23:42   ` Dan Williams
2022-01-28  0:27 ` [PATCH v3 11/14] cxl/region: Add support for single switch level Ben Widawsky
2022-02-01 18:26   ` Jonathan Cameron
2022-02-15 16:10   ` Jonathan Cameron
2022-02-18 18:23     ` Jonathan Cameron
2022-01-28  0:27 ` [PATCH v3 12/14] cxl: Program decoders for regions Ben Widawsky
2022-02-24  0:08   ` Dan Williams
2022-01-28  0:27 ` [PATCH v3 13/14] cxl/pmem: Convert nvdimm bridge API to use dev Ben Widawsky
2022-01-28  0:27 ` [PATCH v3 14/14] cxl/region: Create an nd_region Ben Widawsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4icq8_9E+ziU5KKYrAepUtNP32Qf6wYGYpcUU2K1P4mAA@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=ben.widawsky@intel.com \
    --cc=helgaas@kernel.org \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=patches@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.