All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>, <linux-cxl@vger.kernel.org>,
	Linux PCI <linux-pci@vger.kernel.org>,
	Bjorn Helgaas <helgaas@kernel.org>,
	"Lorenzo Pieralisi" <lorenzo.pieralisi@arm.com>,
	Ben Widawsky <ben.widawsky@intel.com>,
	Chris Browy <cbrowy@avery-design.com>,
	Linux ACPI <linux-acpi@vger.kernel.org>,
	"Schofield, Alison" <alison.schofield@intel.com>,
	Vishal L Verma <vishal.l.verma@intel.com>,
	Linuxarm <linuxarm@huawei.com>, Fangjian <f.fangjian@huawei.com>,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [RFC PATCH v3 2/4] PCI/doe: Add Data Object Exchange support
Date: Wed, 19 May 2021 18:00:57 +0100	[thread overview]
Message-ID: <20210519180057.00002ac3@Huawei.com> (raw)
In-Reply-To: <CAPcyv4gUy0nNh-3y2wWVwM4AtO4F8OOJCtWz_ZH7Eu0H=oymuw@mail.gmail.com>

On Wed, 19 May 2021 09:53:33 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> On Wed, May 19, 2021 at 9:35 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Wed, 19 May 2021 17:20:52 +0100
> > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> >  
> > > On Wed, 19 May 2021 08:29:58 -0700
> > > Dan Williams <dan.j.williams@intel.com> wrote:
> > >  
> > > > On Wed, May 19, 2021 at 8:14 AM Jonathan Cameron
> > > > <Jonathan.Cameron@huawei.com> wrote:  
> > > > >
> > > > > On Wed, 19 May 2021 07:18:28 -0700
> > > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > > >  
> > > > > > On Tue, May 18, 2021 at 3:06 AM Jonathan Cameron
> > > > > > <Jonathan.Cameron@huawei.com> wrote:  
> > > > > > >
> > > > > > > On Mon, 17 May 2021 10:21:14 -0700
> > > > > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > > > > >  
> > > > > > > > On Mon, May 17, 2021 at 1:42 AM Jonathan Cameron
> > > > > > > > <Jonathan.Cameron@huawei.com> wrote:  
> > > > > > > > >
> > > > > > > > > On Fri, 14 May 2021 11:37:12 -0700
> > > > > > > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > > > > > > >  
> > > > > > > > > > On Fri, May 14, 2021 at 1:50 AM Jonathan Cameron
> > > > > > > > > > <Jonathan.Cameron@huawei.com> wrote:
> > > > > > > > > > [..]  
> > > > > > > > > > > > If it simplifies the kernel implementation to assume single
> > > > > > > > > > > > kernel-initiator then I think that's more than enough reason to block
> > > > > > > > > > > > out userspace, and/or provide userspace a method to get into the
> > > > > > > > > > > > kernel's queue for service.  
> > > > > > > > > > >
> > > > > > > > > > > This last suggestion makes sense to me. Let's provide a 'right' way
> > > > > > > > > > > to access the DOE from user space. I like the idea if it being possible
> > > > > > > > > > > to run CXL compliance tests from userspace whilst the driver is loaded.  
> > > > > > > > > >
> > > > > > > > > > Ah, and I like your observation that once the kernel provides a
> > > > > > > > > > "right" way to access DOE then userspace direct-access of DOE is
> > > > > > > > > > indeed a "you get to keep the pieces" event like any other unwanted
> > > > > > > > > > userspace config-write.
> > > > > > > > > >  
> > > > > > > > > > > Bjorn, given this would be a generic PCI thing, any preference for what
> > > > > > > > > > > this interface might look like?   /dev/pcidoe[xxxxxx].i with ioctls similar
> > > > > > > > > > > to those for the BAR based CXL mailboxes?  
> > > > > > > > > >
> > > > > > > > > > (warning, anti-ioctl bias incoming...)  
> > > > > > > > >
> > > > > > > > > I feel very similar about ioctls - my immediate thought was to shove this in
> > > > > > > > > debugfs, but that feels the wrong choice if we are trying to persuade people
> > > > > > > > > to use it instead of writing code that directly accesses the config space.
> > > > > > > > >  
> > > > > > > > > >
> > > > > > > > > > Hmm, DOE has an enumeration capability, could the DOE driver use a
> > > > > > > > > > scheme to have a sysfs bin_attr per discovered object type? This would
> > > > > > > > > > make it simliar to the pci-vpd sysfs interface.  
> > > > > > > > >
> > > > > > > > > We can discover the protocols, but anything beyond that is protocol
> > > > > > > > > specific.  I don't think there is a enough info available by any standards
> > > > > > > > > defined method. Also part of the reason to allow a safe userspace interface
> > > > > > > > > would be to provide a generic interface for vendor protocols and things like
> > > > > > > > > CXL compliance tests where we will almost certainly never provide a more
> > > > > > > > > specific kernel interface.
> > > > > > > > >
> > > > > > > > > Whilst sysfs would work for CDAT, some protocols are challenge response rather
> > > > > > > > > than simple read back and that really doesn't fit well for sysfs model.
> > > > > > > > > If we get other protocols that are simple data read back, then I would
> > > > > > > > > advocate giving them a simple sysfs interface much like proposed for CDAT
> > > > > > > > > as it will always be simpler to use + self describing.
> > > > > > > > >
> > > > > > > > > On a lesser note it might be helpful to provide sysfs attrs for
> > > > > > > > > what protocols are supported.  The alternative is to let userspace run
> > > > > > > > > the discovery protocol. Perhaps we can do this as a later phase.
> > > > > > > > >  
> > > > > > > > > >
> > > > > > > > > > Then the kernel could cache objects like CDAT that don't change
> > > > > > > > > > outside of some invalidation event.  
> > > > > > > > >
> > > > > > > > > It's been a while since I last saw any conversation on sysfs bin_attrs
> > > > > > > > > but mostly I thought the feeling was pretty strongly against them for anything
> > > > > > > > > but a few niche usecases.
> > > > > > > > >
> > > > > > > > > Feels to me like it would break most of the usual rules in a way vpd does
> > > > > > > > > not (IIRC VPD is supposed to be a simple in the sense that if you write a value
> > > > > > > > > to a writable part, you will read back the same value).
> > > > > > > > >
> > > > > > > > > +CC Greg who is a fount of knowledge in this area (and regularly + correctly
> > > > > > > > > screams at the ways I try to abuse sysfs :)  Note I don't think Dan was
> > > > > > > > > suggesting implementing response / request directly, but I think that is
> > > > > > > > > all we could do given DOE protocols can be vendor specific and the standard
> > > > > > > > > discovery protocol doesn't let us know the fine grained support (what commands
> > > > > > > > > within a given protocol).  
> > > > > > > >
> > > > > > > > I'm not all that interested in supporting vendor defined DOE
> > > > > > > > shenanigans. There's more than enough published DOE protocols that the
> > > > > > > > kernel could limit its support to the known set. This is similar to
> > > > > > > > how ACPI DSMs are not generically supported, but when they appear in a
> > > > > > > > published specification the kernel may then grow the support. The
> > > > > > > > supported protocols could be limited to: CDAT, PCIe IDE, CXL
> > > > > > > > Compliance, etc...
> > > > > > > >
> > > > > > > > Vendor specific DOE is in the same class as unfettered /dev/mem
> > > > > > > > access, first you need to disable the kernel's integrity and
> > > > > > > > confidentiality protections, and then you can do whatever you want. If
> > > > > > > > a vendor wants a DOE protocol supported in the "trusted" set they can
> > > > > > > > simply publish the specification and send the proper support patches.  
> > > > > > >
> > > > > > > Fair enough, though the interface should be root only, so a vendor shooting
> > > > > > > themselves in the foot this way would be no different to using pcitools
> > > > > > > to access the device directly (we are just providing safety from concurrency
> > > > > > > point of view).
> > > > > > >
> > > > > > > Anyway, I can see two options for how to do this.
> > > > > > >
> > > > > > > 1) Per protocol interface. Would not be generic, as these work in entirely
> > > > > > >    different ways (some are simple read back of tables, some require complex
> > > > > > >    cycles of operations in the right order with data flowing in both directions)
> > > > > > > 2) White list those protocols we are going to let through a generic interface
> > > > > > >    Not including CXL compliance for instance as that has nasty side effects!
> > > > > > >
> > > > > > > If we want to enable userspace DOE access, I prefer option 2.
> > > > > > >
> > > > > > > Note that I wasn't that keen on a userspace interface in the first place as
> > > > > > > in my view these should all be handled in kernel.
> > > > > > > Ultimately we should have case 1 if userspace access make sense.
> > > > > > > However, if we do this we shouldn't pretend we are providing userspace
> > > > > > > access to the DOE at all.  We are providing interfaces to things that just
> > > > > > > happen to be implemented using DOE under the hood.
> > > > > > >
> > > > > > > I have a prototype of a trivial ioctl based interface. I'll send it out
> > > > > > > as an RFC later this week.  Might add a white list, depending on where
> > > > > > > this discussion goes.
> > > > > > >  
> > > > > >
> > > > > > I'd say let's do this in typical Linux fashion and not solve future
> > > > > > problems before they need to be solved. I.e. start small and build
> > > > > > incrementally. To me that looks like a sysfs interface to convey a
> > > > > > cached copy of a CDAT with an internal interface for a driver to
> > > > > > trigger invalidations and re-reads on the next access. This would
> > > > > > assume that userspace may have left the DOE in an indeterminate state
> > > > > > and an abort cycle may be needed. A 1 second delay for the rare case
> > > > > > where a collision is detected seems reasonable for just CDAT
> > > > > > retrieval.  
> > > > >
> > > > > The problem is you can not detect a collision.  
> > > >
> > > > This discussion started because Ira questioned the handling of the
> > > > busy status. If the DOE is busy and the kernel did not make it busy
> > > > then there was a collision, no?  
> > >
> > > True, but not complete. Not having busy set does not mean there
> > > wasn't a collision. Busy is an indication that the EP can't recieve
> > > a new request (no space in buffer or similar), not that there is a response
> > > still to be sent back.  We have no way to tell if there is a response
> > > going to come back in the future. There is no 'exchange in flight' flag.  
> >
> > Perhaps useful to add a quote from the DOE ECN here.
> > This is in an implementation note on Page 6.
> >
> > "The DOE Busy bit can be used to indicate that the DOE responder is
> >  temporarily unable to accept a data object. It is necessary for a
> >  DOE requester to ensure that individual data object transfers are
> >  completed, and that a request/response contract is completed, for
> >  example using a mutex mechanism to block other conflicting traffic
> >  for cases where such conflicts are possible."  
> 
> I read that as the specification mandating my proposal to disallow
> multi-initiator access. My only mistake was making the exclusion apply
> to reads and not limiting it to the minimum of config write exclusion.

Key thing is even that isn't enough.   The mutex isn't about stopping
temporary access, it's about ensuring "request/response contract is completed".
So you would need userspace to be able to take a lock to stop the kernel
from using the DOE whilst it completes it's request/response pair and
userspace to guarantee it doesn't do anything stupid.

Easiest way to do that is provide proper interfaces that allows the
kernel to fully mediate the access + don't support direct userspace access
for normal operation. (treat it the same as an other config space write)

Jonathan

  reply	other threads:[~2021-05-19 17:02 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-19 16:54 [RFC PATCH v3 0/4] PCI Data Object Exchange support + CXL CDAT Jonathan Cameron
2021-04-19 16:54 ` [RFC PATCH v3 1/4] PCI: Add vendor ID for the PCI SIG Jonathan Cameron
2021-04-19 16:54 ` [RFC PATCH v3 2/4] PCI/doe: Add Data Object Exchange support Jonathan Cameron
2021-05-06 21:59   ` Ira Weiny
2021-05-11 16:50     ` Jonathan Cameron
2021-05-13 21:20       ` Dan Williams
2021-05-14  8:47         ` Jonathan Cameron
2021-05-14 11:15           ` Lorenzo Pieralisi
2021-05-14 12:39             ` Jonathan Cameron
2021-05-14 18:37           ` Dan Williams
2021-05-17  8:40             ` Jonathan Cameron
2021-05-17  8:51               ` Greg KH
2021-05-17 17:21               ` Dan Williams
2021-05-18 10:04                 ` Jonathan Cameron
2021-05-19 14:18                   ` Dan Williams
2021-05-19 15:11                     ` Jonathan Cameron
2021-05-19 15:29                       ` Dan Williams
2021-05-19 16:20                         ` Jonathan Cameron
2021-05-19 16:33                           ` Jonathan Cameron
2021-05-19 16:53                             ` Dan Williams
2021-05-19 17:00                               ` Jonathan Cameron [this message]
2021-05-19 19:20                                 ` Dan Williams
2021-05-19 20:18                                   ` Jonathan Cameron
2021-05-19 23:51                                     ` Dan Williams
2021-05-20  0:16                                       ` Dan Williams
2021-05-20  8:22                                       ` Jonathan Cameron
2021-05-07  9:36   ` Jonathan Cameron
2021-05-07 23:10   ` Bjorn Helgaas
2021-05-12 12:44     ` Jonathan Cameron
2021-04-19 16:54 ` [RFC PATCH v3 3/4] cxl/mem: Add CDAT table reading from DOE Jonathan Cameron
2021-04-19 16:54 ` [RFC PATCH v3 4/4] cxl/mem: Add a debug parser for CDAT commands Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210519180057.00002ac3@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=ben.widawsky@intel.com \
    --cc=cbrowy@avery-design.com \
    --cc=dan.j.williams@intel.com \
    --cc=f.fangjian@huawei.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=ira.weiny@intel.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.