Linux-ACPI Archive on lore.kernel.org
 help / color / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>, <linux-cxl@vger.kernel.org>,
	Linux PCI <linux-pci@vger.kernel.org>,
	Bjorn Helgaas <helgaas@kernel.org>,
	"Lorenzo Pieralisi" <lorenzo.pieralisi@arm.com>,
	Ben Widawsky <ben.widawsky@intel.com>,
	Chris Browy <cbrowy@avery-design.com>,
	Linux ACPI <linux-acpi@vger.kernel.org>,
	"Schofield, Alison" <alison.schofield@intel.com>,
	Vishal L Verma <vishal.l.verma@intel.com>,
	Linuxarm <linuxarm@huawei.com>, Fangjian <f.fangjian@huawei.com>,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [RFC PATCH v3 2/4] PCI/doe: Add Data Object Exchange support
Date: Wed, 19 May 2021 17:33:52 +0100
Message-ID: <20210519173352.000026fe@Huawei.com> (raw)
In-Reply-To: <20210519172052.00002124@Huawei.com>

On Wed, 19 May 2021 17:20:52 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Wed, 19 May 2021 08:29:58 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > On Wed, May 19, 2021 at 8:14 AM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:  
> > >
> > > On Wed, 19 May 2021 07:18:28 -0700
> > > Dan Williams <dan.j.williams@intel.com> wrote:
> > >    
> > > > On Tue, May 18, 2021 at 3:06 AM Jonathan Cameron
> > > > <Jonathan.Cameron@huawei.com> wrote:    
> > > > >
> > > > > On Mon, 17 May 2021 10:21:14 -0700
> > > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > > >    
> > > > > > On Mon, May 17, 2021 at 1:42 AM Jonathan Cameron
> > > > > > <Jonathan.Cameron@huawei.com> wrote:    
> > > > > > >
> > > > > > > On Fri, 14 May 2021 11:37:12 -0700
> > > > > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > > > > >    
> > > > > > > > On Fri, May 14, 2021 at 1:50 AM Jonathan Cameron
> > > > > > > > <Jonathan.Cameron@huawei.com> wrote:
> > > > > > > > [..]    
> > > > > > > > > > If it simplifies the kernel implementation to assume single
> > > > > > > > > > kernel-initiator then I think that's more than enough reason to block
> > > > > > > > > > out userspace, and/or provide userspace a method to get into the
> > > > > > > > > > kernel's queue for service.    
> > > > > > > > >
> > > > > > > > > This last suggestion makes sense to me. Let's provide a 'right' way
> > > > > > > > > to access the DOE from user space. I like the idea if it being possible
> > > > > > > > > to run CXL compliance tests from userspace whilst the driver is loaded.    
> > > > > > > >
> > > > > > > > Ah, and I like your observation that once the kernel provides a
> > > > > > > > "right" way to access DOE then userspace direct-access of DOE is
> > > > > > > > indeed a "you get to keep the pieces" event like any other unwanted
> > > > > > > > userspace config-write.
> > > > > > > >    
> > > > > > > > > Bjorn, given this would be a generic PCI thing, any preference for what
> > > > > > > > > this interface might look like?   /dev/pcidoe[xxxxxx].i with ioctls similar
> > > > > > > > > to those for the BAR based CXL mailboxes?    
> > > > > > > >
> > > > > > > > (warning, anti-ioctl bias incoming...)    
> > > > > > >
> > > > > > > I feel very similar about ioctls - my immediate thought was to shove this in
> > > > > > > debugfs, but that feels the wrong choice if we are trying to persuade people
> > > > > > > to use it instead of writing code that directly accesses the config space.
> > > > > > >    
> > > > > > > >
> > > > > > > > Hmm, DOE has an enumeration capability, could the DOE driver use a
> > > > > > > > scheme to have a sysfs bin_attr per discovered object type? This would
> > > > > > > > make it simliar to the pci-vpd sysfs interface.    
> > > > > > >
> > > > > > > We can discover the protocols, but anything beyond that is protocol
> > > > > > > specific.  I don't think there is a enough info available by any standards
> > > > > > > defined method. Also part of the reason to allow a safe userspace interface
> > > > > > > would be to provide a generic interface for vendor protocols and things like
> > > > > > > CXL compliance tests where we will almost certainly never provide a more
> > > > > > > specific kernel interface.
> > > > > > >
> > > > > > > Whilst sysfs would work for CDAT, some protocols are challenge response rather
> > > > > > > than simple read back and that really doesn't fit well for sysfs model.
> > > > > > > If we get other protocols that are simple data read back, then I would
> > > > > > > advocate giving them a simple sysfs interface much like proposed for CDAT
> > > > > > > as it will always be simpler to use + self describing.
> > > > > > >
> > > > > > > On a lesser note it might be helpful to provide sysfs attrs for
> > > > > > > what protocols are supported.  The alternative is to let userspace run
> > > > > > > the discovery protocol. Perhaps we can do this as a later phase.
> > > > > > >    
> > > > > > > >
> > > > > > > > Then the kernel could cache objects like CDAT that don't change
> > > > > > > > outside of some invalidation event.    
> > > > > > >
> > > > > > > It's been a while since I last saw any conversation on sysfs bin_attrs
> > > > > > > but mostly I thought the feeling was pretty strongly against them for anything
> > > > > > > but a few niche usecases.
> > > > > > >
> > > > > > > Feels to me like it would break most of the usual rules in a way vpd does
> > > > > > > not (IIRC VPD is supposed to be a simple in the sense that if you write a value
> > > > > > > to a writable part, you will read back the same value).
> > > > > > >
> > > > > > > +CC Greg who is a fount of knowledge in this area (and regularly + correctly
> > > > > > > screams at the ways I try to abuse sysfs :)  Note I don't think Dan was
> > > > > > > suggesting implementing response / request directly, but I think that is
> > > > > > > all we could do given DOE protocols can be vendor specific and the standard
> > > > > > > discovery protocol doesn't let us know the fine grained support (what commands
> > > > > > > within a given protocol).    
> > > > > >
> > > > > > I'm not all that interested in supporting vendor defined DOE
> > > > > > shenanigans. There's more than enough published DOE protocols that the
> > > > > > kernel could limit its support to the known set. This is similar to
> > > > > > how ACPI DSMs are not generically supported, but when they appear in a
> > > > > > published specification the kernel may then grow the support. The
> > > > > > supported protocols could be limited to: CDAT, PCIe IDE, CXL
> > > > > > Compliance, etc...
> > > > > >
> > > > > > Vendor specific DOE is in the same class as unfettered /dev/mem
> > > > > > access, first you need to disable the kernel's integrity and
> > > > > > confidentiality protections, and then you can do whatever you want. If
> > > > > > a vendor wants a DOE protocol supported in the "trusted" set they can
> > > > > > simply publish the specification and send the proper support patches.    
> > > > >
> > > > > Fair enough, though the interface should be root only, so a vendor shooting
> > > > > themselves in the foot this way would be no different to using pcitools
> > > > > to access the device directly (we are just providing safety from concurrency
> > > > > point of view).
> > > > >
> > > > > Anyway, I can see two options for how to do this.
> > > > >
> > > > > 1) Per protocol interface. Would not be generic, as these work in entirely
> > > > >    different ways (some are simple read back of tables, some require complex
> > > > >    cycles of operations in the right order with data flowing in both directions)
> > > > > 2) White list those protocols we are going to let through a generic interface
> > > > >    Not including CXL compliance for instance as that has nasty side effects!
> > > > >
> > > > > If we want to enable userspace DOE access, I prefer option 2.
> > > > >
> > > > > Note that I wasn't that keen on a userspace interface in the first place as
> > > > > in my view these should all be handled in kernel.
> > > > > Ultimately we should have case 1 if userspace access make sense.
> > > > > However, if we do this we shouldn't pretend we are providing userspace
> > > > > access to the DOE at all.  We are providing interfaces to things that just
> > > > > happen to be implemented using DOE under the hood.
> > > > >
> > > > > I have a prototype of a trivial ioctl based interface. I'll send it out
> > > > > as an RFC later this week.  Might add a white list, depending on where
> > > > > this discussion goes.
> > > > >    
> > > >
> > > > I'd say let's do this in typical Linux fashion and not solve future
> > > > problems before they need to be solved. I.e. start small and build
> > > > incrementally. To me that looks like a sysfs interface to convey a
> > > > cached copy of a CDAT with an internal interface for a driver to
> > > > trigger invalidations and re-reads on the next access. This would
> > > > assume that userspace may have left the DOE in an indeterminate state
> > > > and an abort cycle may be needed. A 1 second delay for the rare case
> > > > where a collision is detected seems reasonable for just CDAT
> > > > retrieval.    
> > >
> > > The problem is you can not detect a collision.    
> > 
> > This discussion started because Ira questioned the handling of the
> > busy status. If the DOE is busy and the kernel did not make it busy
> > then there was a collision, no?  
> 
> True, but not complete. Not having busy set does not mean there
> wasn't a collision. Busy is an indication that the EP can't recieve
> a new request (no space in buffer or similar), not that there is a response
> still to be sent back.  We have no way to tell if there is a response
> going to come back in the future. There is no 'exchange in flight' flag.

Perhaps useful to add a quote from the DOE ECN here.
This is in an implementation note on Page 6.

"The DOE Busy bit can be used to indicate that the DOE responder is
 temporarily unable to accept a data object. It is necessary for a
 DOE requester to ensure that individual data object transfers are
 completed, and that a request/response contract is completed, for
 example using a mutex mechanism to block other conflicting traffic
 for cases where such conflicts are possible."

Given you must not issue a second request to a given protocol until
the response is received (unless you issue an abort) you can't safely
do anything if we assume userspace might be using the DOE directly
other than by issuing an abort every time.

I had previously missed that a DOE busy clear can cause an interrupt
but that would only avoid us polling for busy (probably not worth
the handling code). It doesn't help us with this problem.

> 
> >   
> > > Hence it's a reset every
> > > time you use the DOE from in the kernel.    
> > 
> > I would expect:
> > 
> > pci_cfg_access_lock()
> > if (busy)
> >     reset();  
> 
> No, it would need to be unconditional, or you might get a response
> to a previously issued request mid way through your operation, and you
> can't in general tell that it wasn't the response to your request.
> For example you might get the wrong part of CDAT. You'd need to read
> the whole of CDAT to then verify the checksum to detect this had happened.
> 
> > do_doe_operation();
> > pci_cfg_access_unlock();
> >   
> > > Personally I think that this
> > > is fixing a problem that doesn't exist.    
> > 
> > Again this is all stemming from the driver handling "busy" especially
> > with the knowledged that it did not busy it itself.  
> 
> As above, busy doesn't do what it's name might suggest. It doesn't
> mean there is a request / response pair in flight.
> 
> >   
> > > Userspace should not access
> > > the DOE when a driver is loaded in exactly the same way it shouldn't
> > > be writing to anywhere else in config space under normal circumstances.    
> > 
> > DOE has data that is similar to what can be retrieved via config read,
> > it just so happens that reading that data triggers writes. DOE is in a
> > different class than other configuration writes in terms of the
> > userspace motivation to access it.  
> 
> For CDAT yes, for other protocols, (e.g. IDE, CXL compliance) definitely not.
> Using the protocols is as destructive as a write to config space.
> 
> >   
> > > I really don't see this as special. If we think it is special then
> > > we should provide a safe interface.    
> > 
> >   
> > >
> > > Given it's nearly done, I might send out the ioctl proposal and
> > > we can can just decide to leave it unmerged for now, pending real
> > > usecases being established.    
> > 
> > Fair enough.  
> 


  reply index

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-19 16:54 [RFC PATCH v3 0/4] PCI Data Object Exchange support + CXL CDAT Jonathan Cameron
2021-04-19 16:54 ` [RFC PATCH v3 1/4] PCI: Add vendor ID for the PCI SIG Jonathan Cameron
2021-04-19 16:54 ` [RFC PATCH v3 2/4] PCI/doe: Add Data Object Exchange support Jonathan Cameron
2021-05-06 21:59   ` Ira Weiny
2021-05-11 16:50     ` Jonathan Cameron
2021-05-13 21:20       ` Dan Williams
2021-05-14  8:47         ` Jonathan Cameron
2021-05-14 11:15           ` Lorenzo Pieralisi
2021-05-14 12:39             ` Jonathan Cameron
2021-05-14 18:37           ` Dan Williams
2021-05-17  8:40             ` Jonathan Cameron
2021-05-17  8:51               ` Greg KH
2021-05-17 17:21               ` Dan Williams
2021-05-18 10:04                 ` Jonathan Cameron
2021-05-19 14:18                   ` Dan Williams
2021-05-19 15:11                     ` Jonathan Cameron
2021-05-19 15:29                       ` Dan Williams
2021-05-19 16:20                         ` Jonathan Cameron
2021-05-19 16:33                           ` Jonathan Cameron [this message]
2021-05-19 16:53                             ` Dan Williams
2021-05-19 17:00                               ` Jonathan Cameron
2021-05-19 19:20                                 ` Dan Williams
2021-05-19 20:18                                   ` Jonathan Cameron
2021-05-19 23:51                                     ` Dan Williams
2021-05-20  0:16                                       ` Dan Williams
2021-05-20  8:22                                       ` Jonathan Cameron
2021-05-07  9:36   ` Jonathan Cameron
2021-05-07 23:10   ` Bjorn Helgaas
2021-05-12 12:44     ` Jonathan Cameron
2021-04-19 16:54 ` [RFC PATCH v3 3/4] cxl/mem: Add CDAT table reading from DOE Jonathan Cameron
2021-04-19 16:54 ` [RFC PATCH v3 4/4] cxl/mem: Add a debug parser for CDAT commands Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210519173352.000026fe@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=ben.widawsky@intel.com \
    --cc=cbrowy@avery-design.com \
    --cc=dan.j.williams@intel.com \
    --cc=f.fangjian@huawei.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=ira.weiny@intel.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ACPI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-acpi/0 linux-acpi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-acpi linux-acpi/ https://lore.kernel.org/linux-acpi \
		linux-acpi@vger.kernel.org
	public-inbox-index linux-acpi

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-acpi


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git