Linux-ACPI Archive on lore.kernel.org
 help / color / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Ira Weiny <ira.weiny@intel.com>,
	linux-cxl@vger.kernel.org, Linux PCI <linux-pci@vger.kernel.org>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Ben Widawsky <ben.widawsky@intel.com>,
	Chris Browy <cbrowy@avery-design.com>,
	Linux ACPI <linux-acpi@vger.kernel.org>,
	"Schofield, Alison" <alison.schofield@intel.com>,
	Vishal L Verma <vishal.l.verma@intel.com>,
	Linuxarm <linuxarm@huawei.com>, Fangjian <f.fangjian@huawei.com>
Subject: Re: [RFC PATCH v3 2/4] PCI/doe: Add Data Object Exchange support
Date: Thu, 13 May 2021 14:20:38 -0700
Message-ID: <CAPcyv4j=uww+85b4AbWmoPNPry_+JLEpEnuywpdC8PonXmRmEg@mail.gmail.com> (raw)
In-Reply-To: <20210511175006.00007861@Huawei.com>

On Tue, May 11, 2021 at 9:52 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Thu, 6 May 2021 14:59:34 -0700
> Ira Weiny <ira.weiny@intel.com> wrote:
>
> > On Tue, Apr 20, 2021 at 12:54:49AM +0800, Jonathan Cameron wrote:
> > > +
> > > +static int pci_doe_send_req(struct pci_doe *doe, struct pci_doe_exchange *ex)
> > > +{
> > > +   struct pci_dev *pdev = doe->pdev;
> > > +   u32 val;
> > > +   int i;
> > > +
> > > +   /*
> > > +    * Check the DOE busy bit is not set. If it is set, this could indicate
> > > +    * someone other than Linux (e.g. firmware) is using the mailbox. Note
> > > +    * it is expected that firmware and OS will negotiate access rights via
> > > +    * an, as yet to be defined method.
> > > +    */
> > > +   pci_read_config_dword(pdev, doe->cap + PCI_DOE_STATUS, &val);
> > > +   if (FIELD_GET(PCI_DOE_STATUS_BUSY, val))
> > > +           return -EBUSY;
> >
> > In discussion with Dan we believe that user space could also be issuing
> > commands and would potentially cause us to be locked out.
> >
> > We agree that firmware should be out of the way here and if it is blocking
> > the OS there is not much we can do about it.
> >
> > However, if user space is using the mailbox we need to synchronize with them
> > via pci_cfg_access_[try]lock().  This should avoid this EBUSY condition.
>
> Hi Ira, thanks for taking a look.
>
> So the question here is whether we can ever safely work with a
> userspace that is accessing the DOE.  I think the answer is no we can't.
>
> We'd have no way of knowing that userspace left the DOE in a clean state
> without resetting every time we want to use it (which can take 1 second)
> or doing significant sanity checking (can we tell if something is
> in flight?).  Note that if userspace and kernel were talking different
> protocols nothing sensible could be done to prevent them receiving each
> other's answers (unless you can rely on userspace holding the lock until
> it is done - which you can't as who trusts userspace?)

There is no ability for userpsace to lock out the kernel, only kernel
locking out userspace.

> You could do
> something horrible like back off after peeking at the protocol to see
> if it might be yours, but even that only works assuming the two are
> trying to talk different protocols (talking the same protocol isn't allowed
> but no way to enforce that using just pci_cfg_access_lock()).

Wait why isn't pci_cfg_access_lock() sufficient? The userspace DOE
transfer is halted, the kernel validates the state of DOE, does it's
work and releases the lock.

> I can't see a way to tell that the DOE might not have responded to an
> earlier request.  DOE busy indicates the write mailbox register cannot
> receive data at the moment.  If it's set then there is a message in
> flight, but if it is not set there might still be a message in flight.
> Busy only indicates if the write mailbox register can sink a request
> which doesn't in general tell us anything about the underlying state.
>
> So if userspace sent a request then quit.  Kernel driver would have
> no way of knowing if the next response was due to the request it sent
> or some earlier one (other than matching IDs)  Note you aren't allowed
> to have multiple requests for a single protocol in flight at the same
> time.  With just a lock you would have no way of preventing this.
>
> So we are back to every request the kernel sent having to be proceeded
> by an abort and potentially a 1 second delay whilst some chunk of the
> device firmware reboots.
>
> This came up in dicussion when Dan proposed the patch
> [PATCH] PCI: Allow drivers to claim exclusive access to config regions
> https://lore.kernel.org/linux-pci/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/
> Summarizing outcome of that thread.
>
> 1) Reads of DOE registers are always safe, so we shouldn't stop lspci
> and similar accessing config space.
> 2) You are on your own if any userspace writes to pci config space.
> There are loads of ways it can break the system so it doesn't make much
> sense to protect against one more.

I'm not quite as enthusiastic about Greg's assertion that "we're
already broken why not allow more breakage" as he was also the one
supportive of /dev/mem restrictions in the face of obvious collisions.
I'll circle back and say as much to Greg. My mistake was not realizing
the write dependency in the protocol, so the pushback was warranted
that the kernel does not need to block out all access.

Given that /dev/mem is optionally disabled for userspace access
outside of kernel-lockdown scenarios, I think it is reasonable to have
the kernel disable config writes to a register block and the request
of a driver.

Consider that userspace can certainly trash the system by writing to
the BAR registers, for example, but a non-malicious userspace has no
reason to do that. Unfortunately DOE has some utility for a
non-malicious userspace to access so there is a rationale to figure
out a cooperation scheme.

>
> If there is a reason to provide a userspace interface to a DOE for a
> device with a driver attached, then I would agree with Dan's suggestion
> to use a proper driver for it.
>
> Dan briefly mentioned that temporary blocking might be needed. I'm guessing
> that was to try and let userspace safely use the DOE.
>
> The driver would work fine ignoring busy entirely and would perhaps be
> less confusing as a result.  We reset the DOE at startup anyway and that
> would clear existing busy.  Any future times busy is set would have no
> impact on the flow.

If it simplifies the kernel implementation to assume single
kernel-initiator then I think that's more than enough reason to block
out userspace, and/or provide userspace a method to get into the
kernel's queue for service.

  reply index

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-19 16:54 [RFC PATCH v3 0/4] PCI Data Object Exchange support + CXL CDAT Jonathan Cameron
2021-04-19 16:54 ` [RFC PATCH v3 1/4] PCI: Add vendor ID for the PCI SIG Jonathan Cameron
2021-04-19 16:54 ` [RFC PATCH v3 2/4] PCI/doe: Add Data Object Exchange support Jonathan Cameron
2021-05-06 21:59   ` Ira Weiny
2021-05-11 16:50     ` Jonathan Cameron
2021-05-13 21:20       ` Dan Williams [this message]
2021-05-14  8:47         ` Jonathan Cameron
2021-05-14 11:15           ` Lorenzo Pieralisi
2021-05-14 12:39             ` Jonathan Cameron
2021-05-14 18:37           ` Dan Williams
2021-05-17  8:40             ` Jonathan Cameron
2021-05-17  8:51               ` Greg KH
2021-05-17 17:21               ` Dan Williams
2021-05-18 10:04                 ` Jonathan Cameron
2021-05-19 14:18                   ` Dan Williams
2021-05-19 15:11                     ` Jonathan Cameron
2021-05-19 15:29                       ` Dan Williams
2021-05-19 16:20                         ` Jonathan Cameron
2021-05-19 16:33                           ` Jonathan Cameron
2021-05-19 16:53                             ` Dan Williams
2021-05-19 17:00                               ` Jonathan Cameron
2021-05-19 19:20                                 ` Dan Williams
2021-05-19 20:18                                   ` Jonathan Cameron
2021-05-19 23:51                                     ` Dan Williams
2021-05-20  0:16                                       ` Dan Williams
2021-05-20  8:22                                       ` Jonathan Cameron
2021-05-07  9:36   ` Jonathan Cameron
2021-05-07 23:10   ` Bjorn Helgaas
2021-05-12 12:44     ` Jonathan Cameron
2021-04-19 16:54 ` [RFC PATCH v3 3/4] cxl/mem: Add CDAT table reading from DOE Jonathan Cameron
2021-04-19 16:54 ` [RFC PATCH v3 4/4] cxl/mem: Add a debug parser for CDAT commands Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4j=uww+85b4AbWmoPNPry_+JLEpEnuywpdC8PonXmRmEg@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=ben.widawsky@intel.com \
    --cc=cbrowy@avery-design.com \
    --cc=f.fangjian@huawei.com \
    --cc=helgaas@kernel.org \
    --cc=ira.weiny@intel.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ACPI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-acpi/0 linux-acpi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-acpi linux-acpi/ https://lore.kernel.org/linux-acpi \
		linux-acpi@vger.kernel.org
	public-inbox-index linux-acpi

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-acpi


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git