linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: Ira Weiny <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Christoph Hellwig <hch@infradead.org>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ben Widawsky <ben.widawsky@intel.com>,
	<linux-kernel@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
	<linux-pci@vger.kernel.org>
Subject: Re: [PATCH V8 03/10] PCI: Create PCI library functions in support of DOE mailboxes.
Date: Wed, 1 Jun 2022 15:23:23 +0100	[thread overview]
Message-ID: <20220601152323.00004b9e@Huawei.com> (raw)
In-Reply-To: <20220601071808.GA19924@wunner.de>

On Wed, 1 Jun 2022 09:18:08 +0200
Lukas Wunner <lukas@wunner.de> wrote:

> On Tue, May 31, 2022 at 07:59:21PM -0700, Ira Weiny wrote:
> > On Tue, May 31, 2022 at 11:33:50AM +0100, Jonathan Cameron wrote:  
> > > On Mon, 30 May 2022 21:06:57 +0200 Lukas Wunner <lukas@wunner.de> wrote:  
> > > > On Thu, Apr 14, 2022 at 01:32:30PM -0700, ira.weiny@intel.com wrote:  
> > > > > +	/* First 2 dwords have already been read */
> > > > > +	length -= 2;
> > > > > +	/* Read the rest of the response payload */
> > > > > +	for (i = 0; i < min(length, task->response_pl_sz / sizeof(u32)); i++) {
> > > > > +		pci_read_config_dword(pdev, offset + PCI_DOE_READ,
> > > > > +				      &task->response_pl[i]);
> > > > > +		pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0);
> > > > > +	}    
> > > > 
> > > > You need to check the Data Object Ready bit.  The device may clear the
> > > > bit prematurely (e.g. as a result of a concurrent FLR or Conventional
> > > > Reset).  You'll continue reading zero dwords from the mailbox and
> > > > pretend success to the caller even though the response is truncated.
> > > > 
> > > > If you're concerned about performance when checking the bit on every
> > > > loop iteration, checking it only on the last but one iteration should
> > > > be sufficient to detect truncation.  
> > > 
> > > Good catch - I hate corner cases.  Thankfully this one is trivial to
> > > check for.  
> > 
> > Ok looking at the spec:  Strictly speaking this needs to happen multiple
> > times both in doe_statemachine_work() and inside pci_doe_recv_resp();
> > not just in this loop.  :-(
> > 
> > This is because, the check in doe_statemachine_work() only covers the
> > 1st dword read IIUC.  
> 
> The spec says "this bit indicates the DOE instance has a *data object*
> available to be read by system firmware/software".
> 
> So, the entire object is available for reading, not just one dword.

Agreed

> 
> You've already got checks in place for the first two dwords which
> cover reading an "all zeroes" response.  No need to amend them.
> 
> You only need to re-check the Data Object Ready bit on the last-but-one
> dword in case the function was reset concurrently.  Per sec. 6.30.2,
> "An FLR to a Function must result in the aborting of any DOE transfer
> in progress."

Ouch, isn't that racy as you can only check it slightly before reading the
last dword and a reset could occur in between?

> 
> 
> > > > > +static irqreturn_t pci_doe_irq_handler(int irq, void *data)
> > > > > +{
> > > > > +	struct pci_doe_mb *doe_mb = data;
> > > > > +	struct pci_dev *pdev = doe_mb->pdev;
> > > > > +	int offset = doe_mb->cap_offset;
> > > > > +	u32 val;
> > > > > +
> > > > > +	pci_read_config_dword(pdev, offset + PCI_DOE_STATUS, &val);
> > > > > +
> > > > > +	/* Leave the error case to be handled outside IRQ */
> > > > > +	if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) {
> > > > > +		mod_delayed_work(system_wq, &doe_mb->statemachine, 0);
> > > > > +		return IRQ_HANDLED;
> > > > > +	}
> > > > > +
> > > > > +	if (FIELD_GET(PCI_DOE_STATUS_INT_STATUS, val)) {
> > > > > +		pci_write_config_dword(pdev, offset + PCI_DOE_STATUS,
> > > > > +					PCI_DOE_STATUS_INT_STATUS);
> > > > > +		mod_delayed_work(system_wq, &doe_mb->statemachine, 0);
> > > > > +		return IRQ_HANDLED;
> > > > > +	}
> > > > > +
> > > > > +	return IRQ_NONE;
> > > > > +}    
> > > > 
> > > > PCIe 6.0, table 7-316 says that an interrupt is also raised when
> > > > "the DOE Busy bit has been Cleared", yet such an interrupt is
> > > > not handled here.  It is incorrectly treated as a spurious
> > > > interrupt by returning IRQ_NONE.  The right thing to do
> > > > is probably to wake the state machine in case it's polling
> > > > for the Busy flag to clear.  
> > > 
> > > Ah. I remember testing this via a lot of hacking on the QEMU code
> > > to inject the various races that can occur (it was really ugly to do).
> > > 
> > > Guess we lost the handling at some point.  I think your fix
> > > is the right one.  
> > 
> > Perhaps I am missing something but digging into this more.  I disagree
> > that the handler fails to handle this case.  If I read the spec correctly
> > DOE Interrupt Status must be set when an interrupt is generated.
> > The handler wakes the state machine in that case.  The state machine
> > then checks for busy if there is work to be done.  
> 
> Right, I was mistaken, sorry for the noise.

Ah. Makes sense - managed to confuse me too ;)
I really don't like the absence of a status bit for the DOE busy bit has
cleared interrupt source, but glad we did handle it.

> 
> 
> > Normally we would not even need to check for status error.  But that is
> > special cased because clearing that status is left to the state machine.  
> 
> That however looks wrong because the DOE Interrupt Status bit is never
> cleared after a DOE Error is signaled.  The state machine performs an
> explicit abort upon an error by setting the DOE Abort bit, but that
> doesn't seem to clear DOE Interrupt Status:
> 
> Per section 6.30.2, "At any time, the system firmware/software is
> permitted to set the DOE Abort bit in the DOE Control Register,
> and the DOE instance must Clear the Data Object Ready bit,
> if not already Clear, and Clear the DOE Error bit, if already Set,
> in the DOE Status Register, within 1 second."
> 
> No mention of the DOE Interrupt Status bit, so we cannot assume that
> it's cleared by a DOE Abort and we must clear it explicitly.

Gah.

Jonathan

> 
> Thanks,
> 
> Lukas
> 


  reply	other threads:[~2022-06-01 14:28 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-14 20:32 [PATCH V8 00/10] CXL: Read CDAT and DSMAS data from the device ira.weiny
2022-04-14 20:32 ` [PATCH V8 01/10] PCI: Add vendor ID for the PCI SIG ira.weiny
2022-04-14 20:32 ` [PATCH V8 02/10] PCI: Replace magic constant for PCI Sig Vendor ID ira.weiny
2022-04-14 20:32 ` [PATCH V8 03/10] PCI: Create PCI library functions in support of DOE mailboxes ira.weiny
2022-04-28 21:27   ` Bjorn Helgaas
2022-05-02  5:36     ` ira.weiny
2022-05-30 19:06   ` Lukas Wunner
2022-05-31 10:33     ` Jonathan Cameron
2022-06-01  2:59       ` Ira Weiny
2022-06-01  7:18         ` Lukas Wunner
2022-06-01 14:23           ` Jonathan Cameron [this message]
2022-06-01 17:16           ` Ira Weiny
2022-06-01 17:56             ` Lukas Wunner
2022-06-01 20:17               ` Ira Weiny
2022-06-06 14:46             ` Jonathan Cameron
2022-06-06 19:56               ` Ira Weiny
2022-06-07  9:58                 ` Jonathan Cameron
2022-05-31 23:43     ` Ira Weiny
2022-04-14 20:32 ` [PATCH V8 04/10] cxl/pci: Create auxiliary devices for each DOE mailbox ira.weiny
2022-04-27 17:19   ` Jonathan Cameron
2022-04-28 21:09     ` ira.weiny
2022-04-29 16:38       ` Jonathan Cameron
2022-04-29 17:01         ` Dan Williams
2022-05-03 16:14           ` Jonathan Cameron
2022-04-29 15:55   ` Jonathan Cameron
2022-04-29 17:20     ` Ira Weiny
2022-05-03 15:32       ` Jonathan Cameron
2022-04-14 20:32 ` [PATCH V8 05/10] cxl/pci: Create DOE auxiliary driver ira.weiny
2022-04-27 17:43   ` Jonathan Cameron
2022-04-28 14:48     ` ira.weiny
2022-04-28 15:17       ` Jonathan Cameron
2022-04-14 20:32 ` [PATCH V8 06/10] cxl/pci: Find the DOE mailbox which supports CDAT ira.weiny
2022-04-27 17:49   ` Jonathan Cameron
2022-05-09 21:25     ` Ira Weiny
2022-04-14 20:32 ` [PATCH V8 07/10] cxl/mem: Read CDAT table ira.weiny
2022-04-27 17:55   ` Jonathan Cameron
2022-04-14 20:32 ` [PATCH V8 08/10] cxl/cdat: Introduce cxl_cdat_valid() ira.weiny
2022-04-27 17:56   ` Jonathan Cameron
2022-04-14 20:32 ` [PATCH V8 09/10] cxl/mem: Retry reading CDAT on failure ira.weiny
2022-04-27 17:57   ` Jonathan Cameron
2022-04-14 20:32 ` [PATCH V8 10/10] cxl/port: Parse out DSMAS data from CDAT table ira.weiny
2022-04-27 18:01   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220601152323.00004b9e@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=ben.widawsky@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).