linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: kelvin.cao@microchip.com
Cc: kurt.schwemmer@microsemi.com, logang@deltatee.com,
	bhelgaas@google.com, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, kelvincao@outlook.com
Subject: Re: [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access
Date: Fri, 1 Oct 2021 15:18:22 -0500	[thread overview]
Message-ID: <20211001201822.GA962472@bhelgaas> (raw)
In-Reply-To: <20210924110842.6323-2-kelvin.cao@microchip.com>

On Fri, Sep 24, 2021 at 11:08:38AM +0000, kelvin.cao@microchip.com wrote:
> From: Kelvin Cao <kelvin.cao@microchip.com>
> 
> After a firmware hard reset, MRPC command executions, which are based
> on the PCI BAR (which Microchip refers to as GAS) read/write, will hang
> indefinitely. This is because after a reset, the host will fail all GAS
> reads (get all 1s), in which case the driver won't get a valid MRPC
> status.

Trying to write a merge commit log for this, but having a hard time
summarizing it.  It sounds like it covers both Switchtec-specific
(firmware and MRPC commands) and generic PCIe behavior (MMIO read
failures).

This has something to do with a firmware hard reset.  What is that?
Is that like a firmware reboot?  A device reset, e.g., FLR or
secondary bus reset, that causes a firmware reboot?  A device reset
initiated by firmware?

Anyway, apparently when that happens, MMIO reads to the switch fail
(timeout or error completion on PCIe) for a while.  If a device reset
is involved, that much is standard PCIe behavior.  And the driver sees
~0 data from those failed reads.  That's not part of the PCIe spec,
but is typical root complex behavior.

But you said the MRPC commands hang indefinitely.  Presumably MMIO
reads would start succeeding eventually when the device becomes ready,
so I don't know how that translates to "indefinitely."

Weird to refer to a PCI BAR as "GAS".  Maybe expanding the acronym
would help it make sense.

What does "host" refer to?  I guess it's the switch (the
switchtec_dev), since you say it fails MMIO reads?

Naming comment below.

> Add a read check to GAS access when a MRPC command execution doesn't
> response timely, error out if the check fails.
> 
> Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com>
> ---
>  drivers/pci/switch/switchtec.c | 59 ++++++++++++++++++++++++++++++----
>  1 file changed, 52 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
> index 0b301f8be9ed..092653487021 100644
> --- a/drivers/pci/switch/switchtec.c
> +++ b/drivers/pci/switch/switchtec.c
> @@ -45,6 +45,7 @@ enum mrpc_state {
>  	MRPC_QUEUED,
>  	MRPC_RUNNING,
>  	MRPC_DONE,
> +	MRPC_IO_ERROR,
>  };
>  
>  struct switchtec_user {
> @@ -66,6 +67,13 @@ struct switchtec_user {
>  	int event_cnt;
>  };
>  
> +static int check_access(struct switchtec_dev *stdev)
> +{
> +	u32 device = ioread32(&stdev->mmio_sys_info->device_id);
> +
> +	return stdev->pdev->device == device;
> +}

Didn't notice this before, but the "check_access()" name is not very
helpful because it doesn't give a clue about what the return value
means.  Does 0 mean no error?  Does 1 mean no error?  From reading the
implementation, I can see that 0 is actually the error case, but I
can't tell from the name.

>  static struct switchtec_user *stuser_create(struct switchtec_dev *stdev)
>  {
>  	struct switchtec_user *stuser;
> @@ -113,6 +121,7 @@ static void stuser_set_state(struct switchtec_user *stuser,
>  		[MRPC_QUEUED] = "QUEUED",
>  		[MRPC_RUNNING] = "RUNNING",
>  		[MRPC_DONE] = "DONE",
> +		[MRPC_IO_ERROR] = "IO_ERROR",
>  	};
>  
>  	stuser->state = state;
> @@ -184,6 +193,21 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser)
>  	return 0;
>  }
>  
> +static void mrpc_cleanup_cmd(struct switchtec_dev *stdev)
> +{
> +	/* requires the mrpc_mutex to already be held when called */
> +	struct switchtec_user *stuser = list_entry(stdev->mrpc_queue.next,
> +						   struct switchtec_user, list);
> +
> +	stuser->cmd_done = true;
> +	wake_up_interruptible(&stuser->cmd_comp);
> +	list_del_init(&stuser->list);
> +	stuser_put(stuser);
> +	stdev->mrpc_busy = 0;
> +
> +	mrpc_cmd_submit(stdev);
> +}
> +
>  static void mrpc_complete_cmd(struct switchtec_dev *stdev)
>  {
>  	/* requires the mrpc_mutex to already be held when called */
> @@ -223,13 +247,7 @@ static void mrpc_complete_cmd(struct switchtec_dev *stdev)
>  		memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data,
>  			      stuser->read_len);
>  out:
> -	stuser->cmd_done = true;
> -	wake_up_interruptible(&stuser->cmd_comp);
> -	list_del_init(&stuser->list);
> -	stuser_put(stuser);
> -	stdev->mrpc_busy = 0;
> -
> -	mrpc_cmd_submit(stdev);
> +	mrpc_cleanup_cmd(stdev);
>  }
>  
>  static void mrpc_event_work(struct work_struct *work)
> @@ -246,6 +264,23 @@ static void mrpc_event_work(struct work_struct *work)
>  	mutex_unlock(&stdev->mrpc_mutex);
>  }
>  
> +static void mrpc_error_complete_cmd(struct switchtec_dev *stdev)
> +{
> +	/* requires the mrpc_mutex to already be held when called */
> +
> +	struct switchtec_user *stuser;
> +
> +	if (list_empty(&stdev->mrpc_queue))
> +		return;
> +
> +	stuser = list_entry(stdev->mrpc_queue.next,
> +			    struct switchtec_user, list);
> +
> +	stuser_set_state(stuser, MRPC_IO_ERROR);
> +
> +	mrpc_cleanup_cmd(stdev);
> +}
> +
>  static void mrpc_timeout_work(struct work_struct *work)
>  {
>  	struct switchtec_dev *stdev;
> @@ -257,6 +292,11 @@ static void mrpc_timeout_work(struct work_struct *work)
>  
>  	mutex_lock(&stdev->mrpc_mutex);
>  
> +	if (!check_access(stdev)) {
> +		mrpc_error_complete_cmd(stdev);
> +		goto out;
> +	}
> +
>  	if (stdev->dma_mrpc)
>  		status = stdev->dma_mrpc->status;
>  	else
> @@ -544,6 +584,11 @@ static ssize_t switchtec_dev_read(struct file *filp, char __user *data,
>  	if (rc)
>  		return rc;
>  
> +	if (stuser->state == MRPC_IO_ERROR) {
> +		mutex_unlock(&stdev->mrpc_mutex);
> +		return -EIO;
> +	}
> +
>  	if (stuser->state != MRPC_DONE) {
>  		mutex_unlock(&stdev->mrpc_mutex);
>  		return -EBADE;
> -- 
> 2.25.1
> 

  reply	other threads:[~2021-10-01 20:18 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-24 11:08 [PATCH 0/5] Switchtec Fixes and Improvements kelvin.cao
2021-09-24 11:08 ` [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access kelvin.cao
2021-10-01 20:18   ` Bjorn Helgaas [this message]
2021-10-01 20:29     ` Logan Gunthorpe
2021-10-01 23:49       ` Kelvin.Cao
2021-10-02 15:11         ` Bjorn Helgaas
2021-10-04 20:51           ` Kelvin.Cao
2021-10-05 20:11             ` Bjorn Helgaas
2021-10-06  0:37               ` Kelvin.Cao
2021-10-06  2:33                 ` Bjorn Helgaas
2021-10-06  5:49                   ` Kelvin.Cao
2021-10-06 14:19                     ` Bjorn Helgaas
2021-10-06 19:00                       ` Kelvin.Cao
2021-10-06 20:20                         ` Bjorn Helgaas
2021-10-06 21:27                           ` Kelvin.Cao
2021-10-07 21:23                             ` Bjorn Helgaas
2021-10-08  0:06                               ` Kelvin.Cao
2021-10-08 11:03                                 ` Bjorn Helgaas
2021-10-01 22:58     ` Kelvin.Cao
2021-10-01 23:52       ` Logan Gunthorpe
2021-10-02  0:05         ` Kelvin.Cao
2021-09-24 11:08 ` [PATCH 2/5] PCI/switchtec: Fix a MRPC error status handling issue kelvin.cao
2021-09-24 11:08 ` [PATCH 3/5] PCI/switchtec: Update the way of getting management VEP instance ID kelvin.cao
2021-09-24 11:08 ` [PATCH 4/5] PCI/switchtec: Replace ENOTSUPP with EOPNOTSUPP kelvin.cao
2021-09-24 11:08 ` [PATCH 5/5] PCI/switchtec: Add check of event support kelvin.cao
2021-09-24 15:53 ` [PATCH 0/5] Switchtec Fixes and Improvements Logan Gunthorpe
2021-09-25  5:27   ` Kelvin.Cao
2021-09-27 16:39 ` Bjorn Helgaas
2021-09-27 18:25   ` Kelvin.Cao
2021-10-08 17:05 ` Bjorn Helgaas
2021-10-08 17:23   ` Logan Gunthorpe
2021-10-08 18:25     ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211001201822.GA962472@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=kelvin.cao@microchip.com \
    --cc=kelvincao@outlook.com \
    --cc=kurt.schwemmer@microsemi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).