All of lore.kernel.org
 help / color / mirror / Atom feed
From: <Kelvin.Cao@microchip.com>
To: <helgaas@kernel.org>
Cc: <kurt.schwemmer@microsemi.com>, <bhelgaas@google.com>,
	<kelvincao@outlook.com>, <logang@deltatee.com>,
	<linux-kernel@vger.kernel.org>, <linux-pci@vger.kernel.org>
Subject: Re: [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access
Date: Fri, 1 Oct 2021 22:58:29 +0000	[thread overview]
Message-ID: <2f7b4e6debbf7156a4da26bad0373d9df9667e66.camel@microchip.com> (raw)
In-Reply-To: <20211001201822.GA962472@bhelgaas>

On Fri, 2021-10-01 at 15:18 -0500, Bjorn Helgaas wrote:
> On Fri, Sep 24, 2021 at 11:08:38AM +0000, kelvin.cao@microchip.com
> wrote:
> > From: Kelvin Cao <kelvin.cao@microchip.com>
> > 
> > After a firmware hard reset, MRPC command executions, which are
> > based
> > on the PCI BAR (which Microchip refers to as GAS) read/write, will
> > hang
> > indefinitely. This is because after a reset, the host will fail all
> > GAS
> > reads (get all 1s), in which case the driver won't get a valid MRPC
> > status.
> 
> Trying to write a merge commit log for this, but having a hard time
> summarizing it.  It sounds like it covers both Switchtec-specific
> (firmware and MRPC commands) and generic PCIe behavior (MMIO read
> failures).
> 
> This has something to do with a firmware hard reset.  What is that?
> Is that like a firmware reboot?  A device reset, e.g., FLR or
> secondary bus reset, that causes a firmware reboot?  A device reset
> initiated by firmware?
> 
> Anyway, apparently when that happens, MMIO reads to the switch fail
> (timeout or error completion on PCIe) for a while.  If a device reset
> is involved, that much is standard PCIe behavior.  And the driver
> sees
> ~0 data from those failed reads.  That's not part of the PCIe spec,
> but is typical root complex behavior.
> 
> But you said the MRPC commands hang indefinitely.  Presumably MMIO
> reads would start succeeding eventually when the device becomes
> ready,
> so I don't know how that translates to "indefinitely."
> 
> Weird to refer to a PCI BAR as "GAS".  Maybe expanding the acronym
> would help it make sense.
> 
> What does "host" refer to?  I guess it's the switch (the
> switchtec_dev), since you say it fails MMIO reads?
> 
> Naming comment below.
> 
> > Add a read check to GAS access when a MRPC command execution
> > doesn't
> > response timely, error out if the check fails.
> > 
> > Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com>
> > ---
> >  drivers/pci/switch/switchtec.c | 59
> > ++++++++++++++++++++++++++++++----
> >  1 file changed, 52 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/pci/switch/switchtec.c
> > b/drivers/pci/switch/switchtec.c
> > index 0b301f8be9ed..092653487021 100644
> > --- a/drivers/pci/switch/switchtec.c
> > +++ b/drivers/pci/switch/switchtec.c
> > @@ -45,6 +45,7 @@ enum mrpc_state {
> >       MRPC_QUEUED,
> >       MRPC_RUNNING,
> >       MRPC_DONE,
> > +     MRPC_IO_ERROR,
> >  };
> > 
> >  struct switchtec_user {
> > @@ -66,6 +67,13 @@ struct switchtec_user {
> >       int event_cnt;
> >  };
> > 
> > +static int check_access(struct switchtec_dev *stdev)
> > +{
> > +     u32 device = ioread32(&stdev->mmio_sys_info->device_id);
> > +
> > +     return stdev->pdev->device == device;
> > +}
> 
> Didn't notice this before, but the "check_access()" name is not very
> helpful because it doesn't give a clue about what the return value
> means.  Does 0 mean no error?  Does 1 mean no error?  From reading
> the
> implementation, I can see that 0 is actually the error case, but I
> can't tell from the name.

Yes, will improve the naming, like change it to "has_gas_access()" in
v2 if a v2 patchset is preferred.
> 
> >  static struct switchtec_user *stuser_create(struct switchtec_dev
> > *stdev)
> >  {
> >       struct switchtec_user *stuser;
> > @@ -113,6 +121,7 @@ static void stuser_set_state(struct
> > switchtec_user *stuser,
> >               [MRPC_QUEUED] = "QUEUED",
> >               [MRPC_RUNNING] = "RUNNING",
> >               [MRPC_DONE] = "DONE",
> > +             [MRPC_IO_ERROR] = "IO_ERROR",
> >       };
> > 
> >       stuser->state = state;
> > @@ -184,6 +193,21 @@ static int mrpc_queue_cmd(struct
> > switchtec_user *stuser)
> >       return 0;
> >  }
> > 
> > +static void mrpc_cleanup_cmd(struct switchtec_dev *stdev)
> > +{
> > +     /* requires the mrpc_mutex to already be held when called */
> > +     struct switchtec_user *stuser = list_entry(stdev-
> > >mrpc_queue.next,
> > +                                                struct
> > switchtec_user, list);
> > +
> > +     stuser->cmd_done = true;
> > +     wake_up_interruptible(&stuser->cmd_comp);
> > +     list_del_init(&stuser->list);
> > +     stuser_put(stuser);
> > +     stdev->mrpc_busy = 0;
> > +
> > +     mrpc_cmd_submit(stdev);
> > +}
> > +
> >  static void mrpc_complete_cmd(struct switchtec_dev *stdev)
> >  {
> >       /* requires the mrpc_mutex to already be held when called */
> > @@ -223,13 +247,7 @@ static void mrpc_complete_cmd(struct
> > switchtec_dev *stdev)
> >               memcpy_fromio(stuser->data, &stdev->mmio_mrpc-
> > >output_data,
> >                             stuser->read_len);
> >  out:
> > -     stuser->cmd_done = true;
> > -     wake_up_interruptible(&stuser->cmd_comp);
> > -     list_del_init(&stuser->list);
> > -     stuser_put(stuser);
> > -     stdev->mrpc_busy = 0;
> > -
> > -     mrpc_cmd_submit(stdev);
> > +     mrpc_cleanup_cmd(stdev);
> >  }
> > 
> >  static void mrpc_event_work(struct work_struct *work)
> > @@ -246,6 +264,23 @@ static void mrpc_event_work(struct work_struct
> > *work)
> >       mutex_unlock(&stdev->mrpc_mutex);
> >  }
> > 
> > +static void mrpc_error_complete_cmd(struct switchtec_dev *stdev)
> > +{
> > +     /* requires the mrpc_mutex to already be held when called */
> > +
> > +     struct switchtec_user *stuser;
> > +
> > +     if (list_empty(&stdev->mrpc_queue))
> > +             return;
> > +
> > +     stuser = list_entry(stdev->mrpc_queue.next,
> > +                         struct switchtec_user, list);
> > +
> > +     stuser_set_state(stuser, MRPC_IO_ERROR);
> > +
> > +     mrpc_cleanup_cmd(stdev);
> > +}
> > +
> >  static void mrpc_timeout_work(struct work_struct *work)
> >  {
> >       struct switchtec_dev *stdev;
> > @@ -257,6 +292,11 @@ static void mrpc_timeout_work(struct
> > work_struct *work)
> > 
> >       mutex_lock(&stdev->mrpc_mutex);
> > 
> > +     if (!check_access(stdev)) {
> > +             mrpc_error_complete_cmd(stdev);
> > +             goto out;
> > +     }
> > +
> >       if (stdev->dma_mrpc)
> >               status = stdev->dma_mrpc->status;
> >       else
> > @@ -544,6 +584,11 @@ static ssize_t switchtec_dev_read(struct file
> > *filp, char __user *data,
> >       if (rc)
> >               return rc;
> > 
> > +     if (stuser->state == MRPC_IO_ERROR) {
> > +             mutex_unlock(&stdev->mrpc_mutex);
> > +             return -EIO;
> > +     }
> > +
> >       if (stuser->state != MRPC_DONE) {
> >               mutex_unlock(&stdev->mrpc_mutex);
> >               return -EBADE;
> > --
> > 2.25.1
> > 

  parent reply	other threads:[~2021-10-01 22:58 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-24 11:08 [PATCH 0/5] Switchtec Fixes and Improvements kelvin.cao
2021-09-24 11:08 ` [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access kelvin.cao
2021-10-01 20:18   ` Bjorn Helgaas
2021-10-01 20:29     ` Logan Gunthorpe
2021-10-01 23:49       ` Kelvin.Cao
2021-10-02 15:11         ` Bjorn Helgaas
2021-10-04 20:51           ` Kelvin.Cao
2021-10-05 20:11             ` Bjorn Helgaas
2021-10-06  0:37               ` Kelvin.Cao
2021-10-06  2:33                 ` Bjorn Helgaas
2021-10-06  5:49                   ` Kelvin.Cao
2021-10-06 14:19                     ` Bjorn Helgaas
2021-10-06 19:00                       ` Kelvin.Cao
2021-10-06 20:20                         ` Bjorn Helgaas
2021-10-06 21:27                           ` Kelvin.Cao
2021-10-07 21:23                             ` Bjorn Helgaas
2021-10-08  0:06                               ` Kelvin.Cao
2021-10-08 11:03                                 ` Bjorn Helgaas
2021-10-01 22:58     ` Kelvin.Cao [this message]
2021-10-01 23:52       ` Logan Gunthorpe
2021-10-02  0:05         ` Kelvin.Cao
2021-09-24 11:08 ` [PATCH 2/5] PCI/switchtec: Fix a MRPC error status handling issue kelvin.cao
2021-09-24 11:08 ` [PATCH 3/5] PCI/switchtec: Update the way of getting management VEP instance ID kelvin.cao
2021-09-24 11:08 ` [PATCH 4/5] PCI/switchtec: Replace ENOTSUPP with EOPNOTSUPP kelvin.cao
2021-09-24 11:08 ` [PATCH 5/5] PCI/switchtec: Add check of event support kelvin.cao
2021-09-24 15:53 ` [PATCH 0/5] Switchtec Fixes and Improvements Logan Gunthorpe
2021-09-25  5:27   ` Kelvin.Cao
2021-09-27 16:39 ` Bjorn Helgaas
2021-09-27 18:25   ` Kelvin.Cao
2021-10-08 17:05 ` Bjorn Helgaas
2021-10-08 17:23   ` Logan Gunthorpe
2021-10-08 18:25     ` Bjorn Helgaas
  -- strict thread matches above, loose matches on Subject: below --
2021-09-24 10:24 [PATCH 0/5] Switchtec Miscellaneous " kelvin.cao
2021-09-24 10:24 ` [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access kelvin.cao
2021-09-24 10:22 [PATCH 0/5] Switchtec Miscellaneous Fixes and Improvements kelvin.cao
2021-09-24 10:22 ` [PATCH 1/5] PCI/switchtec: Error out MRPC execution when no GAS access kelvin.cao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f7b4e6debbf7156a4da26bad0373d9df9667e66.camel@microchip.com \
    --to=kelvin.cao@microchip.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=kelvincao@outlook.com \
    --cc=kurt.schwemmer@microsemi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.