All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Widawsky <ben.widawsky@intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-cxl@vger.kernel.org,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Alison Schofield <alison.schofield@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>
Subject: Re: [PATCH 3/9] cxl/pci: Extract device status check
Date: Thu, 2 Dec 2021 09:24:35 -0800	[thread overview]
Message-ID: <20211202172435.2f5hlz22cea4l3a3@intel.com> (raw)
In-Reply-To: <CAPcyv4iWNPA98x6GsOpeqF+11YeWgZternCJ0wgQtWOOknViSw@mail.gmail.com>

On 21-12-02 09:09:44, Dan Williams wrote:
> On Mon, Nov 29, 2021 at 1:47 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > The Memory Device Status register is inspected in the same way for at
> > least two flows in the CXL Type 3 Memory Device Software Guide
> > (Revision: 1.0): 2.13.9 Device discovery and mailbox ready sequence,
> > and 2.13.10 Media ready sequence. Extract this common functionality for
> > use by both.
> >
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > ---
> >  drivers/cxl/pci.c | 33 +++++++++++++++++++++++++--------
> >  1 file changed, 25 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index a6ea9811a05b..6c8d09fb3a17 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -182,6 +182,27 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
> >         return 0;
> >  }
> >
> > +/*
> > + * Implements roughly the bottom half of Figure 42 of the CXL Type 3 Memory
> > + * Device Software Guide
> > + */
> 
> This version did not address the feedback about referencing the CXL
> specification instead of the software guide.
> 

Apologies, I missed that request. I apparently missed your email entirely as I
was going through the revision.

> > +static int check_device_status(struct cxl_dev_state *cxlds)
> > +{
> > +       const u64 md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> > +
> > +       if (md_status & CXLMDEV_DEV_FATAL) {
> > +               dev_err(cxlds->dev, "Fatal: replace device\n");
> > +               return -EIO;
> > +       }
> > +
> > +       if (md_status & CXLMDEV_FW_HALT) {
> > +               dev_err(cxlds->dev, "FWHalt: reset or replace device\n");
> > +               return -EBUSY;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  /**
> >   * cxl_pci_mbox_get() - Acquire exclusive access to the mailbox.
> >   * @cxlds: The device state to gain access to.
> > @@ -231,17 +252,13 @@ static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds)
> >          * Hardware shouldn't allow a ready status but also have failure bits
> >          * set. Spit out an error, this should be a bug report
> >          */
> > -       rc = -EFAULT;
> > -       if (md_status & CXLMDEV_DEV_FATAL) {
> > -               dev_err(dev, "mbox: reported ready, but fatal\n");
> > +       rc = check_device_status(cxlds);
> > +       if (rc)
> >                 goto out;
> > -       }
> > -       if (md_status & CXLMDEV_FW_HALT) {
> > -               dev_err(dev, "mbox: reported ready, but halted\n");
> > -               goto out;
> > -       }
> > +
> >         if (CXLMDEV_RESET_NEEDED(md_status)) {
> >                 dev_err(dev, "mbox: reported ready, but reset needed\n");
> > +               rc = -EFAULT;
> >                 goto out;
> 
> It also did not address the feedback that the reset needed check can
> be dropped because per the spec* it is redundant with the other checks
> above. The driver need not give up on mailbox commands just because
> the media is not ready. So this wants a lead-in patch to clean that up
> first.

Media ready droppage comes in the next patch. I can reorder them if you'd like,
or I suppose you could do it.

> 
> "This field returns non-zero value if FW Halt is set or Media Status
> is not in the ready state."
> 
> Going further I do not think the driver should preemptively abort
> commands due to device-fatal, or even firmware halt for that matter.
> The true sign of failure and dead firmware is timed-out responses to
> incoming commands. Just in case there are some commands still
> functional in a device-fatal condition, or the firmware-halt indicator
> was a false positive, let the command through, and then if it fails
> report these forensic statuses.

I admit I'm not looking for spec reference at this exact moment. I think trying
to throw commands at the device that has already told us is fatal, is a bad
idea.

> 
> "Mailbox Interfaces Ready" is the only status that needs to be checked
> preemptively, and only once at the beginning of time until the next
> command timeout where the driver collects it to find out what
> happened.

I think this happens in a later patch too, if I catch your meaning.

> 
> I'll take a shot at a lead-in patch to that effect.

  reply	other threads:[~2021-12-02 17:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-29 21:47 [PATCH 0/9] CXL port prep work Ben Widawsky
2021-11-29 21:47 ` [PATCH 1/9] cxl: Rename CXL_MEM to CXL_PCI Ben Widawsky
2021-11-29 21:47 ` [PATCH 2/9] cxl: Flesh out register names Ben Widawsky
2021-11-29 21:47 ` [PATCH 3/9] cxl/pci: Extract device status check Ben Widawsky
2021-12-02 17:09   ` Dan Williams
2021-12-02 17:24     ` Ben Widawsky [this message]
2021-12-02 17:32       ` Dan Williams
2021-12-04  1:18         ` Ben Widawsky
2021-12-04  1:37           ` Dan Williams
2021-11-29 21:47 ` [PATCH 4/9] cxl/pci: Implement Interface Ready Timeout Ben Widawsky
2021-11-30 13:19   ` Jonathan Cameron
2021-12-02  4:45     ` [PATCH v2 " Ben Widawsky
2021-12-02  9:54       ` Jonathan Cameron
2021-11-29 21:47 ` [PATCH 5/9] cxl/pci: Don't poll doorbell for mailbox access Ben Widawsky
2021-11-29 21:47 ` [PATCH 6/9] cxl/pci: Add new DVSEC definitions Ben Widawsky
2021-11-29 21:47 ` [PATCH 7/9] cxl/acpi: Map component registers for Root Ports Ben Widawsky
2021-11-30 13:22   ` Jonathan Cameron
2021-12-04  4:37   ` Dan Williams
2021-12-04  5:13     ` Ben Widawsky
2021-12-15 15:04   ` Jonathan Cameron
2021-11-29 21:47 ` [PATCH 8/9] cxl: Introduce module_cxl_driver Ben Widawsky
2021-11-30 13:23   ` Jonathan Cameron
2021-11-29 21:47 ` [PATCH 9/9] cxl/core: Convert decoder range to resource Ben Widawsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211202172435.2f5hlz22cea4l3a3@intel.com \
    --to=ben.widawsky@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.