linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: linux-cxl@vger.kernel.org
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	linux-pci@vger.kernel.org, nvdimm@lists.linux.dev
Subject: [PATCH v3 03/40] cxl/pci: Defer mailbox status checks to command timeouts
Date: Sun, 23 Jan 2022 16:28:54 -0800	[thread overview]
Message-ID: <164298413480.3018233.9643395389297971819.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)
In-Reply-To: <164298411792.3018233.7493009997525360044.stgit@dwillia2-desk3.amr.corp.intel.com>

Device status can change without warning at any point in time. This
effectively means that no amount of status checking before a command is
submitted can guarantee that the device is not in an error condition
when the command is later submitted. The clearest signal that a device
is not able to process commands is if it fails to process commands.

With the above understanding in hand, update cxl_pci_setup_mailbox() to
validate the readiness of the mailbox once at the beginning of time, and
then use timeouts and busy sequencing errors as the only occasions to
report status.

Just as before, unless and until the driver gains a reset recovery path,
doorbell clearing failures by the device are fatal to mailbox
operations.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/pci.c |  134 +++++++++++++----------------------------------------
 1 file changed, 33 insertions(+), 101 deletions(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index ed8de9eac970..91de2e4aff6f 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -73,14 +73,16 @@ static int cxl_pci_mbox_wait_for_doorbell(struct cxl_dev_state *cxlds)
 	return 0;
 }
 
-static void cxl_pci_mbox_timeout(struct cxl_dev_state *cxlds,
-				 struct cxl_mbox_cmd *mbox_cmd)
-{
-	struct device *dev = cxlds->dev;
+#define cxl_err(dev, status, msg)                                        \
+	dev_err_ratelimited(dev, msg ", device state %s%s\n",                  \
+			    status & CXLMDEV_DEV_FATAL ? " fatal" : "",        \
+			    status & CXLMDEV_FW_HALT ? " firmware-halt" : "")
 
-	dev_dbg(dev, "Mailbox command (opcode: %#x size: %zub) timed out\n",
-		mbox_cmd->opcode, mbox_cmd->size_in);
-}
+#define cxl_cmd_err(dev, cmd, status, msg)                               \
+	dev_err_ratelimited(dev, msg " (opcode: %#x), device state %s%s\n",    \
+			    (cmd)->opcode,                                     \
+			    status & CXLMDEV_DEV_FATAL ? " fatal" : "",        \
+			    status & CXLMDEV_FW_HALT ? " firmware-halt" : "")
 
 /**
  * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
@@ -134,7 +136,11 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
 
 	/* #1 */
 	if (cxl_doorbell_busy(cxlds)) {
-		dev_err_ratelimited(dev, "Mailbox re-busy after acquiring\n");
+		u64 md_status =
+			readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
+
+		cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
+			    "mailbox queue busy");
 		return -EBUSY;
 	}
 
@@ -160,7 +166,9 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
 	/* #5 */
 	rc = cxl_pci_mbox_wait_for_doorbell(cxlds);
 	if (rc == -ETIMEDOUT) {
-		cxl_pci_mbox_timeout(cxlds, mbox_cmd);
+		u64 md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
+
+		cxl_cmd_err(cxlds->dev, mbox_cmd, md_status, "mailbox timeout");
 		return rc;
 	}
 
@@ -198,98 +206,13 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
-/**
- * cxl_pci_mbox_get() - Acquire exclusive access to the mailbox.
- * @cxlds: The device state to gain access to.
- *
- * Context: Any context. Takes the mbox_mutex.
- * Return: 0 if exclusive access was acquired.
- */
-static int cxl_pci_mbox_get(struct cxl_dev_state *cxlds)
-{
-	struct device *dev = cxlds->dev;
-	u64 md_status;
-	int rc;
-
-	mutex_lock_io(&cxlds->mbox_mutex);
-
-	/*
-	 * XXX: There is some amount of ambiguity in the 2.0 version of the spec
-	 * around the mailbox interface ready (8.2.8.5.1.1).  The purpose of the
-	 * bit is to allow firmware running on the device to notify the driver
-	 * that it's ready to receive commands. It is unclear if the bit needs
-	 * to be read for each transaction mailbox, ie. the firmware can switch
-	 * it on and off as needed. Second, there is no defined timeout for
-	 * mailbox ready, like there is for the doorbell interface.
-	 *
-	 * Assumptions:
-	 * 1. The firmware might toggle the Mailbox Interface Ready bit, check
-	 *    it for every command.
-	 *
-	 * 2. If the doorbell is clear, the firmware should have first set the
-	 *    Mailbox Interface Ready bit. Therefore, waiting for the doorbell
-	 *    to be ready is sufficient.
-	 */
-	rc = cxl_pci_mbox_wait_for_doorbell(cxlds);
-	if (rc) {
-		dev_warn(dev, "Mailbox interface not ready\n");
-		goto out;
-	}
-
-	md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
-	if (!(md_status & CXLMDEV_MBOX_IF_READY && CXLMDEV_READY(md_status))) {
-		dev_err(dev, "mbox: reported doorbell ready, but not mbox ready\n");
-		rc = -EBUSY;
-		goto out;
-	}
-
-	/*
-	 * Hardware shouldn't allow a ready status but also have failure bits
-	 * set. Spit out an error, this should be a bug report
-	 */
-	rc = -EFAULT;
-	if (md_status & CXLMDEV_DEV_FATAL) {
-		dev_err(dev, "mbox: reported ready, but fatal\n");
-		goto out;
-	}
-	if (md_status & CXLMDEV_FW_HALT) {
-		dev_err(dev, "mbox: reported ready, but halted\n");
-		goto out;
-	}
-	if (CXLMDEV_RESET_NEEDED(md_status)) {
-		dev_err(dev, "mbox: reported ready, but reset needed\n");
-		goto out;
-	}
-
-	/* with lock held */
-	return 0;
-
-out:
-	mutex_unlock(&cxlds->mbox_mutex);
-	return rc;
-}
-
-/**
- * cxl_pci_mbox_put() - Release exclusive access to the mailbox.
- * @cxlds: The device state to communicate with.
- *
- * Context: Any context. Expects mbox_mutex to be held.
- */
-static void cxl_pci_mbox_put(struct cxl_dev_state *cxlds)
-{
-	mutex_unlock(&cxlds->mbox_mutex);
-}
-
 static int cxl_pci_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
 {
 	int rc;
 
-	rc = cxl_pci_mbox_get(cxlds);
-	if (rc)
-		return rc;
-
+	mutex_lock_io(&cxlds->mbox_mutex);
 	rc = __cxl_pci_mbox_send_cmd(cxlds, cmd);
-	cxl_pci_mbox_put(cxlds);
+	mutex_unlock(&cxlds->mbox_mutex);
 
 	return rc;
 }
@@ -310,11 +233,20 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
 	} while (!time_after(jiffies, timeout));
 
 	if (!(md_status & CXLMDEV_MBOX_IF_READY)) {
-		dev_err(cxlds->dev,
-			"timeout awaiting mailbox ready, device state:%s%s\n",
-			md_status & CXLMDEV_DEV_FATAL ? " fatal" : "",
-			md_status & CXLMDEV_FW_HALT ? " firmware-halt" : "");
-		return -EIO;
+		cxl_err(cxlds->dev, md_status,
+			"timeout awaiting mailbox ready");
+		return -ETIMEDOUT;
+	}
+
+	/*
+	 * A command may be in flight from a previous driver instance,
+	 * think kexec, do one doorbell wait so that
+	 * __cxl_pci_mbox_send_cmd() can assume that it is the only
+	 * source for future doorbell busy events.
+	 */
+	if (cxl_pci_mbox_wait_for_doorbell(cxlds) != 0) {
+		cxl_err(cxlds->dev, md_status, "timeout awaiting mailbox idle");
+		return -ETIMEDOUT;
 	}
 
 	cxlds->mbox_send = cxl_pci_mbox_send;


  parent reply	other threads:[~2022-01-24  0:28 UTC|newest]

Thread overview: 172+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-24  0:28 [PATCH v3 00/40] CXL.mem Topology Discovery and Hotplug Support Dan Williams
2022-01-24  0:28 ` [PATCH v3 01/40] cxl: Rename CXL_MEM to CXL_PCI Dan Williams
2022-01-24  0:28 ` [PATCH v3 02/40] cxl/pci: Implement Interface Ready Timeout Dan Williams
2022-01-31 22:21   ` Ben Widawsky
2022-01-31 23:11     ` Dan Williams
2022-01-31 23:25       ` Ben Widawsky
2022-01-31 23:47         ` Dan Williams
2022-01-31 23:51   ` [PATCH v4 " Dan Williams
2022-01-24  0:28 ` Dan Williams [this message]
2022-01-31 22:28   ` [PATCH v3 03/40] cxl/pci: Defer mailbox status checks to command timeouts Ben Widawsky
2022-01-24  0:29 ` [PATCH v3 04/40] cxl: Flesh out register names Dan Williams
2022-01-24  0:29 ` [PATCH v3 05/40] cxl/pci: Add new DVSEC definitions Dan Williams
2022-01-24  0:29 ` [PATCH v3 06/40] cxl/acpi: Map component registers for Root Ports Dan Williams
2022-01-24  0:29 ` [PATCH v3 07/40] cxl: Introduce module_cxl_driver Dan Williams
2022-01-24  0:29 ` [PATCH v3 08/40] cxl/core/port: Rename bus.c to port.c Dan Williams
2022-01-31 22:34   ` Ben Widawsky
2022-01-24  0:29 ` [PATCH v3 09/40] cxl/decoder: Hide physical address information from non-root Dan Williams
2022-01-31 14:14   ` Jonathan Cameron
2022-01-31 22:34   ` Ben Widawsky
2022-01-24  0:29 ` [PATCH v3 10/40] cxl/core: Convert decoder range to resource Dan Williams
2022-01-24  0:29 ` [PATCH v3 11/40] cxl/core/port: Clarify decoder creation Dan Williams
2022-01-31 14:46   ` Jonathan Cameron
2022-01-31 21:17     ` Dan Williams
2022-01-31 21:33   ` [PATCH v4 " Dan Williams
2022-02-01 10:49     ` Jonathan Cameron
2022-01-24  0:29 ` [PATCH v3 12/40] cxl/core: Fix cxl_probe_component_regs() error message Dan Williams
2022-01-31 14:53   ` Jonathan Cameron
2022-01-31 22:29     ` Dan Williams
2022-01-31 22:39   ` Ben Widawsky
2022-01-24  0:29 ` [PATCH v3 13/40] cxl/core/port: Make passthrough decoder init implicit Dan Williams
2022-01-31 14:56   ` Jonathan Cameron
2022-01-24  0:29 ` [PATCH v3 14/40] cxl/core: Track port depth Dan Williams
2022-01-31 14:57   ` Jonathan Cameron
2022-01-24  0:29 ` [PATCH v3 15/40] cxl: Prove CXL locking Dan Williams
2022-01-31 15:48   ` Jonathan Cameron
2022-01-31 19:43     ` Dan Williams
2022-01-31 19:50   ` [PATCH v4 " Dan Williams
2022-01-31 23:23     ` Ben Widawsky
2022-01-24  0:30 ` [PATCH v3 16/40] cxl/core/port: Use dedicated lock for decoder target list Dan Williams
2022-01-26  2:54   ` [PATCH v4 " Dan Williams
2022-01-31 15:59     ` Jonathan Cameron
2022-01-31 23:31       ` Dan Williams
2022-01-31 23:34     ` Ben Widawsky
2022-01-31 23:38       ` Dan Williams
2022-01-31 23:42         ` Ben Widawsky
2022-01-31 23:58           ` Dan Williams
2022-01-31 23:35     ` [PATCH v5 " Dan Williams
2022-02-01 10:52       ` Jonathan Cameron
2022-01-24  0:30 ` [PATCH v3 17/40] cxl/port: Introduce cxl_port_to_pci_bus() Dan Williams
2022-01-31 16:04   ` Jonathan Cameron
2022-01-31 16:44   ` [PATCH v4 " Dan Williams
2022-01-31 23:41     ` Ben Widawsky
2022-01-24  0:30 ` [PATCH v3 18/40] cxl/pmem: Introduce a find_cxl_root() helper Dan Williams
2022-01-26 18:55   ` [PATCH v4 " Dan Williams
2022-01-26 23:59     ` [PATCH v5 " Dan Williams
2022-01-31 16:18       ` Jonathan Cameron
2022-02-01  0:22         ` Dan Williams
2022-02-01 10:58           ` Jonathan Cameron
2022-02-01  0:34       ` [PATCH v6 " Dan Williams
2022-02-01 10:59         ` Jonathan Cameron
2022-01-24  0:30 ` [PATCH v3 19/40] cxl/port: Up-level cxl_add_dport() locking requirements to the caller Dan Williams
2022-01-31 16:20   ` Jonathan Cameron
2022-01-31 23:47   ` Ben Widawsky
2022-02-01  0:43     ` Dan Williams
2022-02-01  1:07   ` [PATCH v4 " Dan Williams
2022-02-01 11:00     ` Jonathan Cameron
2022-01-24  0:30 ` [PATCH v3 20/40] cxl/pci: Rename pci.h to cxlpci.h Dan Williams
2022-01-31 16:22   ` Jonathan Cameron
2022-02-01  0:00     ` Dan Williams
2022-01-31 23:48   ` Ben Widawsky
2022-01-24  0:30 ` [PATCH v3 21/40] cxl/core: Generalize dport enumeration in the core Dan Williams
2022-01-31 17:02   ` Jonathan Cameron
2022-02-01  1:58     ` Dan Williams
2022-02-01  2:10   ` [PATCH v4 " Dan Williams
2022-02-01 11:03     ` Jonathan Cameron
2022-01-24  0:30 ` [PATCH v3 22/40] cxl/core/hdm: Add CXL standard decoder enumeration to " Dan Williams
2022-01-26  3:09   ` [PATCH v4 " Dan Williams
2022-01-31 14:26     ` Jonathan Cameron
2022-01-31 17:51     ` Jonathan Cameron
2022-02-01  5:10       ` Dan Williams
2022-02-01 20:24     ` [PATCH v5 " Dan Williams
2022-02-02  9:31       ` Jonathan Cameron
2022-02-01  0:24   ` [PATCH v3 " Ben Widawsky
2022-02-01  4:58     ` Dan Williams
2022-01-24  0:30 ` [PATCH v3 23/40] cxl/core: Emit modalias for CXL devices Dan Williams
2022-01-31 17:57   ` Jonathan Cameron
2022-02-01 15:11   ` Ben Widawsky
2022-01-24  0:30 ` [PATCH v3 24/40] cxl/port: Add a driver for 'struct cxl_port' objects Dan Williams
2022-01-26 20:16   ` [PATCH v4 " Dan Williams
2022-01-31 18:11     ` Jonathan Cameron
2022-02-01 20:43       ` Dan Williams
2022-02-02  9:33         ` Jonathan Cameron
2022-02-01 21:07     ` [PATCH v5 " Dan Williams
2022-01-24  0:30 ` [PATCH v3 25/40] cxl/core/port: Remove @host argument for dport + decoder enumeration Dan Williams
2022-01-31 14:32   ` Jonathan Cameron
2022-01-31 18:14   ` Jonathan Cameron
2022-02-01 15:17   ` Ben Widawsky
2022-02-01 21:09     ` Dan Williams
2022-02-01 21:23   ` [PATCH v4 " Dan Williams
2022-01-24  0:30 ` [PATCH v3 26/40] cxl/pci: Store component register base in cxlds Dan Williams
2022-01-31 18:15   ` Jonathan Cameron
2022-02-01 21:28   ` [PATCH v4 " Dan Williams
2022-01-24  0:31 ` [PATCH v3 27/40] cxl/pci: Cache device DVSEC offset Dan Williams
2022-01-31 18:19   ` Jonathan Cameron
2022-02-01 15:24     ` Ben Widawsky
2022-02-01 21:41       ` Dan Williams
2022-02-01 22:11         ` Ben Widawsky
2022-02-01 22:15           ` Dan Williams
2022-02-01 22:20             ` Ben Widawsky
2022-02-01 22:24               ` Dan Williams
2022-02-02  9:36                 ` Jonathan Cameron
2022-02-01 22:06   ` [PATCH v4 " Dan Williams
2022-02-02  9:36     ` Jonathan Cameron
2022-01-24  0:31 ` [PATCH v3 28/40] cxl/pci: Retrieve CXL DVSEC memory info Dan Williams
2022-01-31 18:25   ` Jonathan Cameron
2022-02-01 22:52     ` Dan Williams
2022-02-01 23:48   ` [PATCH v4 " Dan Williams
2022-02-02  9:39     ` Jonathan Cameron
2022-01-24  0:31 ` [PATCH v3 29/40] cxl/pci: Implement wait for media active Dan Williams
2022-01-31 18:29   ` Jonathan Cameron
2022-02-01 23:56     ` Dan Williams
2022-01-24  0:31 ` [PATCH v3 30/40] cxl/pci: Emit device serial number Dan Williams
2022-01-31 18:33   ` Jonathan Cameron
2022-01-31 21:43     ` Dan Williams
2022-01-31 21:56   ` [PATCH v4 " Dan Williams
2022-01-24  0:31 ` [PATCH v3 31/40] cxl/memdev: Add numa_node attribute Dan Williams
2022-01-31 18:41   ` Jonathan Cameron
2022-02-01 23:57     ` Dan Williams
2022-02-02  9:44       ` Jonathan Cameron
2022-02-02 15:44         ` Dan Williams
2022-02-03  9:41           ` Jonathan Cameron
2022-02-03 16:59             ` Dan Williams
2022-02-03 18:05               ` Jonathan Cameron
2022-02-04  4:25                 ` Dan Williams
2022-02-01 15:31   ` Ben Widawsky
2022-02-01 15:49     ` Jonathan Cameron
2022-02-01 16:35       ` Ben Widawsky
2022-02-01 17:38         ` Jonathan Cameron
2022-02-01 23:59     ` Dan Williams
2022-02-02  1:18     ` Dan Williams
2022-01-24  0:31 ` [PATCH v3 32/40] cxl/core/port: Add switch port enumeration Dan Williams
2022-02-01 12:13   ` Jonathan Cameron
2022-02-02  5:26     ` Dan Williams
2022-02-01 17:37   ` Ben Widawsky
2022-02-02  6:03     ` Dan Williams
2022-02-02 17:07   ` [PATCH v4 " Dan Williams
2022-02-03  9:55     ` Jonathan Cameron
2022-02-04 15:08     ` [PATCH v5 " Dan Williams
2022-01-24  0:31 ` [PATCH v3 33/40] cxl/mem: Add the cxl_mem driver Dan Williams
2022-01-26  3:16   ` [PATCH v4 " Dan Williams
2022-02-01 12:45     ` Jonathan Cameron
2022-02-01 17:44       ` Ben Widawsky
2022-02-03  2:49       ` Dan Williams
2022-02-03  9:59         ` Jonathan Cameron
2022-02-04 14:54           ` Dan Williams
2022-02-03  3:56     ` [PATCH v5 " Dan Williams
2022-02-03 12:07       ` Jonathan Cameron
2022-02-04 15:18       ` [PATCH v6 " Dan Williams
2022-01-24  0:31 ` [PATCH v3 34/40] cxl/core: Move target_list out of base decoder attributes Dan Williams
2022-01-31 18:45   ` Jonathan Cameron
2022-02-01 17:45   ` Ben Widawsky
2022-01-24  0:31 ` [PATCH v3 35/40] cxl/core/port: Add endpoint decoders Dan Williams
2022-02-01 12:47   ` Jonathan Cameron
2022-02-03  4:02   ` [PATCH v4 " Dan Williams
2022-02-14 17:45     ` Jonathan Cameron
2022-02-14 19:14       ` Dan Williams
2022-01-24  0:31 ` [PATCH v3 36/40] tools/testing/cxl: Mock dvsec_ranges() Dan Williams
2022-01-24  0:31 ` [PATCH v3 37/40] tools/testing/cxl: Fix root port to host bridge assignment Dan Williams
2022-01-24  0:32 ` [PATCH v3 38/40] tools/testing/cxl: Mock one level of switches Dan Williams
2022-01-24  0:32 ` [PATCH v3 39/40] tools/testing/cxl: Enumerate mock decoders Dan Williams
2022-01-24  0:32 ` [PATCH v3 40/40] tools/testing/cxl: Add a physical_node link Dan Williams
2022-02-01 12:53   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=164298413480.3018233.9643395389297971819.stgit@dwillia2-desk3.amr.corp.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).