All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V12 0/5] mmc: Add Command Queue support
@ 2017-10-24  8:40 Adrian Hunter
  2017-10-24  8:40 ` [PATCH V12 1/5] mmc: core: Add parameter use_blk_mq Adrian Hunter
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Adrian Hunter @ 2017-10-24  8:40 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

Hi

Here is V12 of the hardware command queue patches without the software
command queue patches, now using blk-mq and now with blk-mq support for
non-CQE I/O.

HW CMDQ offers 25% - 50% better random multi-threaded I/O.  I see a slight
2% drop in sequential read speed but no change to sequential write.

Non-CQE blk-mq showed a 3% decrease in sequential read performance.  This
seemed to be coming from the inferior latency of running work items compared
with a dedicated thread.  Hacking blk-mq workqueue to be unbound reduced the
performance degradation from 3% to 1%.

While we should look at changing blk-mq to give better workqueue performance,
a bigger gain is likely to be made by adding a new host API to enable the
next already-prepared request to be issued directly from within ->done()
callback of the current request.


Changes since V11:
      Split "mmc: block: Add CQE and blk-mq support" into 2 patches

Changes since V10:
      mmc: core: Remove unnecessary host claim
      mmc: core: Introduce host claiming by context
      mmc: core: Add support for handling CQE requests
      mmc: mmc: Enable Command Queuing
      mmc: mmc: Enable CQE's
      mmc: block: Use local variables in mmc_blk_data_prep()
      mmc: block: Prepare CQE data
      mmc: block: Factor out mmc_setup_queue()
      mmc: core: Add parameter use_blk_mq
      mmc: core: Export mmc_start_bkops()
      mmc: core: Export mmc_start_request()
      mmc: core: Export mmc_retune_hold_now() and mmc_retune_release()
	Dropped because they have been applied
      mmc: block: Add CQE and blk-mq support
	Extend blk-mq support for asynchronous read / writes to all host
	controllers including those that require polling. The direct
	completion path is still available but depends on a new capability
	flag.
	Drop blk-mq support for synchronous read / writes.

Venkat Gopalakrishnan (1):
      mmc: cqhci: support for command queue enabled host

Changes since V9:
      mmc: block: Add CQE and blk-mq support
	- reinstate mq support for REQ_OP_DRV_IN/OUT that was removed because
	it was incorrectly assumed to be handled by the rpmb character device
	- don't check for rpmb block device anymore
      mmc: cqhci: support for command queue enabled host
	Fix cqhci_set_irqs() as per Haibo Chen

Changes since V8:
	Re-based
      mmc: core: Introduce host claiming by context
	Slightly simplified as per Ulf
      mmc: core: Export mmc_retune_hold_now() and mmc_retune_release()
	New patch.
      mmc: block: Add CQE and blk-mq support
	Fix missing ->post_req() on the error path

Changes since V7:
	Re-based
      mmc: core: Introduce host claiming by context
	Slightly simplified
      mmc: core: Add parameter use_blk_mq
	New patch.
      mmc: core: Remove unnecessary host claim
	New patch.
      mmc: core: Export mmc_start_bkops()
	New patch.
      mmc: core: Export mmc_start_request()
	New patch.
      mmc: block: Add CQE and blk-mq support
	Add blk-mq support for non_CQE requests

Changes since V6:
      mmc: core: Introduce host claiming by context
	New patch.
      mmc: core: Move mmc_start_areq() declaration
	Dropped because it has been applied
      mmc: block: Fix block status codes
	Dropped because it has been applied
      mmc: host: Add CQE interface
	Dropped because it has been applied
      mmc: core: Turn off CQE before sending commands
	Dropped because it has been applied
      mmc: block: Factor out mmc_setup_queue()
	New patch.
      mmc: block: Add CQE support
	Drop legacy support and add blk-mq support

Changes since V5:
	Re-based
      mmc: core: Add mmc_retune_hold_now()
	Dropped because it has been applied
      mmc: core: Add members to mmc_request and mmc_data for CQE's
	Dropped because it has been applied
      mmc: core: Move mmc_start_areq() declaration
	New patch at Ulf's request
      mmc: block: Fix block status codes
	Another un-related patch
      mmc: host: Add CQE interface
	Move recovery_notifier() callback to struct mmc_request
      mmc: core: Add support for handling CQE requests
	Roll __mmc_cqe_request_done() into mmc_cqe_request_done()
	Move function declarations requested by Ulf
      mmc: core: Remove unused MMC_CAP2_PACKED_CMD
	Dropped because it has been applied
      mmc: block: Add CQE support
	Add explanation to commit message
	Adjustment for changed recovery_notifier() callback
      mmc: cqhci: support for command queue enabled host
	Adjustment for changed recovery_notifier() callback
      mmc: sdhci-pci: Add CQHCI support for Intel GLK
	Add DCMD capability for Intel controllers except GLK

Changes since V4:
      mmc: core: Add mmc_retune_hold_now()
	Add explanation to commit message.
      mmc: host: Add CQE interface
	Add comments to callback declarations.
      mmc: core: Turn off CQE before sending commands
	Add explanation to commit message.
      mmc: core: Add support for handling CQE requests
	Add comments as requested by Ulf.
      mmc: core: Remove unused MMC_CAP2_PACKED_CMD
	New patch.
      mmc: mmc: Enable Command Queuing
	Adjust for removal of MMC_CAP2_PACKED_CMD.
	Add a comment about Packed Commands.
      mmc: mmc: Enable CQE's
	Remove un-necessary check for MMC_CAP2_CQE
      mmc: block: Use local variables in mmc_blk_data_prep()
	New patch.
      mmc: block: Prepare CQE data
	Adjust due to "mmc: block: Use local variables in mmc_blk_data_prep()"
	Remove priority setting.
	Add explanation to commit message.
      mmc: cqhci: support for command queue enabled host
	Fix transfer descriptor setting in cqhci_set_tran_desc() for 32-bit DMA

Changes since V3:
	Adjusted ...blk_end_request...() for new block status codes
	Fixed CQHCI transaction descriptor for "no DCMD" case

Changes since V2:
	Dropped patches that have been applied.
	Re-based
	Added "mmc: sdhci-pci: Add CQHCI support for Intel GLK"

Changes since V1:

	"Share mmc request array between partitions" is dependent
	on changes in "Introduce queue semantics", so added that
	and block fixes:

	Added "Fix is_waiting_last_req set incorrectly"
	Added "Fix cmd error reset failure path"
	Added "Use local var for mqrq_cur"
	Added "Introduce queue semantics"

Changes since RFC:

	Re-based on next.
	Added comment about command queue priority.
	Added some acks and reviews.


Adrian Hunter (4):
      mmc: core: Add parameter use_blk_mq
      mmc: block: Add blk-mq support
      mmc: block: Add CQE support
      mmc: sdhci-pci: Add CQHCI support for Intel GLK

Venkat Gopalakrishnan (1):
      mmc: cqhci: support for command queue enabled host

 drivers/mmc/Kconfig               |   11 +
 drivers/mmc/core/block.c          |  801 +++++++++++++++++++++++++-
 drivers/mmc/core/block.h          |   12 +
 drivers/mmc/core/core.c           |    7 +
 drivers/mmc/core/core.h           |    2 +
 drivers/mmc/core/host.c           |    2 +
 drivers/mmc/core/host.h           |    4 +
 drivers/mmc/core/queue.c          |  426 +++++++++++++-
 drivers/mmc/core/queue.h          |   56 ++
 drivers/mmc/host/Kconfig          |   14 +
 drivers/mmc/host/Makefile         |    1 +
 drivers/mmc/host/cqhci.c          | 1150 +++++++++++++++++++++++++++++++++++++
 drivers/mmc/host/cqhci.h          |  240 ++++++++
 drivers/mmc/host/sdhci-pci-core.c |  155 ++++-
 include/linux/mmc/host.h          |    2 +
 15 files changed, 2852 insertions(+), 31 deletions(-)
 create mode 100644 drivers/mmc/host/cqhci.c
 create mode 100644 drivers/mmc/host/cqhci.h


Regards
Adrian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V12 1/5] mmc: core: Add parameter use_blk_mq
  2017-10-24  8:40 [PATCH V12 0/5] mmc: Add Command Queue support Adrian Hunter
@ 2017-10-24  8:40 ` Adrian Hunter
  2017-10-24  8:40 ` [PATCH V12 2/5] mmc: block: Add blk-mq support Adrian Hunter
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Adrian Hunter @ 2017-10-24  8:40 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

Until mmc has blk-mq support fully implemented and tested, add a
parameter use_blk_mq, default to false unless config option MMC_MQ_DEFAULT
is selected.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/Kconfig      | 11 +++++++++++
 drivers/mmc/core/core.c  |  7 +++++++
 drivers/mmc/core/core.h  |  2 ++
 drivers/mmc/core/host.c  |  2 ++
 drivers/mmc/core/host.h  |  4 ++++
 include/linux/mmc/host.h |  1 +
 6 files changed, 27 insertions(+)

diff --git a/drivers/mmc/Kconfig b/drivers/mmc/Kconfig
index ec21388311db..98202934bd29 100644
--- a/drivers/mmc/Kconfig
+++ b/drivers/mmc/Kconfig
@@ -12,6 +12,17 @@ menuconfig MMC
 	  If you want MMC/SD/SDIO support, you should say Y here and
 	  also to your specific host controller driver.
 
+config MMC_MQ_DEFAULT
+	bool "MMC: use blk-mq I/O path by default"
+	depends on MMC && BLOCK
+	---help---
+	  This option enables the new blk-mq based I/O path for MMC block
+	  devices by default.  With the option the mmc_core.use_blk_mq
+	  module/boot option defaults to Y, without it to N, but it can
+	  still be overridden either way.
+
+	  If unsure say N.
+
 if MMC
 
 source "drivers/mmc/core/Kconfig"
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 1f0f44f4dd5f..5df03cb73be7 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -66,6 +66,13 @@
 bool use_spi_crc = 1;
 module_param(use_spi_crc, bool, 0);
 
+#ifdef CONFIG_MMC_MQ_DEFAULT
+bool mmc_use_blk_mq = true;
+#else
+bool mmc_use_blk_mq = false;
+#endif
+module_param_named(use_blk_mq, mmc_use_blk_mq, bool, S_IWUSR | S_IRUGO);
+
 static int mmc_schedule_delayed_work(struct delayed_work *work,
 				     unsigned long delay)
 {
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index 71e6c6d7ceb7..8c5dd8d31400 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -35,6 +35,8 @@ struct mmc_bus_ops {
 	int (*reset)(struct mmc_host *);
 };
 
+extern bool mmc_use_blk_mq;
+
 void mmc_attach_bus(struct mmc_host *host, const struct mmc_bus_ops *ops);
 void mmc_detach_bus(struct mmc_host *host);
 
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index 35a9e4fd1a9f..62ef6cb0ece4 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -404,6 +404,8 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev)
 
 	host->fixed_drv_type = -EINVAL;
 
+	host->use_blk_mq = mmc_use_blk_mq;
+
 	return host;
 }
 
diff --git a/drivers/mmc/core/host.h b/drivers/mmc/core/host.h
index fb689a1065ed..6eaf558e62d6 100644
--- a/drivers/mmc/core/host.h
+++ b/drivers/mmc/core/host.h
@@ -74,6 +74,10 @@ static inline bool mmc_card_hs400es(struct mmc_card *card)
 	return card->host->ios.enhanced_strobe;
 }
 
+static inline bool mmc_host_use_blk_mq(struct mmc_host *host)
+{
+	return host->use_blk_mq;
+}
 
 #endif
 
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index e7743eca1021..ce2075d6f429 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -380,6 +380,7 @@ struct mmc_host {
 	unsigned int		doing_retune:1;	/* re-tuning in progress */
 	unsigned int		retune_now:1;	/* do re-tuning at next req */
 	unsigned int		retune_paused:1; /* re-tuning is temporarily disabled */
+	unsigned int		use_blk_mq:1;	/* use blk-mq */
 
 	int			rescan_disable;	/* disable card detection */
 	int			rescan_entered;	/* used with nonremovable devices */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V12 2/5] mmc: block: Add blk-mq support
  2017-10-24  8:40 [PATCH V12 0/5] mmc: Add Command Queue support Adrian Hunter
  2017-10-24  8:40 ` [PATCH V12 1/5] mmc: core: Add parameter use_blk_mq Adrian Hunter
@ 2017-10-24  8:40 ` Adrian Hunter
  2017-10-27  9:23   ` Ulf Hansson
  2017-10-24  8:40 ` [PATCH V12 3/5] mmc: block: Add CQE support Adrian Hunter
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Adrian Hunter @ 2017-10-24  8:40 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

Define and use a blk-mq queue. Discards and flushes are processed
synchronously, but reads and writes asynchronously. In order to support
slow DMA unmapping, DMA unmapping is not done until after the next request
is started. That means the request is not completed until then. If there is
no next request then the completion is done by queued work.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/core/block.c | 655 ++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/mmc/core/block.h |  10 +
 drivers/mmc/core/queue.c | 302 ++++++++++++++++++++--
 drivers/mmc/core/queue.h |  41 +++
 include/linux/mmc/host.h |   1 +
 5 files changed, 979 insertions(+), 30 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index ea80ff4cd7f9..002446e8dc5d 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1264,7 +1264,10 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
 		break;
 	}
 	mq_rq->drv_op_result = ret;
-	blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+	if (req->mq_ctx)
+		blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+	else
+		blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
@@ -1307,7 +1310,10 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
 	else
 		mmc_blk_reset_success(md, type);
 fail:
-	blk_end_request(req, status, blk_rq_bytes(req));
+	if (req->mq_ctx)
+		blk_mq_end_request(req, status);
+	else
+		blk_end_request(req, status, blk_rq_bytes(req));
 }
 
 static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
@@ -1377,7 +1383,10 @@ static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
 	if (!err)
 		mmc_blk_reset_success(md, type);
 out:
-	blk_end_request(req, status, blk_rq_bytes(req));
+	if (req->mq_ctx)
+		blk_mq_end_request(req, status);
+	else
+		blk_end_request(req, status, blk_rq_bytes(req));
 }
 
 static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
@@ -1387,7 +1396,10 @@ static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
 	int ret = 0;
 
 	ret = mmc_flush_cache(card);
-	blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+	if (req->mq_ctx)
+		blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+	else
+		blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 /*
@@ -1413,15 +1425,18 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq,
 	}
 }
 
-#define CMD_ERRORS							\
-	(R1_OUT_OF_RANGE |	/* Command argument out of range */	\
-	 R1_ADDRESS_ERROR |	/* Misaligned address */		\
+#define CMD_ERRORS_EXCL_OOR						\
+	(R1_ADDRESS_ERROR |	/* Misaligned address */		\
 	 R1_BLOCK_LEN_ERROR |	/* Transferred block length incorrect */\
 	 R1_WP_VIOLATION |	/* Tried to write to protected block */	\
 	 R1_CARD_ECC_FAILED |	/* Card ECC failed */			\
 	 R1_CC_ERROR |		/* Card controller error */		\
 	 R1_ERROR)		/* General/unknown error */
 
+#define CMD_ERRORS							\
+	(CMD_ERRORS_EXCL_OOR |						\
+	 R1_OUT_OF_RANGE)	/* Command argument out of range */	\
+
 static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq)
 {
 	u32 val;
@@ -1766,6 +1781,632 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 	mqrq->areq.err_check = mmc_blk_err_check;
 }
 
+#define MMC_MAX_RETRIES		5
+#define MMC_DATA_RETRIES	2
+#define MMC_NO_RETRIES		(MMC_MAX_RETRIES + 1)
+
+/* Single sector read during recovery */
+static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	blk_status_t status;
+
+	while (1) {
+		mmc_blk_rw_rq_prep(mqrq, mq->card, 1, mq);
+
+		mmc_wait_for_req(mq->card->host, &mqrq->brq.mrq);
+
+		/*
+		 * Not expecting command errors, so just give up in that case.
+		 * If there are retries remaining, the request will get
+		 * requeued.
+		 */
+		if (mqrq->brq.cmd.error)
+			return;
+
+		if (blk_rq_bytes(req) <= 512)
+			break;
+
+		status = mqrq->brq.data.error ? BLK_STS_IOERR : BLK_STS_OK;
+
+		blk_update_request(req, status, 512);
+	}
+
+	mqrq->retries = MMC_NO_RETRIES;
+}
+
+static inline bool mmc_blk_oor_valid(struct mmc_blk_request *brq)
+{
+	return !!brq->mrq.sbc;
+}
+
+static inline u32 mmc_blk_stop_err_bits(struct mmc_blk_request *brq)
+{
+	return mmc_blk_oor_valid(brq) ? CMD_ERRORS : CMD_ERRORS_EXCL_OOR;
+}
+
+static inline bool mmc_blk_in_tran_state(u32 status)
+{
+	/*
+	 * Some cards mishandle the status bits, so make sure to check both the
+	 * busy indication and the card state.
+	 */
+	return status & R1_READY_FOR_DATA &&
+	       (R1_CURRENT_STATE(status) == R1_STATE_TRAN);
+}
+
+/*
+ * Check for errors the host controller driver might not have seen such as
+ * response mode errors or invalid card state.
+ */
+static bool mmc_blk_status_error(struct request *req, u32 status)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_blk_request *brq = &mqrq->brq;
+	u32 stop_err_bits = mmc_blk_stop_err_bits(brq);
+
+	return brq->cmd.resp[0]  & CMD_ERRORS    ||
+	       brq->stop.resp[0] & stop_err_bits ||
+	       status            & stop_err_bits ||
+	       (rq_data_dir(req) == WRITE && !mmc_blk_in_tran_state(status));
+}
+
+static inline bool mmc_blk_cmd_started(struct mmc_blk_request *brq)
+{
+	return !brq->sbc.error && !brq->cmd.error &&
+	       !(brq->cmd.resp[0] & CMD_ERRORS);
+}
+
+static unsigned int mmc_blk_clock_khz(struct mmc_host *host)
+{
+	if (host->actual_clock)
+		return host->actual_clock / 1000;
+
+	/* Clock may be subject to a divisor, fudge it by a factor of 2. */
+	if (host->ios.clock)
+		return host->ios.clock / 2000;
+
+	/* How can there be no clock */
+	WARN_ON_ONCE(1);
+	return 100; /* 100 kHz is minimum possible value */
+}
+
+static unsigned long mmc_blk_data_timeout_jiffies(struct mmc_host *host,
+						  struct mmc_data *data)
+{
+	unsigned int ms = DIV_ROUND_UP(data->timeout_ns, 1000000);
+	unsigned int khz;
+
+	if (data->timeout_clks) {
+		khz = mmc_blk_clock_khz(host);
+		ms += DIV_ROUND_UP(data->timeout_clks, khz);
+	}
+
+	return msecs_to_jiffies(ms);
+}
+
+static int mmc_blk_card_stuck(struct mmc_card *card, struct request *req,
+			      u32 *resp_errs)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_data *data = &mqrq->brq.data;
+	unsigned long timeout;
+	u32 status;
+	int err;
+
+	timeout = jiffies + mmc_blk_data_timeout_jiffies(card->host, data);
+
+	while (1) {
+		bool done = time_after(jiffies, timeout);
+
+		err = __mmc_send_status(card, &status, 5);
+		if (err) {
+			pr_err("%s: error %d requesting status\n",
+			       req->rq_disk->disk_name, err);
+			break;
+		}
+
+		/* Accumulate any response error bits seen */
+		if (resp_errs)
+			*resp_errs |= status;
+
+		if (mmc_blk_in_tran_state(status))
+			break;
+
+		/* Timeout if the device never becomes ready */
+		if (done) {
+			pr_err("%s: Card stuck in wrong state! %s %s\n",
+				mmc_hostname(card->host),
+				req->rq_disk->disk_name, __func__);
+			err = -ETIMEDOUT;
+			break;
+		}
+	}
+
+	return err;
+}
+
+static int mmc_blk_send_stop(struct mmc_card *card)
+{
+	struct mmc_command cmd = {
+		.opcode = MMC_STOP_TRANSMISSION,
+		.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_AC,
+	};
+
+	return mmc_wait_for_cmd(card->host, &cmd, 5);
+}
+
+static int mmc_blk_fix_state(struct mmc_card *card, struct request *req)
+{
+	int err;
+
+	mmc_retune_hold_now(card->host);
+
+	mmc_blk_send_stop(card);
+
+	err = mmc_blk_card_stuck(card, req, NULL);
+
+	mmc_retune_release(card->host);
+
+	return err;
+}
+
+static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
+{
+	int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_blk_request *brq = &mqrq->brq;
+	struct mmc_blk_data *md = mq->blkdata;
+	struct mmc_card *card = mq->card;
+	u32 status;
+	u32 blocks;
+	int err;
+
+	/*
+	 * Status error bits might get lost during re-tuning so don't allow
+	 * re-tuning yet.
+	 */
+	mmc_retune_hold_now(card->host);
+
+	/*
+	 * Some errors the host driver might not have seen. Set the number of
+	 * bytes transferred to zero in that case.
+	 */
+	err = __mmc_send_status(card, &status, 0);
+	if (err || mmc_blk_status_error(req, status))
+		brq->data.bytes_xfered = 0;
+
+	mmc_retune_release(card->host);
+
+	/*
+	 * Try again to get the status. This also provides an opportunity for
+	 * re-tuning.
+	 */
+	if (err)
+		err = __mmc_send_status(card, &status, 0);
+
+	/*
+	 * Nothing more to do after the number of bytes transferred has been
+	 * updated and there is no card.
+	 */
+	if (err && mmc_detect_card_removed(card->host))
+		return;
+
+	/* Try to get back to "tran" state */
+	if (err || !mmc_blk_in_tran_state(status))
+		err = mmc_blk_fix_state(mq->card, req);
+
+	/*
+	 * Special case for SD cards where the card might record the number of
+	 * blocks written.
+	 */
+	if (!err && mmc_blk_cmd_started(brq) && mmc_card_sd(card) &&
+	    rq_data_dir(req) == WRITE && !mmc_sd_num_wr_blocks(card, &blocks))
+		brq->data.bytes_xfered = blocks << 9;
+
+	/* Reset if the card is in a bad state */
+	if (err && mmc_blk_reset(md, card->host, type)) {
+		pr_err("%s: recovery failed!\n", req->rq_disk->disk_name);
+		mqrq->retries = MMC_NO_RETRIES;
+		return;
+	}
+
+	/*
+	 * If anything was done, just return and if there is anything remaining
+	 * on the request it will get requeued.
+	 */
+	if (brq->data.bytes_xfered)
+		return;
+
+	/* Reset before last retry */
+	if (mqrq->retries + 1 == MMC_MAX_RETRIES)
+		mmc_blk_reset(md, card->host, type);
+
+	/* Command errors fail fast, so use all MMC_MAX_RETRIES */
+	if (brq->sbc.error || brq->cmd.error)
+		return;
+
+	/* Reduce the remaining retries for data errors */
+	if (mqrq->retries < MMC_MAX_RETRIES - MMC_DATA_RETRIES) {
+		mqrq->retries = MMC_MAX_RETRIES - MMC_DATA_RETRIES;
+		return;
+	}
+
+	/* FIXME: Missing single sector read for large sector size */
+	if (rq_data_dir(req) == READ && !mmc_large_sector(card)) {
+		/* Read one sector at a time */
+		mmc_blk_ss_read(mq, req);
+		return;
+	}
+}
+
+static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq)
+{
+	mmc_blk_eval_resp_error(brq);
+
+	return brq->sbc.error || brq->cmd.error || brq->stop.error ||
+	       brq->data.error || brq->cmd.resp[0] & CMD_ERRORS;
+}
+
+static int mmc_blk_card_busy(struct mmc_card *card, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	u32 status = 0;
+	int err;
+
+	if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ)
+		return 0;
+
+	mmc_retune_hold_now(card->host);
+
+	err = mmc_blk_card_stuck(card, req, &status);
+
+	mmc_retune_release(card->host);
+
+	/*
+	 * Do not assume data transferred correctly if there are any error bits
+	 * set.
+	 */
+	if (!err && status & mmc_blk_stop_err_bits(&mqrq->brq)) {
+		mqrq->brq.data.bytes_xfered = 0;
+		err = -EIO;
+	}
+
+	/* Copy the exception bit so it will be seen later on */
+	if (mmc_card_mmc(card) && status & R1_EXCEPTION_EVENT)
+		mqrq->brq.cmd.resp[0] |= R1_EXCEPTION_EVENT;
+
+	return err;
+}
+
+static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq,
+					    struct request *req)
+{
+	int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
+
+	mmc_blk_reset_success(mq->blkdata, type);
+}
+
+static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	unsigned int nr_bytes = mqrq->brq.data.bytes_xfered;
+
+	if (nr_bytes) {
+		if (blk_update_request(req, BLK_STS_OK, nr_bytes))
+			blk_mq_requeue_request(req, true);
+		else
+			__blk_mq_end_request(req, BLK_STS_OK);
+	} else if (mqrq->retries++ < MMC_MAX_RETRIES) {
+		blk_mq_requeue_request(req, true);
+	} else {
+		if (mmc_card_removed(mq->card))
+			req->rq_flags |= RQF_QUIET;
+		blk_mq_end_request(req, BLK_STS_IOERR);
+	}
+}
+
+static bool mmc_blk_urgent_bkops_needed(struct mmc_queue *mq,
+					struct mmc_queue_req *mqrq)
+{
+	return mmc_card_mmc(mq->card) &&
+	       (mqrq->brq.cmd.resp[0] & R1_EXCEPTION_EVENT ||
+		mqrq->brq.stop.resp[0] & R1_EXCEPTION_EVENT);
+}
+
+static void mmc_blk_urgent_bkops(struct mmc_queue *mq,
+				 struct mmc_queue_req *mqrq)
+{
+	if (mmc_blk_urgent_bkops_needed(mq, mqrq))
+		mmc_start_bkops(mq->card, true);
+}
+
+void mmc_blk_mq_complete(struct request *req)
+{
+	struct mmc_queue *mq = req->q->queuedata;
+
+	mmc_blk_mq_complete_rq(mq, req);
+}
+
+static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
+				       struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_host *host = mq->card->host;
+	bool failed;
+
+	failed = mmc_blk_rq_error(&mqrq->brq) ||
+		 mmc_blk_card_busy(mq->card, req);
+
+	if (!mmc_queue_direct_complete(host))
+		mmc_retune_release(host);
+
+	if (failed)
+		mmc_blk_rw_recovery(mq, req);
+	else
+		mmc_blk_rw_reset_success(mq, req);
+
+	mmc_blk_urgent_bkops(mq, mqrq);
+}
+
+static void mmc_blk_mq_acct_req_done(struct mmc_queue *mq, struct request *req)
+{
+	struct request_queue *q = req->q;
+	unsigned long flags;
+	bool put_card;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+
+	mq->in_flight[mmc_issue_type(mq, req)] -= 1;
+
+	put_card = mmc_tot_in_flight(mq) == 0;
+
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	if (put_card)
+		mmc_put_card(mq->card, &mq->ctx);
+}
+
+static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_request *mrq = &mqrq->brq.mrq;
+	struct mmc_host *host = mq->card->host;
+
+	if (host->ops->post_req)
+		host->ops->post_req(host, mrq, 0);
+
+	/*
+	 * Block layer timeouts race with completions which means the normal
+	 * completion path cannot be used during recovery.
+	 */
+	if (mq->in_recovery)
+		mmc_blk_mq_complete_rq(mq, req);
+	else
+		blk_mq_complete_request(req);
+
+	mmc_blk_mq_acct_req_done(mq, req);
+}
+
+void mmc_blk_mq_recovery(struct mmc_queue *mq)
+{
+	struct request *req = mq->recovery_req;
+
+	mq->recovery_req = NULL;
+	mq->rw_wait = false;
+
+	mmc_blk_mq_poll_completion(mq, req);
+
+	mmc_blk_mq_post_req(mq, req);
+}
+
+static void mmc_blk_mq_complete_prev_req(struct mmc_queue *mq,
+					 struct request **prev_req)
+{
+	if (mmc_queue_direct_complete(mq->card->host))
+		return;
+
+	mutex_lock(&mq->complete_lock);
+
+	if (!mq->complete_req)
+		goto out_unlock;
+
+	mmc_blk_mq_poll_completion(mq, mq->complete_req);
+
+	if (prev_req)
+		*prev_req = mq->complete_req;
+	else
+		mmc_blk_mq_post_req(mq, mq->complete_req);
+
+	mq->complete_req = NULL;
+
+out_unlock:
+	mutex_unlock(&mq->complete_lock);
+}
+
+void mmc_blk_mq_complete_work(struct work_struct *work)
+{
+	struct mmc_queue *mq = container_of(work, struct mmc_queue,
+					    complete_work);
+
+	mmc_blk_mq_complete_prev_req(mq, NULL);
+}
+
+static void mmc_blk_mq_req_done(struct mmc_request *mrq)
+{
+	struct mmc_queue_req *mqrq = container_of(mrq, struct mmc_queue_req,
+						  brq.mrq);
+	struct request *req = mmc_queue_req_to_req(mqrq);
+	struct request_queue *q = req->q;
+	struct mmc_queue *mq = q->queuedata;
+	struct mmc_host *host = mq->card->host;
+	unsigned long flags;
+
+	if (mmc_blk_rq_error(&mqrq->brq) ||
+	    mmc_blk_urgent_bkops_needed(mq, mqrq)) {
+		spin_lock_irqsave(q->queue_lock, flags);
+		mq->recovery_needed = true;
+		mq->recovery_req = req;
+		spin_unlock_irqrestore(q->queue_lock, flags);
+		wake_up(&mq->wait);
+		schedule_work(&mq->recovery_work);
+		return;
+	}
+
+	if (!mmc_queue_direct_complete(host)) {
+		bool waiting;
+
+		spin_lock_irqsave(q->queue_lock, flags);
+		mq->complete_req = req;
+		mq->rw_wait = false;
+		waiting = mq->waiting;
+		spin_unlock_irqrestore(q->queue_lock, flags);
+
+		if (waiting)
+			wake_up(&mq->wait);
+		else
+			kblockd_schedule_work(&mq->complete_work);
+
+		return;
+	}
+
+	mmc_blk_rw_reset_success(mq, req);
+
+	mq->rw_wait = false;
+	wake_up(&mq->wait);
+
+	mmc_blk_mq_post_req(mq, req);
+}
+
+static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
+{
+	struct request_queue *q = mq->queue;
+	unsigned long flags;
+	bool done;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	if (mq->recovery_needed) {
+		*err = -EBUSY;
+		done = true;
+	} else {
+		done = !mq->rw_wait;
+	}
+	mq->waiting = !done;
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	return done;
+}
+
+static int mmc_blk_rw_wait(struct mmc_queue *mq, struct request **prev_req)
+{
+	int err = 0;
+
+	wait_event(mq->wait, mmc_blk_rw_wait_cond(mq, &err));
+
+	mmc_blk_mq_complete_prev_req(mq, prev_req);
+
+	return err;
+}
+
+static int mmc_blk_mq_issue_rw_rq(struct mmc_queue *mq,
+				  struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_host *host = mq->card->host;
+	struct request *prev_req = NULL;
+	int err = 0;
+
+	mmc_blk_rw_rq_prep(mqrq, mq->card, 0, mq);
+
+	mqrq->brq.mrq.done = mmc_blk_mq_req_done;
+
+	if (host->ops->pre_req)
+		host->ops->pre_req(host, &mqrq->brq.mrq);
+
+	err = mmc_blk_rw_wait(mq, &prev_req);
+	if (err)
+		goto out_post_req;
+
+	mq->rw_wait = true;
+
+	err = mmc_start_request(host, &mqrq->brq.mrq);
+
+	if (prev_req)
+		mmc_blk_mq_post_req(mq, prev_req);
+
+	if (err)
+		mq->rw_wait = false;
+
+	/* Release re-tuning here where there is no synchronization required */
+	if (mmc_queue_direct_complete(host))
+		mmc_retune_release(host);
+
+out_post_req:
+	if (err && host->ops->post_req)
+		host->ops->post_req(host, &mqrq->brq.mrq, err);
+
+	return err;
+}
+
+static int mmc_blk_wait_for_idle(struct mmc_queue *mq, struct mmc_host *host)
+{
+	return mmc_blk_rw_wait(mq, NULL);
+}
+
+enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_blk_data *md = mq->blkdata;
+	struct mmc_card *card = md->queue.card;
+	struct mmc_host *host = card->host;
+	int ret;
+
+	ret = mmc_blk_part_switch(card, md->part_type);
+	if (ret)
+		return MMC_REQ_FAILED_TO_START;
+
+	switch (mmc_issue_type(mq, req)) {
+	case MMC_ISSUE_SYNC:
+		ret = mmc_blk_wait_for_idle(mq, host);
+		if (ret)
+			return MMC_REQ_BUSY;
+		switch (req_op(req)) {
+		case REQ_OP_DRV_IN:
+		case REQ_OP_DRV_OUT:
+			mmc_blk_issue_drv_op(mq, req);
+			break;
+		case REQ_OP_DISCARD:
+			mmc_blk_issue_discard_rq(mq, req);
+			break;
+		case REQ_OP_SECURE_ERASE:
+			mmc_blk_issue_secdiscard_rq(mq, req);
+			break;
+		case REQ_OP_FLUSH:
+			mmc_blk_issue_flush(mq, req);
+			break;
+		default:
+			WARN_ON_ONCE(1);
+			return MMC_REQ_FAILED_TO_START;
+		}
+		return MMC_REQ_FINISHED;
+	case MMC_ISSUE_ASYNC:
+		switch (req_op(req)) {
+		case REQ_OP_READ:
+		case REQ_OP_WRITE:
+			ret = mmc_blk_mq_issue_rw_rq(mq, req);
+			break;
+		default:
+			WARN_ON_ONCE(1);
+			ret = -EINVAL;
+		}
+		if (!ret)
+			return MMC_REQ_STARTED;
+		return ret == -EBUSY ? MMC_REQ_BUSY : MMC_REQ_FAILED_TO_START;
+	default:
+		WARN_ON_ONCE(1);
+		return MMC_REQ_FAILED_TO_START;
+	}
+}
+
 static bool mmc_blk_rw_cmd_err(struct mmc_blk_data *md, struct mmc_card *card,
 			       struct mmc_blk_request *brq, struct request *req,
 			       bool old_req_pending)
diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h
index 860ca7c8df86..b145ed81402b 100644
--- a/drivers/mmc/core/block.h
+++ b/drivers/mmc/core/block.h
@@ -6,4 +6,14 @@
 
 void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req);
 
+enum mmc_issued;
+
+enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req);
+void mmc_blk_mq_complete(struct request *req);
+void mmc_blk_mq_recovery(struct mmc_queue *mq);
+
+struct work_struct;
+
+void mmc_blk_mq_complete_work(struct work_struct *work);
+
 #endif
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 4f33d277b125..d3b5881615f5 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -22,6 +22,7 @@
 #include "block.h"
 #include "core.h"
 #include "card.h"
+#include "host.h"
 
 /*
  * Prepare a MMC request. This just filters out odd stuff.
@@ -34,10 +35,48 @@ static int mmc_prep_request(struct request_queue *q, struct request *req)
 		return BLKPREP_KILL;
 
 	req->rq_flags |= RQF_DONTPREP;
+	req_to_mmc_queue_req(req)->retries = 0;
 
 	return BLKPREP_OK;
 }
 
+enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req)
+{
+	if (req_op(req) == REQ_OP_READ || req_op(req) == REQ_OP_WRITE)
+		return MMC_ISSUE_ASYNC;
+
+	return MMC_ISSUE_SYNC;
+}
+
+static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
+						 bool reserved)
+{
+	return BLK_EH_RESET_TIMER;
+}
+
+static void mmc_mq_recovery_handler(struct work_struct *work)
+{
+	struct mmc_queue *mq = container_of(work, struct mmc_queue,
+					    recovery_work);
+	struct request_queue *q = mq->queue;
+
+	mmc_get_card(mq->card, &mq->ctx);
+
+	mq->in_recovery = true;
+
+	mmc_blk_mq_recovery(mq);
+
+	mq->in_recovery = false;
+
+	spin_lock_irq(q->queue_lock);
+	mq->recovery_needed = false;
+	spin_unlock_irq(q->queue_lock);
+
+	mmc_put_card(mq->card, &mq->ctx);
+
+	blk_mq_run_hw_queues(q, true);
+}
+
 static int mmc_queue_thread(void *d)
 {
 	struct mmc_queue *mq = d;
@@ -154,11 +193,10 @@ static void mmc_queue_setup_discard(struct request_queue *q,
  * @req: the request
  * @gfp: memory allocation policy
  */
-static int mmc_init_request(struct request_queue *q, struct request *req,
-			    gfp_t gfp)
+static int __mmc_init_request(struct mmc_queue *mq, struct request *req,
+			      gfp_t gfp)
 {
 	struct mmc_queue_req *mq_rq = req_to_mmc_queue_req(req);
-	struct mmc_queue *mq = q->queuedata;
 	struct mmc_card *card = mq->card;
 	struct mmc_host *host = card->host;
 
@@ -169,6 +207,12 @@ static int mmc_init_request(struct request_queue *q, struct request *req,
 	return 0;
 }
 
+static int mmc_init_request(struct request_queue *q, struct request *req,
+			    gfp_t gfp)
+{
+	return __mmc_init_request(q->queuedata, req, gfp);
+}
+
 static void mmc_exit_request(struct request_queue *q, struct request *req)
 {
 	struct mmc_queue_req *mq_rq = req_to_mmc_queue_req(req);
@@ -177,6 +221,110 @@ static void mmc_exit_request(struct request_queue *q, struct request *req)
 	mq_rq->sg = NULL;
 }
 
+static int mmc_mq_init_request(struct blk_mq_tag_set *set, struct request *req,
+			       unsigned int hctx_idx, unsigned int numa_node)
+{
+	return __mmc_init_request(set->driver_data, req, GFP_KERNEL);
+}
+
+static void mmc_mq_exit_request(struct blk_mq_tag_set *set, struct request *req,
+				unsigned int hctx_idx)
+{
+	struct mmc_queue *mq = set->driver_data;
+
+	mmc_exit_request(mq->queue, req);
+}
+
+static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
+				    const struct blk_mq_queue_data *bd)
+{
+	struct request *req = bd->rq;
+	struct request_queue *q = req->q;
+	struct mmc_queue *mq = q->queuedata;
+	struct mmc_card *card = mq->card;
+	enum mmc_issue_type issue_type;
+	enum mmc_issued issued;
+	bool get_card;
+	int ret;
+
+	if (mmc_card_removed(mq->card)) {
+		req->rq_flags |= RQF_QUIET;
+		return BLK_STS_IOERR;
+	}
+
+	issue_type = mmc_issue_type(mq, req);
+
+	spin_lock_irq(q->queue_lock);
+
+	if (mq->recovery_needed) {
+		spin_unlock_irq(q->queue_lock);
+		return BLK_STS_RESOURCE;
+	}
+
+	switch (issue_type) {
+	case MMC_ISSUE_ASYNC:
+		break;
+	default:
+		/*
+		 * Timeouts are handled by mmc core, so set a large value to
+		 * avoid races.
+		 */
+		req->timeout = 600 * HZ;
+		break;
+	}
+
+	mq->in_flight[issue_type] += 1;
+	get_card = mmc_tot_in_flight(mq) == 1;
+
+	spin_unlock_irq(q->queue_lock);
+
+	if (!(req->rq_flags & RQF_DONTPREP)) {
+		req_to_mmc_queue_req(req)->retries = 0;
+		req->rq_flags |= RQF_DONTPREP;
+	}
+
+	if (get_card)
+		mmc_get_card(card, &mq->ctx);
+
+	blk_mq_start_request(req);
+
+	issued = mmc_blk_mq_issue_rq(mq, req);
+
+	switch (issued) {
+	case MMC_REQ_BUSY:
+		ret = BLK_STS_RESOURCE;
+		break;
+	case MMC_REQ_FAILED_TO_START:
+		ret = BLK_STS_IOERR;
+		break;
+	default:
+		ret = BLK_STS_OK;
+		break;
+	}
+
+	if (issued != MMC_REQ_STARTED) {
+		bool put_card = false;
+
+		spin_lock_irq(q->queue_lock);
+		mq->in_flight[issue_type] -= 1;
+		if (mmc_tot_in_flight(mq) == 0)
+			put_card = true;
+		spin_unlock_irq(q->queue_lock);
+		if (put_card)
+			mmc_put_card(card, &mq->ctx);
+	}
+
+	return ret;
+}
+
+static const struct blk_mq_ops mmc_mq_ops = {
+	.queue_rq	= mmc_mq_queue_rq,
+	.init_request	= mmc_mq_init_request,
+	.exit_request	= mmc_mq_exit_request,
+	.complete	= mmc_blk_mq_complete,
+	.timeout	= mmc_mq_timed_out,
+};
+
 static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
 {
 	struct mmc_host *host = card->host;
@@ -198,6 +346,70 @@ static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
 
 	/* Initialize thread_sem even if it is not used */
 	sema_init(&mq->thread_sem, 1);
+
+	INIT_WORK(&mq->recovery_work, mmc_mq_recovery_handler);
+	INIT_WORK(&mq->complete_work, mmc_blk_mq_complete_work);
+
+	mutex_init(&mq->complete_lock);
+
+	init_waitqueue_head(&mq->wait);
+}
+
+static int mmc_mq_init_queue(struct mmc_queue *mq, int q_depth,
+			     const struct blk_mq_ops *mq_ops, spinlock_t *lock)
+{
+	int ret;
+
+	memset(&mq->tag_set, 0, sizeof(mq->tag_set));
+	mq->tag_set.ops = mq_ops;
+	mq->tag_set.queue_depth = q_depth;
+	mq->tag_set.numa_node = NUMA_NO_NODE;
+	mq->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE |
+			    BLK_MQ_F_BLOCKING;
+	mq->tag_set.nr_hw_queues = 1;
+	mq->tag_set.cmd_size = sizeof(struct mmc_queue_req);
+	mq->tag_set.driver_data = mq;
+
+	ret = blk_mq_alloc_tag_set(&mq->tag_set);
+	if (ret)
+		return ret;
+
+	mq->queue = blk_mq_init_queue(&mq->tag_set);
+	if (IS_ERR(mq->queue)) {
+		ret = PTR_ERR(mq->queue);
+		goto free_tag_set;
+	}
+
+	mq->queue->queue_lock = lock;
+	mq->queue->queuedata = mq;
+
+	return 0;
+
+free_tag_set:
+	blk_mq_free_tag_set(&mq->tag_set);
+
+	return ret;
+}
+
+#define MMC_QUEUE_DEPTH 64
+
+static int mmc_mq_init(struct mmc_queue *mq, struct mmc_card *card,
+			 spinlock_t *lock)
+{
+	int q_depth;
+	int ret;
+
+	q_depth = MMC_QUEUE_DEPTH;
+
+	ret = mmc_mq_init_queue(mq, q_depth, &mmc_mq_ops, lock);
+	if (ret)
+		return ret;
+
+	blk_queue_rq_timeout(mq->queue, 60 * HZ);
+
+	mmc_setup_queue(mq, card);
+
+	return 0;
 }
 
 /**
@@ -216,6 +428,10 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 	int ret = -ENOMEM;
 
 	mq->card = card;
+
+	if (mmc_host_use_blk_mq(host))
+		return mmc_mq_init(mq, card, lock);
+
 	mq->queue = blk_alloc_queue(GFP_KERNEL);
 	if (!mq->queue)
 		return -ENOMEM;
@@ -251,11 +467,63 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 	return ret;
 }
 
+static void mmc_mq_queue_suspend(struct mmc_queue *mq)
+{
+	blk_mq_quiesce_queue(mq->queue);
+
+	/*
+	 * The host remains claimed while there are outstanding requests, so
+	 * simply claiming and releasing here ensures there are none.
+	 */
+	mmc_claim_host(mq->card->host);
+	mmc_release_host(mq->card->host);
+}
+
+static void mmc_mq_queue_resume(struct mmc_queue *mq)
+{
+	blk_mq_unquiesce_queue(mq->queue);
+}
+
+static void __mmc_queue_suspend(struct mmc_queue *mq)
+{
+	struct request_queue *q = mq->queue;
+	unsigned long flags;
+
+	if (!mq->suspended) {
+		mq->suspended |= true;
+
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_stop_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+
+		down(&mq->thread_sem);
+	}
+}
+
+static void __mmc_queue_resume(struct mmc_queue *mq)
+{
+	struct request_queue *q = mq->queue;
+	unsigned long flags;
+
+	if (mq->suspended) {
+		mq->suspended = false;
+
+		up(&mq->thread_sem);
+
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_start_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
+}
+
 void mmc_cleanup_queue(struct mmc_queue *mq)
 {
 	struct request_queue *q = mq->queue;
 	unsigned long flags;
 
+	if (q->mq_ops)
+		return;
+
 	/* Make sure the queue isn't suspended, as that will deadlock */
 	mmc_queue_resume(mq);
 
@@ -283,17 +551,11 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
 void mmc_queue_suspend(struct mmc_queue *mq)
 {
 	struct request_queue *q = mq->queue;
-	unsigned long flags;
 
-	if (!mq->suspended) {
-		mq->suspended |= true;
-
-		spin_lock_irqsave(q->queue_lock, flags);
-		blk_stop_queue(q);
-		spin_unlock_irqrestore(q->queue_lock, flags);
-
-		down(&mq->thread_sem);
-	}
+	if (q->mq_ops)
+		mmc_mq_queue_suspend(mq);
+	else
+		__mmc_queue_suspend(mq);
 }
 
 /**
@@ -303,17 +565,11 @@ void mmc_queue_suspend(struct mmc_queue *mq)
 void mmc_queue_resume(struct mmc_queue *mq)
 {
 	struct request_queue *q = mq->queue;
-	unsigned long flags;
 
-	if (mq->suspended) {
-		mq->suspended = false;
-
-		up(&mq->thread_sem);
-
-		spin_lock_irqsave(q->queue_lock, flags);
-		blk_start_queue(q);
-		spin_unlock_irqrestore(q->queue_lock, flags);
-	}
+	if (q->mq_ops)
+		mmc_mq_queue_resume(mq);
+	else
+		__mmc_queue_resume(mq);
 }
 
 /*
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index 68f68ecd94ea..00b4ce2ce83c 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -7,6 +7,19 @@
 #include <linux/mmc/core.h>
 #include <linux/mmc/host.h>
 
+enum mmc_issued {
+	MMC_REQ_STARTED,
+	MMC_REQ_BUSY,
+	MMC_REQ_FAILED_TO_START,
+	MMC_REQ_FINISHED,
+};
+
+enum mmc_issue_type {
+	MMC_ISSUE_SYNC,
+	MMC_ISSUE_ASYNC,
+	MMC_ISSUE_MAX,
+};
+
 static inline struct mmc_queue_req *req_to_mmc_queue_req(struct request *rq)
 {
 	return blk_mq_rq_to_pdu(rq);
@@ -56,12 +69,15 @@ struct mmc_queue_req {
 	int			drv_op_result;
 	void			*drv_op_data;
 	unsigned int		ioc_count;
+	int			retries;
 };
 
 struct mmc_queue {
 	struct mmc_card		*card;
 	struct task_struct	*thread;
 	struct semaphore	thread_sem;
+	struct mmc_ctx		ctx;
+	struct blk_mq_tag_set	tag_set;
 	bool			suspended;
 	bool			asleep;
 	struct mmc_blk_data	*blkdata;
@@ -73,6 +89,18 @@ struct mmc_queue {
 	 * associated mmc_queue_req data.
 	 */
 	int			qcnt;
+
+	int			in_flight[MMC_ISSUE_MAX];
+	bool			recovery_needed;
+	bool			in_recovery;
+	bool			rw_wait;
+	bool			waiting;
+	struct work_struct	recovery_work;
+	wait_queue_head_t	wait;
+	struct request		*recovery_req;
+	struct request		*complete_req;
+	struct mutex		complete_lock;
+	struct work_struct	complete_work;
 };
 
 extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *,
@@ -83,4 +111,17 @@ extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *,
 extern unsigned int mmc_queue_map_sg(struct mmc_queue *,
 				     struct mmc_queue_req *);
 
+enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req);
+
+static inline int mmc_tot_in_flight(struct mmc_queue *mq)
+{
+	return mq->in_flight[MMC_ISSUE_SYNC] +
+	       mq->in_flight[MMC_ISSUE_ASYNC];
+}
+
+static inline bool mmc_queue_direct_complete(struct mmc_host *host)
+{
+	return host->caps & MMC_CAP_DIRECT_COMPLETE;
+}
+
 #endif
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index ce2075d6f429..4b68a95a8818 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -324,6 +324,7 @@ struct mmc_host {
 #define MMC_CAP_DRIVER_TYPE_A	(1 << 23)	/* Host supports Driver Type A */
 #define MMC_CAP_DRIVER_TYPE_C	(1 << 24)	/* Host supports Driver Type C */
 #define MMC_CAP_DRIVER_TYPE_D	(1 << 25)	/* Host supports Driver Type D */
+#define MMC_CAP_DIRECT_COMPLETE	(1 << 27)	/* RW reqs can be completed within mmc_request_done() */
 #define MMC_CAP_CD_WAKE		(1 << 28)	/* Enable card detect wake */
 #define MMC_CAP_CMD_DURING_TFR	(1 << 29)	/* Commands during data transfer */
 #define MMC_CAP_CMD23		(1 << 30)	/* CMD23 supported. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V12 3/5] mmc: block: Add CQE support
  2017-10-24  8:40 [PATCH V12 0/5] mmc: Add Command Queue support Adrian Hunter
  2017-10-24  8:40 ` [PATCH V12 1/5] mmc: core: Add parameter use_blk_mq Adrian Hunter
  2017-10-24  8:40 ` [PATCH V12 2/5] mmc: block: Add blk-mq support Adrian Hunter
@ 2017-10-24  8:40 ` Adrian Hunter
  2017-10-24  8:40 ` [PATCH V12 4/5] mmc: cqhci: support for command queue enabled host Adrian Hunter
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Adrian Hunter @ 2017-10-24  8:40 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

Add CQE support to the block driver, including:
    - optionally using DCMD for flush requests
    - "manually" issuing discard requests
    - issuing read / write requests to the CQE
    - supporting block-layer timeouts
    - handling recovery
    - supporting re-tuning

CQE offers 25% - 50% better random multi-threaded I/O.  There is a slight
(e.g. 2%) drop in sequential read speed but no observable change to sequential
write.

CQE automatically sends the commands to complete requests.  However it only
supports reads / writes and so-called "direct commands" (DCMD).  Furthermore
DCMD is limited to one command at a time, but discards require 3 commands.
That makes issuing discards through CQE very awkward, but some CQE's don't
support DCMD anyway.  So for discards, the existing non-CQE approach is
taken, where the mmc core code issues the 3 commands one at a time i.e.
mmc_erase(). Where DCMD is used, is for issuing flushes.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/core/block.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/mmc/core/block.h |   2 +
 drivers/mmc/core/queue.c | 134 ++++++++++++++++++++++++++++++++++++++++--
 drivers/mmc/core/queue.h |  15 +++++
 4 files changed, 294 insertions(+), 7 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 002446e8dc5d..5a1fa678667b 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -112,6 +112,7 @@ struct mmc_blk_data {
 #define MMC_BLK_WRITE		BIT(1)
 #define MMC_BLK_DISCARD		BIT(2)
 #define MMC_BLK_SECDISCARD	BIT(3)
+#define MMC_BLK_CQE_RECOVERY	BIT(4)
 
 	/*
 	 * Only set in main mmc_blk_data associated
@@ -1713,6 +1714,138 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
 		*do_data_tag_p = do_data_tag;
 }
 
+#define MMC_CQE_RETRIES 2
+
+static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_request *mrq = &mqrq->brq.mrq;
+	struct request_queue *q = req->q;
+	struct mmc_host *host = mq->card->host;
+	unsigned long flags;
+	bool put_card;
+	int err;
+
+	mmc_cqe_post_req(host, mrq);
+
+	if (mrq->cmd && mrq->cmd->error)
+		err = mrq->cmd->error;
+	else if (mrq->data && mrq->data->error)
+		err = mrq->data->error;
+	else
+		err = 0;
+
+	if (err) {
+		if (mqrq->retries++ < MMC_CQE_RETRIES)
+			blk_mq_requeue_request(req, true);
+		else
+			blk_mq_end_request(req, BLK_STS_IOERR);
+	} else if (mrq->data) {
+		if (blk_update_request(req, BLK_STS_OK, mrq->data->bytes_xfered))
+			blk_mq_requeue_request(req, true);
+		else
+			__blk_mq_end_request(req, BLK_STS_OK);
+	} else {
+		blk_mq_end_request(req, BLK_STS_OK);
+	}
+
+	spin_lock_irqsave(q->queue_lock, flags);
+
+	mq->in_flight[mmc_issue_type(mq, req)] -= 1;
+
+	put_card = mmc_tot_in_flight(mq) == 0;
+
+	mmc_cqe_check_busy(mq);
+
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	if (!mq->cqe_busy)
+		blk_mq_run_hw_queues(q, true);
+
+	if (put_card)
+		mmc_put_card(mq->card, &mq->ctx);
+}
+
+void mmc_blk_cqe_recovery(struct mmc_queue *mq)
+{
+	struct mmc_card *card = mq->card;
+	struct mmc_host *host = card->host;
+	int err;
+
+	pr_debug("%s: CQE recovery start\n", mmc_hostname(host));
+
+	err = mmc_cqe_recovery(host);
+	if (err)
+		mmc_blk_reset(mq->blkdata, host, MMC_BLK_CQE_RECOVERY);
+	else
+		mmc_blk_reset_success(mq->blkdata, MMC_BLK_CQE_RECOVERY);
+
+	pr_debug("%s: CQE recovery done\n", mmc_hostname(host));
+}
+
+static void mmc_blk_cqe_req_done(struct mmc_request *mrq)
+{
+	struct mmc_queue_req *mqrq = container_of(mrq, struct mmc_queue_req,
+						  brq.mrq);
+	struct request *req = mmc_queue_req_to_req(mqrq);
+	struct request_queue *q = req->q;
+	struct mmc_queue *mq = q->queuedata;
+
+	/*
+	 * Block layer timeouts race with completions which means the normal
+	 * completion path cannot be used during recovery.
+	 */
+	if (mq->in_recovery)
+		mmc_blk_cqe_complete_rq(mq, req);
+	else
+		blk_mq_complete_request(req);
+}
+
+static int mmc_blk_cqe_start_req(struct mmc_host *host, struct mmc_request *mrq)
+{
+	mrq->done		= mmc_blk_cqe_req_done;
+	mrq->recovery_notifier	= mmc_cqe_recovery_notifier;
+
+	return mmc_cqe_start_req(host, mrq);
+}
+
+static struct mmc_request *mmc_blk_cqe_prep_dcmd(struct mmc_queue_req *mqrq,
+						 struct request *req)
+{
+	struct mmc_blk_request *brq = &mqrq->brq;
+
+	memset(brq, 0, sizeof(*brq));
+
+	brq->mrq.cmd = &brq->cmd;
+	brq->mrq.tag = req->tag;
+
+	return &brq->mrq;
+}
+
+static int mmc_blk_cqe_issue_flush(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_request *mrq = mmc_blk_cqe_prep_dcmd(mqrq, req);
+
+	mrq->cmd->opcode = MMC_SWITCH;
+	mrq->cmd->arg = (MMC_SWITCH_MODE_WRITE_BYTE << 24) |
+			(EXT_CSD_FLUSH_CACHE << 16) |
+			(1 << 8) |
+			EXT_CSD_CMD_SET_NORMAL;
+	mrq->cmd->flags = MMC_CMD_AC | MMC_RSP_R1B;
+
+	return mmc_blk_cqe_start_req(mq->card->host, mrq);
+}
+
+static int mmc_blk_cqe_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+
+	mmc_blk_data_prep(mq, mqrq, 0, NULL, NULL);
+
+	return mmc_blk_cqe_start_req(mq->card->host, &mqrq->brq.mrq);
+}
+
 static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 			       struct mmc_card *card,
 			       int disable_multi,
@@ -2125,7 +2258,10 @@ void mmc_blk_mq_complete(struct request *req)
 {
 	struct mmc_queue *mq = req->q->queuedata;
 
-	mmc_blk_mq_complete_rq(mq, req);
+	if (mq->use_cqe)
+		mmc_blk_cqe_complete_rq(mq, req);
+	else
+		mmc_blk_mq_complete_rq(mq, req);
 }
 
 static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
@@ -2350,6 +2486,9 @@ static int mmc_blk_mq_issue_rw_rq(struct mmc_queue *mq,
 
 static int mmc_blk_wait_for_idle(struct mmc_queue *mq, struct mmc_host *host)
 {
+	if (mq->use_cqe)
+		return host->cqe_ops->cqe_wait_for_idle(host);
+
 	return mmc_blk_rw_wait(mq, NULL);
 }
 
@@ -2388,11 +2527,18 @@ enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
 			return MMC_REQ_FAILED_TO_START;
 		}
 		return MMC_REQ_FINISHED;
+	case MMC_ISSUE_DCMD:
 	case MMC_ISSUE_ASYNC:
 		switch (req_op(req)) {
+		case REQ_OP_FLUSH:
+			ret = mmc_blk_cqe_issue_flush(mq, req);
+			break;
 		case REQ_OP_READ:
 		case REQ_OP_WRITE:
-			ret = mmc_blk_mq_issue_rw_rq(mq, req);
+			if (mq->use_cqe)
+				ret = mmc_blk_cqe_issue_rw_rq(mq, req);
+			else
+				ret = mmc_blk_mq_issue_rw_rq(mq, req);
 			break;
 		default:
 			WARN_ON_ONCE(1);
diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h
index b145ed81402b..5ad22c1c0318 100644
--- a/drivers/mmc/core/block.h
+++ b/drivers/mmc/core/block.h
@@ -6,6 +6,8 @@
 
 void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req);
 
+void mmc_blk_cqe_recovery(struct mmc_queue *mq);
+
 enum mmc_issued;
 
 enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req);
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index d3b5881615f5..bcba2995c767 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -40,18 +40,119 @@ static int mmc_prep_request(struct request_queue *q, struct request *req)
 	return BLKPREP_OK;
 }
 
+static inline bool mmc_cqe_dcmd_busy(struct mmc_queue *mq)
+{
+	/* Allow only 1 DCMD at a time */
+	return mq->in_flight[MMC_ISSUE_DCMD];
+}
+
+void mmc_cqe_check_busy(struct mmc_queue *mq)
+{
+	if ((mq->cqe_busy & MMC_CQE_DCMD_BUSY) && !mmc_cqe_dcmd_busy(mq))
+		mq->cqe_busy &= ~MMC_CQE_DCMD_BUSY;
+
+	mq->cqe_busy &= ~MMC_CQE_QUEUE_FULL;
+}
+
+static inline bool mmc_cqe_can_dcmd(struct mmc_host *host)
+{
+	return host->caps2 & MMC_CAP2_CQE_DCMD;
+}
+
+enum mmc_issue_type mmc_cqe_issue_type(struct mmc_host *host,
+				       struct request *req)
+{
+	switch (req_op(req)) {
+	case REQ_OP_DRV_IN:
+	case REQ_OP_DRV_OUT:
+	case REQ_OP_DISCARD:
+	case REQ_OP_SECURE_ERASE:
+		return MMC_ISSUE_SYNC;
+	case REQ_OP_FLUSH:
+		return mmc_cqe_can_dcmd(host) ? MMC_ISSUE_DCMD : MMC_ISSUE_SYNC;
+	default:
+		return MMC_ISSUE_ASYNC;
+	}
+}
+
 enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req)
 {
+	struct mmc_host *host = mq->card->host;
+
+	if (mq->use_cqe)
+		return mmc_cqe_issue_type(host, req);
+
 	if (req_op(req) == REQ_OP_READ || req_op(req) == REQ_OP_WRITE)
 		return MMC_ISSUE_ASYNC;
 
 	return MMC_ISSUE_SYNC;
 }
 
+static void __mmc_cqe_recovery_notifier(struct mmc_queue *mq)
+{
+	if (!mq->recovery_needed) {
+		mq->recovery_needed = true;
+		schedule_work(&mq->recovery_work);
+	}
+}
+
+void mmc_cqe_recovery_notifier(struct mmc_request *mrq)
+{
+	struct mmc_queue_req *mqrq = container_of(mrq, struct mmc_queue_req,
+						  brq.mrq);
+	struct request *req = mmc_queue_req_to_req(mqrq);
+	struct request_queue *q = req->q;
+	struct mmc_queue *mq = q->queuedata;
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	__mmc_cqe_recovery_notifier(mq);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_request *mrq = &mqrq->brq.mrq;
+	struct mmc_queue *mq = req->q->queuedata;
+	struct mmc_host *host = mq->card->host;
+	enum mmc_issue_type issue_type = mmc_issue_type(mq, req);
+	bool recovery_needed = false;
+
+	switch (issue_type) {
+	case MMC_ISSUE_ASYNC:
+	case MMC_ISSUE_DCMD:
+		if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) {
+			if (recovery_needed)
+				__mmc_cqe_recovery_notifier(mq);
+			return BLK_EH_RESET_TIMER;
+		}
+		/* No timeout */
+		return BLK_EH_HANDLED;
+	default:
+		/* Timeout is handled by mmc core */
+		return BLK_EH_RESET_TIMER;
+	}
+}
+
 static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
 						 bool reserved)
 {
-	return BLK_EH_RESET_TIMER;
+	struct request_queue *q = req->q;
+	struct mmc_queue *mq = q->queuedata;
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+
+	if (mq->recovery_needed || !mq->use_cqe)
+		ret = BLK_EH_RESET_TIMER;
+	else
+		ret = mmc_cqe_timed_out(req);
+
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	return ret;
 }
 
 static void mmc_mq_recovery_handler(struct work_struct *work)
@@ -64,7 +165,10 @@ static void mmc_mq_recovery_handler(struct work_struct *work)
 
 	mq->in_recovery = true;
 
-	mmc_blk_mq_recovery(mq);
+	if (mq->use_cqe)
+		mmc_blk_cqe_recovery(mq);
+	else
+		mmc_blk_mq_recovery(mq);
 
 	mq->in_recovery = false;
 
@@ -242,9 +346,10 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct request_queue *q = req->q;
 	struct mmc_queue *mq = q->queuedata;
 	struct mmc_card *card = mq->card;
+	struct mmc_host *host = card->host;
 	enum mmc_issue_type issue_type;
 	enum mmc_issued issued;
-	bool get_card;
+	bool get_card, cqe_retune_ok;
 	int ret;
 
 	if (mmc_card_removed(mq->card)) {
@@ -262,6 +367,13 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 	}
 
 	switch (issue_type) {
+	case MMC_ISSUE_DCMD:
+		if (mmc_cqe_dcmd_busy(mq)) {
+			mq->cqe_busy |= MMC_CQE_DCMD_BUSY;
+			spin_unlock_irq(q->queue_lock);
+			return BLK_STS_RESOURCE;
+		}
+		break;
 	case MMC_ISSUE_ASYNC:
 		break;
 	default:
@@ -275,6 +387,7 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	mq->in_flight[issue_type] += 1;
 	get_card = mmc_tot_in_flight(mq) == 1;
+	cqe_retune_ok = mmc_cqe_qcnt(mq) == 1;
 
 	spin_unlock_irq(q->queue_lock);
 
@@ -286,6 +399,11 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (get_card)
 		mmc_get_card(card, &mq->ctx);
 
+	if (mq->use_cqe) {
+		host->retune_now = host->need_retune && cqe_retune_ok &&
+				   !host->hold_retune;
+	}
+
 	blk_mq_start_request(req);
 
 	issued = mmc_blk_mq_issue_rq(mq, req);
@@ -396,10 +514,14 @@ static int mmc_mq_init_queue(struct mmc_queue *mq, int q_depth,
 static int mmc_mq_init(struct mmc_queue *mq, struct mmc_card *card,
 			 spinlock_t *lock)
 {
+	struct mmc_host *host = card->host;
 	int q_depth;
 	int ret;
 
-	q_depth = MMC_QUEUE_DEPTH;
+	if (mq->use_cqe)
+		q_depth = min_t(int, card->ext_csd.cmdq_depth, host->cqe_qdepth);
+	else
+		q_depth = MMC_QUEUE_DEPTH;
 
 	ret = mmc_mq_init_queue(mq, q_depth, &mmc_mq_ops, lock);
 	if (ret)
@@ -429,7 +551,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 
 	mq->card = card;
 
-	if (mmc_host_use_blk_mq(host))
+	mq->use_cqe = host->cqe_enabled;
+
+	if (mq->use_cqe || mmc_host_use_blk_mq(host))
 		return mmc_mq_init(mq, card, lock);
 
 	mq->queue = blk_alloc_queue(GFP_KERNEL);
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index 00b4ce2ce83c..9bbfbb1fad7b 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -16,6 +16,7 @@ enum mmc_issued {
 
 enum mmc_issue_type {
 	MMC_ISSUE_SYNC,
+	MMC_ISSUE_DCMD,
 	MMC_ISSUE_ASYNC,
 	MMC_ISSUE_MAX,
 };
@@ -91,6 +92,10 @@ struct mmc_queue {
 	int			qcnt;
 
 	int			in_flight[MMC_ISSUE_MAX];
+	unsigned int		cqe_busy;
+#define MMC_CQE_DCMD_BUSY	BIT(0)
+#define MMC_CQE_QUEUE_FULL	BIT(1)
+	bool			use_cqe;
 	bool			recovery_needed;
 	bool			in_recovery;
 	bool			rw_wait;
@@ -111,11 +116,21 @@ extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *,
 extern unsigned int mmc_queue_map_sg(struct mmc_queue *,
 				     struct mmc_queue_req *);
 
+void mmc_cqe_check_busy(struct mmc_queue *mq);
+void mmc_cqe_recovery_notifier(struct mmc_request *mrq);
+
 enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req);
 
 static inline int mmc_tot_in_flight(struct mmc_queue *mq)
 {
 	return mq->in_flight[MMC_ISSUE_SYNC] +
+	       mq->in_flight[MMC_ISSUE_DCMD] +
+	       mq->in_flight[MMC_ISSUE_ASYNC];
+}
+
+static inline int mmc_cqe_qcnt(struct mmc_queue *mq)
+{
+	return mq->in_flight[MMC_ISSUE_DCMD] +
 	       mq->in_flight[MMC_ISSUE_ASYNC];
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V12 4/5] mmc: cqhci: support for command queue enabled host
  2017-10-24  8:40 [PATCH V12 0/5] mmc: Add Command Queue support Adrian Hunter
                   ` (2 preceding siblings ...)
  2017-10-24  8:40 ` [PATCH V12 3/5] mmc: block: Add CQE support Adrian Hunter
@ 2017-10-24  8:40 ` Adrian Hunter
  2017-10-24  8:40 ` [PATCH V12 5/5] mmc: sdhci-pci: Add CQHCI support for Intel GLK Adrian Hunter
  2017-10-26 13:32 ` [PATCH V12 0/5] mmc: Add Command Queue support Linus Walleij
  5 siblings, 0 replies; 12+ messages in thread
From: Adrian Hunter @ 2017-10-24  8:40 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

From: Venkat Gopalakrishnan <venkatg@codeaurora.org>

This patch adds CMDQ support for command-queue compatible
hosts.

Command queue is added in eMMC-5.1 specification. This
enables the controller to process upto 32 requests at
a time.

Adrian Hunter contributed renaming to cqhci, recovery, suspend
and resume, cqhci_off, cqhci_wait_for_idle, and external timeout
handling.

Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org>
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/host/Kconfig  |   13 +
 drivers/mmc/host/Makefile |    1 +
 drivers/mmc/host/cqhci.c  | 1150 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/mmc/host/cqhci.h  |  240 ++++++++++
 4 files changed, 1404 insertions(+)
 create mode 100644 drivers/mmc/host/cqhci.c
 create mode 100644 drivers/mmc/host/cqhci.h

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 567028c9219a..3092b7085cb5 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -857,6 +857,19 @@ config MMC_SUNXI
 	  This selects support for the SD/MMC Host Controller on
 	  Allwinner sunxi SoCs.
 
+config MMC_CQHCI
+	tristate "Command Queue Host Controller Interface support"
+	depends on HAS_DMA
+	help
+	  This selects the Command Queue Host Controller Interface (CQHCI)
+	  support present in host controllers of Qualcomm Technologies, Inc
+	  amongst others.
+	  This controller supports eMMC devices with command queue support.
+
+	  If you have a controller with this interface, say Y or M here.
+
+	  If unsure, say N.
+
 config MMC_TOSHIBA_PCI
 	tristate "Toshiba Type A SD/MMC Card Interface Driver"
 	depends on PCI
diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile
index ab61a3e39c0b..de140e3ef402 100644
--- a/drivers/mmc/host/Makefile
+++ b/drivers/mmc/host/Makefile
@@ -91,6 +91,7 @@ obj-$(CONFIG_MMC_SDHCI_ST)		+= sdhci-st.o
 obj-$(CONFIG_MMC_SDHCI_MICROCHIP_PIC32)	+= sdhci-pic32.o
 obj-$(CONFIG_MMC_SDHCI_BRCMSTB)		+= sdhci-brcmstb.o
 obj-$(CONFIG_MMC_SDHCI_OMAP)		+= sdhci-omap.o
+obj-$(CONFIG_MMC_CQHCI)			+= cqhci.o
 
 ifeq ($(CONFIG_CB710_DEBUG),y)
 	CFLAGS-cb710-mmc	+= -DDEBUG
diff --git a/drivers/mmc/host/cqhci.c b/drivers/mmc/host/cqhci.c
new file mode 100644
index 000000000000..159270e947cf
--- /dev/null
+++ b/drivers/mmc/host/cqhci.c
@@ -0,0 +1,1150 @@
+/* Copyright (c) 2015, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/delay.h>
+#include <linux/highmem.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/dma-mapping.h>
+#include <linux/slab.h>
+#include <linux/scatterlist.h>
+#include <linux/platform_device.h>
+#include <linux/ktime.h>
+
+#include <linux/mmc/mmc.h>
+#include <linux/mmc/host.h>
+#include <linux/mmc/card.h>
+
+#include "cqhci.h"
+
+#define DCMD_SLOT 31
+#define NUM_SLOTS 32
+
+struct cqhci_slot {
+	struct mmc_request *mrq;
+	unsigned int flags;
+#define CQHCI_EXTERNAL_TIMEOUT	BIT(0)
+#define CQHCI_COMPLETED		BIT(1)
+#define CQHCI_HOST_CRC		BIT(2)
+#define CQHCI_HOST_TIMEOUT	BIT(3)
+#define CQHCI_HOST_OTHER	BIT(4)
+};
+
+static inline u8 *get_desc(struct cqhci_host *cq_host, u8 tag)
+{
+	return cq_host->desc_base + (tag * cq_host->slot_sz);
+}
+
+static inline u8 *get_link_desc(struct cqhci_host *cq_host, u8 tag)
+{
+	u8 *desc = get_desc(cq_host, tag);
+
+	return desc + cq_host->task_desc_len;
+}
+
+static inline dma_addr_t get_trans_desc_dma(struct cqhci_host *cq_host, u8 tag)
+{
+	return cq_host->trans_desc_dma_base +
+		(cq_host->mmc->max_segs * tag *
+		 cq_host->trans_desc_len);
+}
+
+static inline u8 *get_trans_desc(struct cqhci_host *cq_host, u8 tag)
+{
+	return cq_host->trans_desc_base +
+		(cq_host->trans_desc_len * cq_host->mmc->max_segs * tag);
+}
+
+static void setup_trans_desc(struct cqhci_host *cq_host, u8 tag)
+{
+	u8 *link_temp;
+	dma_addr_t trans_temp;
+
+	link_temp = get_link_desc(cq_host, tag);
+	trans_temp = get_trans_desc_dma(cq_host, tag);
+
+	memset(link_temp, 0, cq_host->link_desc_len);
+	if (cq_host->link_desc_len > 8)
+		*(link_temp + 8) = 0;
+
+	if (tag == DCMD_SLOT && (cq_host->mmc->caps2 & MMC_CAP2_CQE_DCMD)) {
+		*link_temp = CQHCI_VALID(0) | CQHCI_ACT(0) | CQHCI_END(1);
+		return;
+	}
+
+	*link_temp = CQHCI_VALID(1) | CQHCI_ACT(0x6) | CQHCI_END(0);
+
+	if (cq_host->dma64) {
+		__le64 *data_addr = (__le64 __force *)(link_temp + 4);
+
+		data_addr[0] = cpu_to_le64(trans_temp);
+	} else {
+		__le32 *data_addr = (__le32 __force *)(link_temp + 4);
+
+		data_addr[0] = cpu_to_le32(trans_temp);
+	}
+}
+
+static void cqhci_set_irqs(struct cqhci_host *cq_host, u32 set)
+{
+	cqhci_writel(cq_host, set, CQHCI_ISTE);
+	cqhci_writel(cq_host, set, CQHCI_ISGE);
+}
+
+#define DRV_NAME "cqhci"
+
+#define CQHCI_DUMP(f, x...) \
+	pr_err("%s: " DRV_NAME ": " f, mmc_hostname(mmc), ## x)
+
+static void cqhci_dumpregs(struct cqhci_host *cq_host)
+{
+	struct mmc_host *mmc = cq_host->mmc;
+
+	CQHCI_DUMP("============ CQHCI REGISTER DUMP ===========\n");
+
+	CQHCI_DUMP("Caps:      0x%08x | Version:  0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_CAP),
+		   cqhci_readl(cq_host, CQHCI_VER));
+	CQHCI_DUMP("Config:    0x%08x | Control:  0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_CFG),
+		   cqhci_readl(cq_host, CQHCI_CTL));
+	CQHCI_DUMP("Int stat:  0x%08x | Int enab: 0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_IS),
+		   cqhci_readl(cq_host, CQHCI_ISTE));
+	CQHCI_DUMP("Int sig:   0x%08x | Int Coal: 0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_ISGE),
+		   cqhci_readl(cq_host, CQHCI_IC));
+	CQHCI_DUMP("TDL base:  0x%08x | TDL up32: 0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_TDLBA),
+		   cqhci_readl(cq_host, CQHCI_TDLBAU));
+	CQHCI_DUMP("Doorbell:  0x%08x | TCN:      0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_TDBR),
+		   cqhci_readl(cq_host, CQHCI_TCN));
+	CQHCI_DUMP("Dev queue: 0x%08x | Dev Pend: 0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_DQS),
+		   cqhci_readl(cq_host, CQHCI_DPT));
+	CQHCI_DUMP("Task clr:  0x%08x | SSC1:     0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_TCLR),
+		   cqhci_readl(cq_host, CQHCI_SSC1));
+	CQHCI_DUMP("SSC2:      0x%08x | DCMD rsp: 0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_SSC2),
+		   cqhci_readl(cq_host, CQHCI_CRDCT));
+	CQHCI_DUMP("RED mask:  0x%08x | TERRI:    0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_RMEM),
+		   cqhci_readl(cq_host, CQHCI_TERRI));
+	CQHCI_DUMP("Resp idx:  0x%08x | Resp arg: 0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_CRI),
+		   cqhci_readl(cq_host, CQHCI_CRA));
+
+	if (cq_host->ops->dumpregs)
+		cq_host->ops->dumpregs(mmc);
+	else
+		CQHCI_DUMP(": ===========================================\n");
+}
+
+/**
+ * The allocated descriptor table for task, link & transfer descritors
+ * looks like:
+ * |----------|
+ * |task desc |  |->|----------|
+ * |----------|  |  |trans desc|
+ * |link desc-|->|  |----------|
+ * |----------|          .
+ *      .                .
+ *  no. of slots      max-segs
+ *      .           |----------|
+ * |----------|
+ * The idea here is to create the [task+trans] table and mark & point the
+ * link desc to the transfer desc table on a per slot basis.
+ */
+static int cqhci_host_alloc_tdl(struct cqhci_host *cq_host)
+{
+	int i = 0;
+
+	/* task descriptor can be 64/128 bit irrespective of arch */
+	if (cq_host->caps & CQHCI_TASK_DESC_SZ_128) {
+		cqhci_writel(cq_host, cqhci_readl(cq_host, CQHCI_CFG) |
+			       CQHCI_TASK_DESC_SZ, CQHCI_CFG);
+		cq_host->task_desc_len = 16;
+	} else {
+		cq_host->task_desc_len = 8;
+	}
+
+	/*
+	 * 96 bits length of transfer desc instead of 128 bits which means
+	 * ADMA would expect next valid descriptor at the 96th bit
+	 * or 128th bit
+	 */
+	if (cq_host->dma64) {
+		if (cq_host->quirks & CQHCI_QUIRK_SHORT_TXFR_DESC_SZ)
+			cq_host->trans_desc_len = 12;
+		else
+			cq_host->trans_desc_len = 16;
+		cq_host->link_desc_len = 16;
+	} else {
+		cq_host->trans_desc_len = 8;
+		cq_host->link_desc_len = 8;
+	}
+
+	/* total size of a slot: 1 task & 1 transfer (link) */
+	cq_host->slot_sz = cq_host->task_desc_len + cq_host->link_desc_len;
+
+	cq_host->desc_size = cq_host->slot_sz * cq_host->num_slots;
+
+	cq_host->data_size = cq_host->trans_desc_len * cq_host->mmc->max_segs *
+		(cq_host->num_slots - 1);
+
+	pr_debug("%s: cqhci: desc_size: %zu data_sz: %zu slot-sz: %d\n",
+		 mmc_hostname(cq_host->mmc), cq_host->desc_size, cq_host->data_size,
+		 cq_host->slot_sz);
+
+	/*
+	 * allocate a dma-mapped chunk of memory for the descriptors
+	 * allocate a dma-mapped chunk of memory for link descriptors
+	 * setup each link-desc memory offset per slot-number to
+	 * the descriptor table.
+	 */
+	cq_host->desc_base = dmam_alloc_coherent(mmc_dev(cq_host->mmc),
+						 cq_host->desc_size,
+						 &cq_host->desc_dma_base,
+						 GFP_KERNEL);
+	cq_host->trans_desc_base = dmam_alloc_coherent(mmc_dev(cq_host->mmc),
+					      cq_host->data_size,
+					      &cq_host->trans_desc_dma_base,
+					      GFP_KERNEL);
+	if (!cq_host->desc_base || !cq_host->trans_desc_base)
+		return -ENOMEM;
+
+	pr_debug("%s: cqhci: desc-base: 0x%p trans-base: 0x%p\n desc_dma 0x%llx trans_dma: 0x%llx\n",
+		 mmc_hostname(cq_host->mmc), cq_host->desc_base, cq_host->trans_desc_base,
+		(unsigned long long)cq_host->desc_dma_base,
+		(unsigned long long)cq_host->trans_desc_dma_base);
+
+	for (; i < (cq_host->num_slots); i++)
+		setup_trans_desc(cq_host, i);
+
+	return 0;
+}
+
+static void __cqhci_enable(struct cqhci_host *cq_host)
+{
+	struct mmc_host *mmc = cq_host->mmc;
+	u32 cqcfg;
+
+	cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
+
+	/* Configuration must not be changed while enabled */
+	if (cqcfg & CQHCI_ENABLE) {
+		cqcfg &= ~CQHCI_ENABLE;
+		cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+	}
+
+	cqcfg &= ~(CQHCI_DCMD | CQHCI_TASK_DESC_SZ);
+
+	if (mmc->caps2 & MMC_CAP2_CQE_DCMD)
+		cqcfg |= CQHCI_DCMD;
+
+	if (cq_host->caps & CQHCI_TASK_DESC_SZ_128)
+		cqcfg |= CQHCI_TASK_DESC_SZ;
+
+	cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+
+	cqhci_writel(cq_host, lower_32_bits(cq_host->desc_dma_base),
+		     CQHCI_TDLBA);
+	cqhci_writel(cq_host, upper_32_bits(cq_host->desc_dma_base),
+		     CQHCI_TDLBAU);
+
+	cqhci_writel(cq_host, cq_host->rca, CQHCI_SSC2);
+
+	cqhci_set_irqs(cq_host, 0);
+
+	cqcfg |= CQHCI_ENABLE;
+
+	cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+
+	mmc->cqe_on = true;
+
+	if (cq_host->ops->enable)
+		cq_host->ops->enable(mmc);
+
+	/* Ensure all writes are done before interrupts are enabled */
+	wmb();
+
+	cqhci_set_irqs(cq_host, CQHCI_IS_MASK);
+
+	cq_host->activated = true;
+}
+
+static void __cqhci_disable(struct cqhci_host *cq_host)
+{
+	u32 cqcfg;
+
+	cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
+	cqcfg &= ~CQHCI_ENABLE;
+	cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+
+	cq_host->mmc->cqe_on = false;
+
+	cq_host->activated = false;
+}
+
+int cqhci_suspend(struct mmc_host *mmc)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+
+	if (cq_host->enabled)
+		__cqhci_disable(cq_host);
+
+	return 0;
+}
+EXPORT_SYMBOL(cqhci_suspend);
+
+int cqhci_resume(struct mmc_host *mmc)
+{
+	/* Re-enable is done upon first request */
+	return 0;
+}
+EXPORT_SYMBOL(cqhci_resume);
+
+static int cqhci_enable(struct mmc_host *mmc, struct mmc_card *card)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	int err;
+
+	if (cq_host->enabled)
+		return 0;
+
+	cq_host->rca = card->rca;
+
+	err = cqhci_host_alloc_tdl(cq_host);
+	if (err)
+		return err;
+
+	__cqhci_enable(cq_host);
+
+	cq_host->enabled = true;
+
+#ifdef DEBUG
+	cqhci_dumpregs(cq_host);
+#endif
+	return 0;
+}
+
+/* CQHCI is idle and should halt immediately, so set a small timeout */
+#define CQHCI_OFF_TIMEOUT 100
+
+static void cqhci_off(struct mmc_host *mmc)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	ktime_t timeout;
+	bool timed_out;
+	u32 reg;
+
+	if (!cq_host->enabled || !mmc->cqe_on || cq_host->recovery_halt)
+		return;
+
+	if (cq_host->ops->disable)
+		cq_host->ops->disable(mmc, false);
+
+	cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL);
+
+	timeout = ktime_add_us(ktime_get(), CQHCI_OFF_TIMEOUT);
+	while (1) {
+		timed_out = ktime_compare(ktime_get(), timeout) > 0;
+		reg = cqhci_readl(cq_host, CQHCI_CTL);
+		if ((reg & CQHCI_HALT) || timed_out)
+			break;
+	}
+
+	if (timed_out)
+		pr_err("%s: cqhci: CQE stuck on\n", mmc_hostname(mmc));
+	else
+		pr_debug("%s: cqhci: CQE off\n", mmc_hostname(mmc));
+
+	mmc->cqe_on = false;
+}
+
+static void cqhci_disable(struct mmc_host *mmc)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+
+	if (!cq_host->enabled)
+		return;
+
+	cqhci_off(mmc);
+
+	__cqhci_disable(cq_host);
+
+	dmam_free_coherent(mmc_dev(mmc), cq_host->data_size,
+			   cq_host->trans_desc_base,
+			   cq_host->trans_desc_dma_base);
+
+	dmam_free_coherent(mmc_dev(mmc), cq_host->desc_size,
+			   cq_host->desc_base,
+			   cq_host->desc_dma_base);
+
+	cq_host->trans_desc_base = NULL;
+	cq_host->desc_base = NULL;
+
+	cq_host->enabled = false;
+}
+
+static void cqhci_prep_task_desc(struct mmc_request *mrq,
+					u64 *data, bool intr)
+{
+	u32 req_flags = mrq->data->flags;
+
+	*data = CQHCI_VALID(1) |
+		CQHCI_END(1) |
+		CQHCI_INT(intr) |
+		CQHCI_ACT(0x5) |
+		CQHCI_FORCED_PROG(!!(req_flags & MMC_DATA_FORCED_PRG)) |
+		CQHCI_DATA_TAG(!!(req_flags & MMC_DATA_DAT_TAG)) |
+		CQHCI_DATA_DIR(!!(req_flags & MMC_DATA_READ)) |
+		CQHCI_PRIORITY(!!(req_flags & MMC_DATA_PRIO)) |
+		CQHCI_QBAR(!!(req_flags & MMC_DATA_QBR)) |
+		CQHCI_REL_WRITE(!!(req_flags & MMC_DATA_REL_WR)) |
+		CQHCI_BLK_COUNT(mrq->data->blocks) |
+		CQHCI_BLK_ADDR((u64)mrq->data->blk_addr);
+
+	pr_debug("%s: cqhci: tag %d task descriptor 0x016%llx\n",
+		 mmc_hostname(mrq->host), mrq->tag, (unsigned long long)*data);
+}
+
+static int cqhci_dma_map(struct mmc_host *host, struct mmc_request *mrq)
+{
+	int sg_count;
+	struct mmc_data *data = mrq->data;
+
+	if (!data)
+		return -EINVAL;
+
+	sg_count = dma_map_sg(mmc_dev(host), data->sg,
+			      data->sg_len,
+			      (data->flags & MMC_DATA_WRITE) ?
+			      DMA_TO_DEVICE : DMA_FROM_DEVICE);
+	if (!sg_count) {
+		pr_err("%s: sg-len: %d\n", __func__, data->sg_len);
+		return -ENOMEM;
+	}
+
+	return sg_count;
+}
+
+static void cqhci_set_tran_desc(u8 *desc, dma_addr_t addr, int len, bool end,
+				bool dma64)
+{
+	__le32 *attr = (__le32 __force *)desc;
+
+	*attr = (CQHCI_VALID(1) |
+		 CQHCI_END(end ? 1 : 0) |
+		 CQHCI_INT(0) |
+		 CQHCI_ACT(0x4) |
+		 CQHCI_DAT_LENGTH(len));
+
+	if (dma64) {
+		__le64 *dataddr = (__le64 __force *)(desc + 4);
+
+		dataddr[0] = cpu_to_le64(addr);
+	} else {
+		__le32 *dataddr = (__le32 __force *)(desc + 4);
+
+		dataddr[0] = cpu_to_le32(addr);
+	}
+}
+
+static int cqhci_prep_tran_desc(struct mmc_request *mrq,
+			       struct cqhci_host *cq_host, int tag)
+{
+	struct mmc_data *data = mrq->data;
+	int i, sg_count, len;
+	bool end = false;
+	bool dma64 = cq_host->dma64;
+	dma_addr_t addr;
+	u8 *desc;
+	struct scatterlist *sg;
+
+	sg_count = cqhci_dma_map(mrq->host, mrq);
+	if (sg_count < 0) {
+		pr_err("%s: %s: unable to map sg lists, %d\n",
+				mmc_hostname(mrq->host), __func__, sg_count);
+		return sg_count;
+	}
+
+	desc = get_trans_desc(cq_host, tag);
+
+	for_each_sg(data->sg, sg, sg_count, i) {
+		addr = sg_dma_address(sg);
+		len = sg_dma_len(sg);
+
+		if ((i+1) == sg_count)
+			end = true;
+		cqhci_set_tran_desc(desc, addr, len, end, dma64);
+		desc += cq_host->trans_desc_len;
+	}
+
+	return 0;
+}
+
+static void cqhci_prep_dcmd_desc(struct mmc_host *mmc,
+				   struct mmc_request *mrq)
+{
+	u64 *task_desc = NULL;
+	u64 data = 0;
+	u8 resp_type;
+	u8 *desc;
+	__le64 *dataddr;
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	u8 timing;
+
+	if (!(mrq->cmd->flags & MMC_RSP_PRESENT)) {
+		resp_type = 0x0;
+		timing = 0x1;
+	} else {
+		if (mrq->cmd->flags & MMC_RSP_R1B) {
+			resp_type = 0x3;
+			timing = 0x0;
+		} else {
+			resp_type = 0x2;
+			timing = 0x1;
+		}
+	}
+
+	task_desc = (__le64 __force *)get_desc(cq_host, cq_host->dcmd_slot);
+	memset(task_desc, 0, cq_host->task_desc_len);
+	data |= (CQHCI_VALID(1) |
+		 CQHCI_END(1) |
+		 CQHCI_INT(1) |
+		 CQHCI_QBAR(1) |
+		 CQHCI_ACT(0x5) |
+		 CQHCI_CMD_INDEX(mrq->cmd->opcode) |
+		 CQHCI_CMD_TIMING(timing) | CQHCI_RESP_TYPE(resp_type));
+	*task_desc |= data;
+	desc = (u8 *)task_desc;
+	pr_debug("%s: cqhci: dcmd: cmd: %d timing: %d resp: %d\n",
+		 mmc_hostname(mmc), mrq->cmd->opcode, timing, resp_type);
+	dataddr = (__le64 __force *)(desc + 4);
+	dataddr[0] = cpu_to_le64((u64)mrq->cmd->arg);
+
+}
+
+static void cqhci_post_req(struct mmc_host *host, struct mmc_request *mrq)
+{
+	struct mmc_data *data = mrq->data;
+
+	if (data) {
+		dma_unmap_sg(mmc_dev(host), data->sg, data->sg_len,
+			     (data->flags & MMC_DATA_READ) ?
+			     DMA_FROM_DEVICE : DMA_TO_DEVICE);
+	}
+}
+
+static inline int cqhci_tag(struct mmc_request *mrq)
+{
+	return mrq->cmd ? DCMD_SLOT : mrq->tag;
+}
+
+static int cqhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
+{
+	int err = 0;
+	u64 data = 0;
+	u64 *task_desc = NULL;
+	int tag = cqhci_tag(mrq);
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	unsigned long flags;
+
+	if (!cq_host->enabled) {
+		pr_err("%s: cqhci: not enabled\n", mmc_hostname(mmc));
+		return -EINVAL;
+	}
+
+	/* First request after resume has to re-enable */
+	if (!cq_host->activated)
+		__cqhci_enable(cq_host);
+
+	if (!mmc->cqe_on) {
+		cqhci_writel(cq_host, 0, CQHCI_CTL);
+		mmc->cqe_on = true;
+		pr_debug("%s: cqhci: CQE on\n", mmc_hostname(mmc));
+		if (cqhci_readl(cq_host, CQHCI_CTL) && CQHCI_HALT) {
+			pr_err("%s: cqhci: CQE failed to exit halt state\n",
+			       mmc_hostname(mmc));
+		}
+		if (cq_host->ops->enable)
+			cq_host->ops->enable(mmc);
+	}
+
+	if (mrq->data) {
+		task_desc = (__le64 __force *)get_desc(cq_host, tag);
+		cqhci_prep_task_desc(mrq, &data, 1);
+		*task_desc = cpu_to_le64(data);
+		err = cqhci_prep_tran_desc(mrq, cq_host, tag);
+		if (err) {
+			pr_err("%s: cqhci: failed to setup tx desc: %d\n",
+			       mmc_hostname(mmc), err);
+			return err;
+		}
+	} else {
+		cqhci_prep_dcmd_desc(mmc, mrq);
+	}
+
+	spin_lock_irqsave(&cq_host->lock, flags);
+
+	if (cq_host->recovery_halt) {
+		err = -EBUSY;
+		goto out_unlock;
+	}
+
+	cq_host->slot[tag].mrq = mrq;
+	cq_host->slot[tag].flags = 0;
+
+	cq_host->qcnt += 1;
+
+	cqhci_writel(cq_host, 1 << tag, CQHCI_TDBR);
+	if (!(cqhci_readl(cq_host, CQHCI_TDBR) & (1 << tag)))
+		pr_debug("%s: cqhci: doorbell not set for tag %d\n",
+			 mmc_hostname(mmc), tag);
+out_unlock:
+	spin_unlock_irqrestore(&cq_host->lock, flags);
+
+	if (err)
+		cqhci_post_req(mmc, mrq);
+
+	return err;
+}
+
+static void cqhci_recovery_needed(struct mmc_host *mmc, struct mmc_request *mrq,
+				  bool notify)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+
+	if (!cq_host->recovery_halt) {
+		cq_host->recovery_halt = true;
+		pr_debug("%s: cqhci: recovery needed\n", mmc_hostname(mmc));
+		wake_up(&cq_host->wait_queue);
+		if (notify && mrq->recovery_notifier)
+			mrq->recovery_notifier(mrq);
+	}
+}
+
+static unsigned int cqhci_error_flags(int error1, int error2)
+{
+	int error = error1 ? error1 : error2;
+
+	switch (error) {
+	case -EILSEQ:
+		return CQHCI_HOST_CRC;
+	case -ETIMEDOUT:
+		return CQHCI_HOST_TIMEOUT;
+	default:
+		return CQHCI_HOST_OTHER;
+	}
+}
+
+static void cqhci_error_irq(struct mmc_host *mmc, u32 status, int cmd_error,
+			    int data_error)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	struct cqhci_slot *slot;
+	u32 terri;
+	int tag;
+
+	spin_lock(&cq_host->lock);
+
+	terri = cqhci_readl(cq_host, CQHCI_TERRI);
+
+	pr_debug("%s: cqhci: error IRQ status: 0x%08x cmd error %d data error %d TERRI: 0x%08x\n",
+		 mmc_hostname(mmc), status, cmd_error, data_error, terri);
+
+	/* Forget about errors when recovery has already been triggered */
+	if (cq_host->recovery_halt)
+		goto out_unlock;
+
+	if (!cq_host->qcnt) {
+		WARN_ONCE(1, "%s: cqhci: error when idle. IRQ status: 0x%08x cmd error %d data error %d TERRI: 0x%08x\n",
+			  mmc_hostname(mmc), status, cmd_error, data_error,
+			  terri);
+		goto out_unlock;
+	}
+
+	if (CQHCI_TERRI_C_VALID(terri)) {
+		tag = CQHCI_TERRI_C_TASK(terri);
+		slot = &cq_host->slot[tag];
+		if (slot->mrq) {
+			slot->flags = cqhci_error_flags(cmd_error, data_error);
+			cqhci_recovery_needed(mmc, slot->mrq, true);
+		}
+	}
+
+	if (CQHCI_TERRI_D_VALID(terri)) {
+		tag = CQHCI_TERRI_D_TASK(terri);
+		slot = &cq_host->slot[tag];
+		if (slot->mrq) {
+			slot->flags = cqhci_error_flags(data_error, cmd_error);
+			cqhci_recovery_needed(mmc, slot->mrq, true);
+		}
+	}
+
+	if (!cq_host->recovery_halt) {
+		/*
+		 * The only way to guarantee forward progress is to mark at
+		 * least one task in error, so if none is indicated, pick one.
+		 */
+		for (tag = 0; tag < NUM_SLOTS; tag++) {
+			slot = &cq_host->slot[tag];
+			if (!slot->mrq)
+				continue;
+			slot->flags = cqhci_error_flags(data_error, cmd_error);
+			cqhci_recovery_needed(mmc, slot->mrq, true);
+			break;
+		}
+	}
+
+out_unlock:
+	spin_unlock(&cq_host->lock);
+}
+
+static void cqhci_finish_mrq(struct mmc_host *mmc, unsigned int tag)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	struct cqhci_slot *slot = &cq_host->slot[tag];
+	struct mmc_request *mrq = slot->mrq;
+	struct mmc_data *data;
+
+	if (!mrq) {
+		WARN_ONCE(1, "%s: cqhci: spurious TCN for tag %d\n",
+			  mmc_hostname(mmc), tag);
+		return;
+	}
+
+	/* No completions allowed during recovery */
+	if (cq_host->recovery_halt) {
+		slot->flags |= CQHCI_COMPLETED;
+		return;
+	}
+
+	slot->mrq = NULL;
+
+	cq_host->qcnt -= 1;
+
+	data = mrq->data;
+	if (data) {
+		if (data->error)
+			data->bytes_xfered = 0;
+		else
+			data->bytes_xfered = data->blksz * data->blocks;
+	}
+
+	mmc_cqe_request_done(mmc, mrq);
+}
+
+irqreturn_t cqhci_irq(struct mmc_host *mmc, u32 intmask, int cmd_error,
+		      int data_error)
+{
+	u32 status;
+	unsigned long tag = 0, comp_status;
+	struct cqhci_host *cq_host = mmc->cqe_private;
+
+	status = cqhci_readl(cq_host, CQHCI_IS);
+	cqhci_writel(cq_host, status, CQHCI_IS);
+
+	pr_debug("%s: cqhci: IRQ status: 0x%08x\n", mmc_hostname(mmc), status);
+
+	if ((status & CQHCI_IS_RED) || cmd_error || data_error)
+		cqhci_error_irq(mmc, status, cmd_error, data_error);
+
+	if (status & CQHCI_IS_TCC) {
+		/* read TCN and complete the request */
+		comp_status = cqhci_readl(cq_host, CQHCI_TCN);
+		cqhci_writel(cq_host, comp_status, CQHCI_TCN);
+		pr_debug("%s: cqhci: TCN: 0x%08lx\n",
+			 mmc_hostname(mmc), comp_status);
+
+		spin_lock(&cq_host->lock);
+
+		for_each_set_bit(tag, &comp_status, cq_host->num_slots) {
+			/* complete the corresponding mrq */
+			pr_debug("%s: cqhci: completing tag %lu\n",
+				 mmc_hostname(mmc), tag);
+			cqhci_finish_mrq(mmc, tag);
+		}
+
+		if (cq_host->waiting_for_idle && !cq_host->qcnt) {
+			cq_host->waiting_for_idle = false;
+			wake_up(&cq_host->wait_queue);
+		}
+
+		spin_unlock(&cq_host->lock);
+	}
+
+	if (status & CQHCI_IS_TCL)
+		wake_up(&cq_host->wait_queue);
+
+	if (status & CQHCI_IS_HAC)
+		wake_up(&cq_host->wait_queue);
+
+	return IRQ_HANDLED;
+}
+EXPORT_SYMBOL(cqhci_irq);
+
+static bool cqhci_is_idle(struct cqhci_host *cq_host, int *ret)
+{
+	unsigned long flags;
+	bool is_idle;
+
+	spin_lock_irqsave(&cq_host->lock, flags);
+	is_idle = !cq_host->qcnt || cq_host->recovery_halt;
+	*ret = cq_host->recovery_halt ? -EBUSY : 0;
+	cq_host->waiting_for_idle = !is_idle;
+	spin_unlock_irqrestore(&cq_host->lock, flags);
+
+	return is_idle;
+}
+
+static int cqhci_wait_for_idle(struct mmc_host *mmc)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	int ret;
+
+	wait_event(cq_host->wait_queue, cqhci_is_idle(cq_host, &ret));
+
+	return ret;
+}
+
+static bool cqhci_timeout(struct mmc_host *mmc, struct mmc_request *mrq,
+			  bool *recovery_needed)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	int tag = cqhci_tag(mrq);
+	struct cqhci_slot *slot = &cq_host->slot[tag];
+	unsigned long flags;
+	bool timed_out;
+
+	spin_lock_irqsave(&cq_host->lock, flags);
+	timed_out = slot->mrq == mrq;
+	if (timed_out) {
+		slot->flags |= CQHCI_EXTERNAL_TIMEOUT;
+		cqhci_recovery_needed(mmc, mrq, false);
+		*recovery_needed = cq_host->recovery_halt;
+	}
+	spin_unlock_irqrestore(&cq_host->lock, flags);
+
+	if (timed_out) {
+		pr_err("%s: cqhci: timeout for tag %d\n",
+		       mmc_hostname(mmc), tag);
+		cqhci_dumpregs(cq_host);
+	}
+
+	return timed_out;
+}
+
+static bool cqhci_tasks_cleared(struct cqhci_host *cq_host)
+{
+	return !(cqhci_readl(cq_host, CQHCI_CTL) & CQHCI_CLEAR_ALL_TASKS);
+}
+
+static bool cqhci_clear_all_tasks(struct mmc_host *mmc, unsigned int timeout)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	bool ret;
+	u32 ctl;
+
+	cqhci_set_irqs(cq_host, CQHCI_IS_TCL);
+
+	ctl = cqhci_readl(cq_host, CQHCI_CTL);
+	ctl |= CQHCI_CLEAR_ALL_TASKS;
+	cqhci_writel(cq_host, ctl, CQHCI_CTL);
+
+	wait_event_timeout(cq_host->wait_queue, cqhci_tasks_cleared(cq_host),
+			   msecs_to_jiffies(timeout) + 1);
+
+	cqhci_set_irqs(cq_host, 0);
+
+	ret = cqhci_tasks_cleared(cq_host);
+
+	if (!ret)
+		pr_debug("%s: cqhci: Failed to clear tasks\n",
+			 mmc_hostname(mmc));
+
+	return ret;
+}
+
+static bool cqhci_halted(struct cqhci_host *cq_host)
+{
+	return cqhci_readl(cq_host, CQHCI_CTL) & CQHCI_HALT;
+}
+
+static bool cqhci_halt(struct mmc_host *mmc, unsigned int timeout)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	bool ret;
+	u32 ctl;
+
+	if (cqhci_halted(cq_host))
+		return true;
+
+	cqhci_set_irqs(cq_host, CQHCI_IS_HAC);
+
+	ctl = cqhci_readl(cq_host, CQHCI_CTL);
+	ctl |= CQHCI_HALT;
+	cqhci_writel(cq_host, ctl, CQHCI_CTL);
+
+	wait_event_timeout(cq_host->wait_queue, cqhci_halted(cq_host),
+			   msecs_to_jiffies(timeout) + 1);
+
+	cqhci_set_irqs(cq_host, 0);
+
+	ret = cqhci_halted(cq_host);
+
+	if (!ret)
+		pr_debug("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));
+
+	return ret;
+}
+
+/*
+ * After halting we expect to be able to use the command line. We interpret the
+ * failure to halt to mean the data lines might still be in use (and the upper
+ * layers will need to send a STOP command), so we set the timeout based on a
+ * generous command timeout.
+ */
+#define CQHCI_START_HALT_TIMEOUT	5
+
+static void cqhci_recovery_start(struct mmc_host *mmc)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+
+	pr_debug("%s: cqhci: %s\n", mmc_hostname(mmc), __func__);
+
+	WARN_ON(!cq_host->recovery_halt);
+
+	cqhci_halt(mmc, CQHCI_START_HALT_TIMEOUT);
+
+	if (cq_host->ops->disable)
+		cq_host->ops->disable(mmc, true);
+
+	mmc->cqe_on = false;
+}
+
+static int cqhci_error_from_flags(unsigned int flags)
+{
+	if (!flags)
+		return 0;
+
+	/* CRC errors might indicate re-tuning so prefer to report that */
+	if (flags & CQHCI_HOST_CRC)
+		return -EILSEQ;
+
+	if (flags & (CQHCI_EXTERNAL_TIMEOUT | CQHCI_HOST_TIMEOUT))
+		return -ETIMEDOUT;
+
+	return -EIO;
+}
+
+static void cqhci_recover_mrq(struct cqhci_host *cq_host, unsigned int tag)
+{
+	struct cqhci_slot *slot = &cq_host->slot[tag];
+	struct mmc_request *mrq = slot->mrq;
+	struct mmc_data *data;
+
+	if (!mrq)
+		return;
+
+	slot->mrq = NULL;
+
+	cq_host->qcnt -= 1;
+
+	data = mrq->data;
+	if (data) {
+		data->bytes_xfered = 0;
+		data->error = cqhci_error_from_flags(slot->flags);
+	} else {
+		mrq->cmd->error = cqhci_error_from_flags(slot->flags);
+	}
+
+	mmc_cqe_request_done(cq_host->mmc, mrq);
+}
+
+static void cqhci_recover_mrqs(struct cqhci_host *cq_host)
+{
+	int i;
+
+	for (i = 0; i < cq_host->num_slots; i++)
+		cqhci_recover_mrq(cq_host, i);
+}
+
+/*
+ * By now the command and data lines should be unused so there is no reason for
+ * CQHCI to take a long time to halt, but if it doesn't halt there could be
+ * problems clearing tasks, so be generous.
+ */
+#define CQHCI_FINISH_HALT_TIMEOUT	20
+
+/* CQHCI could be expected to clear it's internal state pretty quickly */
+#define CQHCI_CLEAR_TIMEOUT		20
+
+static void cqhci_recovery_finish(struct mmc_host *mmc)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	unsigned long flags;
+	u32 cqcfg;
+	bool ok;
+
+	pr_debug("%s: cqhci: %s\n", mmc_hostname(mmc), __func__);
+
+	WARN_ON(!cq_host->recovery_halt);
+
+	ok = cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT);
+
+	if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT))
+		ok = false;
+
+	/*
+	 * The specification contradicts itself, by saying that tasks cannot be
+	 * cleared if CQHCI does not halt, but if CQHCI does not halt, it should
+	 * be disabled/re-enabled, but not to disable before clearing tasks.
+	 * Have a go anyway.
+	 */
+	if (!ok) {
+		pr_debug("%s: cqhci: disable / re-enable\n", mmc_hostname(mmc));
+		cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
+		cqcfg &= ~CQHCI_ENABLE;
+		cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+		cqcfg |= CQHCI_ENABLE;
+		cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+		/* Be sure that there are no tasks */
+		ok = cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT);
+		if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT))
+			ok = false;
+		WARN_ON(!ok);
+	}
+
+	cqhci_recover_mrqs(cq_host);
+
+	WARN_ON(cq_host->qcnt);
+
+	spin_lock_irqsave(&cq_host->lock, flags);
+	cq_host->qcnt = 0;
+	cq_host->recovery_halt = false;
+	mmc->cqe_on = false;
+	spin_unlock_irqrestore(&cq_host->lock, flags);
+
+	/* Ensure all writes are done before interrupts are re-enabled */
+	wmb();
+
+	cqhci_writel(cq_host, CQHCI_IS_HAC | CQHCI_IS_TCL, CQHCI_IS);
+
+	cqhci_set_irqs(cq_host, CQHCI_IS_MASK);
+
+	pr_debug("%s: cqhci: recovery done\n", mmc_hostname(mmc));
+}
+
+static const struct mmc_cqe_ops cqhci_cqe_ops = {
+	.cqe_enable = cqhci_enable,
+	.cqe_disable = cqhci_disable,
+	.cqe_request = cqhci_request,
+	.cqe_post_req = cqhci_post_req,
+	.cqe_off = cqhci_off,
+	.cqe_wait_for_idle = cqhci_wait_for_idle,
+	.cqe_timeout = cqhci_timeout,
+	.cqe_recovery_start = cqhci_recovery_start,
+	.cqe_recovery_finish = cqhci_recovery_finish,
+};
+
+struct cqhci_host *cqhci_pltfm_init(struct platform_device *pdev)
+{
+	struct cqhci_host *cq_host;
+	struct resource *cqhci_memres = NULL;
+
+	/* check and setup CMDQ interface */
+	cqhci_memres = platform_get_resource_byname(pdev, IORESOURCE_MEM,
+						   "cqhci_mem");
+	if (!cqhci_memres) {
+		dev_dbg(&pdev->dev, "CMDQ not supported\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	cq_host = devm_kzalloc(&pdev->dev, sizeof(*cq_host), GFP_KERNEL);
+	if (!cq_host)
+		return ERR_PTR(-ENOMEM);
+	cq_host->mmio = devm_ioremap(&pdev->dev,
+				     cqhci_memres->start,
+				     resource_size(cqhci_memres));
+	if (!cq_host->mmio) {
+		dev_err(&pdev->dev, "failed to remap cqhci regs\n");
+		return ERR_PTR(-EBUSY);
+	}
+	dev_dbg(&pdev->dev, "CMDQ ioremap: done\n");
+
+	return cq_host;
+}
+EXPORT_SYMBOL(cqhci_pltfm_init);
+
+static unsigned int cqhci_ver_major(struct cqhci_host *cq_host)
+{
+	return CQHCI_VER_MAJOR(cqhci_readl(cq_host, CQHCI_VER));
+}
+
+static unsigned int cqhci_ver_minor(struct cqhci_host *cq_host)
+{
+	u32 ver = cqhci_readl(cq_host, CQHCI_VER);
+
+	return CQHCI_VER_MINOR1(ver) * 10 + CQHCI_VER_MINOR2(ver);
+}
+
+int cqhci_init(struct cqhci_host *cq_host, struct mmc_host *mmc,
+	      bool dma64)
+{
+	int err;
+
+	cq_host->dma64 = dma64;
+	cq_host->mmc = mmc;
+	cq_host->mmc->cqe_private = cq_host;
+
+	cq_host->num_slots = NUM_SLOTS;
+	cq_host->dcmd_slot = DCMD_SLOT;
+
+	mmc->cqe_ops = &cqhci_cqe_ops;
+
+	mmc->cqe_qdepth = NUM_SLOTS;
+	if (mmc->caps2 & MMC_CAP2_CQE_DCMD)
+		mmc->cqe_qdepth -= 1;
+
+	cq_host->slot = devm_kcalloc(mmc_dev(mmc), cq_host->num_slots,
+				     sizeof(*cq_host->slot), GFP_KERNEL);
+	if (!cq_host->slot) {
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	spin_lock_init(&cq_host->lock);
+
+	init_completion(&cq_host->halt_comp);
+	init_waitqueue_head(&cq_host->wait_queue);
+
+	pr_info("%s: CQHCI version %u.%02u\n",
+		mmc_hostname(mmc), cqhci_ver_major(cq_host),
+		cqhci_ver_minor(cq_host));
+
+	return 0;
+
+out_err:
+	pr_err("%s: CQHCI version %u.%02u failed to initialize, error %d\n",
+	       mmc_hostname(mmc), cqhci_ver_major(cq_host),
+	       cqhci_ver_minor(cq_host), err);
+	return err;
+}
+EXPORT_SYMBOL(cqhci_init);
+
+MODULE_AUTHOR("Venkat Gopalakrishnan <venkatg@codeaurora.org>");
+MODULE_DESCRIPTION("Command Queue Host Controller Interface driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/mmc/host/cqhci.h b/drivers/mmc/host/cqhci.h
new file mode 100644
index 000000000000..2d39d361b322
--- /dev/null
+++ b/drivers/mmc/host/cqhci.h
@@ -0,0 +1,240 @@
+/* Copyright (c) 2015, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#ifndef LINUX_MMC_CQHCI_H
+#define LINUX_MMC_CQHCI_H
+
+#include <linux/compiler.h>
+#include <linux/bitops.h>
+#include <linux/spinlock_types.h>
+#include <linux/types.h>
+#include <linux/completion.h>
+#include <linux/wait.h>
+#include <linux/irqreturn.h>
+#include <asm/io.h>
+
+/* registers */
+/* version */
+#define CQHCI_VER			0x00
+#define CQHCI_VER_MAJOR(x)		(((x) & GENMASK(11, 8)) >> 8)
+#define CQHCI_VER_MINOR1(x)		(((x) & GENMASK(7, 4)) >> 4)
+#define CQHCI_VER_MINOR2(x)		((x) & GENMASK(3, 0))
+
+/* capabilities */
+#define CQHCI_CAP			0x04
+/* configuration */
+#define CQHCI_CFG			0x08
+#define CQHCI_DCMD			0x00001000
+#define CQHCI_TASK_DESC_SZ		0x00000100
+#define CQHCI_ENABLE			0x00000001
+
+/* control */
+#define CQHCI_CTL			0x0C
+#define CQHCI_CLEAR_ALL_TASKS		0x00000100
+#define CQHCI_HALT			0x00000001
+
+/* interrupt status */
+#define CQHCI_IS			0x10
+#define CQHCI_IS_HAC			BIT(0)
+#define CQHCI_IS_TCC			BIT(1)
+#define CQHCI_IS_RED			BIT(2)
+#define CQHCI_IS_TCL			BIT(3)
+
+#define CQHCI_IS_MASK (CQHCI_IS_TCC | CQHCI_IS_RED)
+
+/* interrupt status enable */
+#define CQHCI_ISTE			0x14
+
+/* interrupt signal enable */
+#define CQHCI_ISGE			0x18
+
+/* interrupt coalescing */
+#define CQHCI_IC			0x1C
+#define CQHCI_IC_ENABLE			BIT(31)
+#define CQHCI_IC_RESET			BIT(16)
+#define CQHCI_IC_ICCTHWEN		BIT(15)
+#define CQHCI_IC_ICCTH(x)		((x & 0x1F) << 8)
+#define CQHCI_IC_ICTOVALWEN		BIT(7)
+#define CQHCI_IC_ICTOVAL(x)		(x & 0x7F)
+
+/* task list base address */
+#define CQHCI_TDLBA			0x20
+
+/* task list base address upper */
+#define CQHCI_TDLBAU			0x24
+
+/* door-bell */
+#define CQHCI_TDBR			0x28
+
+/* task completion notification */
+#define CQHCI_TCN			0x2C
+
+/* device queue status */
+#define CQHCI_DQS			0x30
+
+/* device pending tasks */
+#define CQHCI_DPT			0x34
+
+/* task clear */
+#define CQHCI_TCLR			0x38
+
+/* send status config 1 */
+#define CQHCI_SSC1			0x40
+
+/* send status config 2 */
+#define CQHCI_SSC2			0x44
+
+/* response for dcmd */
+#define CQHCI_CRDCT			0x48
+
+/* response mode error mask */
+#define CQHCI_RMEM			0x50
+
+/* task error info */
+#define CQHCI_TERRI			0x54
+
+#define CQHCI_TERRI_C_INDEX(x)		((x) & GENMASK(5, 0))
+#define CQHCI_TERRI_C_TASK(x)		(((x) & GENMASK(12, 8)) >> 8)
+#define CQHCI_TERRI_C_VALID(x)		((x) & BIT(15))
+#define CQHCI_TERRI_D_INDEX(x)		(((x) & GENMASK(21, 16)) >> 16)
+#define CQHCI_TERRI_D_TASK(x)		(((x) & GENMASK(28, 24)) >> 24)
+#define CQHCI_TERRI_D_VALID(x)		((x) & BIT(31))
+
+/* command response index */
+#define CQHCI_CRI			0x58
+
+/* command response argument */
+#define CQHCI_CRA			0x5C
+
+#define CQHCI_INT_ALL			0xF
+#define CQHCI_IC_DEFAULT_ICCTH		31
+#define CQHCI_IC_DEFAULT_ICTOVAL	1
+
+/* attribute fields */
+#define CQHCI_VALID(x)			((x & 1) << 0)
+#define CQHCI_END(x)			((x & 1) << 1)
+#define CQHCI_INT(x)			((x & 1) << 2)
+#define CQHCI_ACT(x)			((x & 0x7) << 3)
+
+/* data command task descriptor fields */
+#define CQHCI_FORCED_PROG(x)		((x & 1) << 6)
+#define CQHCI_CONTEXT(x)		((x & 0xF) << 7)
+#define CQHCI_DATA_TAG(x)		((x & 1) << 11)
+#define CQHCI_DATA_DIR(x)		((x & 1) << 12)
+#define CQHCI_PRIORITY(x)		((x & 1) << 13)
+#define CQHCI_QBAR(x)			((x & 1) << 14)
+#define CQHCI_REL_WRITE(x)		((x & 1) << 15)
+#define CQHCI_BLK_COUNT(x)		((x & 0xFFFF) << 16)
+#define CQHCI_BLK_ADDR(x)		((x & 0xFFFFFFFF) << 32)
+
+/* direct command task descriptor fields */
+#define CQHCI_CMD_INDEX(x)		((x & 0x3F) << 16)
+#define CQHCI_CMD_TIMING(x)		((x & 1) << 22)
+#define CQHCI_RESP_TYPE(x)		((x & 0x3) << 23)
+
+/* transfer descriptor fields */
+#define CQHCI_DAT_LENGTH(x)		((x & 0xFFFF) << 16)
+#define CQHCI_DAT_ADDR_LO(x)		((x & 0xFFFFFFFF) << 32)
+#define CQHCI_DAT_ADDR_HI(x)		((x & 0xFFFFFFFF) << 0)
+
+struct cqhci_host_ops;
+struct mmc_host;
+struct cqhci_slot;
+
+struct cqhci_host {
+	const struct cqhci_host_ops *ops;
+	void __iomem *mmio;
+	struct mmc_host *mmc;
+
+	spinlock_t lock;
+
+	/* relative card address of device */
+	unsigned int rca;
+
+	/* 64 bit DMA */
+	bool dma64;
+	int num_slots;
+	int qcnt;
+
+	u32 dcmd_slot;
+	u32 caps;
+#define CQHCI_TASK_DESC_SZ_128		0x1
+
+	u32 quirks;
+#define CQHCI_QUIRK_SHORT_TXFR_DESC_SZ	0x1
+
+	bool enabled;
+	bool halted;
+	bool init_done;
+	bool activated;
+	bool waiting_for_idle;
+	bool recovery_halt;
+
+	size_t desc_size;
+	size_t data_size;
+
+	u8 *desc_base;
+
+	/* total descriptor size */
+	u8 slot_sz;
+
+	/* 64/128 bit depends on CQHCI_CFG */
+	u8 task_desc_len;
+
+	/* 64 bit on 32-bit arch, 128 bit on 64-bit */
+	u8 link_desc_len;
+
+	u8 *trans_desc_base;
+	/* same length as transfer descriptor */
+	u8 trans_desc_len;
+
+	dma_addr_t desc_dma_base;
+	dma_addr_t trans_desc_dma_base;
+
+	struct completion halt_comp;
+	wait_queue_head_t wait_queue;
+	struct cqhci_slot *slot;
+};
+
+struct cqhci_host_ops {
+	void (*dumpregs)(struct mmc_host *mmc);
+	void (*write_l)(struct cqhci_host *host, u32 val, int reg);
+	u32 (*read_l)(struct cqhci_host *host, int reg);
+	void (*enable)(struct mmc_host *mmc);
+	void (*disable)(struct mmc_host *mmc, bool recovery);
+};
+
+static inline void cqhci_writel(struct cqhci_host *host, u32 val, int reg)
+{
+	if (unlikely(host->ops->write_l))
+		host->ops->write_l(host, val, reg);
+	else
+		writel_relaxed(val, host->mmio + reg);
+}
+
+static inline u32 cqhci_readl(struct cqhci_host *host, int reg)
+{
+	if (unlikely(host->ops->read_l))
+		return host->ops->read_l(host, reg);
+	else
+		return readl_relaxed(host->mmio + reg);
+}
+
+struct platform_device;
+
+irqreturn_t cqhci_irq(struct mmc_host *mmc, u32 intmask, int cmd_error,
+		      int data_error);
+int cqhci_init(struct cqhci_host *cq_host, struct mmc_host *mmc, bool dma64);
+struct cqhci_host *cqhci_pltfm_init(struct platform_device *pdev);
+int cqhci_suspend(struct mmc_host *mmc);
+int cqhci_resume(struct mmc_host *mmc);
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V12 5/5] mmc: sdhci-pci: Add CQHCI support for Intel GLK
  2017-10-24  8:40 [PATCH V12 0/5] mmc: Add Command Queue support Adrian Hunter
                   ` (3 preceding siblings ...)
  2017-10-24  8:40 ` [PATCH V12 4/5] mmc: cqhci: support for command queue enabled host Adrian Hunter
@ 2017-10-24  8:40 ` Adrian Hunter
  2017-10-26 13:32 ` [PATCH V12 0/5] mmc: Add Command Queue support Linus Walleij
  5 siblings, 0 replies; 12+ messages in thread
From: Adrian Hunter @ 2017-10-24  8:40 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

Add CQHCI initialization and implement CQHCI operations for Intel GLK.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/host/Kconfig          |   1 +
 drivers/mmc/host/sdhci-pci-core.c | 155 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 155 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 3092b7085cb5..2b02a9788bb6 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -81,6 +81,7 @@ config MMC_SDHCI_BIG_ENDIAN_32BIT_BYTE_SWAPPER
 config MMC_SDHCI_PCI
 	tristate "SDHCI support on PCI bus"
 	depends on MMC_SDHCI && PCI
+	select MMC_CQHCI
 	help
 	  This selects the PCI Secure Digital Host Controller Interface.
 	  Most controllers found today are PCI devices.
diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
index 3e4f04fd5175..110c634cfb43 100644
--- a/drivers/mmc/host/sdhci-pci-core.c
+++ b/drivers/mmc/host/sdhci-pci-core.c
@@ -30,6 +30,8 @@
 #include <linux/mmc/sdhci-pci-data.h>
 #include <linux/acpi.h>
 
+#include "cqhci.h"
+
 #include "sdhci.h"
 #include "sdhci-pci.h"
 
@@ -116,6 +118,28 @@ int sdhci_pci_resume_host(struct sdhci_pci_chip *chip)
 
 	return 0;
 }
+
+static int sdhci_cqhci_suspend(struct sdhci_pci_chip *chip)
+{
+	int ret;
+
+	ret = cqhci_suspend(chip->slots[0]->host->mmc);
+	if (ret)
+		return ret;
+
+	return sdhci_pci_suspend_host(chip);
+}
+
+static int sdhci_cqhci_resume(struct sdhci_pci_chip *chip)
+{
+	int ret;
+
+	ret = sdhci_pci_resume_host(chip);
+	if (ret)
+		return ret;
+
+	return cqhci_resume(chip->slots[0]->host->mmc);
+}
 #endif
 
 #ifdef CONFIG_PM
@@ -166,8 +190,48 @@ static int sdhci_pci_runtime_resume_host(struct sdhci_pci_chip *chip)
 
 	return 0;
 }
+
+static int sdhci_cqhci_runtime_suspend(struct sdhci_pci_chip *chip)
+{
+	int ret;
+
+	ret = cqhci_suspend(chip->slots[0]->host->mmc);
+	if (ret)
+		return ret;
+
+	return sdhci_pci_runtime_suspend_host(chip);
+}
+
+static int sdhci_cqhci_runtime_resume(struct sdhci_pci_chip *chip)
+{
+	int ret;
+
+	ret = sdhci_pci_runtime_resume_host(chip);
+	if (ret)
+		return ret;
+
+	return cqhci_resume(chip->slots[0]->host->mmc);
+}
 #endif
 
+static u32 sdhci_cqhci_irq(struct sdhci_host *host, u32 intmask)
+{
+	int cmd_error = 0;
+	int data_error = 0;
+
+	if (!sdhci_cqe_irq(host, intmask, &cmd_error, &data_error))
+		return intmask;
+
+	cqhci_irq(host->mmc, intmask, cmd_error, data_error);
+
+	return 0;
+}
+
+static void sdhci_pci_dumpregs(struct mmc_host *mmc)
+{
+	sdhci_dumpregs(mmc_priv(mmc));
+}
+
 /*****************************************************************************\
  *                                                                           *
  * Hardware specific quirk handling                                          *
@@ -583,6 +647,18 @@ static void sdhci_intel_voltage_switch(struct sdhci_host *host)
 	.voltage_switch		= sdhci_intel_voltage_switch,
 };
 
+static const struct sdhci_ops sdhci_intel_glk_ops = {
+	.set_clock		= sdhci_set_clock,
+	.set_power		= sdhci_intel_set_power,
+	.enable_dma		= sdhci_pci_enable_dma,
+	.set_bus_width		= sdhci_set_bus_width,
+	.reset			= sdhci_reset,
+	.set_uhs_signaling	= sdhci_set_uhs_signaling,
+	.hw_reset		= sdhci_pci_hw_reset,
+	.voltage_switch		= sdhci_intel_voltage_switch,
+	.irq			= sdhci_cqhci_irq,
+};
+
 static void byt_read_dsm(struct sdhci_pci_slot *slot)
 {
 	struct intel_host *intel_host = sdhci_pci_priv(slot);
@@ -612,12 +688,80 @@ static int glk_emmc_probe_slot(struct sdhci_pci_slot *slot)
 {
 	int ret = byt_emmc_probe_slot(slot);
 
+	slot->host->mmc->caps2 |= MMC_CAP2_CQE;
+
 	if (slot->chip->pdev->device != PCI_DEVICE_ID_INTEL_GLK_EMMC) {
 		slot->host->mmc->caps2 |= MMC_CAP2_HS400_ES,
 		slot->host->mmc_host_ops.hs400_enhanced_strobe =
 						intel_hs400_enhanced_strobe;
+		slot->host->mmc->caps2 |= MMC_CAP2_CQE_DCMD;
+	}
+
+	return ret;
+}
+
+static void glk_cqe_enable(struct mmc_host *mmc)
+{
+	struct sdhci_host *host = mmc_priv(mmc);
+	u32 reg;
+
+	/*
+	 * CQE gets stuck if it sees Buffer Read Enable bit set, which can be
+	 * the case after tuning, so ensure the buffer is drained.
+	 */
+	reg = sdhci_readl(host, SDHCI_PRESENT_STATE);
+	while (reg & SDHCI_DATA_AVAILABLE) {
+		sdhci_readl(host, SDHCI_BUFFER);
+		reg = sdhci_readl(host, SDHCI_PRESENT_STATE);
+	}
+
+	sdhci_cqe_enable(mmc);
+}
+
+static const struct cqhci_host_ops glk_cqhci_ops = {
+	.enable		= glk_cqe_enable,
+	.disable	= sdhci_cqe_disable,
+	.dumpregs	= sdhci_pci_dumpregs,
+};
+
+static int glk_emmc_add_host(struct sdhci_pci_slot *slot)
+{
+	struct device *dev = &slot->chip->pdev->dev;
+	struct sdhci_host *host = slot->host;
+	struct cqhci_host *cq_host;
+	bool dma64;
+	int ret;
+
+	ret = sdhci_setup_host(host);
+	if (ret)
+		return ret;
+
+	cq_host = devm_kzalloc(dev, sizeof(*cq_host), GFP_KERNEL);
+	if (!cq_host) {
+		ret = -ENOMEM;
+		goto cleanup;
 	}
 
+	cq_host->mmio = host->ioaddr + 0x200;
+	cq_host->quirks |= CQHCI_QUIRK_SHORT_TXFR_DESC_SZ;
+	cq_host->ops = &glk_cqhci_ops;
+
+	dma64 = host->flags & SDHCI_USE_64_BIT_DMA;
+	if (dma64)
+		cq_host->caps |= CQHCI_TASK_DESC_SZ_128;
+
+	ret = cqhci_init(cq_host, host->mmc, dma64);
+	if (ret)
+		goto cleanup;
+
+	ret = __sdhci_add_host(host);
+	if (ret)
+		goto cleanup;
+
+	return 0;
+
+cleanup:
+	sdhci_cleanup_host(host);
 	return ret;
 }
 
@@ -699,11 +843,20 @@ static int byt_sd_probe_slot(struct sdhci_pci_slot *slot)
 static const struct sdhci_pci_fixes sdhci_intel_glk_emmc = {
 	.allow_runtime_pm	= true,
 	.probe_slot		= glk_emmc_probe_slot,
+	.add_host		= glk_emmc_add_host,
+#ifdef CONFIG_PM_SLEEP
+	.suspend		= sdhci_cqhci_suspend,
+	.resume			= sdhci_cqhci_resume,
+#endif
+#ifdef CONFIG_PM
+	.runtime_suspend	= sdhci_cqhci_runtime_suspend,
+	.runtime_resume		= sdhci_cqhci_runtime_resume,
+#endif
 	.quirks			= SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC,
 	.quirks2		= SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
 				  SDHCI_QUIRK2_CAPS_BIT63_FOR_HS400 |
 				  SDHCI_QUIRK2_STOP_WITH_TC,
-	.ops			= &sdhci_intel_byt_ops,
+	.ops			= &sdhci_intel_glk_ops,
 	.priv_size		= sizeof(struct intel_host),
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 0/5] mmc: Add Command Queue support
  2017-10-24  8:40 [PATCH V12 0/5] mmc: Add Command Queue support Adrian Hunter
                   ` (4 preceding siblings ...)
  2017-10-24  8:40 ` [PATCH V12 5/5] mmc: sdhci-pci: Add CQHCI support for Intel GLK Adrian Hunter
@ 2017-10-26 13:32 ` Linus Walleij
  2017-10-26 13:49   ` Adrian Hunter
  5 siblings, 1 reply; 12+ messages in thread
From: Linus Walleij @ 2017-10-26 13:32 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Tue, Oct 24, 2017 at 10:40 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:

> Here is V12 of the hardware command queue patches without the software
> command queue patches, now using blk-mq and now with blk-mq support for
> non-CQE I/O.

Since I had my test setup going I gave this a spin with the same set
of tests that I used before/after my MQ patches.

It is using the same setup and same eMMC, but I hade to rebase onto
Ulf's very latest next branch to apply your patches.

I default-enabled multiqueue.

Results:

sync
echo 3 > /proc/sys/vm/drop_caches
sync
time dd if=/dev/mmcblk3 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 24.251922 seconds, 42.2MB/s
real    0m 24.25s
user    0m 0.03s
sys     0m 3.80s

mount /dev/mmcblk3p1 /mnt/
cd /mnt/
sync
echo 3 > /proc/sys/vm/drop_caches
sync
time find . > /dev/null
real    0m 3.24s
user    0m 0.22s
sys     0m 1.23s

sync
echo 3 > /proc/sys/vm/drop_caches
sync
iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test

                                                   random    random
   kB  reclen    write  rewrite    read    reread    read     write
20480       4     1615     1571     6612     6714     6494      531
20480       8     2143     2295    11559    11563    11499     1164
20480      16     3894     4202    17826    17823    17755     1369
20480      32     5816     7489    23741    23759    23709     3016
20480      64     7393     9167    27532    27526    27502     3591
20480     128     7328     8097    29184    29161    29159     5592
20480     256     7194     8752    29424    29434    29424     6700
20480     512     8984     9930    29903    29911    29909     7420
20480    1024     7072     7446    27684    27685    27681     7444
20480    2048     6840     8199    27398    27420    27418     6766
20480    4096     8137     6805    28091    28089    28093     8209
20480    8192     7255     7485    28386    28384    28383     7479
20480   16384     7078     7448    28584    28585    28585     7447

In short: no performance regressions.

Performance-wise this is on par with my own patch set for MQ.

As you know my pet peeve is "enable MQ by default" and I see no
reason from a performance perspective not to enable MQ by default
on this patch set or mine for that matter.

> While we should look at changing blk-mq to give better workqueue performance,
> a bigger gain is likely to be made by adding a new host API to enable the
> next already-prepared request to be issued directly from within ->done()
> callback of the current request.

My patch series switches the stack around to make it possible
to do this. But it doesn't go the whole way to complete the requests
from interrupt context.

Since we have to send commands for retune etc request finalization
cannot easily be done from interrupt context.

But I am thinking about testing to hack it
using some ugly approaches ... like assuming we don't need any
retune etc and just say all is fine and optimistically complete the
request directly in the interrupt handler if all was OK and wait
for errors to happen before retuning.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 0/5] mmc: Add Command Queue support
  2017-10-26 13:32 ` [PATCH V12 0/5] mmc: Add Command Queue support Linus Walleij
@ 2017-10-26 13:49   ` Adrian Hunter
  2017-10-26 14:25     ` Linus Walleij
  0 siblings, 1 reply; 12+ messages in thread
From: Adrian Hunter @ 2017-10-26 13:49 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 26/10/17 16:32, Linus Walleij wrote:
> On Tue, Oct 24, 2017 at 10:40 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>> Here is V12 of the hardware command queue patches without the software
>> command queue patches, now using blk-mq and now with blk-mq support for
>> non-CQE I/O.
> 
> Since I had my test setup going I gave this a spin with the same set
> of tests that I used before/after my MQ patches.
> 
> It is using the same setup and same eMMC, but I hade to rebase onto
> Ulf's very latest next branch to apply your patches.
> 
> I default-enabled multiqueue.
> 
> Results:
> 
> sync
> echo 3 > /proc/sys/vm/drop_caches
> sync
> time dd if=/dev/mmcblk3 of=/dev/null bs=1M count=1024
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.0GB) copied, 24.251922 seconds, 42.2MB/s
> real    0m 24.25s
> user    0m 0.03s
> sys     0m 3.80s
> 
> mount /dev/mmcblk3p1 /mnt/
> cd /mnt/
> sync
> echo 3 > /proc/sys/vm/drop_caches
> sync
> time find . > /dev/null
> real    0m 3.24s
> user    0m 0.22s
> sys     0m 1.23s
> 
> sync
> echo 3 > /proc/sys/vm/drop_caches
> sync
> iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
> 
>                                                    random    random
>    kB  reclen    write  rewrite    read    reread    read     write
> 20480       4     1615     1571     6612     6714     6494      531
> 20480       8     2143     2295    11559    11563    11499     1164
> 20480      16     3894     4202    17826    17823    17755     1369
> 20480      32     5816     7489    23741    23759    23709     3016
> 20480      64     7393     9167    27532    27526    27502     3591
> 20480     128     7328     8097    29184    29161    29159     5592
> 20480     256     7194     8752    29424    29434    29424     6700
> 20480     512     8984     9930    29903    29911    29909     7420
> 20480    1024     7072     7446    27684    27685    27681     7444
> 20480    2048     6840     8199    27398    27420    27418     6766
> 20480    4096     8137     6805    28091    28089    28093     8209
> 20480    8192     7255     7485    28386    28384    28383     7479
> 20480   16384     7078     7448    28584    28585    28585     7447
> 
> In short: no performance regressions.

You really need to test cards that are fast.  A decent UHS-I SD card can do
over 80 MB/s for reads and of course HS400 eMMC can do over 300 MB/s.

> 
> Performance-wise this is on par with my own patch set for MQ.
> 
> As you know my pet peeve is "enable MQ by default" and I see no
> reason from a performance perspective not to enable MQ by default
> on this patch set or mine for that matter.

That is a side-issue.  A single small patch can change that.

> 
>> While we should look at changing blk-mq to give better workqueue performance,
>> a bigger gain is likely to be made by adding a new host API to enable the
>> next already-prepared request to be issued directly from within ->done()
>> callback of the current request.
> 
> My patch series switches the stack around to make it possible
> to do this. But it doesn't go the whole way to complete the requests
> from interrupt context.
> 
> Since we have to send commands for retune etc request finalization
> cannot easily be done from interrupt context.

Re-tuning and background operations are rare and slow, so there is no reason
to try to start them from interrupt context.

> 
> But I am thinking about testing to hack it
> using some ugly approaches ... like assuming we don't need any
> retune etc and just say all is fine and optimistically complete the
> request directly in the interrupt handler if all was OK and wait
> for errors to happen before retuning.

It already works that way.  Re-tuning happens before you start a request.
We prevent re-tuning in between dependent requests, like between starting a
transfer and CMD13 polling for completion.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 0/5] mmc: Add Command Queue support
  2017-10-26 13:49   ` Adrian Hunter
@ 2017-10-26 14:25     ` Linus Walleij
  0 siblings, 0 replies; 12+ messages in thread
From: Linus Walleij @ 2017-10-26 14:25 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Thu, Oct 26, 2017 at 3:49 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 26/10/17 16:32, Linus Walleij wrote:

>> My patch series switches the stack around to make it possible
>> to do this. But it doesn't go the whole way to complete the requests
>> from interrupt context.
>>
>> Since we have to send commands for retune etc request finalization
>> cannot easily be done from interrupt context.
>
> Re-tuning and background operations are rare and slow, so there is no reason
> to try to start them from interrupt context.

OK I will try to get them out of the way and see what happens,
hehe :)

What I mean is that we were checking - on every command -
if BKOPS or retune needs to happen. And then doing it. Thus
all was done in process context.

>> But I am thinking about testing to hack it
>> using some ugly approaches ... like assuming we don't need any
>> retune etc and just say all is fine and optimistically complete the
>> request directly in the interrupt handler if all was OK and wait
>> for errors to happen before retuning.
>
> It already works that way.  Re-tuning happens before you start a request.
> We prevent re-tuning in between dependent requests, like between starting a
> transfer and CMD13 polling for completion.

Ah that is what these if()s do ... right. I'll see if I can get around
this then.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 2/5] mmc: block: Add blk-mq support
  2017-10-24  8:40 ` [PATCH V12 2/5] mmc: block: Add blk-mq support Adrian Hunter
@ 2017-10-27  9:23   ` Ulf Hansson
  2017-10-27 11:54     ` Adrian Hunter
  0 siblings, 1 reply; 12+ messages in thread
From: Ulf Hansson @ 2017-10-27  9:23 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

On 24 October 2017 at 10:40, Adrian Hunter <adrian.hunter@intel.com> wrote:
> Define and use a blk-mq queue. Discards and flushes are processed
> synchronously, but reads and writes asynchronously. In order to support
> slow DMA unmapping, DMA unmapping is not done until after the next request
> is started. That means the request is not completed until then. If there is
> no next request then the completion is done by queued work.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  drivers/mmc/core/block.c | 655 ++++++++++++++++++++++++++++++++++++++++++++++-
>  drivers/mmc/core/block.h |  10 +
>  drivers/mmc/core/queue.c | 302 ++++++++++++++++++++--
>  drivers/mmc/core/queue.h |  41 +++
>  include/linux/mmc/host.h |   1 +
>  5 files changed, 979 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index ea80ff4cd7f9..002446e8dc5d 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -1264,7 +1264,10 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
>                 break;
>         }
>         mq_rq->drv_op_result = ret;
> -       blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
> +       if (req->mq_ctx)
> +               blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
> +       else
> +               blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
>  }
>
>  static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
> @@ -1307,7 +1310,10 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
>         else
>                 mmc_blk_reset_success(md, type);
>  fail:
> -       blk_end_request(req, status, blk_rq_bytes(req));
> +       if (req->mq_ctx)
> +               blk_mq_end_request(req, status);
> +       else
> +               blk_end_request(req, status, blk_rq_bytes(req));
>  }
>
>  static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
> @@ -1377,7 +1383,10 @@ static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
>         if (!err)
>                 mmc_blk_reset_success(md, type);
>  out:
> -       blk_end_request(req, status, blk_rq_bytes(req));
> +       if (req->mq_ctx)
> +               blk_mq_end_request(req, status);
> +       else
> +               blk_end_request(req, status, blk_rq_bytes(req));
>  }
>
>  static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
> @@ -1387,7 +1396,10 @@ static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
>         int ret = 0;
>
>         ret = mmc_flush_cache(card);
> -       blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
> +       if (req->mq_ctx)
> +               blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
> +       else
> +               blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
>  }
>
>  /*
> @@ -1413,15 +1425,18 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq,
>         }
>  }
>
> -#define CMD_ERRORS                                                     \
> -       (R1_OUT_OF_RANGE |      /* Command argument out of range */     \
> -        R1_ADDRESS_ERROR |     /* Misaligned address */                \
> +#define CMD_ERRORS_EXCL_OOR                                            \
> +       (R1_ADDRESS_ERROR |     /* Misaligned address */                \

This looks unrelated to blkmq support.

>          R1_BLOCK_LEN_ERROR |   /* Transferred block length incorrect */\
>          R1_WP_VIOLATION |      /* Tried to write to protected block */ \
>          R1_CARD_ECC_FAILED |   /* Card ECC failed */                   \
>          R1_CC_ERROR |          /* Card controller error */             \
>          R1_ERROR)              /* General/unknown error */
>
> +#define CMD_ERRORS                                                     \
> +       (CMD_ERRORS_EXCL_OOR |                                          \
> +        R1_OUT_OF_RANGE)       /* Command argument out of range */     \
> +

Ditto.

>  static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq)
>  {
>         u32 val;
> @@ -1766,6 +1781,632 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
>         mqrq->areq.err_check = mmc_blk_err_check;
>  }
>
> +#define MMC_MAX_RETRIES                5
> +#define MMC_DATA_RETRIES       2
> +#define MMC_NO_RETRIES         (MMC_MAX_RETRIES + 1)

What's are these defines about? Do you intend to use different retries
for the blkmq case comparing to the legacy request path? If so, why?

> +
> +/* Single sector read during recovery */
> +static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req)
> +{
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       blk_status_t status;
> +
> +       while (1) {
> +               mmc_blk_rw_rq_prep(mqrq, mq->card, 1, mq);
> +
> +               mmc_wait_for_req(mq->card->host, &mqrq->brq.mrq);
> +
> +               /*
> +                * Not expecting command errors, so just give up in that case.
> +                * If there are retries remaining, the request will get
> +                * requeued.
> +                */
> +               if (mqrq->brq.cmd.error)
> +                       return;
> +
> +               if (blk_rq_bytes(req) <= 512)
> +                       break;
> +
> +               status = mqrq->brq.data.error ? BLK_STS_IOERR : BLK_STS_OK;
> +
> +               blk_update_request(req, status, 512);
> +       }
> +
> +       mqrq->retries = MMC_NO_RETRIES;
> +}
> +
> +static inline bool mmc_blk_oor_valid(struct mmc_blk_request *brq)
> +{
> +       return !!brq->mrq.sbc;
> +}
> +
> +static inline u32 mmc_blk_stop_err_bits(struct mmc_blk_request *brq)
> +{
> +       return mmc_blk_oor_valid(brq) ? CMD_ERRORS : CMD_ERRORS_EXCL_OOR;
> +}

Again, this seems like a non blkmq specific thing.

> +
> +static inline bool mmc_blk_in_tran_state(u32 status)
> +{
> +       /*
> +        * Some cards mishandle the status bits, so make sure to check both the
> +        * busy indication and the card state.
> +        */
> +       return status & R1_READY_FOR_DATA &&
> +              (R1_CURRENT_STATE(status) == R1_STATE_TRAN);
> +}
> +
> +/*
> + * Check for errors the host controller driver might not have seen such as
> + * response mode errors or invalid card state.
> + */
> +static bool mmc_blk_status_error(struct request *req, u32 status)
> +{
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       struct mmc_blk_request *brq = &mqrq->brq;
> +       u32 stop_err_bits = mmc_blk_stop_err_bits(brq);
> +
> +       return brq->cmd.resp[0]  & CMD_ERRORS    ||
> +              brq->stop.resp[0] & stop_err_bits ||
> +              status            & stop_err_bits ||
> +              (rq_data_dir(req) == WRITE && !mmc_blk_in_tran_state(status));
> +}
> +
> +static inline bool mmc_blk_cmd_started(struct mmc_blk_request *brq)
> +{
> +       return !brq->sbc.error && !brq->cmd.error &&
> +              !(brq->cmd.resp[0] & CMD_ERRORS);
> +}
> +
> +static unsigned int mmc_blk_clock_khz(struct mmc_host *host)
> +{
> +       if (host->actual_clock)
> +               return host->actual_clock / 1000;
> +
> +       /* Clock may be subject to a divisor, fudge it by a factor of 2. */
> +       if (host->ios.clock)
> +               return host->ios.clock / 2000;
> +
> +       /* How can there be no clock */
> +       WARN_ON_ONCE(1);
> +       return 100; /* 100 kHz is minimum possible value */
> +}
> +
> +static unsigned long mmc_blk_data_timeout_jiffies(struct mmc_host *host,
> +                                                 struct mmc_data *data)
> +{
> +       unsigned int ms = DIV_ROUND_UP(data->timeout_ns, 1000000);
> +       unsigned int khz;
> +
> +       if (data->timeout_clks) {
> +               khz = mmc_blk_clock_khz(host);
> +               ms += DIV_ROUND_UP(data->timeout_clks, khz);
> +       }
> +
> +       return msecs_to_jiffies(ms);
> +}
> +
> +static int mmc_blk_card_stuck(struct mmc_card *card, struct request *req,
> +                             u32 *resp_errs)
> +{
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       struct mmc_data *data = &mqrq->brq.data;
> +       unsigned long timeout;
> +       u32 status;
> +       int err;
> +
> +       timeout = jiffies + mmc_blk_data_timeout_jiffies(card->host, data);
> +
> +       while (1) {
> +               bool done = time_after(jiffies, timeout);
> +
> +               err = __mmc_send_status(card, &status, 5);
> +               if (err) {
> +                       pr_err("%s: error %d requesting status\n",
> +                              req->rq_disk->disk_name, err);
> +                       break;
> +               }
> +
> +               /* Accumulate any response error bits seen */
> +               if (resp_errs)
> +                       *resp_errs |= status;
> +
> +               if (mmc_blk_in_tran_state(status))
> +                       break;
> +
> +               /* Timeout if the device never becomes ready */
> +               if (done) {
> +                       pr_err("%s: Card stuck in wrong state! %s %s\n",
> +                               mmc_hostname(card->host),
> +                               req->rq_disk->disk_name, __func__);
> +                       err = -ETIMEDOUT;
> +                       break;
> +               }
> +       }
> +
> +       return err;
> +}
> +
> +static int mmc_blk_send_stop(struct mmc_card *card)
> +{
> +       struct mmc_command cmd = {
> +               .opcode = MMC_STOP_TRANSMISSION,
> +               .flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_AC,
> +       };
> +
> +       return mmc_wait_for_cmd(card->host, &cmd, 5);
> +}
> +
> +static int mmc_blk_fix_state(struct mmc_card *card, struct request *req)
> +{
> +       int err;
> +
> +       mmc_retune_hold_now(card->host);
> +
> +       mmc_blk_send_stop(card);
> +
> +       err = mmc_blk_card_stuck(card, req, NULL);
> +
> +       mmc_retune_release(card->host);
> +
> +       return err;
> +}
> +
> +static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
> +{
> +       int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       struct mmc_blk_request *brq = &mqrq->brq;
> +       struct mmc_blk_data *md = mq->blkdata;
> +       struct mmc_card *card = mq->card;
> +       u32 status;
> +       u32 blocks;
> +       int err;
> +
> +       /*
> +        * Status error bits might get lost during re-tuning so don't allow
> +        * re-tuning yet.
> +        */
> +       mmc_retune_hold_now(card->host);
> +
> +       /*
> +        * Some errors the host driver might not have seen. Set the number of
> +        * bytes transferred to zero in that case.
> +        */
> +       err = __mmc_send_status(card, &status, 0);
> +       if (err || mmc_blk_status_error(req, status))
> +               brq->data.bytes_xfered = 0;
> +
> +       mmc_retune_release(card->host);
> +
> +       /*
> +        * Try again to get the status. This also provides an opportunity for
> +        * re-tuning.
> +        */
> +       if (err)
> +               err = __mmc_send_status(card, &status, 0);
> +
> +       /*
> +        * Nothing more to do after the number of bytes transferred has been
> +        * updated and there is no card.
> +        */
> +       if (err && mmc_detect_card_removed(card->host))
> +               return;
> +
> +       /* Try to get back to "tran" state */
> +       if (err || !mmc_blk_in_tran_state(status))
> +               err = mmc_blk_fix_state(mq->card, req);
> +
> +       /*
> +        * Special case for SD cards where the card might record the number of
> +        * blocks written.
> +        */
> +       if (!err && mmc_blk_cmd_started(brq) && mmc_card_sd(card) &&
> +           rq_data_dir(req) == WRITE && !mmc_sd_num_wr_blocks(card, &blocks))
> +               brq->data.bytes_xfered = blocks << 9;
> +
> +       /* Reset if the card is in a bad state */
> +       if (err && mmc_blk_reset(md, card->host, type)) {
> +               pr_err("%s: recovery failed!\n", req->rq_disk->disk_name);
> +               mqrq->retries = MMC_NO_RETRIES;
> +               return;
> +       }
> +
> +       /*
> +        * If anything was done, just return and if there is anything remaining
> +        * on the request it will get requeued.
> +        */
> +       if (brq->data.bytes_xfered)
> +               return;
> +
> +       /* Reset before last retry */
> +       if (mqrq->retries + 1 == MMC_MAX_RETRIES)
> +               mmc_blk_reset(md, card->host, type);
> +
> +       /* Command errors fail fast, so use all MMC_MAX_RETRIES */
> +       if (brq->sbc.error || brq->cmd.error)
> +               return;
> +
> +       /* Reduce the remaining retries for data errors */
> +       if (mqrq->retries < MMC_MAX_RETRIES - MMC_DATA_RETRIES) {
> +               mqrq->retries = MMC_MAX_RETRIES - MMC_DATA_RETRIES;
> +               return;
> +       }
> +
> +       /* FIXME: Missing single sector read for large sector size */
> +       if (rq_data_dir(req) == READ && !mmc_large_sector(card)) {
> +               /* Read one sector at a time */
> +               mmc_blk_ss_read(mq, req);
> +               return;
> +       }
> +}
> +
> +static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq)
> +{
> +       mmc_blk_eval_resp_error(brq);
> +
> +       return brq->sbc.error || brq->cmd.error || brq->stop.error ||
> +              brq->data.error || brq->cmd.resp[0] & CMD_ERRORS;
> +}
> +
> +static int mmc_blk_card_busy(struct mmc_card *card, struct request *req)
> +{
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       u32 status = 0;
> +       int err;
> +
> +       if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ)
> +               return 0;
> +
> +       mmc_retune_hold_now(card->host);
> +
> +       err = mmc_blk_card_stuck(card, req, &status);
> +
> +       mmc_retune_release(card->host);
> +
> +       /*
> +        * Do not assume data transferred correctly if there are any error bits
> +        * set.
> +        */
> +       if (!err && status & mmc_blk_stop_err_bits(&mqrq->brq)) {
> +               mqrq->brq.data.bytes_xfered = 0;
> +               err = -EIO;
> +       }
> +
> +       /* Copy the exception bit so it will be seen later on */
> +       if (mmc_card_mmc(card) && status & R1_EXCEPTION_EVENT)
> +               mqrq->brq.cmd.resp[0] |= R1_EXCEPTION_EVENT;
> +
> +       return err;
> +}
> +
> +static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq,
> +                                           struct request *req)
> +{
> +       int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
> +
> +       mmc_blk_reset_success(mq->blkdata, type);
> +}

I understand that all the above new line and code (around 300 lines)
is something you need for the blkmq support, in the rest of this
patch.

However, it looks like you are adding completely new code that either
already exists in the legacy request path (in some slightly different
format), or could serve as clean up/re-factorization of the legacy
request path.

This is not the way you should format a path for converting to blkmq.
The reasons are:
*) It makes it hard to review.
**) There is no need to through away *all* old mmc blk/core code,
which I assume is your plan for next step. Instead the proper way is
to re-factor it, an make it possible to re-use those parts that makes
sense.

> +
> +static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req)
> +{
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       unsigned int nr_bytes = mqrq->brq.data.bytes_xfered;
> +
> +       if (nr_bytes) {
> +               if (blk_update_request(req, BLK_STS_OK, nr_bytes))
> +                       blk_mq_requeue_request(req, true);
> +               else
> +                       __blk_mq_end_request(req, BLK_STS_OK);
> +       } else if (mqrq->retries++ < MMC_MAX_RETRIES) {
> +               blk_mq_requeue_request(req, true);
> +       } else {
> +               if (mmc_card_removed(mq->card))
> +                       req->rq_flags |= RQF_QUIET;
> +               blk_mq_end_request(req, BLK_STS_IOERR);
> +       }
> +}
> +
> +static bool mmc_blk_urgent_bkops_needed(struct mmc_queue *mq,
> +                                       struct mmc_queue_req *mqrq)
> +{
> +       return mmc_card_mmc(mq->card) &&
> +              (mqrq->brq.cmd.resp[0] & R1_EXCEPTION_EVENT ||
> +               mqrq->brq.stop.resp[0] & R1_EXCEPTION_EVENT);
> +}
> +
> +static void mmc_blk_urgent_bkops(struct mmc_queue *mq,
> +                                struct mmc_queue_req *mqrq)
> +{
> +       if (mmc_blk_urgent_bkops_needed(mq, mqrq))
> +               mmc_start_bkops(mq->card, true);
> +}

Ditto for the two above functions.

> +
> +void mmc_blk_mq_complete(struct request *req)
> +{
> +       struct mmc_queue *mq = req->q->queuedata;
> +
> +       mmc_blk_mq_complete_rq(mq, req);
> +}
> +
> +static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
> +                                      struct request *req)
> +{
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       struct mmc_host *host = mq->card->host;
> +       bool failed;
> +
> +       failed = mmc_blk_rq_error(&mqrq->brq) ||
> +                mmc_blk_card_busy(mq->card, req);
> +
> +       if (!mmc_queue_direct_complete(host))

Can you please make the changes related to completing the request in
the mmc_request_done() into a separate patch.

It's better to first get the default behavior in place, then we can
improve things on top. Again, that also makes it easier to review.

No I am giving up reaching this point. I didn't even get the actual
blkmq converting part, which is the core part I should be spending my
time to review. Sorry!

Some final comments around the system-wide PM support below.

[...]

>
> +static void mmc_mq_queue_suspend(struct mmc_queue *mq)
> +{
> +       blk_mq_quiesce_queue(mq->queue);
> +
> +       /*
> +        * The host remains claimed while there are outstanding requests, so
> +        * simply claiming and releasing here ensures there are none.
> +        */
> +       mmc_claim_host(mq->card->host);
> +       mmc_release_host(mq->card->host);

This looks fragile.

Seems like an interface in blkmq with flushes the queue and suspend it
is missing, however there are of course reasons why it doesn't exist.

I assume the blk_mq_quiesce_queue() guarantees no new requests is
being pushed to us after calling it, but it still seem a bit racy to
rely on the host claim/release thing.

Let's bring this up as question for the block people experts.

BTW, I was talking with Bart and Rafael about generic suspend issues
for block/fs at Kernelsummit the other day. I will try to follow up on
that to make sure we also do the right things in mmc.

> +}
> +

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 2/5] mmc: block: Add blk-mq support
  2017-10-27  9:23   ` Ulf Hansson
@ 2017-10-27 11:54     ` Adrian Hunter
  2017-10-27 13:44       ` Ulf Hansson
  0 siblings, 1 reply; 12+ messages in thread
From: Adrian Hunter @ 2017-10-27 11:54 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

On 27/10/17 12:23, Ulf Hansson wrote:
> On 24 October 2017 at 10:40, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> Define and use a blk-mq queue. Discards and flushes are processed
>> synchronously, but reads and writes asynchronously. In order to support
>> slow DMA unmapping, DMA unmapping is not done until after the next request
>> is started. That means the request is not completed until then. If there is
>> no next request then the completion is done by queued work.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  drivers/mmc/core/block.c | 655 ++++++++++++++++++++++++++++++++++++++++++++++-
>>  drivers/mmc/core/block.h |  10 +
>>  drivers/mmc/core/queue.c | 302 ++++++++++++++++++++--
>>  drivers/mmc/core/queue.h |  41 +++
>>  include/linux/mmc/host.h |   1 +
>>  5 files changed, 979 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
>> index ea80ff4cd7f9..002446e8dc5d 100644
>> --- a/drivers/mmc/core/block.c
>> +++ b/drivers/mmc/core/block.c
>> @@ -1264,7 +1264,10 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
>>                 break;
>>         }
>>         mq_rq->drv_op_result = ret;
>> -       blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
>> +       if (req->mq_ctx)
>> +               blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
>> +       else
>> +               blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
>>  }
>>
>>  static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
>> @@ -1307,7 +1310,10 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
>>         else
>>                 mmc_blk_reset_success(md, type);
>>  fail:
>> -       blk_end_request(req, status, blk_rq_bytes(req));
>> +       if (req->mq_ctx)
>> +               blk_mq_end_request(req, status);
>> +       else
>> +               blk_end_request(req, status, blk_rq_bytes(req));
>>  }
>>
>>  static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
>> @@ -1377,7 +1383,10 @@ static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
>>         if (!err)
>>                 mmc_blk_reset_success(md, type);
>>  out:
>> -       blk_end_request(req, status, blk_rq_bytes(req));
>> +       if (req->mq_ctx)
>> +               blk_mq_end_request(req, status);
>> +       else
>> +               blk_end_request(req, status, blk_rq_bytes(req));
>>  }
>>
>>  static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
>> @@ -1387,7 +1396,10 @@ static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
>>         int ret = 0;
>>
>>         ret = mmc_flush_cache(card);
>> -       blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
>> +       if (req->mq_ctx)
>> +               blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
>> +       else
>> +               blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
>>  }
>>
>>  /*
>> @@ -1413,15 +1425,18 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq,
>>         }
>>  }
>>
>> -#define CMD_ERRORS                                                     \
>> -       (R1_OUT_OF_RANGE |      /* Command argument out of range */     \
>> -        R1_ADDRESS_ERROR |     /* Misaligned address */                \
>> +#define CMD_ERRORS_EXCL_OOR                                            \
>> +       (R1_ADDRESS_ERROR |     /* Misaligned address */                \
> 
> This looks unrelated to blkmq support.

No, it is preserving existing oor functionality.

> 
>>          R1_BLOCK_LEN_ERROR |   /* Transferred block length incorrect */\
>>          R1_WP_VIOLATION |      /* Tried to write to protected block */ \
>>          R1_CARD_ECC_FAILED |   /* Card ECC failed */                   \
>>          R1_CC_ERROR |          /* Card controller error */             \
>>          R1_ERROR)              /* General/unknown error */
>>
>> +#define CMD_ERRORS                                                     \
>> +       (CMD_ERRORS_EXCL_OOR |                                          \
>> +        R1_OUT_OF_RANGE)       /* Command argument out of range */     \
>> +
> 
> Ditto.

And ditto.

> 
>>  static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq)
>>  {
>>         u32 val;
>> @@ -1766,6 +1781,632 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
>>         mqrq->areq.err_check = mmc_blk_err_check;
>>  }
>>
>> +#define MMC_MAX_RETRIES                5
>> +#define MMC_DATA_RETRIES       2
>> +#define MMC_NO_RETRIES         (MMC_MAX_RETRIES + 1)
> 
> What's are these defines about? Do you intend to use different retries
> for the blkmq case comparing to the legacy request path? If so, why?

Same number of retries but now actually explicit.

> 
>> +
>> +/* Single sector read during recovery */
>> +static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req)
>> +{
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       blk_status_t status;
>> +
>> +       while (1) {
>> +               mmc_blk_rw_rq_prep(mqrq, mq->card, 1, mq);
>> +
>> +               mmc_wait_for_req(mq->card->host, &mqrq->brq.mrq);
>> +
>> +               /*
>> +                * Not expecting command errors, so just give up in that case.
>> +                * If there are retries remaining, the request will get
>> +                * requeued.
>> +                */
>> +               if (mqrq->brq.cmd.error)
>> +                       return;
>> +
>> +               if (blk_rq_bytes(req) <= 512)
>> +                       break;
>> +
>> +               status = mqrq->brq.data.error ? BLK_STS_IOERR : BLK_STS_OK;
>> +
>> +               blk_update_request(req, status, 512);
>> +       }
>> +
>> +       mqrq->retries = MMC_NO_RETRIES;
>> +}
>> +
>> +static inline bool mmc_blk_oor_valid(struct mmc_blk_request *brq)
>> +{
>> +       return !!brq->mrq.sbc;
>> +}
>> +
>> +static inline u32 mmc_blk_stop_err_bits(struct mmc_blk_request *brq)
>> +{
>> +       return mmc_blk_oor_valid(brq) ? CMD_ERRORS : CMD_ERRORS_EXCL_OOR;
>> +}
> 
> Again, this seems like a non blkmq specific thing.

Again, preserving existing functionality.

> 
>> +
>> +static inline bool mmc_blk_in_tran_state(u32 status)
>> +{
>> +       /*
>> +        * Some cards mishandle the status bits, so make sure to check both the
>> +        * busy indication and the card state.
>> +        */
>> +       return status & R1_READY_FOR_DATA &&
>> +              (R1_CURRENT_STATE(status) == R1_STATE_TRAN);
>> +}
>> +
>> +/*
>> + * Check for errors the host controller driver might not have seen such as
>> + * response mode errors or invalid card state.
>> + */
>> +static bool mmc_blk_status_error(struct request *req, u32 status)
>> +{
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       struct mmc_blk_request *brq = &mqrq->brq;
>> +       u32 stop_err_bits = mmc_blk_stop_err_bits(brq);
>> +
>> +       return brq->cmd.resp[0]  & CMD_ERRORS    ||
>> +              brq->stop.resp[0] & stop_err_bits ||
>> +              status            & stop_err_bits ||
>> +              (rq_data_dir(req) == WRITE && !mmc_blk_in_tran_state(status));
>> +}
>> +
>> +static inline bool mmc_blk_cmd_started(struct mmc_blk_request *brq)
>> +{
>> +       return !brq->sbc.error && !brq->cmd.error &&
>> +              !(brq->cmd.resp[0] & CMD_ERRORS);
>> +}
>> +
>> +static unsigned int mmc_blk_clock_khz(struct mmc_host *host)
>> +{
>> +       if (host->actual_clock)
>> +               return host->actual_clock / 1000;
>> +
>> +       /* Clock may be subject to a divisor, fudge it by a factor of 2. */
>> +       if (host->ios.clock)
>> +               return host->ios.clock / 2000;
>> +
>> +       /* How can there be no clock */
>> +       WARN_ON_ONCE(1);
>> +       return 100; /* 100 kHz is minimum possible value */
>> +}
>> +
>> +static unsigned long mmc_blk_data_timeout_jiffies(struct mmc_host *host,
>> +                                                 struct mmc_data *data)
>> +{
>> +       unsigned int ms = DIV_ROUND_UP(data->timeout_ns, 1000000);
>> +       unsigned int khz;
>> +
>> +       if (data->timeout_clks) {
>> +               khz = mmc_blk_clock_khz(host);
>> +               ms += DIV_ROUND_UP(data->timeout_clks, khz);
>> +       }
>> +
>> +       return msecs_to_jiffies(ms);
>> +}
>> +
>> +static int mmc_blk_card_stuck(struct mmc_card *card, struct request *req,
>> +                             u32 *resp_errs)
>> +{
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       struct mmc_data *data = &mqrq->brq.data;
>> +       unsigned long timeout;
>> +       u32 status;
>> +       int err;
>> +
>> +       timeout = jiffies + mmc_blk_data_timeout_jiffies(card->host, data);
>> +
>> +       while (1) {
>> +               bool done = time_after(jiffies, timeout);
>> +
>> +               err = __mmc_send_status(card, &status, 5);
>> +               if (err) {
>> +                       pr_err("%s: error %d requesting status\n",
>> +                              req->rq_disk->disk_name, err);
>> +                       break;
>> +               }
>> +
>> +               /* Accumulate any response error bits seen */
>> +               if (resp_errs)
>> +                       *resp_errs |= status;
>> +
>> +               if (mmc_blk_in_tran_state(status))
>> +                       break;
>> +
>> +               /* Timeout if the device never becomes ready */
>> +               if (done) {
>> +                       pr_err("%s: Card stuck in wrong state! %s %s\n",
>> +                               mmc_hostname(card->host),
>> +                               req->rq_disk->disk_name, __func__);
>> +                       err = -ETIMEDOUT;
>> +                       break;
>> +               }
>> +       }
>> +
>> +       return err;
>> +}
>> +
>> +static int mmc_blk_send_stop(struct mmc_card *card)
>> +{
>> +       struct mmc_command cmd = {
>> +               .opcode = MMC_STOP_TRANSMISSION,
>> +               .flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_AC,
>> +       };
>> +
>> +       return mmc_wait_for_cmd(card->host, &cmd, 5);
>> +}
>> +
>> +static int mmc_blk_fix_state(struct mmc_card *card, struct request *req)
>> +{
>> +       int err;
>> +
>> +       mmc_retune_hold_now(card->host);
>> +
>> +       mmc_blk_send_stop(card);
>> +
>> +       err = mmc_blk_card_stuck(card, req, NULL);
>> +
>> +       mmc_retune_release(card->host);
>> +
>> +       return err;
>> +}
>> +
>> +static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
>> +{
>> +       int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       struct mmc_blk_request *brq = &mqrq->brq;
>> +       struct mmc_blk_data *md = mq->blkdata;
>> +       struct mmc_card *card = mq->card;
>> +       u32 status;
>> +       u32 blocks;
>> +       int err;
>> +
>> +       /*
>> +        * Status error bits might get lost during re-tuning so don't allow
>> +        * re-tuning yet.
>> +        */
>> +       mmc_retune_hold_now(card->host);
>> +
>> +       /*
>> +        * Some errors the host driver might not have seen. Set the number of
>> +        * bytes transferred to zero in that case.
>> +        */
>> +       err = __mmc_send_status(card, &status, 0);
>> +       if (err || mmc_blk_status_error(req, status))
>> +               brq->data.bytes_xfered = 0;
>> +
>> +       mmc_retune_release(card->host);
>> +
>> +       /*
>> +        * Try again to get the status. This also provides an opportunity for
>> +        * re-tuning.
>> +        */
>> +       if (err)
>> +               err = __mmc_send_status(card, &status, 0);
>> +
>> +       /*
>> +        * Nothing more to do after the number of bytes transferred has been
>> +        * updated and there is no card.
>> +        */
>> +       if (err && mmc_detect_card_removed(card->host))
>> +               return;
>> +
>> +       /* Try to get back to "tran" state */
>> +       if (err || !mmc_blk_in_tran_state(status))
>> +               err = mmc_blk_fix_state(mq->card, req);
>> +
>> +       /*
>> +        * Special case for SD cards where the card might record the number of
>> +        * blocks written.
>> +        */
>> +       if (!err && mmc_blk_cmd_started(brq) && mmc_card_sd(card) &&
>> +           rq_data_dir(req) == WRITE && !mmc_sd_num_wr_blocks(card, &blocks))
>> +               brq->data.bytes_xfered = blocks << 9;
>> +
>> +       /* Reset if the card is in a bad state */
>> +       if (err && mmc_blk_reset(md, card->host, type)) {
>> +               pr_err("%s: recovery failed!\n", req->rq_disk->disk_name);
>> +               mqrq->retries = MMC_NO_RETRIES;
>> +               return;
>> +       }
>> +
>> +       /*
>> +        * If anything was done, just return and if there is anything remaining
>> +        * on the request it will get requeued.
>> +        */
>> +       if (brq->data.bytes_xfered)
>> +               return;
>> +
>> +       /* Reset before last retry */
>> +       if (mqrq->retries + 1 == MMC_MAX_RETRIES)
>> +               mmc_blk_reset(md, card->host, type);
>> +
>> +       /* Command errors fail fast, so use all MMC_MAX_RETRIES */
>> +       if (brq->sbc.error || brq->cmd.error)
>> +               return;
>> +
>> +       /* Reduce the remaining retries for data errors */
>> +       if (mqrq->retries < MMC_MAX_RETRIES - MMC_DATA_RETRIES) {
>> +               mqrq->retries = MMC_MAX_RETRIES - MMC_DATA_RETRIES;
>> +               return;
>> +       }
>> +
>> +       /* FIXME: Missing single sector read for large sector size */
>> +       if (rq_data_dir(req) == READ && !mmc_large_sector(card)) {
>> +               /* Read one sector at a time */
>> +               mmc_blk_ss_read(mq, req);
>> +               return;
>> +       }
>> +}
>> +
>> +static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq)
>> +{
>> +       mmc_blk_eval_resp_error(brq);
>> +
>> +       return brq->sbc.error || brq->cmd.error || brq->stop.error ||
>> +              brq->data.error || brq->cmd.resp[0] & CMD_ERRORS;
>> +}
>> +
>> +static int mmc_blk_card_busy(struct mmc_card *card, struct request *req)
>> +{
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       u32 status = 0;
>> +       int err;
>> +
>> +       if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ)
>> +               return 0;
>> +
>> +       mmc_retune_hold_now(card->host);
>> +
>> +       err = mmc_blk_card_stuck(card, req, &status);
>> +
>> +       mmc_retune_release(card->host);
>> +
>> +       /*
>> +        * Do not assume data transferred correctly if there are any error bits
>> +        * set.
>> +        */
>> +       if (!err && status & mmc_blk_stop_err_bits(&mqrq->brq)) {
>> +               mqrq->brq.data.bytes_xfered = 0;
>> +               err = -EIO;
>> +       }
>> +
>> +       /* Copy the exception bit so it will be seen later on */
>> +       if (mmc_card_mmc(card) && status & R1_EXCEPTION_EVENT)
>> +               mqrq->brq.cmd.resp[0] |= R1_EXCEPTION_EVENT;
>> +
>> +       return err;
>> +}
>> +
>> +static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq,
>> +                                           struct request *req)
>> +{
>> +       int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
>> +
>> +       mmc_blk_reset_success(mq->blkdata, type);
>> +}
> 
> I understand that all the above new line and code (around 300 lines)
> is something you need for the blkmq support, in the rest of this
> patch.
> 
> However, it looks like you are adding completely new code that either
> already exists in the legacy request path (in some slightly different
> format), or could serve as clean up/re-factorization of the legacy
> request path.

The legacy error handling is a non-starter.  Messy and convoluted.  There is
no reason to keep using it.  Structurally it doesn't fit with blk-mq because
it is split in different places and mixes in logic that is not related with
error-handling.

> 
> This is not the way you should format a path for converting to blkmq.
> The reasons are:
> *) It makes it hard to review.

But the few comments in this email suggest you have spent maybe 30 minutes.
It really looks like you haven't tried.  Where are the questions?

> **) There is no need to through away *all* old mmc blk/core code,
> which I assume is your plan for next step. Instead the proper way is
> to re-factor it, an make it possible to re-use those parts that makes
> sense.

That is just nonsense.  We definitely want to throw away the messy
convoluted legacy code.  The new code is much simpler.  Did you read it?

> 
>> +
>> +static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req)
>> +{
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       unsigned int nr_bytes = mqrq->brq.data.bytes_xfered;
>> +
>> +       if (nr_bytes) {
>> +               if (blk_update_request(req, BLK_STS_OK, nr_bytes))
>> +                       blk_mq_requeue_request(req, true);
>> +               else
>> +                       __blk_mq_end_request(req, BLK_STS_OK);
>> +       } else if (mqrq->retries++ < MMC_MAX_RETRIES) {
>> +               blk_mq_requeue_request(req, true);
>> +       } else {
>> +               if (mmc_card_removed(mq->card))
>> +                       req->rq_flags |= RQF_QUIET;
>> +               blk_mq_end_request(req, BLK_STS_IOERR);
>> +       }
>> +}
>> +
>> +static bool mmc_blk_urgent_bkops_needed(struct mmc_queue *mq,
>> +                                       struct mmc_queue_req *mqrq)
>> +{
>> +       return mmc_card_mmc(mq->card) &&
>> +              (mqrq->brq.cmd.resp[0] & R1_EXCEPTION_EVENT ||
>> +               mqrq->brq.stop.resp[0] & R1_EXCEPTION_EVENT);
>> +}
>> +
>> +static void mmc_blk_urgent_bkops(struct mmc_queue *mq,
>> +                                struct mmc_queue_req *mqrq)
>> +{
>> +       if (mmc_blk_urgent_bkops_needed(mq, mqrq))
>> +               mmc_start_bkops(mq->card, true);
>> +}
> 
> Ditto for the two above functions.

Again it is the same functionality.  In what way is it different?

> 
>> +
>> +void mmc_blk_mq_complete(struct request *req)
>> +{
>> +       struct mmc_queue *mq = req->q->queuedata;
>> +
>> +       mmc_blk_mq_complete_rq(mq, req);
>> +}
>> +
>> +static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
>> +                                      struct request *req)
>> +{
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       struct mmc_host *host = mq->card->host;
>> +       bool failed;
>> +
>> +       failed = mmc_blk_rq_error(&mqrq->brq) ||
>> +                mmc_blk_card_busy(mq->card, req);
>> +
>> +       if (!mmc_queue_direct_complete(host))
> 
> Can you please make the changes related to completing the request in
> the mmc_request_done() into a separate patch.
> 
> It's better to first get the default behavior in place, then we can
> improve things on top. Again, that also makes it easier to review.

I am not sure what you mean.  Did you read the code?
mmc_queue_direct_complete() is a capability that is not yet used, the
default behavior is preserved the way you asked.  You can keep asked for the
same lines of code to be in different patches, but it isn't going to change
the code.

> 
> No I am giving up reaching this point. I didn't even get the actual
> blkmq converting part, which is the core part I should be spending my
> time to review. Sorry!

It looks much more like you are not trying.  Not a single technical issue
raised!  Not a single technical question!

> 
> Some final comments around the system-wide PM support below.
> 
> [...]
> 
>>
>> +static void mmc_mq_queue_suspend(struct mmc_queue *mq)
>> +{
>> +       blk_mq_quiesce_queue(mq->queue);
>> +
>> +       /*
>> +        * The host remains claimed while there are outstanding requests, so
>> +        * simply claiming and releasing here ensures there are none.
>> +        */
>> +       mmc_claim_host(mq->card->host);
>> +       mmc_release_host(mq->card->host);
> 
> This looks fragile.

It isn't.  The driver would already be broken if we could have requests in
flight but not have the host claimed.

> 
> Seems like an interface in blkmq with flushes the queue and suspend it
> is missing, however there are of course reasons why it doesn't exist.
> 
> I assume the blk_mq_quiesce_queue() guarantees no new requests is
> being pushed to us after calling it, but it still seem a bit racy to
> rely on the host claim/release thing.

Race with what?

> 
> Let's bring this up as question for the block people experts.

You have had almost a year to learn about blk-mq.  What have you been doing?

And what are you going to do to get an answer from "the block people
experts" - there is a good chance few of those cc'ed will read this.

This all looks like a weak excuse to do nothing.

You need to stop playing games.  If you are not prepared to put the effort
in, then let the CQE code go in.  After all the dependency is a figment of
your imagination.  CQE has been ready to go for ages.  Why are you making up
reasons for delaying it?

Another possibility is to make me maintainer of the block driver since it
seems I am the one that knows most about it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 2/5] mmc: block: Add blk-mq support
  2017-10-27 11:54     ` Adrian Hunter
@ 2017-10-27 13:44       ` Ulf Hansson
  0 siblings, 0 replies; 12+ messages in thread
From: Ulf Hansson @ 2017-10-27 13:44 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

>>>
>>> -#define CMD_ERRORS                                                     \
>>> -       (R1_OUT_OF_RANGE |      /* Command argument out of range */     \
>>> -        R1_ADDRESS_ERROR |     /* Misaligned address */                \
>>> +#define CMD_ERRORS_EXCL_OOR                                            \
>>> +       (R1_ADDRESS_ERROR |     /* Misaligned address */                \
>>
>> This looks unrelated to blkmq support.
>
> No, it is preserving existing oor functionality.

So why change it in this patch?

[...]

>>> +       /*
>>> +        * Do not assume data transferred correctly if there are any error bits
>>> +        * set.
>>> +        */
>>> +       if (!err && status & mmc_blk_stop_err_bits(&mqrq->brq)) {
>>> +               mqrq->brq.data.bytes_xfered = 0;
>>> +               err = -EIO;
>>> +       }
>>> +
>>> +       /* Copy the exception bit so it will be seen later on */
>>> +       if (mmc_card_mmc(card) && status & R1_EXCEPTION_EVENT)
>>> +               mqrq->brq.cmd.resp[0] |= R1_EXCEPTION_EVENT;
>>> +
>>> +       return err;
>>> +}
>>> +
>>> +static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq,
>>> +                                           struct request *req)
>>> +{
>>> +       int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
>>> +
>>> +       mmc_blk_reset_success(mq->blkdata, type);
>>> +}
>>
>> I understand that all the above new line and code (around 300 lines)
>> is something you need for the blkmq support, in the rest of this
>> patch.
>>
>> However, it looks like you are adding completely new code that either
>> already exists in the legacy request path (in some slightly different
>> format), or could serve as clean up/re-factorization of the legacy
>> request path.
>
> The legacy error handling is a non-starter.  Messy and convoluted.  There is
> no reason to keep using it.  Structurally it doesn't fit with blk-mq because
> it is split in different places and mixes in logic that is not related with
> error-handling.

No matter you like it or not, the legacy request error path is what we got now.

The way forward is not to just through it away and start over, but
instead to take small steps and make parts useful also for blkmq.

Yes, it's more work, but on the other hand, small changes which moves
things in the right direction are also more easy to review.

So, no, sorry, I don't buy it!

>
>>
>> This is not the way you should format a path for converting to blkmq.
>> The reasons are:
>> *) It makes it hard to review.
>
> But the few comments in this email suggest you have spent maybe 30 minutes.
> It really looks like you haven't tried.  Where are the questions?

On this version, I spent a few hours on it, but really, it isn't easy
to look at ~1000 new lines in one single patch, especially changes
that re-implements the entire mmc request layer.

>
>> **) There is no need to through away *all* old mmc blk/core code,
>> which I assume is your plan for next step. Instead the proper way is
>> to re-factor it, an make it possible to re-use those parts that makes
>> sense.
>
> That is just nonsense.  We definitely want to throw away the messy
> convoluted legacy code.  The new code is much simpler.  Did you read it?

Yes, the new code is simpler, but as stated, re-factoring some of the
old code that can be re-used, is the way forward.

[...]

>>> +static bool mmc_blk_urgent_bkops_needed(struct mmc_queue *mq,
>>> +                                       struct mmc_queue_req *mqrq)
>>> +{
>>> +       return mmc_card_mmc(mq->card) &&
>>> +              (mqrq->brq.cmd.resp[0] & R1_EXCEPTION_EVENT ||
>>> +               mqrq->brq.stop.resp[0] & R1_EXCEPTION_EVENT);
>>> +}
>>> +
>>> +static void mmc_blk_urgent_bkops(struct mmc_queue *mq,
>>> +                                struct mmc_queue_req *mqrq)
>>> +{
>>> +       if (mmc_blk_urgent_bkops_needed(mq, mqrq))
>>> +               mmc_start_bkops(mq->card, true);
>>> +}
>>
>> Ditto for the two above functions.
>
> Again it is the same functionality.  In what way is it different?

These checks are being done also in the legacy path, so in principle
adding these and not using them there, means open-coding to me.

>
>>
>>> +
>>> +void mmc_blk_mq_complete(struct request *req)
>>> +{
>>> +       struct mmc_queue *mq = req->q->queuedata;
>>> +
>>> +       mmc_blk_mq_complete_rq(mq, req);
>>> +}
>>> +
>>> +static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
>>> +                                      struct request *req)
>>> +{
>>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>>> +       struct mmc_host *host = mq->card->host;
>>> +       bool failed;
>>> +
>>> +       failed = mmc_blk_rq_error(&mqrq->brq) ||
>>> +                mmc_blk_card_busy(mq->card, req);
>>> +
>>> +       if (!mmc_queue_direct_complete(host))
>>
>> Can you please make the changes related to completing the request in
>> the mmc_request_done() into a separate patch.
>>
>> It's better to first get the default behavior in place, then we can
>> improve things on top. Again, that also makes it easier to review.
>
> I am not sure what you mean.  Did you read the code?
> mmc_queue_direct_complete() is a capability that is not yet used, the
> default behavior is preserved the way you asked.  You can keep asked for the
> same lines of code to be in different patches, but it isn't going to change
> the code.

Let me try to make this more clear then.

This is about keeping patches reviewable and about making one change per patch.

More precisely, move all code related to MMC_CAP_DIRECT_COMPLETE (the
new cap which you introduce in this patch), into a separate patch on
top of this one. That makes it both easy to review this patch, but
also the one that introduces the new cap.

>
>>
>> No I am giving up reaching this point. I didn't even get the actual
>> blkmq converting part, which is the core part I should be spending my
>> time to review. Sorry!
>
> It looks much more like you are not trying.  Not a single technical issue
> raised!  Not a single technical question!
>
>>
>> Some final comments around the system-wide PM support below.
>>
>> [...]
>>
>>>
>>> +static void mmc_mq_queue_suspend(struct mmc_queue *mq)
>>> +{
>>> +       blk_mq_quiesce_queue(mq->queue);
>>> +
>>> +       /*
>>> +        * The host remains claimed while there are outstanding requests, so
>>> +        * simply claiming and releasing here ensures there are none.
>>> +        */
>>> +       mmc_claim_host(mq->card->host);
>>> +       mmc_release_host(mq->card->host);
>>
>> This looks fragile.
>
> It isn't.  The driver would already be broken if we could have requests in
> flight but not have the host claimed.

Yes, but this boils done to that I am not sure what
blk_mq_quiesce_queue() really means from dispatching point of view.

Would it possible that a dispatched request became preempted just
before we was about to call mmc_claim_host() for it, thus the above
mmc_claim_host() claims the host before?

Are you sure that can't happen?

>
>>
>> Seems like an interface in blkmq with flushes the queue and suspend it
>> is missing, however there are of course reasons why it doesn't exist.
>>
>> I assume the blk_mq_quiesce_queue() guarantees no new requests is
>> being pushed to us after calling it, but it still seem a bit racy to
>> rely on the host claim/release thing.
>
> Race with what?

See above.

>
>>
>> Let's bring this up as question for the block people experts.
>
> You have had almost a year to learn about blk-mq.  What have you been doing?

Sorry, I haven't looked closely enough at the PM support. Obviously my bad.

>
> And what are you going to do to get an answer from "the block people
> experts" - there is a good chance few of those cc'ed will read this.
>
> This all looks like a weak excuse to do nothing.
>
> You need to stop playing games.  If you are not prepared to put the effort
> in, then let the CQE code go in.  After all the dependency is a figment of
> your imagination.  CQE has been ready to go for ages.  Why are you making up
> reasons for delaying it?
>
> Another possibility is to make me maintainer of the block driver since it
> seems I am the one that knows most about it.

Adrian, I have appreciated your contributions to mmc under a very long
time, and I still am. I don't doubt your expertise in the area!

However, it seems like you don't want to listen to peoples opinions
regarding this series - and I really don't get what's the deal. A man
of your skills should easily be able to address those comments, but
you have mostly been arguing and chosen to ignore them.

As long as I have something to say in this community, I will never
accept poorly written patches, at least that goes for the mmc core
layer.

That said, if other people think are better suited than me, please go
ahead and make a formal suggestion.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-27 13:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-24  8:40 [PATCH V12 0/5] mmc: Add Command Queue support Adrian Hunter
2017-10-24  8:40 ` [PATCH V12 1/5] mmc: core: Add parameter use_blk_mq Adrian Hunter
2017-10-24  8:40 ` [PATCH V12 2/5] mmc: block: Add blk-mq support Adrian Hunter
2017-10-27  9:23   ` Ulf Hansson
2017-10-27 11:54     ` Adrian Hunter
2017-10-27 13:44       ` Ulf Hansson
2017-10-24  8:40 ` [PATCH V12 3/5] mmc: block: Add CQE support Adrian Hunter
2017-10-24  8:40 ` [PATCH V12 4/5] mmc: cqhci: support for command queue enabled host Adrian Hunter
2017-10-24  8:40 ` [PATCH V12 5/5] mmc: sdhci-pci: Add CQHCI support for Intel GLK Adrian Hunter
2017-10-26 13:32 ` [PATCH V12 0/5] mmc: Add Command Queue support Linus Walleij
2017-10-26 13:49   ` Adrian Hunter
2017-10-26 14:25     ` Linus Walleij

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.