All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V13 00/10] mmc: Add Command Queue support
@ 2017-11-03 13:20 Adrian Hunter
  2017-11-03 13:20 ` [PATCH V13 01/10] mmc: core: Add parameter use_blk_mq Adrian Hunter
                   ` (10 more replies)
  0 siblings, 11 replies; 55+ messages in thread
From: Adrian Hunter @ 2017-11-03 13:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

Hi

Here is V13 of the hardware command queue patches without the software
command queue patches, now using blk-mq and now with blk-mq support for
non-CQE I/O.

HW CMDQ offers 25% - 50% better random multi-threaded I/O.  I see a slight
2% drop in sequential read speed but no change to sequential write.

Non-CQE blk-mq showed a 3% decrease in sequential read performance.  This
seemed to be coming from the inferior latency of running work items compared
with a dedicated thread.  Hacking blk-mq workqueue to be unbound reduced the
performance degradation from 3% to 1%.

While we should look at changing blk-mq to give better workqueue performance,
a bigger gain is likely to be made by adding a new host API to enable the
next already-prepared request to be issued directly from within ->done()
callback of the current request.


Changes since V12:
      mmc: block: Add error-handling comments
	New patch.
      mmc: block: Add blk-mq support
	Use legacy error handling
      mmc: block: Add CQE support
	Re-base
      mmc: block: blk-mq: Add support for direct completion
	New patch.
      mmc: block: blk-mq: Separate card polling from recovery
	New patch.
      mmc: block: blk-mq: Stop using card_busy_detect()
	New patch.
      mmc: block: blk-mq: Stop using legacy recovery
	New patch.

Changes since V11:
      Split "mmc: block: Add CQE and blk-mq support" into 2 patches

Changes since V10:
      mmc: core: Remove unnecessary host claim
      mmc: core: Introduce host claiming by context
      mmc: core: Add support for handling CQE requests
      mmc: mmc: Enable Command Queuing
      mmc: mmc: Enable CQE's
      mmc: block: Use local variables in mmc_blk_data_prep()
      mmc: block: Prepare CQE data
      mmc: block: Factor out mmc_setup_queue()
      mmc: core: Add parameter use_blk_mq
      mmc: core: Export mmc_start_bkops()
      mmc: core: Export mmc_start_request()
      mmc: core: Export mmc_retune_hold_now() and mmc_retune_release()
	Dropped because they have been applied
      mmc: block: Add CQE and blk-mq support
	Extend blk-mq support for asynchronous read / writes to all host
	controllers including those that require polling. The direct
	completion path is still available but depends on a new capability
	flag.
	Drop blk-mq support for synchronous read / writes.

Venkat Gopalakrishnan (1):
      mmc: cqhci: support for command queue enabled host

Changes since V9:
      mmc: block: Add CQE and blk-mq support
	- reinstate mq support for REQ_OP_DRV_IN/OUT that was removed because
	it was incorrectly assumed to be handled by the rpmb character device
	- don't check for rpmb block device anymore
      mmc: cqhci: support for command queue enabled host
	Fix cqhci_set_irqs() as per Haibo Chen

Changes since V8:
	Re-based
      mmc: core: Introduce host claiming by context
	Slightly simplified as per Ulf
      mmc: core: Export mmc_retune_hold_now() and mmc_retune_release()
	New patch.
      mmc: block: Add CQE and blk-mq support
	Fix missing ->post_req() on the error path

Changes since V7:
	Re-based
      mmc: core: Introduce host claiming by context
	Slightly simplified
      mmc: core: Add parameter use_blk_mq
	New patch.
      mmc: core: Remove unnecessary host claim
	New patch.
      mmc: core: Export mmc_start_bkops()
	New patch.
      mmc: core: Export mmc_start_request()
	New patch.
      mmc: block: Add CQE and blk-mq support
	Add blk-mq support for non_CQE requests

Changes since V6:
      mmc: core: Introduce host claiming by context
	New patch.
      mmc: core: Move mmc_start_areq() declaration
	Dropped because it has been applied
      mmc: block: Fix block status codes
	Dropped because it has been applied
      mmc: host: Add CQE interface
	Dropped because it has been applied
      mmc: core: Turn off CQE before sending commands
	Dropped because it has been applied
      mmc: block: Factor out mmc_setup_queue()
	New patch.
      mmc: block: Add CQE support
	Drop legacy support and add blk-mq support

Changes since V5:
	Re-based
      mmc: core: Add mmc_retune_hold_now()
	Dropped because it has been applied
      mmc: core: Add members to mmc_request and mmc_data for CQE's
	Dropped because it has been applied
      mmc: core: Move mmc_start_areq() declaration
	New patch at Ulf's request
      mmc: block: Fix block status codes
	Another un-related patch
      mmc: host: Add CQE interface
	Move recovery_notifier() callback to struct mmc_request
      mmc: core: Add support for handling CQE requests
	Roll __mmc_cqe_request_done() into mmc_cqe_request_done()
	Move function declarations requested by Ulf
      mmc: core: Remove unused MMC_CAP2_PACKED_CMD
	Dropped because it has been applied
      mmc: block: Add CQE support
	Add explanation to commit message
	Adjustment for changed recovery_notifier() callback
      mmc: cqhci: support for command queue enabled host
	Adjustment for changed recovery_notifier() callback
      mmc: sdhci-pci: Add CQHCI support for Intel GLK
	Add DCMD capability for Intel controllers except GLK

Changes since V4:
      mmc: core: Add mmc_retune_hold_now()
	Add explanation to commit message.
      mmc: host: Add CQE interface
	Add comments to callback declarations.
      mmc: core: Turn off CQE before sending commands
	Add explanation to commit message.
      mmc: core: Add support for handling CQE requests
	Add comments as requested by Ulf.
      mmc: core: Remove unused MMC_CAP2_PACKED_CMD
	New patch.
      mmc: mmc: Enable Command Queuing
	Adjust for removal of MMC_CAP2_PACKED_CMD.
	Add a comment about Packed Commands.
      mmc: mmc: Enable CQE's
	Remove un-necessary check for MMC_CAP2_CQE
      mmc: block: Use local variables in mmc_blk_data_prep()
	New patch.
      mmc: block: Prepare CQE data
	Adjust due to "mmc: block: Use local variables in mmc_blk_data_prep()"
	Remove priority setting.
	Add explanation to commit message.
      mmc: cqhci: support for command queue enabled host
	Fix transfer descriptor setting in cqhci_set_tran_desc() for 32-bit DMA

Changes since V3:
	Adjusted ...blk_end_request...() for new block status codes
	Fixed CQHCI transaction descriptor for "no DCMD" case

Changes since V2:
	Dropped patches that have been applied.
	Re-based
	Added "mmc: sdhci-pci: Add CQHCI support for Intel GLK"

Changes since V1:

	"Share mmc request array between partitions" is dependent
	on changes in "Introduce queue semantics", so added that
	and block fixes:

	Added "Fix is_waiting_last_req set incorrectly"
	Added "Fix cmd error reset failure path"
	Added "Use local var for mqrq_cur"
	Added "Introduce queue semantics"

Changes since RFC:

	Re-based on next.
	Added comment about command queue priority.
	Added some acks and reviews.


Adrian Hunter (9):
      mmc: core: Add parameter use_blk_mq
      mmc: block: Add error-handling comments
      mmc: block: Add blk-mq support
      mmc: block: Add CQE support
      mmc: sdhci-pci: Add CQHCI support for Intel GLK
      mmc: block: blk-mq: Add support for direct completion
      mmc: block: blk-mq: Separate card polling from recovery
      mmc: block: blk-mq: Stop using card_busy_detect()
      mmc: block: blk-mq: Stop using legacy recovery

Venkat Gopalakrishnan (1):
      mmc: cqhci: support for command queue enabled host

 drivers/mmc/Kconfig               |   11 +
 drivers/mmc/core/block.c          |  850 ++++++++++++++++++++++++++-
 drivers/mmc/core/block.h          |   12 +
 drivers/mmc/core/core.c           |    7 +
 drivers/mmc/core/core.h           |    2 +
 drivers/mmc/core/host.c           |    2 +
 drivers/mmc/core/host.h           |    4 +
 drivers/mmc/core/queue.c          |  426 +++++++++++++-
 drivers/mmc/core/queue.h          |   56 ++
 drivers/mmc/host/Kconfig          |   14 +
 drivers/mmc/host/Makefile         |    1 +
 drivers/mmc/host/cqhci.c          | 1150 +++++++++++++++++++++++++++++++++++++
 drivers/mmc/host/cqhci.h          |  240 ++++++++
 drivers/mmc/host/sdhci-pci-core.c |  155 ++++-
 include/linux/mmc/host.h          |    2 +
 15 files changed, 2900 insertions(+), 32 deletions(-)
 create mode 100644 drivers/mmc/host/cqhci.c
 create mode 100644 drivers/mmc/host/cqhci.h

 
Adrian Hunter (4):
      mmc: core: Add parameter use_blk_mq
      mmc: block: Add blk-mq support
      mmc: block: Add CQE support
      mmc: sdhci-pci: Add CQHCI support for Intel GLK

Venkat Gopalakrishnan (1):
      mmc: cqhci: support for command queue enabled host

 drivers/mmc/Kconfig               |   11 +
 drivers/mmc/core/block.c          |  801 +++++++++++++++++++++++++-
 drivers/mmc/core/block.h          |   12 +
 drivers/mmc/core/core.c           |    7 +
 drivers/mmc/core/core.h           |    2 +
 drivers/mmc/core/host.c           |    2 +
 drivers/mmc/core/host.h           |    4 +
 drivers/mmc/core/queue.c          |  426 +++++++++++++-
 drivers/mmc/core/queue.h          |   56 ++
 drivers/mmc/host/Kconfig          |   14 +
 drivers/mmc/host/Makefile         |    1 +
 drivers/mmc/host/cqhci.c          | 1150 +++++++++++++++++++++++++++++++++++++
 drivers/mmc/host/cqhci.h          |  240 ++++++++
 drivers/mmc/host/sdhci-pci-core.c |  155 ++++-
 include/linux/mmc/host.h          |    2 +
 15 files changed, 2852 insertions(+), 31 deletions(-)
 create mode 100644 drivers/mmc/host/cqhci.c
 create mode 100644 drivers/mmc/host/cqhci.h


Regards
Adrian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH V13 01/10] mmc: core: Add parameter use_blk_mq
  2017-11-03 13:20 [PATCH V13 00/10] mmc: Add Command Queue support Adrian Hunter
@ 2017-11-03 13:20 ` Adrian Hunter
  2017-11-06  8:38   ` Linus Walleij
  2017-11-03 13:20 ` [PATCH V13 02/10] mmc: block: Add error-handling comments Adrian Hunter
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-03 13:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

Until mmc has blk-mq support fully implemented and tested, add a
parameter use_blk_mq, default to false unless config option MMC_MQ_DEFAULT
is selected.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/Kconfig      | 11 +++++++++++
 drivers/mmc/core/core.c  |  7 +++++++
 drivers/mmc/core/core.h  |  2 ++
 drivers/mmc/core/host.c  |  2 ++
 drivers/mmc/core/host.h  |  4 ++++
 include/linux/mmc/host.h |  1 +
 6 files changed, 27 insertions(+)

diff --git a/drivers/mmc/Kconfig b/drivers/mmc/Kconfig
index ec21388311db..98202934bd29 100644
--- a/drivers/mmc/Kconfig
+++ b/drivers/mmc/Kconfig
@@ -12,6 +12,17 @@ menuconfig MMC
 	  If you want MMC/SD/SDIO support, you should say Y here and
 	  also to your specific host controller driver.
 
+config MMC_MQ_DEFAULT
+	bool "MMC: use blk-mq I/O path by default"
+	depends on MMC && BLOCK
+	---help---
+	  This option enables the new blk-mq based I/O path for MMC block
+	  devices by default.  With the option the mmc_core.use_blk_mq
+	  module/boot option defaults to Y, without it to N, but it can
+	  still be overridden either way.
+
+	  If unsure say N.
+
 if MMC
 
 source "drivers/mmc/core/Kconfig"
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 1f0f44f4dd5f..5df03cb73be7 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -66,6 +66,13 @@
 bool use_spi_crc = 1;
 module_param(use_spi_crc, bool, 0);
 
+#ifdef CONFIG_MMC_MQ_DEFAULT
+bool mmc_use_blk_mq = true;
+#else
+bool mmc_use_blk_mq = false;
+#endif
+module_param_named(use_blk_mq, mmc_use_blk_mq, bool, S_IWUSR | S_IRUGO);
+
 static int mmc_schedule_delayed_work(struct delayed_work *work,
 				     unsigned long delay)
 {
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index 71e6c6d7ceb7..8c5dd8d31400 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -35,6 +35,8 @@ struct mmc_bus_ops {
 	int (*reset)(struct mmc_host *);
 };
 
+extern bool mmc_use_blk_mq;
+
 void mmc_attach_bus(struct mmc_host *host, const struct mmc_bus_ops *ops);
 void mmc_detach_bus(struct mmc_host *host);
 
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index 35a9e4fd1a9f..62ef6cb0ece4 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -404,6 +404,8 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev)
 
 	host->fixed_drv_type = -EINVAL;
 
+	host->use_blk_mq = mmc_use_blk_mq;
+
 	return host;
 }
 
diff --git a/drivers/mmc/core/host.h b/drivers/mmc/core/host.h
index fb689a1065ed..6eaf558e62d6 100644
--- a/drivers/mmc/core/host.h
+++ b/drivers/mmc/core/host.h
@@ -74,6 +74,10 @@ static inline bool mmc_card_hs400es(struct mmc_card *card)
 	return card->host->ios.enhanced_strobe;
 }
 
+static inline bool mmc_host_use_blk_mq(struct mmc_host *host)
+{
+	return host->use_blk_mq;
+}
 
 #endif
 
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index e7743eca1021..ce2075d6f429 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -380,6 +380,7 @@ struct mmc_host {
 	unsigned int		doing_retune:1;	/* re-tuning in progress */
 	unsigned int		retune_now:1;	/* do re-tuning at next req */
 	unsigned int		retune_paused:1; /* re-tuning is temporarily disabled */
+	unsigned int		use_blk_mq:1;	/* use blk-mq */
 
 	int			rescan_disable;	/* disable card detection */
 	int			rescan_entered;	/* used with nonremovable devices */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH V13 02/10] mmc: block: Add error-handling comments
  2017-11-03 13:20 [PATCH V13 00/10] mmc: Add Command Queue support Adrian Hunter
  2017-11-03 13:20 ` [PATCH V13 01/10] mmc: core: Add parameter use_blk_mq Adrian Hunter
@ 2017-11-03 13:20 ` Adrian Hunter
  2017-11-06  8:39   ` Linus Walleij
  2017-11-03 13:20 ` [PATCH V13 03/10] mmc: block: Add blk-mq support Adrian Hunter
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-03 13:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

Add error-handling comments to explain what would also be done for blk-mq
if it used the legacy error-handling.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/core/block.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index ea80ff4cd7f9..ad72fa19f082 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1894,7 +1894,11 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
 		case MMC_BLK_SUCCESS:
 		case MMC_BLK_PARTIAL:
 			/*
-			 * A block was successfully transferred.
+			 * Reset success, and accept bytes_xfered. For
+			 * MMC_BLK_PARTIAL re-submit the remaining request. For
+			 * MMC_BLK_SUCCESS error out the remaining request (it
+			 * could not be re-submitted anyway if a next request
+			 * had already begun).
 			 */
 			mmc_blk_reset_success(md, type);
 
@@ -1914,6 +1918,14 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
 			}
 			break;
 		case MMC_BLK_CMD_ERR:
+			/*
+			 * For SD cards, get bytes written, but do not accept
+			 * bytes_xfered if that fails. For MMC cards accept
+			 * bytes_xfered. Then try to reset. If reset fails then
+			 * error out the remaining request, otherwise retry
+			 * once (N.B mmc_blk_reset() will not succeed twice in a
+			 * row).
+			 */
 			req_pending = mmc_blk_rw_cmd_err(md, card, brq, old_req, req_pending);
 			if (mmc_blk_reset(md, card->host, type)) {
 				if (req_pending)
@@ -1930,11 +1942,20 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
 			}
 			break;
 		case MMC_BLK_RETRY:
+			/*
+			 * Do not accept bytes_xfered, but retry up to 5 times,
+			 * otherwise same as abort.
+			 */
 			retune_retry_done = brq->retune_retry_done;
 			if (retry++ < 5)
 				break;
 			/* Fall through */
 		case MMC_BLK_ABORT:
+			/*
+			 * Do not accept bytes_xfered, but try to reset. If
+			 * reset succeeds, try once more, otherwise error out
+			 * the request.
+			 */
 			if (!mmc_blk_reset(md, card->host, type))
 				break;
 			mmc_blk_rw_cmd_abort(mq, card, old_req, mq_rq);
@@ -1943,6 +1964,13 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
 		case MMC_BLK_DATA_ERR: {
 			int err;
 
+			/*
+			 * Do not accept bytes_xfered, but try to reset. If
+			 * reset succeeds, try once more. If reset fails with
+			 * ENODEV which means the partition is wrong, then error
+			 * out the request. Otherwise attempt to read one sector
+			 * at a time.
+			 */
 			err = mmc_blk_reset(md, card->host, type);
 			if (!err)
 				break;
@@ -1954,6 +1982,10 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
 			/* Fall through */
 		}
 		case MMC_BLK_ECC_ERR:
+			/*
+			 * Do not accept bytes_xfered. If reading more than one
+			 * sector, try reading one sector at a time.
+			 */
 			if (brq->data.blocks > 1) {
 				/* Redo read one sector at a time */
 				pr_warn("%s: retrying using single block read\n",
@@ -1975,10 +2007,12 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req)
 			}
 			break;
 		case MMC_BLK_NOMEDIUM:
+			/* Do not accept bytes_xfered. Error out the request */
 			mmc_blk_rw_cmd_abort(mq, card, old_req, mq_rq);
 			mmc_blk_rw_try_restart(mq, new_req, mqrq_cur);
 			return;
 		default:
+			/* Do not accept bytes_xfered. Error out the request */
 			pr_err("%s: Unhandled return value (%d)",
 					old_req->rq_disk->disk_name, status);
 			mmc_blk_rw_cmd_abort(mq, card, old_req, mq_rq);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH V13 03/10] mmc: block: Add blk-mq support
  2017-11-03 13:20 [PATCH V13 00/10] mmc: Add Command Queue support Adrian Hunter
  2017-11-03 13:20 ` [PATCH V13 01/10] mmc: core: Add parameter use_blk_mq Adrian Hunter
  2017-11-03 13:20 ` [PATCH V13 02/10] mmc: block: Add error-handling comments Adrian Hunter
@ 2017-11-03 13:20 ` Adrian Hunter
  2017-11-08  8:54   ` Linus Walleij
  2017-11-03 13:20 ` [PATCH V13 04/10] mmc: block: Add CQE support Adrian Hunter
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-03 13:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

Define and use a blk-mq queue. Discards and flushes are processed
synchronously, but reads and writes asynchronously. In order to support
slow DMA unmapping, DMA unmapping is not done until after the next request
is started. That means the request is not completed until then. If there is
no next request then the completion is done by queued work.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/core/block.c | 457 ++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/mmc/core/block.h |   9 +
 drivers/mmc/core/queue.c | 273 +++++++++++++++++++++++++---
 drivers/mmc/core/queue.h |  32 ++++
 4 files changed, 740 insertions(+), 31 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index ad72fa19f082..e2838ff4738e 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1264,7 +1264,10 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
 		break;
 	}
 	mq_rq->drv_op_result = ret;
-	blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+	if (req->mq_ctx)
+		blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+	else
+		blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
@@ -1307,7 +1310,10 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
 	else
 		mmc_blk_reset_success(md, type);
 fail:
-	blk_end_request(req, status, blk_rq_bytes(req));
+	if (req->mq_ctx)
+		blk_mq_end_request(req, status);
+	else
+		blk_end_request(req, status, blk_rq_bytes(req));
 }
 
 static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
@@ -1377,7 +1383,10 @@ static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq,
 	if (!err)
 		mmc_blk_reset_success(md, type);
 out:
-	blk_end_request(req, status, blk_rq_bytes(req));
+	if (req->mq_ctx)
+		blk_mq_end_request(req, status);
+	else
+		blk_end_request(req, status, blk_rq_bytes(req));
 }
 
 static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
@@ -1387,7 +1396,10 @@ static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req)
 	int ret = 0;
 
 	ret = mmc_flush_cache(card);
-	blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+	if (req->mq_ctx)
+		blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
+	else
+		blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
 }
 
 /*
@@ -1464,11 +1476,9 @@ static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq)
 	}
 }
 
-static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
-					     struct mmc_async_req *areq)
+static enum mmc_blk_status __mmc_blk_err_check(struct mmc_card *card,
+					       struct mmc_queue_req *mq_mrq)
 {
-	struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
-						    areq);
 	struct mmc_blk_request *brq = &mq_mrq->brq;
 	struct request *req = mmc_queue_req_to_req(mq_mrq);
 	int need_retune = card->host->need_retune;
@@ -1574,6 +1584,15 @@ static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
 	return MMC_BLK_SUCCESS;
 }
 
+static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
+					     struct mmc_async_req *areq)
+{
+	struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
+						    areq);
+
+	return __mmc_blk_err_check(card, mq_mrq);
+}
+
 static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
 			      int disable_multi, bool *do_rel_wr_p,
 			      bool *do_data_tag_p)
@@ -1766,6 +1785,428 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 	mqrq->areq.err_check = mmc_blk_err_check;
 }
 
+#define MMC_MAX_RETRIES		5
+#define MMC_NO_RETRIES		(MMC_MAX_RETRIES + 1)
+
+/* Single sector read during recovery */
+static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	blk_status_t status;
+
+	while (1) {
+		mmc_blk_rw_rq_prep(mqrq, mq->card, 1, mq);
+
+		mmc_wait_for_req(mq->card->host, &mqrq->brq.mrq);
+
+		/*
+		 * Not expecting command errors, so just give up in that case.
+		 * If there are retries remaining, the request will get
+		 * requeued.
+		 */
+		if (mqrq->brq.cmd.error)
+			return;
+
+		if (blk_rq_bytes(req) <= 512)
+			break;
+
+		status = mqrq->brq.data.error ? BLK_STS_IOERR : BLK_STS_OK;
+
+		blk_update_request(req, status, 512);
+	}
+
+	mqrq->retries = MMC_NO_RETRIES;
+}
+
+static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
+{
+	int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_blk_request *brq = &mqrq->brq;
+	struct mmc_blk_data *md = mq->blkdata;
+	struct mmc_card *card = mq->card;
+	static enum mmc_blk_status status;
+
+	brq->retune_retry_done = mqrq->retries;
+
+	status = __mmc_blk_err_check(card, mqrq);
+
+	mmc_retune_release(card->host);
+
+	/*
+	 * Requests are completed by mmc_blk_mq_complete_rq() which sets simple
+	 * policy:
+	 * 1. A request that has transferred at least some data is considered
+	 * successful and will be requeued if there is remaining data to
+	 * transfer.
+	 * 2. Otherwise the number of retries is incremented and the request
+	 * will be requeued if there are remaining retries.
+	 * 3. Otherwise the request will be errored out.
+	 * That means mmc_blk_mq_complete_rq() is controlled by bytes_xfered and
+	 * mqrq->retries. So there are only 4 possible actions here:
+	 *	1. do not accept the bytes_xfered value i.e. set it to zero
+	 *	2. change mqrq->retries to determine the number of retries
+	 *	3. try to reset the card
+	 *	4. read one sector at a time
+	 */
+	switch (status) {
+	case MMC_BLK_SUCCESS:
+	case MMC_BLK_PARTIAL:
+		/* Reset success, and accept bytes_xfered */
+		mmc_blk_reset_success(md, type);
+		break;
+	case MMC_BLK_CMD_ERR:
+		/*
+		 * For SD cards, get bytes written, but do not accept
+		 * bytes_xfered if that fails. For MMC cards accept
+		 * bytes_xfered. Then try to reset. If reset fails then
+		 * error out the remaining request, otherwise retry
+		 * once (N.B mmc_blk_reset() will not succeed twice in a
+		 * row).
+		 */
+		if (mmc_card_sd(card)) {
+			u32 blocks;
+			int err;
+
+			err = mmc_sd_num_wr_blocks(card, &blocks);
+			if (err)
+				brq->data.bytes_xfered = 0;
+			else
+				brq->data.bytes_xfered = blocks << 9;
+		}
+		if (mmc_blk_reset(md, card->host, type))
+			mqrq->retries = MMC_NO_RETRIES;
+		else
+			mqrq->retries = MMC_MAX_RETRIES - 1;
+		break;
+	case MMC_BLK_RETRY:
+		/*
+		 * Do not accept bytes_xfered, but retry up to 5 times,
+		 * otherwise same as abort.
+		 */
+		brq->data.bytes_xfered = 0;
+		if (mqrq->retries < MMC_MAX_RETRIES)
+			break;
+		/* Fall through */
+	case MMC_BLK_ABORT:
+		/*
+		 * Do not accept bytes_xfered, but try to reset. If
+		 * reset succeeds, try once more, otherwise error out
+		 * the request.
+		 */
+		brq->data.bytes_xfered = 0;
+		if (mmc_blk_reset(md, card->host, type))
+			mqrq->retries = MMC_NO_RETRIES;
+		else
+			mqrq->retries = MMC_MAX_RETRIES - 1;
+		break;
+	case MMC_BLK_DATA_ERR: {
+		int err;
+
+		/*
+		 * Do not accept bytes_xfered, but try to reset. If
+		 * reset succeeds, try once more. If reset fails with
+		 * ENODEV which means the partition is wrong, then error
+		 * out the request. Otherwise attempt to read one sector
+		 * at a time.
+		 */
+		brq->data.bytes_xfered = 0;
+		err = mmc_blk_reset(md, card->host, type);
+		if (!err) {
+			mqrq->retries = MMC_MAX_RETRIES - 1;
+			break;
+		}
+		if (err == -ENODEV) {
+			mqrq->retries = MMC_NO_RETRIES;
+			break;
+		}
+		/* Fall through */
+	}
+	case MMC_BLK_ECC_ERR:
+		/*
+		 * Do not accept bytes_xfered. If reading more than one
+		 * sector, try reading one sector at a time.
+		 */
+		brq->data.bytes_xfered = 0;
+		/* FIXME: Missing single sector read for large sector size */
+		if (brq->data.blocks > 1 && !mmc_large_sector(card)) {
+			/* Redo read one sector at a time */
+			pr_warn("%s: retrying using single block read\n",
+				req->rq_disk->disk_name);
+			mmc_blk_ss_read(mq, req);
+		} else {
+			mqrq->retries = MMC_NO_RETRIES;
+		}
+		break;
+	case MMC_BLK_NOMEDIUM:
+		/* Do not accept bytes_xfered. Error out the request */
+		brq->data.bytes_xfered = 0;
+		mqrq->retries = MMC_NO_RETRIES;
+		break;
+	default:
+		/* Do not accept bytes_xfered. Error out the request */
+		brq->data.bytes_xfered = 0;
+		mqrq->retries = MMC_NO_RETRIES;
+		pr_err("%s: Unhandled return value (%d)",
+		       req->rq_disk->disk_name, status);
+		break;
+	}
+}
+
+static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	unsigned int nr_bytes = mqrq->brq.data.bytes_xfered;
+
+	if (nr_bytes) {
+		if (blk_update_request(req, BLK_STS_OK, nr_bytes))
+			blk_mq_requeue_request(req, true);
+		else
+			__blk_mq_end_request(req, BLK_STS_OK);
+	} else if (mqrq->retries++ < MMC_MAX_RETRIES) {
+		blk_mq_requeue_request(req, true);
+	} else {
+		if (mmc_card_removed(mq->card))
+			req->rq_flags |= RQF_QUIET;
+		blk_mq_end_request(req, BLK_STS_IOERR);
+	}
+}
+
+static bool mmc_blk_urgent_bkops_needed(struct mmc_queue *mq,
+					struct mmc_queue_req *mqrq)
+{
+	return mmc_card_mmc(mq->card) &&
+	       (mqrq->brq.cmd.resp[0] & R1_EXCEPTION_EVENT ||
+		mqrq->brq.stop.resp[0] & R1_EXCEPTION_EVENT);
+}
+
+static void mmc_blk_urgent_bkops(struct mmc_queue *mq,
+				 struct mmc_queue_req *mqrq)
+{
+	if (mmc_blk_urgent_bkops_needed(mq, mqrq))
+		mmc_start_bkops(mq->card, true);
+}
+
+void mmc_blk_mq_complete(struct request *req)
+{
+	struct mmc_queue *mq = req->q->queuedata;
+
+	mmc_blk_mq_complete_rq(mq, req);
+}
+
+static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
+				       struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+
+	mmc_blk_rw_recovery(mq, req);
+
+	mmc_blk_urgent_bkops(mq, mqrq);
+}
+
+static void mmc_blk_mq_acct_req_done(struct mmc_queue *mq, struct request *req)
+{
+	struct request_queue *q = req->q;
+	unsigned long flags;
+	bool put_card;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+
+	mq->in_flight[mmc_issue_type(mq, req)] -= 1;
+
+	put_card = mmc_tot_in_flight(mq) == 0;
+
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	if (put_card)
+		mmc_put_card(mq->card, &mq->ctx);
+}
+
+static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_request *mrq = &mqrq->brq.mrq;
+	struct mmc_host *host = mq->card->host;
+
+	if (host->ops->post_req)
+		host->ops->post_req(host, mrq, 0);
+
+	blk_mq_complete_request(req);
+
+	mmc_blk_mq_acct_req_done(mq, req);
+}
+
+static void mmc_blk_mq_complete_prev_req(struct mmc_queue *mq,
+					 struct request **prev_req)
+{
+	mutex_lock(&mq->complete_lock);
+
+	if (!mq->complete_req)
+		goto out_unlock;
+
+	mmc_blk_mq_poll_completion(mq, mq->complete_req);
+
+	if (prev_req)
+		*prev_req = mq->complete_req;
+	else
+		mmc_blk_mq_post_req(mq, mq->complete_req);
+
+	mq->complete_req = NULL;
+
+out_unlock:
+	mutex_unlock(&mq->complete_lock);
+}
+
+void mmc_blk_mq_complete_work(struct work_struct *work)
+{
+	struct mmc_queue *mq = container_of(work, struct mmc_queue,
+					    complete_work);
+
+	mmc_blk_mq_complete_prev_req(mq, NULL);
+}
+
+static void mmc_blk_mq_req_done(struct mmc_request *mrq)
+{
+	struct mmc_queue_req *mqrq = container_of(mrq, struct mmc_queue_req,
+						  brq.mrq);
+	struct request *req = mmc_queue_req_to_req(mqrq);
+	struct request_queue *q = req->q;
+	struct mmc_queue *mq = q->queuedata;
+	unsigned long flags;
+	bool waiting;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	mq->complete_req = req;
+	mq->rw_wait = false;
+	waiting = mq->waiting;
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	if (waiting)
+		wake_up(&mq->wait);
+	else
+		kblockd_schedule_work(&mq->complete_work);
+}
+
+static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
+{
+	struct request_queue *q = mq->queue;
+	unsigned long flags;
+	bool done;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	done = !mq->rw_wait;
+	mq->waiting = !done;
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	return done;
+}
+
+static int mmc_blk_rw_wait(struct mmc_queue *mq, struct request **prev_req)
+{
+	int err = 0;
+
+	wait_event(mq->wait, mmc_blk_rw_wait_cond(mq, &err));
+
+	mmc_blk_mq_complete_prev_req(mq, prev_req);
+
+	return err;
+}
+
+static int mmc_blk_mq_issue_rw_rq(struct mmc_queue *mq,
+				  struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_host *host = mq->card->host;
+	struct request *prev_req = NULL;
+	int err = 0;
+
+	mmc_blk_rw_rq_prep(mqrq, mq->card, 0, mq);
+
+	mqrq->brq.mrq.done = mmc_blk_mq_req_done;
+
+	if (host->ops->pre_req)
+		host->ops->pre_req(host, &mqrq->brq.mrq);
+
+	err = mmc_blk_rw_wait(mq, &prev_req);
+	if (err)
+		goto out_post_req;
+
+	mq->rw_wait = true;
+
+	err = mmc_start_request(host, &mqrq->brq.mrq);
+
+	if (prev_req)
+		mmc_blk_mq_post_req(mq, prev_req);
+
+	if (err)
+		mq->rw_wait = false;
+
+out_post_req:
+	if (err && host->ops->post_req)
+		host->ops->post_req(host, &mqrq->brq.mrq, err);
+
+	return err;
+}
+
+static int mmc_blk_wait_for_idle(struct mmc_queue *mq, struct mmc_host *host)
+{
+	return mmc_blk_rw_wait(mq, NULL);
+}
+
+enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_blk_data *md = mq->blkdata;
+	struct mmc_card *card = md->queue.card;
+	struct mmc_host *host = card->host;
+	int ret;
+
+	ret = mmc_blk_part_switch(card, md->part_type);
+	if (ret)
+		return MMC_REQ_FAILED_TO_START;
+
+	switch (mmc_issue_type(mq, req)) {
+	case MMC_ISSUE_SYNC:
+		ret = mmc_blk_wait_for_idle(mq, host);
+		if (ret)
+			return MMC_REQ_BUSY;
+		switch (req_op(req)) {
+		case REQ_OP_DRV_IN:
+		case REQ_OP_DRV_OUT:
+			mmc_blk_issue_drv_op(mq, req);
+			break;
+		case REQ_OP_DISCARD:
+			mmc_blk_issue_discard_rq(mq, req);
+			break;
+		case REQ_OP_SECURE_ERASE:
+			mmc_blk_issue_secdiscard_rq(mq, req);
+			break;
+		case REQ_OP_FLUSH:
+			mmc_blk_issue_flush(mq, req);
+			break;
+		default:
+			WARN_ON_ONCE(1);
+			return MMC_REQ_FAILED_TO_START;
+		}
+		return MMC_REQ_FINISHED;
+	case MMC_ISSUE_ASYNC:
+		switch (req_op(req)) {
+		case REQ_OP_READ:
+		case REQ_OP_WRITE:
+			ret = mmc_blk_mq_issue_rw_rq(mq, req);
+			break;
+		default:
+			WARN_ON_ONCE(1);
+			ret = -EINVAL;
+		}
+		if (!ret)
+			return MMC_REQ_STARTED;
+		return ret == -EBUSY ? MMC_REQ_BUSY : MMC_REQ_FAILED_TO_START;
+	default:
+		WARN_ON_ONCE(1);
+		return MMC_REQ_FAILED_TO_START;
+	}
+}
+
 static bool mmc_blk_rw_cmd_err(struct mmc_blk_data *md, struct mmc_card *card,
 			       struct mmc_blk_request *brq, struct request *req,
 			       bool old_req_pending)
diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h
index 860ca7c8df86..c62e3f3d3a3a 100644
--- a/drivers/mmc/core/block.h
+++ b/drivers/mmc/core/block.h
@@ -6,4 +6,13 @@
 
 void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req);
 
+enum mmc_issued;
+
+enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req);
+void mmc_blk_mq_complete(struct request *req);
+
+struct work_struct;
+
+void mmc_blk_mq_complete_work(struct work_struct *work);
+
 #endif
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 4f33d277b125..a9c2351a9b29 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -22,6 +22,7 @@
 #include "block.h"
 #include "core.h"
 #include "card.h"
+#include "host.h"
 
 /*
  * Prepare a MMC request. This just filters out odd stuff.
@@ -34,10 +35,25 @@ static int mmc_prep_request(struct request_queue *q, struct request *req)
 		return BLKPREP_KILL;
 
 	req->rq_flags |= RQF_DONTPREP;
+	req_to_mmc_queue_req(req)->retries = 0;
 
 	return BLKPREP_OK;
 }
 
+enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req)
+{
+	if (req_op(req) == REQ_OP_READ || req_op(req) == REQ_OP_WRITE)
+		return MMC_ISSUE_ASYNC;
+
+	return MMC_ISSUE_SYNC;
+}
+
+static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
+						 bool reserved)
+{
+	return BLK_EH_RESET_TIMER;
+}
+
 static int mmc_queue_thread(void *d)
 {
 	struct mmc_queue *mq = d;
@@ -154,11 +170,10 @@ static void mmc_queue_setup_discard(struct request_queue *q,
  * @req: the request
  * @gfp: memory allocation policy
  */
-static int mmc_init_request(struct request_queue *q, struct request *req,
-			    gfp_t gfp)
+static int __mmc_init_request(struct mmc_queue *mq, struct request *req,
+			      gfp_t gfp)
 {
 	struct mmc_queue_req *mq_rq = req_to_mmc_queue_req(req);
-	struct mmc_queue *mq = q->queuedata;
 	struct mmc_card *card = mq->card;
 	struct mmc_host *host = card->host;
 
@@ -169,6 +184,12 @@ static int mmc_init_request(struct request_queue *q, struct request *req,
 	return 0;
 }
 
+static int mmc_init_request(struct request_queue *q, struct request *req,
+			    gfp_t gfp)
+{
+	return __mmc_init_request(q->queuedata, req, gfp);
+}
+
 static void mmc_exit_request(struct request_queue *q, struct request *req)
 {
 	struct mmc_queue_req *mq_rq = req_to_mmc_queue_req(req);
@@ -177,6 +198,105 @@ static void mmc_exit_request(struct request_queue *q, struct request *req)
 	mq_rq->sg = NULL;
 }
 
+static int mmc_mq_init_request(struct blk_mq_tag_set *set, struct request *req,
+			       unsigned int hctx_idx, unsigned int numa_node)
+{
+	return __mmc_init_request(set->driver_data, req, GFP_KERNEL);
+}
+
+static void mmc_mq_exit_request(struct blk_mq_tag_set *set, struct request *req,
+				unsigned int hctx_idx)
+{
+	struct mmc_queue *mq = set->driver_data;
+
+	mmc_exit_request(mq->queue, req);
+}
+
+static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
+				    const struct blk_mq_queue_data *bd)
+{
+	struct request *req = bd->rq;
+	struct request_queue *q = req->q;
+	struct mmc_queue *mq = q->queuedata;
+	struct mmc_card *card = mq->card;
+	enum mmc_issue_type issue_type;
+	enum mmc_issued issued;
+	bool get_card;
+	int ret;
+
+	if (mmc_card_removed(mq->card)) {
+		req->rq_flags |= RQF_QUIET;
+		return BLK_STS_IOERR;
+	}
+
+	issue_type = mmc_issue_type(mq, req);
+
+	spin_lock_irq(q->queue_lock);
+
+	switch (issue_type) {
+	case MMC_ISSUE_ASYNC:
+		break;
+	default:
+		/*
+		 * Timeouts are handled by mmc core, so set a large value to
+		 * avoid races.
+		 */
+		req->timeout = 600 * HZ;
+		break;
+	}
+
+	mq->in_flight[issue_type] += 1;
+	get_card = mmc_tot_in_flight(mq) == 1;
+
+	spin_unlock_irq(q->queue_lock);
+
+	if (!(req->rq_flags & RQF_DONTPREP)) {
+		req_to_mmc_queue_req(req)->retries = 0;
+		req->rq_flags |= RQF_DONTPREP;
+	}
+
+	if (get_card)
+		mmc_get_card(card, &mq->ctx);
+
+	blk_mq_start_request(req);
+
+	issued = mmc_blk_mq_issue_rq(mq, req);
+
+	switch (issued) {
+	case MMC_REQ_BUSY:
+		ret = BLK_STS_RESOURCE;
+		break;
+	case MMC_REQ_FAILED_TO_START:
+		ret = BLK_STS_IOERR;
+		break;
+	default:
+		ret = BLK_STS_OK;
+		break;
+	}
+
+	if (issued != MMC_REQ_STARTED) {
+		bool put_card = false;
+
+		spin_lock_irq(q->queue_lock);
+		mq->in_flight[issue_type] -= 1;
+		if (mmc_tot_in_flight(mq) == 0)
+			put_card = true;
+		spin_unlock_irq(q->queue_lock);
+		if (put_card)
+			mmc_put_card(card, &mq->ctx);
+	}
+
+	return ret;
+}
+
+static const struct blk_mq_ops mmc_mq_ops = {
+	.queue_rq	= mmc_mq_queue_rq,
+	.init_request	= mmc_mq_init_request,
+	.exit_request	= mmc_mq_exit_request,
+	.complete	= mmc_blk_mq_complete,
+	.timeout	= mmc_mq_timed_out,
+};
+
 static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
 {
 	struct mmc_host *host = card->host;
@@ -198,6 +318,69 @@ static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
 
 	/* Initialize thread_sem even if it is not used */
 	sema_init(&mq->thread_sem, 1);
+
+	INIT_WORK(&mq->complete_work, mmc_blk_mq_complete_work);
+
+	mutex_init(&mq->complete_lock);
+
+	init_waitqueue_head(&mq->wait);
+}
+
+static int mmc_mq_init_queue(struct mmc_queue *mq, int q_depth,
+			     const struct blk_mq_ops *mq_ops, spinlock_t *lock)
+{
+	int ret;
+
+	memset(&mq->tag_set, 0, sizeof(mq->tag_set));
+	mq->tag_set.ops = mq_ops;
+	mq->tag_set.queue_depth = q_depth;
+	mq->tag_set.numa_node = NUMA_NO_NODE;
+	mq->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE |
+			    BLK_MQ_F_BLOCKING;
+	mq->tag_set.nr_hw_queues = 1;
+	mq->tag_set.cmd_size = sizeof(struct mmc_queue_req);
+	mq->tag_set.driver_data = mq;
+
+	ret = blk_mq_alloc_tag_set(&mq->tag_set);
+	if (ret)
+		return ret;
+
+	mq->queue = blk_mq_init_queue(&mq->tag_set);
+	if (IS_ERR(mq->queue)) {
+		ret = PTR_ERR(mq->queue);
+		goto free_tag_set;
+	}
+
+	mq->queue->queue_lock = lock;
+	mq->queue->queuedata = mq;
+
+	return 0;
+
+free_tag_set:
+	blk_mq_free_tag_set(&mq->tag_set);
+
+	return ret;
+}
+
+#define MMC_QUEUE_DEPTH 64
+
+static int mmc_mq_init(struct mmc_queue *mq, struct mmc_card *card,
+			 spinlock_t *lock)
+{
+	int q_depth;
+	int ret;
+
+	q_depth = MMC_QUEUE_DEPTH;
+
+	ret = mmc_mq_init_queue(mq, q_depth, &mmc_mq_ops, lock);
+	if (ret)
+		return ret;
+
+	blk_queue_rq_timeout(mq->queue, 60 * HZ);
+
+	mmc_setup_queue(mq, card);
+
+	return 0;
 }
 
 /**
@@ -216,6 +399,10 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 	int ret = -ENOMEM;
 
 	mq->card = card;
+
+	if (mmc_host_use_blk_mq(host))
+		return mmc_mq_init(mq, card, lock);
+
 	mq->queue = blk_alloc_queue(GFP_KERNEL);
 	if (!mq->queue)
 		return -ENOMEM;
@@ -251,11 +438,63 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 	return ret;
 }
 
+static void mmc_mq_queue_suspend(struct mmc_queue *mq)
+{
+	blk_mq_quiesce_queue(mq->queue);
+
+	/*
+	 * The host remains claimed while there are outstanding requests, so
+	 * simply claiming and releasing here ensures there are none.
+	 */
+	mmc_claim_host(mq->card->host);
+	mmc_release_host(mq->card->host);
+}
+
+static void mmc_mq_queue_resume(struct mmc_queue *mq)
+{
+	blk_mq_unquiesce_queue(mq->queue);
+}
+
+static void __mmc_queue_suspend(struct mmc_queue *mq)
+{
+	struct request_queue *q = mq->queue;
+	unsigned long flags;
+
+	if (!mq->suspended) {
+		mq->suspended |= true;
+
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_stop_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+
+		down(&mq->thread_sem);
+	}
+}
+
+static void __mmc_queue_resume(struct mmc_queue *mq)
+{
+	struct request_queue *q = mq->queue;
+	unsigned long flags;
+
+	if (mq->suspended) {
+		mq->suspended = false;
+
+		up(&mq->thread_sem);
+
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_start_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
+}
+
 void mmc_cleanup_queue(struct mmc_queue *mq)
 {
 	struct request_queue *q = mq->queue;
 	unsigned long flags;
 
+	if (q->mq_ops)
+		return;
+
 	/* Make sure the queue isn't suspended, as that will deadlock */
 	mmc_queue_resume(mq);
 
@@ -283,17 +522,11 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
 void mmc_queue_suspend(struct mmc_queue *mq)
 {
 	struct request_queue *q = mq->queue;
-	unsigned long flags;
-
-	if (!mq->suspended) {
-		mq->suspended |= true;
-
-		spin_lock_irqsave(q->queue_lock, flags);
-		blk_stop_queue(q);
-		spin_unlock_irqrestore(q->queue_lock, flags);
 
-		down(&mq->thread_sem);
-	}
+	if (q->mq_ops)
+		mmc_mq_queue_suspend(mq);
+	else
+		__mmc_queue_suspend(mq);
 }
 
 /**
@@ -303,17 +536,11 @@ void mmc_queue_suspend(struct mmc_queue *mq)
 void mmc_queue_resume(struct mmc_queue *mq)
 {
 	struct request_queue *q = mq->queue;
-	unsigned long flags;
 
-	if (mq->suspended) {
-		mq->suspended = false;
-
-		up(&mq->thread_sem);
-
-		spin_lock_irqsave(q->queue_lock, flags);
-		blk_start_queue(q);
-		spin_unlock_irqrestore(q->queue_lock, flags);
-	}
+	if (q->mq_ops)
+		mmc_mq_queue_resume(mq);
+	else
+		__mmc_queue_resume(mq);
 }
 
 /*
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index 68f68ecd94ea..a5424ee100ef 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -7,6 +7,19 @@
 #include <linux/mmc/core.h>
 #include <linux/mmc/host.h>
 
+enum mmc_issued {
+	MMC_REQ_STARTED,
+	MMC_REQ_BUSY,
+	MMC_REQ_FAILED_TO_START,
+	MMC_REQ_FINISHED,
+};
+
+enum mmc_issue_type {
+	MMC_ISSUE_SYNC,
+	MMC_ISSUE_ASYNC,
+	MMC_ISSUE_MAX,
+};
+
 static inline struct mmc_queue_req *req_to_mmc_queue_req(struct request *rq)
 {
 	return blk_mq_rq_to_pdu(rq);
@@ -56,12 +69,15 @@ struct mmc_queue_req {
 	int			drv_op_result;
 	void			*drv_op_data;
 	unsigned int		ioc_count;
+	int			retries;
 };
 
 struct mmc_queue {
 	struct mmc_card		*card;
 	struct task_struct	*thread;
 	struct semaphore	thread_sem;
+	struct mmc_ctx		ctx;
+	struct blk_mq_tag_set	tag_set;
 	bool			suspended;
 	bool			asleep;
 	struct mmc_blk_data	*blkdata;
@@ -73,6 +89,14 @@ struct mmc_queue {
 	 * associated mmc_queue_req data.
 	 */
 	int			qcnt;
+
+	int			in_flight[MMC_ISSUE_MAX];
+	bool			rw_wait;
+	bool			waiting;
+	wait_queue_head_t	wait;
+	struct request		*complete_req;
+	struct mutex		complete_lock;
+	struct work_struct	complete_work;
 };
 
 extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *,
@@ -83,4 +107,12 @@ extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *,
 extern unsigned int mmc_queue_map_sg(struct mmc_queue *,
 				     struct mmc_queue_req *);
 
+enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req);
+
+static inline int mmc_tot_in_flight(struct mmc_queue *mq)
+{
+	return mq->in_flight[MMC_ISSUE_SYNC] +
+	       mq->in_flight[MMC_ISSUE_ASYNC];
+}
+
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH V13 04/10] mmc: block: Add CQE support
  2017-11-03 13:20 [PATCH V13 00/10] mmc: Add Command Queue support Adrian Hunter
                   ` (2 preceding siblings ...)
  2017-11-03 13:20 ` [PATCH V13 03/10] mmc: block: Add blk-mq support Adrian Hunter
@ 2017-11-03 13:20 ` Adrian Hunter
  2017-11-08  9:00   ` Linus Walleij
  2017-11-03 13:20 ` [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host Adrian Hunter
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-03 13:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

Add CQE support to the block driver, including:
    - optionally using DCMD for flush requests
    - "manually" issuing discard requests
    - issuing read / write requests to the CQE
    - supporting block-layer timeouts
    - handling recovery
    - supporting re-tuning

CQE offers 25% - 50% better random multi-threaded I/O.  There is a slight
(e.g. 2%) drop in sequential read speed but no observable change to sequential
write.

CQE automatically sends the commands to complete requests.  However it only
supports reads / writes and so-called "direct commands" (DCMD).  Furthermore
DCMD is limited to one command at a time, but discards require 3 commands.
That makes issuing discards through CQE very awkward, but some CQE's don't
support DCMD anyway.  So for discards, the existing non-CQE approach is
taken, where the mmc core code issues the 3 commands one at a time i.e.
mmc_erase(). Where DCMD is used, is for issuing flushes.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/core/block.c | 150 +++++++++++++++++++++++++++++++++++++++++++-
 drivers/mmc/core/block.h |   2 +
 drivers/mmc/core/queue.c | 158 +++++++++++++++++++++++++++++++++++++++++++++--
 drivers/mmc/core/queue.h |  18 ++++++
 4 files changed, 322 insertions(+), 6 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index e2838ff4738e..e8be17152884 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -112,6 +112,7 @@ struct mmc_blk_data {
 #define MMC_BLK_WRITE		BIT(1)
 #define MMC_BLK_DISCARD		BIT(2)
 #define MMC_BLK_SECDISCARD	BIT(3)
+#define MMC_BLK_CQE_RECOVERY	BIT(4)
 
 	/*
 	 * Only set in main mmc_blk_data associated
@@ -1717,6 +1718,138 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
 		*do_data_tag_p = do_data_tag;
 }
 
+#define MMC_CQE_RETRIES 2
+
+static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_request *mrq = &mqrq->brq.mrq;
+	struct request_queue *q = req->q;
+	struct mmc_host *host = mq->card->host;
+	unsigned long flags;
+	bool put_card;
+	int err;
+
+	mmc_cqe_post_req(host, mrq);
+
+	if (mrq->cmd && mrq->cmd->error)
+		err = mrq->cmd->error;
+	else if (mrq->data && mrq->data->error)
+		err = mrq->data->error;
+	else
+		err = 0;
+
+	if (err) {
+		if (mqrq->retries++ < MMC_CQE_RETRIES)
+			blk_mq_requeue_request(req, true);
+		else
+			blk_mq_end_request(req, BLK_STS_IOERR);
+	} else if (mrq->data) {
+		if (blk_update_request(req, BLK_STS_OK, mrq->data->bytes_xfered))
+			blk_mq_requeue_request(req, true);
+		else
+			__blk_mq_end_request(req, BLK_STS_OK);
+	} else {
+		blk_mq_end_request(req, BLK_STS_OK);
+	}
+
+	spin_lock_irqsave(q->queue_lock, flags);
+
+	mq->in_flight[mmc_issue_type(mq, req)] -= 1;
+
+	put_card = mmc_tot_in_flight(mq) == 0;
+
+	mmc_cqe_check_busy(mq);
+
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	if (!mq->cqe_busy)
+		blk_mq_run_hw_queues(q, true);
+
+	if (put_card)
+		mmc_put_card(mq->card, &mq->ctx);
+}
+
+void mmc_blk_cqe_recovery(struct mmc_queue *mq)
+{
+	struct mmc_card *card = mq->card;
+	struct mmc_host *host = card->host;
+	int err;
+
+	pr_debug("%s: CQE recovery start\n", mmc_hostname(host));
+
+	err = mmc_cqe_recovery(host);
+	if (err)
+		mmc_blk_reset(mq->blkdata, host, MMC_BLK_CQE_RECOVERY);
+	else
+		mmc_blk_reset_success(mq->blkdata, MMC_BLK_CQE_RECOVERY);
+
+	pr_debug("%s: CQE recovery done\n", mmc_hostname(host));
+}
+
+static void mmc_blk_cqe_req_done(struct mmc_request *mrq)
+{
+	struct mmc_queue_req *mqrq = container_of(mrq, struct mmc_queue_req,
+						  brq.mrq);
+	struct request *req = mmc_queue_req_to_req(mqrq);
+	struct request_queue *q = req->q;
+	struct mmc_queue *mq = q->queuedata;
+
+	/*
+	 * Block layer timeouts race with completions which means the normal
+	 * completion path cannot be used during recovery.
+	 */
+	if (mq->in_recovery)
+		mmc_blk_cqe_complete_rq(mq, req);
+	else
+		blk_mq_complete_request(req);
+}
+
+static int mmc_blk_cqe_start_req(struct mmc_host *host, struct mmc_request *mrq)
+{
+	mrq->done		= mmc_blk_cqe_req_done;
+	mrq->recovery_notifier	= mmc_cqe_recovery_notifier;
+
+	return mmc_cqe_start_req(host, mrq);
+}
+
+static struct mmc_request *mmc_blk_cqe_prep_dcmd(struct mmc_queue_req *mqrq,
+						 struct request *req)
+{
+	struct mmc_blk_request *brq = &mqrq->brq;
+
+	memset(brq, 0, sizeof(*brq));
+
+	brq->mrq.cmd = &brq->cmd;
+	brq->mrq.tag = req->tag;
+
+	return &brq->mrq;
+}
+
+static int mmc_blk_cqe_issue_flush(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_request *mrq = mmc_blk_cqe_prep_dcmd(mqrq, req);
+
+	mrq->cmd->opcode = MMC_SWITCH;
+	mrq->cmd->arg = (MMC_SWITCH_MODE_WRITE_BYTE << 24) |
+			(EXT_CSD_FLUSH_CACHE << 16) |
+			(1 << 8) |
+			EXT_CSD_CMD_SET_NORMAL;
+	mrq->cmd->flags = MMC_CMD_AC | MMC_RSP_R1B;
+
+	return mmc_blk_cqe_start_req(mq->card->host, mrq);
+}
+
+static int mmc_blk_cqe_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+
+	mmc_blk_data_prep(mq, mqrq, 0, NULL, NULL);
+
+	return mmc_blk_cqe_start_req(mq->card->host, &mqrq->brq.mrq);
+}
+
 static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 			       struct mmc_card *card,
 			       int disable_multi,
@@ -1991,7 +2124,10 @@ void mmc_blk_mq_complete(struct request *req)
 {
 	struct mmc_queue *mq = req->q->queuedata;
 
-	mmc_blk_mq_complete_rq(mq, req);
+	if (mq->use_cqe)
+		mmc_blk_cqe_complete_rq(mq, req);
+	else
+		mmc_blk_mq_complete_rq(mq, req);
 }
 
 static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
@@ -2150,6 +2286,9 @@ static int mmc_blk_mq_issue_rw_rq(struct mmc_queue *mq,
 
 static int mmc_blk_wait_for_idle(struct mmc_queue *mq, struct mmc_host *host)
 {
+	if (mq->use_cqe)
+		return host->cqe_ops->cqe_wait_for_idle(host);
+
 	return mmc_blk_rw_wait(mq, NULL);
 }
 
@@ -2188,11 +2327,18 @@ enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
 			return MMC_REQ_FAILED_TO_START;
 		}
 		return MMC_REQ_FINISHED;
+	case MMC_ISSUE_DCMD:
 	case MMC_ISSUE_ASYNC:
 		switch (req_op(req)) {
+		case REQ_OP_FLUSH:
+			ret = mmc_blk_cqe_issue_flush(mq, req);
+			break;
 		case REQ_OP_READ:
 		case REQ_OP_WRITE:
-			ret = mmc_blk_mq_issue_rw_rq(mq, req);
+			if (mq->use_cqe)
+				ret = mmc_blk_cqe_issue_rw_rq(mq, req);
+			else
+				ret = mmc_blk_mq_issue_rw_rq(mq, req);
 			break;
 		default:
 			WARN_ON_ONCE(1);
diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h
index c62e3f3d3a3a..6c0e98c1af71 100644
--- a/drivers/mmc/core/block.h
+++ b/drivers/mmc/core/block.h
@@ -6,6 +6,8 @@
 
 void mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req);
 
+void mmc_blk_cqe_recovery(struct mmc_queue *mq);
+
 enum mmc_issued;
 
 enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req);
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index a9c2351a9b29..971f97698866 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -40,18 +40,142 @@ static int mmc_prep_request(struct request_queue *q, struct request *req)
 	return BLKPREP_OK;
 }
 
+static inline bool mmc_cqe_dcmd_busy(struct mmc_queue *mq)
+{
+	/* Allow only 1 DCMD at a time */
+	return mq->in_flight[MMC_ISSUE_DCMD];
+}
+
+void mmc_cqe_check_busy(struct mmc_queue *mq)
+{
+	if ((mq->cqe_busy & MMC_CQE_DCMD_BUSY) && !mmc_cqe_dcmd_busy(mq))
+		mq->cqe_busy &= ~MMC_CQE_DCMD_BUSY;
+
+	mq->cqe_busy &= ~MMC_CQE_QUEUE_FULL;
+}
+
+static inline bool mmc_cqe_can_dcmd(struct mmc_host *host)
+{
+	return host->caps2 & MMC_CAP2_CQE_DCMD;
+}
+
+enum mmc_issue_type mmc_cqe_issue_type(struct mmc_host *host,
+				       struct request *req)
+{
+	switch (req_op(req)) {
+	case REQ_OP_DRV_IN:
+	case REQ_OP_DRV_OUT:
+	case REQ_OP_DISCARD:
+	case REQ_OP_SECURE_ERASE:
+		return MMC_ISSUE_SYNC;
+	case REQ_OP_FLUSH:
+		return mmc_cqe_can_dcmd(host) ? MMC_ISSUE_DCMD : MMC_ISSUE_SYNC;
+	default:
+		return MMC_ISSUE_ASYNC;
+	}
+}
+
 enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req)
 {
+	struct mmc_host *host = mq->card->host;
+
+	if (mq->use_cqe)
+		return mmc_cqe_issue_type(host, req);
+
 	if (req_op(req) == REQ_OP_READ || req_op(req) == REQ_OP_WRITE)
 		return MMC_ISSUE_ASYNC;
 
 	return MMC_ISSUE_SYNC;
 }
 
+static void __mmc_cqe_recovery_notifier(struct mmc_queue *mq)
+{
+	if (!mq->recovery_needed) {
+		mq->recovery_needed = true;
+		schedule_work(&mq->recovery_work);
+	}
+}
+
+void mmc_cqe_recovery_notifier(struct mmc_request *mrq)
+{
+	struct mmc_queue_req *mqrq = container_of(mrq, struct mmc_queue_req,
+						  brq.mrq);
+	struct request *req = mmc_queue_req_to_req(mqrq);
+	struct request_queue *q = req->q;
+	struct mmc_queue *mq = q->queuedata;
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	__mmc_cqe_recovery_notifier(mq);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_request *mrq = &mqrq->brq.mrq;
+	struct mmc_queue *mq = req->q->queuedata;
+	struct mmc_host *host = mq->card->host;
+	enum mmc_issue_type issue_type = mmc_issue_type(mq, req);
+	bool recovery_needed = false;
+
+	switch (issue_type) {
+	case MMC_ISSUE_ASYNC:
+	case MMC_ISSUE_DCMD:
+		if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) {
+			if (recovery_needed)
+				__mmc_cqe_recovery_notifier(mq);
+			return BLK_EH_RESET_TIMER;
+		}
+		/* No timeout */
+		return BLK_EH_HANDLED;
+	default:
+		/* Timeout is handled by mmc core */
+		return BLK_EH_RESET_TIMER;
+	}
+}
+
 static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
 						 bool reserved)
 {
-	return BLK_EH_RESET_TIMER;
+	struct request_queue *q = req->q;
+	struct mmc_queue *mq = q->queuedata;
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+
+	if (mq->recovery_needed || !mq->use_cqe)
+		ret = BLK_EH_RESET_TIMER;
+	else
+		ret = mmc_cqe_timed_out(req);
+
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	return ret;
+}
+
+static void mmc_mq_recovery_handler(struct work_struct *work)
+{
+	struct mmc_queue *mq = container_of(work, struct mmc_queue,
+					    recovery_work);
+	struct request_queue *q = mq->queue;
+
+	mmc_get_card(mq->card, &mq->ctx);
+
+	mq->in_recovery = true;
+
+	mmc_blk_cqe_recovery(mq);
+
+	mq->in_recovery = false;
+
+	spin_lock_irq(q->queue_lock);
+	mq->recovery_needed = false;
+	spin_unlock_irq(q->queue_lock);
+
+	mmc_put_card(mq->card, &mq->ctx);
+
+	blk_mq_run_hw_queues(q, true);
 }
 
 static int mmc_queue_thread(void *d)
@@ -219,9 +343,10 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct request_queue *q = req->q;
 	struct mmc_queue *mq = q->queuedata;
 	struct mmc_card *card = mq->card;
+	struct mmc_host *host = card->host;
 	enum mmc_issue_type issue_type;
 	enum mmc_issued issued;
-	bool get_card;
+	bool get_card, cqe_retune_ok;
 	int ret;
 
 	if (mmc_card_removed(mq->card)) {
@@ -233,7 +358,19 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	spin_lock_irq(q->queue_lock);
 
+	if (mq->recovery_needed) {
+		spin_unlock_irq(q->queue_lock);
+		return BLK_STS_RESOURCE;
+	}
+
 	switch (issue_type) {
+	case MMC_ISSUE_DCMD:
+		if (mmc_cqe_dcmd_busy(mq)) {
+			mq->cqe_busy |= MMC_CQE_DCMD_BUSY;
+			spin_unlock_irq(q->queue_lock);
+			return BLK_STS_RESOURCE;
+		}
+		break;
 	case MMC_ISSUE_ASYNC:
 		break;
 	default:
@@ -247,6 +384,7 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	mq->in_flight[issue_type] += 1;
 	get_card = mmc_tot_in_flight(mq) == 1;
+	cqe_retune_ok = mmc_cqe_qcnt(mq) == 1;
 
 	spin_unlock_irq(q->queue_lock);
 
@@ -258,6 +396,11 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (get_card)
 		mmc_get_card(card, &mq->ctx);
 
+	if (mq->use_cqe) {
+		host->retune_now = host->need_retune && cqe_retune_ok &&
+				   !host->hold_retune;
+	}
+
 	blk_mq_start_request(req);
 
 	issued = mmc_blk_mq_issue_rq(mq, req);
@@ -319,6 +462,7 @@ static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
 	/* Initialize thread_sem even if it is not used */
 	sema_init(&mq->thread_sem, 1);
 
+	INIT_WORK(&mq->recovery_work, mmc_mq_recovery_handler);
 	INIT_WORK(&mq->complete_work, mmc_blk_mq_complete_work);
 
 	mutex_init(&mq->complete_lock);
@@ -367,10 +511,14 @@ static int mmc_mq_init_queue(struct mmc_queue *mq, int q_depth,
 static int mmc_mq_init(struct mmc_queue *mq, struct mmc_card *card,
 			 spinlock_t *lock)
 {
+	struct mmc_host *host = card->host;
 	int q_depth;
 	int ret;
 
-	q_depth = MMC_QUEUE_DEPTH;
+	if (mq->use_cqe)
+		q_depth = min_t(int, card->ext_csd.cmdq_depth, host->cqe_qdepth);
+	else
+		q_depth = MMC_QUEUE_DEPTH;
 
 	ret = mmc_mq_init_queue(mq, q_depth, &mmc_mq_ops, lock);
 	if (ret)
@@ -400,7 +548,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 
 	mq->card = card;
 
-	if (mmc_host_use_blk_mq(host))
+	mq->use_cqe = host->cqe_enabled;
+
+	if (mq->use_cqe || mmc_host_use_blk_mq(host))
 		return mmc_mq_init(mq, card, lock);
 
 	mq->queue = blk_alloc_queue(GFP_KERNEL);
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index a5424ee100ef..f05b5a9d2f87 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -16,6 +16,7 @@ enum mmc_issued {
 
 enum mmc_issue_type {
 	MMC_ISSUE_SYNC,
+	MMC_ISSUE_DCMD,
 	MMC_ISSUE_ASYNC,
 	MMC_ISSUE_MAX,
 };
@@ -91,8 +92,15 @@ struct mmc_queue {
 	int			qcnt;
 
 	int			in_flight[MMC_ISSUE_MAX];
+	unsigned int		cqe_busy;
+#define MMC_CQE_DCMD_BUSY	BIT(0)
+#define MMC_CQE_QUEUE_FULL	BIT(1)
+	bool			use_cqe;
+	bool			recovery_needed;
+	bool			in_recovery;
 	bool			rw_wait;
 	bool			waiting;
+	struct work_struct	recovery_work;
 	wait_queue_head_t	wait;
 	struct request		*complete_req;
 	struct mutex		complete_lock;
@@ -107,11 +115,21 @@ extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *,
 extern unsigned int mmc_queue_map_sg(struct mmc_queue *,
 				     struct mmc_queue_req *);
 
+void mmc_cqe_check_busy(struct mmc_queue *mq);
+void mmc_cqe_recovery_notifier(struct mmc_request *mrq);
+
 enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req);
 
 static inline int mmc_tot_in_flight(struct mmc_queue *mq)
 {
 	return mq->in_flight[MMC_ISSUE_SYNC] +
+	       mq->in_flight[MMC_ISSUE_DCMD] +
+	       mq->in_flight[MMC_ISSUE_ASYNC];
+}
+
+static inline int mmc_cqe_qcnt(struct mmc_queue *mq)
+{
+	return mq->in_flight[MMC_ISSUE_DCMD] +
 	       mq->in_flight[MMC_ISSUE_ASYNC];
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host
  2017-11-03 13:20 [PATCH V13 00/10] mmc: Add Command Queue support Adrian Hunter
                   ` (3 preceding siblings ...)
  2017-11-03 13:20 ` [PATCH V13 04/10] mmc: block: Add CQE support Adrian Hunter
@ 2017-11-03 13:20 ` Adrian Hunter
  2017-11-08  9:22   ` Linus Walleij
  2017-11-09 13:41   ` Ulf Hansson
  2017-11-03 13:20 ` [PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK Adrian Hunter
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Adrian Hunter @ 2017-11-03 13:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

From: Venkat Gopalakrishnan <venkatg@codeaurora.org>

This patch adds CMDQ support for command-queue compatible
hosts.

Command queue is added in eMMC-5.1 specification. This
enables the controller to process upto 32 requests at
a time.

Adrian Hunter contributed renaming to cqhci, recovery, suspend
and resume, cqhci_off, cqhci_wait_for_idle, and external timeout
handling.

Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org>
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/host/Kconfig  |   13 +
 drivers/mmc/host/Makefile |    1 +
 drivers/mmc/host/cqhci.c  | 1150 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/mmc/host/cqhci.h  |  240 ++++++++++
 4 files changed, 1404 insertions(+)
 create mode 100644 drivers/mmc/host/cqhci.c
 create mode 100644 drivers/mmc/host/cqhci.h

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 567028c9219a..3092b7085cb5 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -857,6 +857,19 @@ config MMC_SUNXI
 	  This selects support for the SD/MMC Host Controller on
 	  Allwinner sunxi SoCs.
 
+config MMC_CQHCI
+	tristate "Command Queue Host Controller Interface support"
+	depends on HAS_DMA
+	help
+	  This selects the Command Queue Host Controller Interface (CQHCI)
+	  support present in host controllers of Qualcomm Technologies, Inc
+	  amongst others.
+	  This controller supports eMMC devices with command queue support.
+
+	  If you have a controller with this interface, say Y or M here.
+
+	  If unsure, say N.
+
 config MMC_TOSHIBA_PCI
 	tristate "Toshiba Type A SD/MMC Card Interface Driver"
 	depends on PCI
diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile
index ab61a3e39c0b..de140e3ef402 100644
--- a/drivers/mmc/host/Makefile
+++ b/drivers/mmc/host/Makefile
@@ -91,6 +91,7 @@ obj-$(CONFIG_MMC_SDHCI_ST)		+= sdhci-st.o
 obj-$(CONFIG_MMC_SDHCI_MICROCHIP_PIC32)	+= sdhci-pic32.o
 obj-$(CONFIG_MMC_SDHCI_BRCMSTB)		+= sdhci-brcmstb.o
 obj-$(CONFIG_MMC_SDHCI_OMAP)		+= sdhci-omap.o
+obj-$(CONFIG_MMC_CQHCI)			+= cqhci.o
 
 ifeq ($(CONFIG_CB710_DEBUG),y)
 	CFLAGS-cb710-mmc	+= -DDEBUG
diff --git a/drivers/mmc/host/cqhci.c b/drivers/mmc/host/cqhci.c
new file mode 100644
index 000000000000..159270e947cf
--- /dev/null
+++ b/drivers/mmc/host/cqhci.c
@@ -0,0 +1,1150 @@
+/* Copyright (c) 2015, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/delay.h>
+#include <linux/highmem.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/dma-mapping.h>
+#include <linux/slab.h>
+#include <linux/scatterlist.h>
+#include <linux/platform_device.h>
+#include <linux/ktime.h>
+
+#include <linux/mmc/mmc.h>
+#include <linux/mmc/host.h>
+#include <linux/mmc/card.h>
+
+#include "cqhci.h"
+
+#define DCMD_SLOT 31
+#define NUM_SLOTS 32
+
+struct cqhci_slot {
+	struct mmc_request *mrq;
+	unsigned int flags;
+#define CQHCI_EXTERNAL_TIMEOUT	BIT(0)
+#define CQHCI_COMPLETED		BIT(1)
+#define CQHCI_HOST_CRC		BIT(2)
+#define CQHCI_HOST_TIMEOUT	BIT(3)
+#define CQHCI_HOST_OTHER	BIT(4)
+};
+
+static inline u8 *get_desc(struct cqhci_host *cq_host, u8 tag)
+{
+	return cq_host->desc_base + (tag * cq_host->slot_sz);
+}
+
+static inline u8 *get_link_desc(struct cqhci_host *cq_host, u8 tag)
+{
+	u8 *desc = get_desc(cq_host, tag);
+
+	return desc + cq_host->task_desc_len;
+}
+
+static inline dma_addr_t get_trans_desc_dma(struct cqhci_host *cq_host, u8 tag)
+{
+	return cq_host->trans_desc_dma_base +
+		(cq_host->mmc->max_segs * tag *
+		 cq_host->trans_desc_len);
+}
+
+static inline u8 *get_trans_desc(struct cqhci_host *cq_host, u8 tag)
+{
+	return cq_host->trans_desc_base +
+		(cq_host->trans_desc_len * cq_host->mmc->max_segs * tag);
+}
+
+static void setup_trans_desc(struct cqhci_host *cq_host, u8 tag)
+{
+	u8 *link_temp;
+	dma_addr_t trans_temp;
+
+	link_temp = get_link_desc(cq_host, tag);
+	trans_temp = get_trans_desc_dma(cq_host, tag);
+
+	memset(link_temp, 0, cq_host->link_desc_len);
+	if (cq_host->link_desc_len > 8)
+		*(link_temp + 8) = 0;
+
+	if (tag == DCMD_SLOT && (cq_host->mmc->caps2 & MMC_CAP2_CQE_DCMD)) {
+		*link_temp = CQHCI_VALID(0) | CQHCI_ACT(0) | CQHCI_END(1);
+		return;
+	}
+
+	*link_temp = CQHCI_VALID(1) | CQHCI_ACT(0x6) | CQHCI_END(0);
+
+	if (cq_host->dma64) {
+		__le64 *data_addr = (__le64 __force *)(link_temp + 4);
+
+		data_addr[0] = cpu_to_le64(trans_temp);
+	} else {
+		__le32 *data_addr = (__le32 __force *)(link_temp + 4);
+
+		data_addr[0] = cpu_to_le32(trans_temp);
+	}
+}
+
+static void cqhci_set_irqs(struct cqhci_host *cq_host, u32 set)
+{
+	cqhci_writel(cq_host, set, CQHCI_ISTE);
+	cqhci_writel(cq_host, set, CQHCI_ISGE);
+}
+
+#define DRV_NAME "cqhci"
+
+#define CQHCI_DUMP(f, x...) \
+	pr_err("%s: " DRV_NAME ": " f, mmc_hostname(mmc), ## x)
+
+static void cqhci_dumpregs(struct cqhci_host *cq_host)
+{
+	struct mmc_host *mmc = cq_host->mmc;
+
+	CQHCI_DUMP("============ CQHCI REGISTER DUMP ===========\n");
+
+	CQHCI_DUMP("Caps:      0x%08x | Version:  0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_CAP),
+		   cqhci_readl(cq_host, CQHCI_VER));
+	CQHCI_DUMP("Config:    0x%08x | Control:  0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_CFG),
+		   cqhci_readl(cq_host, CQHCI_CTL));
+	CQHCI_DUMP("Int stat:  0x%08x | Int enab: 0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_IS),
+		   cqhci_readl(cq_host, CQHCI_ISTE));
+	CQHCI_DUMP("Int sig:   0x%08x | Int Coal: 0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_ISGE),
+		   cqhci_readl(cq_host, CQHCI_IC));
+	CQHCI_DUMP("TDL base:  0x%08x | TDL up32: 0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_TDLBA),
+		   cqhci_readl(cq_host, CQHCI_TDLBAU));
+	CQHCI_DUMP("Doorbell:  0x%08x | TCN:      0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_TDBR),
+		   cqhci_readl(cq_host, CQHCI_TCN));
+	CQHCI_DUMP("Dev queue: 0x%08x | Dev Pend: 0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_DQS),
+		   cqhci_readl(cq_host, CQHCI_DPT));
+	CQHCI_DUMP("Task clr:  0x%08x | SSC1:     0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_TCLR),
+		   cqhci_readl(cq_host, CQHCI_SSC1));
+	CQHCI_DUMP("SSC2:      0x%08x | DCMD rsp: 0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_SSC2),
+		   cqhci_readl(cq_host, CQHCI_CRDCT));
+	CQHCI_DUMP("RED mask:  0x%08x | TERRI:    0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_RMEM),
+		   cqhci_readl(cq_host, CQHCI_TERRI));
+	CQHCI_DUMP("Resp idx:  0x%08x | Resp arg: 0x%08x\n",
+		   cqhci_readl(cq_host, CQHCI_CRI),
+		   cqhci_readl(cq_host, CQHCI_CRA));
+
+	if (cq_host->ops->dumpregs)
+		cq_host->ops->dumpregs(mmc);
+	else
+		CQHCI_DUMP(": ===========================================\n");
+}
+
+/**
+ * The allocated descriptor table for task, link & transfer descritors
+ * looks like:
+ * |----------|
+ * |task desc |  |->|----------|
+ * |----------|  |  |trans desc|
+ * |link desc-|->|  |----------|
+ * |----------|          .
+ *      .                .
+ *  no. of slots      max-segs
+ *      .           |----------|
+ * |----------|
+ * The idea here is to create the [task+trans] table and mark & point the
+ * link desc to the transfer desc table on a per slot basis.
+ */
+static int cqhci_host_alloc_tdl(struct cqhci_host *cq_host)
+{
+	int i = 0;
+
+	/* task descriptor can be 64/128 bit irrespective of arch */
+	if (cq_host->caps & CQHCI_TASK_DESC_SZ_128) {
+		cqhci_writel(cq_host, cqhci_readl(cq_host, CQHCI_CFG) |
+			       CQHCI_TASK_DESC_SZ, CQHCI_CFG);
+		cq_host->task_desc_len = 16;
+	} else {
+		cq_host->task_desc_len = 8;
+	}
+
+	/*
+	 * 96 bits length of transfer desc instead of 128 bits which means
+	 * ADMA would expect next valid descriptor at the 96th bit
+	 * or 128th bit
+	 */
+	if (cq_host->dma64) {
+		if (cq_host->quirks & CQHCI_QUIRK_SHORT_TXFR_DESC_SZ)
+			cq_host->trans_desc_len = 12;
+		else
+			cq_host->trans_desc_len = 16;
+		cq_host->link_desc_len = 16;
+	} else {
+		cq_host->trans_desc_len = 8;
+		cq_host->link_desc_len = 8;
+	}
+
+	/* total size of a slot: 1 task & 1 transfer (link) */
+	cq_host->slot_sz = cq_host->task_desc_len + cq_host->link_desc_len;
+
+	cq_host->desc_size = cq_host->slot_sz * cq_host->num_slots;
+
+	cq_host->data_size = cq_host->trans_desc_len * cq_host->mmc->max_segs *
+		(cq_host->num_slots - 1);
+
+	pr_debug("%s: cqhci: desc_size: %zu data_sz: %zu slot-sz: %d\n",
+		 mmc_hostname(cq_host->mmc), cq_host->desc_size, cq_host->data_size,
+		 cq_host->slot_sz);
+
+	/*
+	 * allocate a dma-mapped chunk of memory for the descriptors
+	 * allocate a dma-mapped chunk of memory for link descriptors
+	 * setup each link-desc memory offset per slot-number to
+	 * the descriptor table.
+	 */
+	cq_host->desc_base = dmam_alloc_coherent(mmc_dev(cq_host->mmc),
+						 cq_host->desc_size,
+						 &cq_host->desc_dma_base,
+						 GFP_KERNEL);
+	cq_host->trans_desc_base = dmam_alloc_coherent(mmc_dev(cq_host->mmc),
+					      cq_host->data_size,
+					      &cq_host->trans_desc_dma_base,
+					      GFP_KERNEL);
+	if (!cq_host->desc_base || !cq_host->trans_desc_base)
+		return -ENOMEM;
+
+	pr_debug("%s: cqhci: desc-base: 0x%p trans-base: 0x%p\n desc_dma 0x%llx trans_dma: 0x%llx\n",
+		 mmc_hostname(cq_host->mmc), cq_host->desc_base, cq_host->trans_desc_base,
+		(unsigned long long)cq_host->desc_dma_base,
+		(unsigned long long)cq_host->trans_desc_dma_base);
+
+	for (; i < (cq_host->num_slots); i++)
+		setup_trans_desc(cq_host, i);
+
+	return 0;
+}
+
+static void __cqhci_enable(struct cqhci_host *cq_host)
+{
+	struct mmc_host *mmc = cq_host->mmc;
+	u32 cqcfg;
+
+	cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
+
+	/* Configuration must not be changed while enabled */
+	if (cqcfg & CQHCI_ENABLE) {
+		cqcfg &= ~CQHCI_ENABLE;
+		cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+	}
+
+	cqcfg &= ~(CQHCI_DCMD | CQHCI_TASK_DESC_SZ);
+
+	if (mmc->caps2 & MMC_CAP2_CQE_DCMD)
+		cqcfg |= CQHCI_DCMD;
+
+	if (cq_host->caps & CQHCI_TASK_DESC_SZ_128)
+		cqcfg |= CQHCI_TASK_DESC_SZ;
+
+	cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+
+	cqhci_writel(cq_host, lower_32_bits(cq_host->desc_dma_base),
+		     CQHCI_TDLBA);
+	cqhci_writel(cq_host, upper_32_bits(cq_host->desc_dma_base),
+		     CQHCI_TDLBAU);
+
+	cqhci_writel(cq_host, cq_host->rca, CQHCI_SSC2);
+
+	cqhci_set_irqs(cq_host, 0);
+
+	cqcfg |= CQHCI_ENABLE;
+
+	cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+
+	mmc->cqe_on = true;
+
+	if (cq_host->ops->enable)
+		cq_host->ops->enable(mmc);
+
+	/* Ensure all writes are done before interrupts are enabled */
+	wmb();
+
+	cqhci_set_irqs(cq_host, CQHCI_IS_MASK);
+
+	cq_host->activated = true;
+}
+
+static void __cqhci_disable(struct cqhci_host *cq_host)
+{
+	u32 cqcfg;
+
+	cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
+	cqcfg &= ~CQHCI_ENABLE;
+	cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+
+	cq_host->mmc->cqe_on = false;
+
+	cq_host->activated = false;
+}
+
+int cqhci_suspend(struct mmc_host *mmc)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+
+	if (cq_host->enabled)
+		__cqhci_disable(cq_host);
+
+	return 0;
+}
+EXPORT_SYMBOL(cqhci_suspend);
+
+int cqhci_resume(struct mmc_host *mmc)
+{
+	/* Re-enable is done upon first request */
+	return 0;
+}
+EXPORT_SYMBOL(cqhci_resume);
+
+static int cqhci_enable(struct mmc_host *mmc, struct mmc_card *card)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	int err;
+
+	if (cq_host->enabled)
+		return 0;
+
+	cq_host->rca = card->rca;
+
+	err = cqhci_host_alloc_tdl(cq_host);
+	if (err)
+		return err;
+
+	__cqhci_enable(cq_host);
+
+	cq_host->enabled = true;
+
+#ifdef DEBUG
+	cqhci_dumpregs(cq_host);
+#endif
+	return 0;
+}
+
+/* CQHCI is idle and should halt immediately, so set a small timeout */
+#define CQHCI_OFF_TIMEOUT 100
+
+static void cqhci_off(struct mmc_host *mmc)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	ktime_t timeout;
+	bool timed_out;
+	u32 reg;
+
+	if (!cq_host->enabled || !mmc->cqe_on || cq_host->recovery_halt)
+		return;
+
+	if (cq_host->ops->disable)
+		cq_host->ops->disable(mmc, false);
+
+	cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL);
+
+	timeout = ktime_add_us(ktime_get(), CQHCI_OFF_TIMEOUT);
+	while (1) {
+		timed_out = ktime_compare(ktime_get(), timeout) > 0;
+		reg = cqhci_readl(cq_host, CQHCI_CTL);
+		if ((reg & CQHCI_HALT) || timed_out)
+			break;
+	}
+
+	if (timed_out)
+		pr_err("%s: cqhci: CQE stuck on\n", mmc_hostname(mmc));
+	else
+		pr_debug("%s: cqhci: CQE off\n", mmc_hostname(mmc));
+
+	mmc->cqe_on = false;
+}
+
+static void cqhci_disable(struct mmc_host *mmc)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+
+	if (!cq_host->enabled)
+		return;
+
+	cqhci_off(mmc);
+
+	__cqhci_disable(cq_host);
+
+	dmam_free_coherent(mmc_dev(mmc), cq_host->data_size,
+			   cq_host->trans_desc_base,
+			   cq_host->trans_desc_dma_base);
+
+	dmam_free_coherent(mmc_dev(mmc), cq_host->desc_size,
+			   cq_host->desc_base,
+			   cq_host->desc_dma_base);
+
+	cq_host->trans_desc_base = NULL;
+	cq_host->desc_base = NULL;
+
+	cq_host->enabled = false;
+}
+
+static void cqhci_prep_task_desc(struct mmc_request *mrq,
+					u64 *data, bool intr)
+{
+	u32 req_flags = mrq->data->flags;
+
+	*data = CQHCI_VALID(1) |
+		CQHCI_END(1) |
+		CQHCI_INT(intr) |
+		CQHCI_ACT(0x5) |
+		CQHCI_FORCED_PROG(!!(req_flags & MMC_DATA_FORCED_PRG)) |
+		CQHCI_DATA_TAG(!!(req_flags & MMC_DATA_DAT_TAG)) |
+		CQHCI_DATA_DIR(!!(req_flags & MMC_DATA_READ)) |
+		CQHCI_PRIORITY(!!(req_flags & MMC_DATA_PRIO)) |
+		CQHCI_QBAR(!!(req_flags & MMC_DATA_QBR)) |
+		CQHCI_REL_WRITE(!!(req_flags & MMC_DATA_REL_WR)) |
+		CQHCI_BLK_COUNT(mrq->data->blocks) |
+		CQHCI_BLK_ADDR((u64)mrq->data->blk_addr);
+
+	pr_debug("%s: cqhci: tag %d task descriptor 0x016%llx\n",
+		 mmc_hostname(mrq->host), mrq->tag, (unsigned long long)*data);
+}
+
+static int cqhci_dma_map(struct mmc_host *host, struct mmc_request *mrq)
+{
+	int sg_count;
+	struct mmc_data *data = mrq->data;
+
+	if (!data)
+		return -EINVAL;
+
+	sg_count = dma_map_sg(mmc_dev(host), data->sg,
+			      data->sg_len,
+			      (data->flags & MMC_DATA_WRITE) ?
+			      DMA_TO_DEVICE : DMA_FROM_DEVICE);
+	if (!sg_count) {
+		pr_err("%s: sg-len: %d\n", __func__, data->sg_len);
+		return -ENOMEM;
+	}
+
+	return sg_count;
+}
+
+static void cqhci_set_tran_desc(u8 *desc, dma_addr_t addr, int len, bool end,
+				bool dma64)
+{
+	__le32 *attr = (__le32 __force *)desc;
+
+	*attr = (CQHCI_VALID(1) |
+		 CQHCI_END(end ? 1 : 0) |
+		 CQHCI_INT(0) |
+		 CQHCI_ACT(0x4) |
+		 CQHCI_DAT_LENGTH(len));
+
+	if (dma64) {
+		__le64 *dataddr = (__le64 __force *)(desc + 4);
+
+		dataddr[0] = cpu_to_le64(addr);
+	} else {
+		__le32 *dataddr = (__le32 __force *)(desc + 4);
+
+		dataddr[0] = cpu_to_le32(addr);
+	}
+}
+
+static int cqhci_prep_tran_desc(struct mmc_request *mrq,
+			       struct cqhci_host *cq_host, int tag)
+{
+	struct mmc_data *data = mrq->data;
+	int i, sg_count, len;
+	bool end = false;
+	bool dma64 = cq_host->dma64;
+	dma_addr_t addr;
+	u8 *desc;
+	struct scatterlist *sg;
+
+	sg_count = cqhci_dma_map(mrq->host, mrq);
+	if (sg_count < 0) {
+		pr_err("%s: %s: unable to map sg lists, %d\n",
+				mmc_hostname(mrq->host), __func__, sg_count);
+		return sg_count;
+	}
+
+	desc = get_trans_desc(cq_host, tag);
+
+	for_each_sg(data->sg, sg, sg_count, i) {
+		addr = sg_dma_address(sg);
+		len = sg_dma_len(sg);
+
+		if ((i+1) == sg_count)
+			end = true;
+		cqhci_set_tran_desc(desc, addr, len, end, dma64);
+		desc += cq_host->trans_desc_len;
+	}
+
+	return 0;
+}
+
+static void cqhci_prep_dcmd_desc(struct mmc_host *mmc,
+				   struct mmc_request *mrq)
+{
+	u64 *task_desc = NULL;
+	u64 data = 0;
+	u8 resp_type;
+	u8 *desc;
+	__le64 *dataddr;
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	u8 timing;
+
+	if (!(mrq->cmd->flags & MMC_RSP_PRESENT)) {
+		resp_type = 0x0;
+		timing = 0x1;
+	} else {
+		if (mrq->cmd->flags & MMC_RSP_R1B) {
+			resp_type = 0x3;
+			timing = 0x0;
+		} else {
+			resp_type = 0x2;
+			timing = 0x1;
+		}
+	}
+
+	task_desc = (__le64 __force *)get_desc(cq_host, cq_host->dcmd_slot);
+	memset(task_desc, 0, cq_host->task_desc_len);
+	data |= (CQHCI_VALID(1) |
+		 CQHCI_END(1) |
+		 CQHCI_INT(1) |
+		 CQHCI_QBAR(1) |
+		 CQHCI_ACT(0x5) |
+		 CQHCI_CMD_INDEX(mrq->cmd->opcode) |
+		 CQHCI_CMD_TIMING(timing) | CQHCI_RESP_TYPE(resp_type));
+	*task_desc |= data;
+	desc = (u8 *)task_desc;
+	pr_debug("%s: cqhci: dcmd: cmd: %d timing: %d resp: %d\n",
+		 mmc_hostname(mmc), mrq->cmd->opcode, timing, resp_type);
+	dataddr = (__le64 __force *)(desc + 4);
+	dataddr[0] = cpu_to_le64((u64)mrq->cmd->arg);
+
+}
+
+static void cqhci_post_req(struct mmc_host *host, struct mmc_request *mrq)
+{
+	struct mmc_data *data = mrq->data;
+
+	if (data) {
+		dma_unmap_sg(mmc_dev(host), data->sg, data->sg_len,
+			     (data->flags & MMC_DATA_READ) ?
+			     DMA_FROM_DEVICE : DMA_TO_DEVICE);
+	}
+}
+
+static inline int cqhci_tag(struct mmc_request *mrq)
+{
+	return mrq->cmd ? DCMD_SLOT : mrq->tag;
+}
+
+static int cqhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
+{
+	int err = 0;
+	u64 data = 0;
+	u64 *task_desc = NULL;
+	int tag = cqhci_tag(mrq);
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	unsigned long flags;
+
+	if (!cq_host->enabled) {
+		pr_err("%s: cqhci: not enabled\n", mmc_hostname(mmc));
+		return -EINVAL;
+	}
+
+	/* First request after resume has to re-enable */
+	if (!cq_host->activated)
+		__cqhci_enable(cq_host);
+
+	if (!mmc->cqe_on) {
+		cqhci_writel(cq_host, 0, CQHCI_CTL);
+		mmc->cqe_on = true;
+		pr_debug("%s: cqhci: CQE on\n", mmc_hostname(mmc));
+		if (cqhci_readl(cq_host, CQHCI_CTL) && CQHCI_HALT) {
+			pr_err("%s: cqhci: CQE failed to exit halt state\n",
+			       mmc_hostname(mmc));
+		}
+		if (cq_host->ops->enable)
+			cq_host->ops->enable(mmc);
+	}
+
+	if (mrq->data) {
+		task_desc = (__le64 __force *)get_desc(cq_host, tag);
+		cqhci_prep_task_desc(mrq, &data, 1);
+		*task_desc = cpu_to_le64(data);
+		err = cqhci_prep_tran_desc(mrq, cq_host, tag);
+		if (err) {
+			pr_err("%s: cqhci: failed to setup tx desc: %d\n",
+			       mmc_hostname(mmc), err);
+			return err;
+		}
+	} else {
+		cqhci_prep_dcmd_desc(mmc, mrq);
+	}
+
+	spin_lock_irqsave(&cq_host->lock, flags);
+
+	if (cq_host->recovery_halt) {
+		err = -EBUSY;
+		goto out_unlock;
+	}
+
+	cq_host->slot[tag].mrq = mrq;
+	cq_host->slot[tag].flags = 0;
+
+	cq_host->qcnt += 1;
+
+	cqhci_writel(cq_host, 1 << tag, CQHCI_TDBR);
+	if (!(cqhci_readl(cq_host, CQHCI_TDBR) & (1 << tag)))
+		pr_debug("%s: cqhci: doorbell not set for tag %d\n",
+			 mmc_hostname(mmc), tag);
+out_unlock:
+	spin_unlock_irqrestore(&cq_host->lock, flags);
+
+	if (err)
+		cqhci_post_req(mmc, mrq);
+
+	return err;
+}
+
+static void cqhci_recovery_needed(struct mmc_host *mmc, struct mmc_request *mrq,
+				  bool notify)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+
+	if (!cq_host->recovery_halt) {
+		cq_host->recovery_halt = true;
+		pr_debug("%s: cqhci: recovery needed\n", mmc_hostname(mmc));
+		wake_up(&cq_host->wait_queue);
+		if (notify && mrq->recovery_notifier)
+			mrq->recovery_notifier(mrq);
+	}
+}
+
+static unsigned int cqhci_error_flags(int error1, int error2)
+{
+	int error = error1 ? error1 : error2;
+
+	switch (error) {
+	case -EILSEQ:
+		return CQHCI_HOST_CRC;
+	case -ETIMEDOUT:
+		return CQHCI_HOST_TIMEOUT;
+	default:
+		return CQHCI_HOST_OTHER;
+	}
+}
+
+static void cqhci_error_irq(struct mmc_host *mmc, u32 status, int cmd_error,
+			    int data_error)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	struct cqhci_slot *slot;
+	u32 terri;
+	int tag;
+
+	spin_lock(&cq_host->lock);
+
+	terri = cqhci_readl(cq_host, CQHCI_TERRI);
+
+	pr_debug("%s: cqhci: error IRQ status: 0x%08x cmd error %d data error %d TERRI: 0x%08x\n",
+		 mmc_hostname(mmc), status, cmd_error, data_error, terri);
+
+	/* Forget about errors when recovery has already been triggered */
+	if (cq_host->recovery_halt)
+		goto out_unlock;
+
+	if (!cq_host->qcnt) {
+		WARN_ONCE(1, "%s: cqhci: error when idle. IRQ status: 0x%08x cmd error %d data error %d TERRI: 0x%08x\n",
+			  mmc_hostname(mmc), status, cmd_error, data_error,
+			  terri);
+		goto out_unlock;
+	}
+
+	if (CQHCI_TERRI_C_VALID(terri)) {
+		tag = CQHCI_TERRI_C_TASK(terri);
+		slot = &cq_host->slot[tag];
+		if (slot->mrq) {
+			slot->flags = cqhci_error_flags(cmd_error, data_error);
+			cqhci_recovery_needed(mmc, slot->mrq, true);
+		}
+	}
+
+	if (CQHCI_TERRI_D_VALID(terri)) {
+		tag = CQHCI_TERRI_D_TASK(terri);
+		slot = &cq_host->slot[tag];
+		if (slot->mrq) {
+			slot->flags = cqhci_error_flags(data_error, cmd_error);
+			cqhci_recovery_needed(mmc, slot->mrq, true);
+		}
+	}
+
+	if (!cq_host->recovery_halt) {
+		/*
+		 * The only way to guarantee forward progress is to mark at
+		 * least one task in error, so if none is indicated, pick one.
+		 */
+		for (tag = 0; tag < NUM_SLOTS; tag++) {
+			slot = &cq_host->slot[tag];
+			if (!slot->mrq)
+				continue;
+			slot->flags = cqhci_error_flags(data_error, cmd_error);
+			cqhci_recovery_needed(mmc, slot->mrq, true);
+			break;
+		}
+	}
+
+out_unlock:
+	spin_unlock(&cq_host->lock);
+}
+
+static void cqhci_finish_mrq(struct mmc_host *mmc, unsigned int tag)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	struct cqhci_slot *slot = &cq_host->slot[tag];
+	struct mmc_request *mrq = slot->mrq;
+	struct mmc_data *data;
+
+	if (!mrq) {
+		WARN_ONCE(1, "%s: cqhci: spurious TCN for tag %d\n",
+			  mmc_hostname(mmc), tag);
+		return;
+	}
+
+	/* No completions allowed during recovery */
+	if (cq_host->recovery_halt) {
+		slot->flags |= CQHCI_COMPLETED;
+		return;
+	}
+
+	slot->mrq = NULL;
+
+	cq_host->qcnt -= 1;
+
+	data = mrq->data;
+	if (data) {
+		if (data->error)
+			data->bytes_xfered = 0;
+		else
+			data->bytes_xfered = data->blksz * data->blocks;
+	}
+
+	mmc_cqe_request_done(mmc, mrq);
+}
+
+irqreturn_t cqhci_irq(struct mmc_host *mmc, u32 intmask, int cmd_error,
+		      int data_error)
+{
+	u32 status;
+	unsigned long tag = 0, comp_status;
+	struct cqhci_host *cq_host = mmc->cqe_private;
+
+	status = cqhci_readl(cq_host, CQHCI_IS);
+	cqhci_writel(cq_host, status, CQHCI_IS);
+
+	pr_debug("%s: cqhci: IRQ status: 0x%08x\n", mmc_hostname(mmc), status);
+
+	if ((status & CQHCI_IS_RED) || cmd_error || data_error)
+		cqhci_error_irq(mmc, status, cmd_error, data_error);
+
+	if (status & CQHCI_IS_TCC) {
+		/* read TCN and complete the request */
+		comp_status = cqhci_readl(cq_host, CQHCI_TCN);
+		cqhci_writel(cq_host, comp_status, CQHCI_TCN);
+		pr_debug("%s: cqhci: TCN: 0x%08lx\n",
+			 mmc_hostname(mmc), comp_status);
+
+		spin_lock(&cq_host->lock);
+
+		for_each_set_bit(tag, &comp_status, cq_host->num_slots) {
+			/* complete the corresponding mrq */
+			pr_debug("%s: cqhci: completing tag %lu\n",
+				 mmc_hostname(mmc), tag);
+			cqhci_finish_mrq(mmc, tag);
+		}
+
+		if (cq_host->waiting_for_idle && !cq_host->qcnt) {
+			cq_host->waiting_for_idle = false;
+			wake_up(&cq_host->wait_queue);
+		}
+
+		spin_unlock(&cq_host->lock);
+	}
+
+	if (status & CQHCI_IS_TCL)
+		wake_up(&cq_host->wait_queue);
+
+	if (status & CQHCI_IS_HAC)
+		wake_up(&cq_host->wait_queue);
+
+	return IRQ_HANDLED;
+}
+EXPORT_SYMBOL(cqhci_irq);
+
+static bool cqhci_is_idle(struct cqhci_host *cq_host, int *ret)
+{
+	unsigned long flags;
+	bool is_idle;
+
+	spin_lock_irqsave(&cq_host->lock, flags);
+	is_idle = !cq_host->qcnt || cq_host->recovery_halt;
+	*ret = cq_host->recovery_halt ? -EBUSY : 0;
+	cq_host->waiting_for_idle = !is_idle;
+	spin_unlock_irqrestore(&cq_host->lock, flags);
+
+	return is_idle;
+}
+
+static int cqhci_wait_for_idle(struct mmc_host *mmc)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	int ret;
+
+	wait_event(cq_host->wait_queue, cqhci_is_idle(cq_host, &ret));
+
+	return ret;
+}
+
+static bool cqhci_timeout(struct mmc_host *mmc, struct mmc_request *mrq,
+			  bool *recovery_needed)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	int tag = cqhci_tag(mrq);
+	struct cqhci_slot *slot = &cq_host->slot[tag];
+	unsigned long flags;
+	bool timed_out;
+
+	spin_lock_irqsave(&cq_host->lock, flags);
+	timed_out = slot->mrq == mrq;
+	if (timed_out) {
+		slot->flags |= CQHCI_EXTERNAL_TIMEOUT;
+		cqhci_recovery_needed(mmc, mrq, false);
+		*recovery_needed = cq_host->recovery_halt;
+	}
+	spin_unlock_irqrestore(&cq_host->lock, flags);
+
+	if (timed_out) {
+		pr_err("%s: cqhci: timeout for tag %d\n",
+		       mmc_hostname(mmc), tag);
+		cqhci_dumpregs(cq_host);
+	}
+
+	return timed_out;
+}
+
+static bool cqhci_tasks_cleared(struct cqhci_host *cq_host)
+{
+	return !(cqhci_readl(cq_host, CQHCI_CTL) & CQHCI_CLEAR_ALL_TASKS);
+}
+
+static bool cqhci_clear_all_tasks(struct mmc_host *mmc, unsigned int timeout)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	bool ret;
+	u32 ctl;
+
+	cqhci_set_irqs(cq_host, CQHCI_IS_TCL);
+
+	ctl = cqhci_readl(cq_host, CQHCI_CTL);
+	ctl |= CQHCI_CLEAR_ALL_TASKS;
+	cqhci_writel(cq_host, ctl, CQHCI_CTL);
+
+	wait_event_timeout(cq_host->wait_queue, cqhci_tasks_cleared(cq_host),
+			   msecs_to_jiffies(timeout) + 1);
+
+	cqhci_set_irqs(cq_host, 0);
+
+	ret = cqhci_tasks_cleared(cq_host);
+
+	if (!ret)
+		pr_debug("%s: cqhci: Failed to clear tasks\n",
+			 mmc_hostname(mmc));
+
+	return ret;
+}
+
+static bool cqhci_halted(struct cqhci_host *cq_host)
+{
+	return cqhci_readl(cq_host, CQHCI_CTL) & CQHCI_HALT;
+}
+
+static bool cqhci_halt(struct mmc_host *mmc, unsigned int timeout)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	bool ret;
+	u32 ctl;
+
+	if (cqhci_halted(cq_host))
+		return true;
+
+	cqhci_set_irqs(cq_host, CQHCI_IS_HAC);
+
+	ctl = cqhci_readl(cq_host, CQHCI_CTL);
+	ctl |= CQHCI_HALT;
+	cqhci_writel(cq_host, ctl, CQHCI_CTL);
+
+	wait_event_timeout(cq_host->wait_queue, cqhci_halted(cq_host),
+			   msecs_to_jiffies(timeout) + 1);
+
+	cqhci_set_irqs(cq_host, 0);
+
+	ret = cqhci_halted(cq_host);
+
+	if (!ret)
+		pr_debug("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));
+
+	return ret;
+}
+
+/*
+ * After halting we expect to be able to use the command line. We interpret the
+ * failure to halt to mean the data lines might still be in use (and the upper
+ * layers will need to send a STOP command), so we set the timeout based on a
+ * generous command timeout.
+ */
+#define CQHCI_START_HALT_TIMEOUT	5
+
+static void cqhci_recovery_start(struct mmc_host *mmc)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+
+	pr_debug("%s: cqhci: %s\n", mmc_hostname(mmc), __func__);
+
+	WARN_ON(!cq_host->recovery_halt);
+
+	cqhci_halt(mmc, CQHCI_START_HALT_TIMEOUT);
+
+	if (cq_host->ops->disable)
+		cq_host->ops->disable(mmc, true);
+
+	mmc->cqe_on = false;
+}
+
+static int cqhci_error_from_flags(unsigned int flags)
+{
+	if (!flags)
+		return 0;
+
+	/* CRC errors might indicate re-tuning so prefer to report that */
+	if (flags & CQHCI_HOST_CRC)
+		return -EILSEQ;
+
+	if (flags & (CQHCI_EXTERNAL_TIMEOUT | CQHCI_HOST_TIMEOUT))
+		return -ETIMEDOUT;
+
+	return -EIO;
+}
+
+static void cqhci_recover_mrq(struct cqhci_host *cq_host, unsigned int tag)
+{
+	struct cqhci_slot *slot = &cq_host->slot[tag];
+	struct mmc_request *mrq = slot->mrq;
+	struct mmc_data *data;
+
+	if (!mrq)
+		return;
+
+	slot->mrq = NULL;
+
+	cq_host->qcnt -= 1;
+
+	data = mrq->data;
+	if (data) {
+		data->bytes_xfered = 0;
+		data->error = cqhci_error_from_flags(slot->flags);
+	} else {
+		mrq->cmd->error = cqhci_error_from_flags(slot->flags);
+	}
+
+	mmc_cqe_request_done(cq_host->mmc, mrq);
+}
+
+static void cqhci_recover_mrqs(struct cqhci_host *cq_host)
+{
+	int i;
+
+	for (i = 0; i < cq_host->num_slots; i++)
+		cqhci_recover_mrq(cq_host, i);
+}
+
+/*
+ * By now the command and data lines should be unused so there is no reason for
+ * CQHCI to take a long time to halt, but if it doesn't halt there could be
+ * problems clearing tasks, so be generous.
+ */
+#define CQHCI_FINISH_HALT_TIMEOUT	20
+
+/* CQHCI could be expected to clear it's internal state pretty quickly */
+#define CQHCI_CLEAR_TIMEOUT		20
+
+static void cqhci_recovery_finish(struct mmc_host *mmc)
+{
+	struct cqhci_host *cq_host = mmc->cqe_private;
+	unsigned long flags;
+	u32 cqcfg;
+	bool ok;
+
+	pr_debug("%s: cqhci: %s\n", mmc_hostname(mmc), __func__);
+
+	WARN_ON(!cq_host->recovery_halt);
+
+	ok = cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT);
+
+	if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT))
+		ok = false;
+
+	/*
+	 * The specification contradicts itself, by saying that tasks cannot be
+	 * cleared if CQHCI does not halt, but if CQHCI does not halt, it should
+	 * be disabled/re-enabled, but not to disable before clearing tasks.
+	 * Have a go anyway.
+	 */
+	if (!ok) {
+		pr_debug("%s: cqhci: disable / re-enable\n", mmc_hostname(mmc));
+		cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
+		cqcfg &= ~CQHCI_ENABLE;
+		cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+		cqcfg |= CQHCI_ENABLE;
+		cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+		/* Be sure that there are no tasks */
+		ok = cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT);
+		if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT))
+			ok = false;
+		WARN_ON(!ok);
+	}
+
+	cqhci_recover_mrqs(cq_host);
+
+	WARN_ON(cq_host->qcnt);
+
+	spin_lock_irqsave(&cq_host->lock, flags);
+	cq_host->qcnt = 0;
+	cq_host->recovery_halt = false;
+	mmc->cqe_on = false;
+	spin_unlock_irqrestore(&cq_host->lock, flags);
+
+	/* Ensure all writes are done before interrupts are re-enabled */
+	wmb();
+
+	cqhci_writel(cq_host, CQHCI_IS_HAC | CQHCI_IS_TCL, CQHCI_IS);
+
+	cqhci_set_irqs(cq_host, CQHCI_IS_MASK);
+
+	pr_debug("%s: cqhci: recovery done\n", mmc_hostname(mmc));
+}
+
+static const struct mmc_cqe_ops cqhci_cqe_ops = {
+	.cqe_enable = cqhci_enable,
+	.cqe_disable = cqhci_disable,
+	.cqe_request = cqhci_request,
+	.cqe_post_req = cqhci_post_req,
+	.cqe_off = cqhci_off,
+	.cqe_wait_for_idle = cqhci_wait_for_idle,
+	.cqe_timeout = cqhci_timeout,
+	.cqe_recovery_start = cqhci_recovery_start,
+	.cqe_recovery_finish = cqhci_recovery_finish,
+};
+
+struct cqhci_host *cqhci_pltfm_init(struct platform_device *pdev)
+{
+	struct cqhci_host *cq_host;
+	struct resource *cqhci_memres = NULL;
+
+	/* check and setup CMDQ interface */
+	cqhci_memres = platform_get_resource_byname(pdev, IORESOURCE_MEM,
+						   "cqhci_mem");
+	if (!cqhci_memres) {
+		dev_dbg(&pdev->dev, "CMDQ not supported\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	cq_host = devm_kzalloc(&pdev->dev, sizeof(*cq_host), GFP_KERNEL);
+	if (!cq_host)
+		return ERR_PTR(-ENOMEM);
+	cq_host->mmio = devm_ioremap(&pdev->dev,
+				     cqhci_memres->start,
+				     resource_size(cqhci_memres));
+	if (!cq_host->mmio) {
+		dev_err(&pdev->dev, "failed to remap cqhci regs\n");
+		return ERR_PTR(-EBUSY);
+	}
+	dev_dbg(&pdev->dev, "CMDQ ioremap: done\n");
+
+	return cq_host;
+}
+EXPORT_SYMBOL(cqhci_pltfm_init);
+
+static unsigned int cqhci_ver_major(struct cqhci_host *cq_host)
+{
+	return CQHCI_VER_MAJOR(cqhci_readl(cq_host, CQHCI_VER));
+}
+
+static unsigned int cqhci_ver_minor(struct cqhci_host *cq_host)
+{
+	u32 ver = cqhci_readl(cq_host, CQHCI_VER);
+
+	return CQHCI_VER_MINOR1(ver) * 10 + CQHCI_VER_MINOR2(ver);
+}
+
+int cqhci_init(struct cqhci_host *cq_host, struct mmc_host *mmc,
+	      bool dma64)
+{
+	int err;
+
+	cq_host->dma64 = dma64;
+	cq_host->mmc = mmc;
+	cq_host->mmc->cqe_private = cq_host;
+
+	cq_host->num_slots = NUM_SLOTS;
+	cq_host->dcmd_slot = DCMD_SLOT;
+
+	mmc->cqe_ops = &cqhci_cqe_ops;
+
+	mmc->cqe_qdepth = NUM_SLOTS;
+	if (mmc->caps2 & MMC_CAP2_CQE_DCMD)
+		mmc->cqe_qdepth -= 1;
+
+	cq_host->slot = devm_kcalloc(mmc_dev(mmc), cq_host->num_slots,
+				     sizeof(*cq_host->slot), GFP_KERNEL);
+	if (!cq_host->slot) {
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	spin_lock_init(&cq_host->lock);
+
+	init_completion(&cq_host->halt_comp);
+	init_waitqueue_head(&cq_host->wait_queue);
+
+	pr_info("%s: CQHCI version %u.%02u\n",
+		mmc_hostname(mmc), cqhci_ver_major(cq_host),
+		cqhci_ver_minor(cq_host));
+
+	return 0;
+
+out_err:
+	pr_err("%s: CQHCI version %u.%02u failed to initialize, error %d\n",
+	       mmc_hostname(mmc), cqhci_ver_major(cq_host),
+	       cqhci_ver_minor(cq_host), err);
+	return err;
+}
+EXPORT_SYMBOL(cqhci_init);
+
+MODULE_AUTHOR("Venkat Gopalakrishnan <venkatg@codeaurora.org>");
+MODULE_DESCRIPTION("Command Queue Host Controller Interface driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/mmc/host/cqhci.h b/drivers/mmc/host/cqhci.h
new file mode 100644
index 000000000000..2d39d361b322
--- /dev/null
+++ b/drivers/mmc/host/cqhci.h
@@ -0,0 +1,240 @@
+/* Copyright (c) 2015, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#ifndef LINUX_MMC_CQHCI_H
+#define LINUX_MMC_CQHCI_H
+
+#include <linux/compiler.h>
+#include <linux/bitops.h>
+#include <linux/spinlock_types.h>
+#include <linux/types.h>
+#include <linux/completion.h>
+#include <linux/wait.h>
+#include <linux/irqreturn.h>
+#include <asm/io.h>
+
+/* registers */
+/* version */
+#define CQHCI_VER			0x00
+#define CQHCI_VER_MAJOR(x)		(((x) & GENMASK(11, 8)) >> 8)
+#define CQHCI_VER_MINOR1(x)		(((x) & GENMASK(7, 4)) >> 4)
+#define CQHCI_VER_MINOR2(x)		((x) & GENMASK(3, 0))
+
+/* capabilities */
+#define CQHCI_CAP			0x04
+/* configuration */
+#define CQHCI_CFG			0x08
+#define CQHCI_DCMD			0x00001000
+#define CQHCI_TASK_DESC_SZ		0x00000100
+#define CQHCI_ENABLE			0x00000001
+
+/* control */
+#define CQHCI_CTL			0x0C
+#define CQHCI_CLEAR_ALL_TASKS		0x00000100
+#define CQHCI_HALT			0x00000001
+
+/* interrupt status */
+#define CQHCI_IS			0x10
+#define CQHCI_IS_HAC			BIT(0)
+#define CQHCI_IS_TCC			BIT(1)
+#define CQHCI_IS_RED			BIT(2)
+#define CQHCI_IS_TCL			BIT(3)
+
+#define CQHCI_IS_MASK (CQHCI_IS_TCC | CQHCI_IS_RED)
+
+/* interrupt status enable */
+#define CQHCI_ISTE			0x14
+
+/* interrupt signal enable */
+#define CQHCI_ISGE			0x18
+
+/* interrupt coalescing */
+#define CQHCI_IC			0x1C
+#define CQHCI_IC_ENABLE			BIT(31)
+#define CQHCI_IC_RESET			BIT(16)
+#define CQHCI_IC_ICCTHWEN		BIT(15)
+#define CQHCI_IC_ICCTH(x)		((x & 0x1F) << 8)
+#define CQHCI_IC_ICTOVALWEN		BIT(7)
+#define CQHCI_IC_ICTOVAL(x)		(x & 0x7F)
+
+/* task list base address */
+#define CQHCI_TDLBA			0x20
+
+/* task list base address upper */
+#define CQHCI_TDLBAU			0x24
+
+/* door-bell */
+#define CQHCI_TDBR			0x28
+
+/* task completion notification */
+#define CQHCI_TCN			0x2C
+
+/* device queue status */
+#define CQHCI_DQS			0x30
+
+/* device pending tasks */
+#define CQHCI_DPT			0x34
+
+/* task clear */
+#define CQHCI_TCLR			0x38
+
+/* send status config 1 */
+#define CQHCI_SSC1			0x40
+
+/* send status config 2 */
+#define CQHCI_SSC2			0x44
+
+/* response for dcmd */
+#define CQHCI_CRDCT			0x48
+
+/* response mode error mask */
+#define CQHCI_RMEM			0x50
+
+/* task error info */
+#define CQHCI_TERRI			0x54
+
+#define CQHCI_TERRI_C_INDEX(x)		((x) & GENMASK(5, 0))
+#define CQHCI_TERRI_C_TASK(x)		(((x) & GENMASK(12, 8)) >> 8)
+#define CQHCI_TERRI_C_VALID(x)		((x) & BIT(15))
+#define CQHCI_TERRI_D_INDEX(x)		(((x) & GENMASK(21, 16)) >> 16)
+#define CQHCI_TERRI_D_TASK(x)		(((x) & GENMASK(28, 24)) >> 24)
+#define CQHCI_TERRI_D_VALID(x)		((x) & BIT(31))
+
+/* command response index */
+#define CQHCI_CRI			0x58
+
+/* command response argument */
+#define CQHCI_CRA			0x5C
+
+#define CQHCI_INT_ALL			0xF
+#define CQHCI_IC_DEFAULT_ICCTH		31
+#define CQHCI_IC_DEFAULT_ICTOVAL	1
+
+/* attribute fields */
+#define CQHCI_VALID(x)			((x & 1) << 0)
+#define CQHCI_END(x)			((x & 1) << 1)
+#define CQHCI_INT(x)			((x & 1) << 2)
+#define CQHCI_ACT(x)			((x & 0x7) << 3)
+
+/* data command task descriptor fields */
+#define CQHCI_FORCED_PROG(x)		((x & 1) << 6)
+#define CQHCI_CONTEXT(x)		((x & 0xF) << 7)
+#define CQHCI_DATA_TAG(x)		((x & 1) << 11)
+#define CQHCI_DATA_DIR(x)		((x & 1) << 12)
+#define CQHCI_PRIORITY(x)		((x & 1) << 13)
+#define CQHCI_QBAR(x)			((x & 1) << 14)
+#define CQHCI_REL_WRITE(x)		((x & 1) << 15)
+#define CQHCI_BLK_COUNT(x)		((x & 0xFFFF) << 16)
+#define CQHCI_BLK_ADDR(x)		((x & 0xFFFFFFFF) << 32)
+
+/* direct command task descriptor fields */
+#define CQHCI_CMD_INDEX(x)		((x & 0x3F) << 16)
+#define CQHCI_CMD_TIMING(x)		((x & 1) << 22)
+#define CQHCI_RESP_TYPE(x)		((x & 0x3) << 23)
+
+/* transfer descriptor fields */
+#define CQHCI_DAT_LENGTH(x)		((x & 0xFFFF) << 16)
+#define CQHCI_DAT_ADDR_LO(x)		((x & 0xFFFFFFFF) << 32)
+#define CQHCI_DAT_ADDR_HI(x)		((x & 0xFFFFFFFF) << 0)
+
+struct cqhci_host_ops;
+struct mmc_host;
+struct cqhci_slot;
+
+struct cqhci_host {
+	const struct cqhci_host_ops *ops;
+	void __iomem *mmio;
+	struct mmc_host *mmc;
+
+	spinlock_t lock;
+
+	/* relative card address of device */
+	unsigned int rca;
+
+	/* 64 bit DMA */
+	bool dma64;
+	int num_slots;
+	int qcnt;
+
+	u32 dcmd_slot;
+	u32 caps;
+#define CQHCI_TASK_DESC_SZ_128		0x1
+
+	u32 quirks;
+#define CQHCI_QUIRK_SHORT_TXFR_DESC_SZ	0x1
+
+	bool enabled;
+	bool halted;
+	bool init_done;
+	bool activated;
+	bool waiting_for_idle;
+	bool recovery_halt;
+
+	size_t desc_size;
+	size_t data_size;
+
+	u8 *desc_base;
+
+	/* total descriptor size */
+	u8 slot_sz;
+
+	/* 64/128 bit depends on CQHCI_CFG */
+	u8 task_desc_len;
+
+	/* 64 bit on 32-bit arch, 128 bit on 64-bit */
+	u8 link_desc_len;
+
+	u8 *trans_desc_base;
+	/* same length as transfer descriptor */
+	u8 trans_desc_len;
+
+	dma_addr_t desc_dma_base;
+	dma_addr_t trans_desc_dma_base;
+
+	struct completion halt_comp;
+	wait_queue_head_t wait_queue;
+	struct cqhci_slot *slot;
+};
+
+struct cqhci_host_ops {
+	void (*dumpregs)(struct mmc_host *mmc);
+	void (*write_l)(struct cqhci_host *host, u32 val, int reg);
+	u32 (*read_l)(struct cqhci_host *host, int reg);
+	void (*enable)(struct mmc_host *mmc);
+	void (*disable)(struct mmc_host *mmc, bool recovery);
+};
+
+static inline void cqhci_writel(struct cqhci_host *host, u32 val, int reg)
+{
+	if (unlikely(host->ops->write_l))
+		host->ops->write_l(host, val, reg);
+	else
+		writel_relaxed(val, host->mmio + reg);
+}
+
+static inline u32 cqhci_readl(struct cqhci_host *host, int reg)
+{
+	if (unlikely(host->ops->read_l))
+		return host->ops->read_l(host, reg);
+	else
+		return readl_relaxed(host->mmio + reg);
+}
+
+struct platform_device;
+
+irqreturn_t cqhci_irq(struct mmc_host *mmc, u32 intmask, int cmd_error,
+		      int data_error);
+int cqhci_init(struct cqhci_host *cq_host, struct mmc_host *mmc, bool dma64);
+struct cqhci_host *cqhci_pltfm_init(struct platform_device *pdev);
+int cqhci_suspend(struct mmc_host *mmc);
+int cqhci_resume(struct mmc_host *mmc);
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK
  2017-11-03 13:20 [PATCH V13 00/10] mmc: Add Command Queue support Adrian Hunter
                   ` (4 preceding siblings ...)
  2017-11-03 13:20 ` [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host Adrian Hunter
@ 2017-11-03 13:20 ` Adrian Hunter
  2017-11-08  9:24   ` Linus Walleij
  2017-11-09 13:37   ` Ulf Hansson
  2017-11-03 13:20 ` [PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion Adrian Hunter
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Adrian Hunter @ 2017-11-03 13:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

Add CQHCI initialization and implement CQHCI operations for Intel GLK.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/host/Kconfig          |   1 +
 drivers/mmc/host/sdhci-pci-core.c | 155 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 155 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 3092b7085cb5..2b02a9788bb6 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -81,6 +81,7 @@ config MMC_SDHCI_BIG_ENDIAN_32BIT_BYTE_SWAPPER
 config MMC_SDHCI_PCI
 	tristate "SDHCI support on PCI bus"
 	depends on MMC_SDHCI && PCI
+	select MMC_CQHCI
 	help
 	  This selects the PCI Secure Digital Host Controller Interface.
 	  Most controllers found today are PCI devices.
diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
index 3e4f04fd5175..110c634cfb43 100644
--- a/drivers/mmc/host/sdhci-pci-core.c
+++ b/drivers/mmc/host/sdhci-pci-core.c
@@ -30,6 +30,8 @@
 #include <linux/mmc/sdhci-pci-data.h>
 #include <linux/acpi.h>
 
+#include "cqhci.h"
+
 #include "sdhci.h"
 #include "sdhci-pci.h"
 
@@ -116,6 +118,28 @@ int sdhci_pci_resume_host(struct sdhci_pci_chip *chip)
 
 	return 0;
 }
+
+static int sdhci_cqhci_suspend(struct sdhci_pci_chip *chip)
+{
+	int ret;
+
+	ret = cqhci_suspend(chip->slots[0]->host->mmc);
+	if (ret)
+		return ret;
+
+	return sdhci_pci_suspend_host(chip);
+}
+
+static int sdhci_cqhci_resume(struct sdhci_pci_chip *chip)
+{
+	int ret;
+
+	ret = sdhci_pci_resume_host(chip);
+	if (ret)
+		return ret;
+
+	return cqhci_resume(chip->slots[0]->host->mmc);
+}
 #endif
 
 #ifdef CONFIG_PM
@@ -166,8 +190,48 @@ static int sdhci_pci_runtime_resume_host(struct sdhci_pci_chip *chip)
 
 	return 0;
 }
+
+static int sdhci_cqhci_runtime_suspend(struct sdhci_pci_chip *chip)
+{
+	int ret;
+
+	ret = cqhci_suspend(chip->slots[0]->host->mmc);
+	if (ret)
+		return ret;
+
+	return sdhci_pci_runtime_suspend_host(chip);
+}
+
+static int sdhci_cqhci_runtime_resume(struct sdhci_pci_chip *chip)
+{
+	int ret;
+
+	ret = sdhci_pci_runtime_resume_host(chip);
+	if (ret)
+		return ret;
+
+	return cqhci_resume(chip->slots[0]->host->mmc);
+}
 #endif
 
+static u32 sdhci_cqhci_irq(struct sdhci_host *host, u32 intmask)
+{
+	int cmd_error = 0;
+	int data_error = 0;
+
+	if (!sdhci_cqe_irq(host, intmask, &cmd_error, &data_error))
+		return intmask;
+
+	cqhci_irq(host->mmc, intmask, cmd_error, data_error);
+
+	return 0;
+}
+
+static void sdhci_pci_dumpregs(struct mmc_host *mmc)
+{
+	sdhci_dumpregs(mmc_priv(mmc));
+}
+
 /*****************************************************************************\
  *                                                                           *
  * Hardware specific quirk handling                                          *
@@ -583,6 +647,18 @@ static void sdhci_intel_voltage_switch(struct sdhci_host *host)
 	.voltage_switch		= sdhci_intel_voltage_switch,
 };
 
+static const struct sdhci_ops sdhci_intel_glk_ops = {
+	.set_clock		= sdhci_set_clock,
+	.set_power		= sdhci_intel_set_power,
+	.enable_dma		= sdhci_pci_enable_dma,
+	.set_bus_width		= sdhci_set_bus_width,
+	.reset			= sdhci_reset,
+	.set_uhs_signaling	= sdhci_set_uhs_signaling,
+	.hw_reset		= sdhci_pci_hw_reset,
+	.voltage_switch		= sdhci_intel_voltage_switch,
+	.irq			= sdhci_cqhci_irq,
+};
+
 static void byt_read_dsm(struct sdhci_pci_slot *slot)
 {
 	struct intel_host *intel_host = sdhci_pci_priv(slot);
@@ -612,12 +688,80 @@ static int glk_emmc_probe_slot(struct sdhci_pci_slot *slot)
 {
 	int ret = byt_emmc_probe_slot(slot);
 
+	slot->host->mmc->caps2 |= MMC_CAP2_CQE;
+
 	if (slot->chip->pdev->device != PCI_DEVICE_ID_INTEL_GLK_EMMC) {
 		slot->host->mmc->caps2 |= MMC_CAP2_HS400_ES,
 		slot->host->mmc_host_ops.hs400_enhanced_strobe =
 						intel_hs400_enhanced_strobe;
+		slot->host->mmc->caps2 |= MMC_CAP2_CQE_DCMD;
+	}
+
+	return ret;
+}
+
+static void glk_cqe_enable(struct mmc_host *mmc)
+{
+	struct sdhci_host *host = mmc_priv(mmc);
+	u32 reg;
+
+	/*
+	 * CQE gets stuck if it sees Buffer Read Enable bit set, which can be
+	 * the case after tuning, so ensure the buffer is drained.
+	 */
+	reg = sdhci_readl(host, SDHCI_PRESENT_STATE);
+	while (reg & SDHCI_DATA_AVAILABLE) {
+		sdhci_readl(host, SDHCI_BUFFER);
+		reg = sdhci_readl(host, SDHCI_PRESENT_STATE);
+	}
+
+	sdhci_cqe_enable(mmc);
+}
+
+static const struct cqhci_host_ops glk_cqhci_ops = {
+	.enable		= glk_cqe_enable,
+	.disable	= sdhci_cqe_disable,
+	.dumpregs	= sdhci_pci_dumpregs,
+};
+
+static int glk_emmc_add_host(struct sdhci_pci_slot *slot)
+{
+	struct device *dev = &slot->chip->pdev->dev;
+	struct sdhci_host *host = slot->host;
+	struct cqhci_host *cq_host;
+	bool dma64;
+	int ret;
+
+	ret = sdhci_setup_host(host);
+	if (ret)
+		return ret;
+
+	cq_host = devm_kzalloc(dev, sizeof(*cq_host), GFP_KERNEL);
+	if (!cq_host) {
+		ret = -ENOMEM;
+		goto cleanup;
 	}
 
+	cq_host->mmio = host->ioaddr + 0x200;
+	cq_host->quirks |= CQHCI_QUIRK_SHORT_TXFR_DESC_SZ;
+	cq_host->ops = &glk_cqhci_ops;
+
+	dma64 = host->flags & SDHCI_USE_64_BIT_DMA;
+	if (dma64)
+		cq_host->caps |= CQHCI_TASK_DESC_SZ_128;
+
+	ret = cqhci_init(cq_host, host->mmc, dma64);
+	if (ret)
+		goto cleanup;
+
+	ret = __sdhci_add_host(host);
+	if (ret)
+		goto cleanup;
+
+	return 0;
+
+cleanup:
+	sdhci_cleanup_host(host);
 	return ret;
 }
 
@@ -699,11 +843,20 @@ static int byt_sd_probe_slot(struct sdhci_pci_slot *slot)
 static const struct sdhci_pci_fixes sdhci_intel_glk_emmc = {
 	.allow_runtime_pm	= true,
 	.probe_slot		= glk_emmc_probe_slot,
+	.add_host		= glk_emmc_add_host,
+#ifdef CONFIG_PM_SLEEP
+	.suspend		= sdhci_cqhci_suspend,
+	.resume			= sdhci_cqhci_resume,
+#endif
+#ifdef CONFIG_PM
+	.runtime_suspend	= sdhci_cqhci_runtime_suspend,
+	.runtime_resume		= sdhci_cqhci_runtime_resume,
+#endif
 	.quirks			= SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC,
 	.quirks2		= SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
 				  SDHCI_QUIRK2_CAPS_BIT63_FOR_HS400 |
 				  SDHCI_QUIRK2_STOP_WITH_TC,
-	.ops			= &sdhci_intel_byt_ops,
+	.ops			= &sdhci_intel_glk_ops,
 	.priv_size		= sizeof(struct intel_host),
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion
  2017-11-03 13:20 [PATCH V13 00/10] mmc: Add Command Queue support Adrian Hunter
                   ` (5 preceding siblings ...)
  2017-11-03 13:20 ` [PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK Adrian Hunter
@ 2017-11-03 13:20 ` Adrian Hunter
  2017-11-08  9:28   ` Linus Walleij
  2017-11-09 13:07   ` Ulf Hansson
  2017-11-03 13:20 ` [PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery Adrian Hunter
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Adrian Hunter @ 2017-11-03 13:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

For blk-mq, add support for completing requests directly in the ->done
callback. That means that error handling and urgent background operations
must be handled by recovery_work in that case.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/core/block.c | 100 +++++++++++++++++++++++++++++++++++++++++------
 drivers/mmc/core/block.h |   1 +
 drivers/mmc/core/queue.c |   5 ++-
 drivers/mmc/core/queue.h |   6 +++
 include/linux/mmc/host.h |   1 +
 5 files changed, 101 insertions(+), 12 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index e8be17152884..cbb4b35a592d 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2086,6 +2086,22 @@ static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
 	}
 }
 
+static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq)
+{
+	mmc_blk_eval_resp_error(brq);
+
+	return brq->sbc.error || brq->cmd.error || brq->stop.error ||
+	       brq->data.error || brq->cmd.resp[0] & CMD_ERRORS;
+}
+
+static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq,
+					    struct request *req)
+{
+	int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
+
+	mmc_blk_reset_success(mq->blkdata, type);
+}
+
 static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req)
 {
 	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
@@ -2167,14 +2183,43 @@ static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req)
 	if (host->ops->post_req)
 		host->ops->post_req(host, mrq, 0);
 
-	blk_mq_complete_request(req);
+	/*
+	 * Block layer timeouts race with completions which means the normal
+	 * completion path cannot be used during recovery.
+	 */
+	if (mq->in_recovery)
+		mmc_blk_mq_complete_rq(mq, req);
+	else
+		blk_mq_complete_request(req);
 
 	mmc_blk_mq_acct_req_done(mq, req);
 }
 
+void mmc_blk_mq_recovery(struct mmc_queue *mq)
+{
+	struct request *req = mq->recovery_req;
+	struct mmc_host *host = mq->card->host;
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+
+	mq->recovery_req = NULL;
+	mq->rw_wait = false;
+
+	if (mmc_blk_rq_error(&mqrq->brq)) {
+		mmc_retune_hold_now(host);
+		mmc_blk_rw_recovery(mq, req);
+	}
+
+	mmc_blk_urgent_bkops(mq, mqrq);
+
+	mmc_blk_mq_post_req(mq, req);
+}
+
 static void mmc_blk_mq_complete_prev_req(struct mmc_queue *mq,
 					 struct request **prev_req)
 {
+	if (mmc_queue_direct_complete(mq->card->host))
+		return;
+
 	mutex_lock(&mq->complete_lock);
 
 	if (!mq->complete_req)
@@ -2208,19 +2253,43 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq)
 	struct request *req = mmc_queue_req_to_req(mqrq);
 	struct request_queue *q = req->q;
 	struct mmc_queue *mq = q->queuedata;
+	struct mmc_host *host = mq->card->host;
 	unsigned long flags;
-	bool waiting;
 
-	spin_lock_irqsave(q->queue_lock, flags);
-	mq->complete_req = req;
-	mq->rw_wait = false;
-	waiting = mq->waiting;
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	if (!mmc_queue_direct_complete(host)) {
+		bool waiting;
+
+		spin_lock_irqsave(q->queue_lock, flags);
+		mq->complete_req = req;
+		mq->rw_wait = false;
+		waiting = mq->waiting;
+		spin_unlock_irqrestore(q->queue_lock, flags);
+
+		if (waiting)
+			wake_up(&mq->wait);
+		else
+			kblockd_schedule_work(&mq->complete_work);
 
-	if (waiting)
+		return;
+	}
+
+	if (mmc_blk_rq_error(&mqrq->brq) ||
+	    mmc_blk_urgent_bkops_needed(mq, mqrq)) {
+		spin_lock_irqsave(q->queue_lock, flags);
+		mq->recovery_needed = true;
+		mq->recovery_req = req;
+		spin_unlock_irqrestore(q->queue_lock, flags);
 		wake_up(&mq->wait);
-	else
-		kblockd_schedule_work(&mq->complete_work);
+		schedule_work(&mq->recovery_work);
+		return;
+	}
+
+	mmc_blk_rw_reset_success(mq, req);
+
+	mq->rw_wait = false;
+	wake_up(&mq->wait);
+
+	mmc_blk_mq_post_req(mq, req);
 }
 
 static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
@@ -2230,7 +2299,12 @@ static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
 	bool done;
 
 	spin_lock_irqsave(q->queue_lock, flags);
-	done = !mq->rw_wait;
+	if (mq->recovery_needed) {
+		*err = -EBUSY;
+		done = true;
+	} else {
+		done = !mq->rw_wait;
+	}
 	mq->waiting = !done;
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
@@ -2277,6 +2351,10 @@ static int mmc_blk_mq_issue_rw_rq(struct mmc_queue *mq,
 	if (err)
 		mq->rw_wait = false;
 
+	/* Release re-tuning here where there is no synchronization required */
+	if (mmc_queue_direct_complete(host))
+		mmc_retune_release(host);
+
 out_post_req:
 	if (err && host->ops->post_req)
 		host->ops->post_req(host, &mqrq->brq.mrq, err);
diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h
index 6c0e98c1af71..5ad22c1c0318 100644
--- a/drivers/mmc/core/block.h
+++ b/drivers/mmc/core/block.h
@@ -12,6 +12,7 @@
 
 enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req);
 void mmc_blk_mq_complete(struct request *req);
+void mmc_blk_mq_recovery(struct mmc_queue *mq);
 
 struct work_struct;
 
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 971f97698866..bcba2995c767 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -165,7 +165,10 @@ static void mmc_mq_recovery_handler(struct work_struct *work)
 
 	mq->in_recovery = true;
 
-	mmc_blk_cqe_recovery(mq);
+	if (mq->use_cqe)
+		mmc_blk_cqe_recovery(mq);
+	else
+		mmc_blk_mq_recovery(mq);
 
 	mq->in_recovery = false;
 
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index f05b5a9d2f87..9bbfbb1fad7b 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -102,6 +102,7 @@ struct mmc_queue {
 	bool			waiting;
 	struct work_struct	recovery_work;
 	wait_queue_head_t	wait;
+	struct request		*recovery_req;
 	struct request		*complete_req;
 	struct mutex		complete_lock;
 	struct work_struct	complete_work;
@@ -133,4 +134,9 @@ static inline int mmc_cqe_qcnt(struct mmc_queue *mq)
 	       mq->in_flight[MMC_ISSUE_ASYNC];
 }
 
+static inline bool mmc_queue_direct_complete(struct mmc_host *host)
+{
+	return host->caps & MMC_CAP_DIRECT_COMPLETE;
+}
+
 #endif
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index ce2075d6f429..4b68a95a8818 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -324,6 +324,7 @@ struct mmc_host {
 #define MMC_CAP_DRIVER_TYPE_A	(1 << 23)	/* Host supports Driver Type A */
 #define MMC_CAP_DRIVER_TYPE_C	(1 << 24)	/* Host supports Driver Type C */
 #define MMC_CAP_DRIVER_TYPE_D	(1 << 25)	/* Host supports Driver Type D */
+#define MMC_CAP_DIRECT_COMPLETE	(1 << 27)	/* RW reqs can be completed within mmc_request_done() */
 #define MMC_CAP_CD_WAKE		(1 << 28)	/* Enable card detect wake */
 #define MMC_CAP_CMD_DURING_TFR	(1 << 29)	/* Commands during data transfer */
 #define MMC_CAP_CMD23		(1 << 30)	/* CMD23 supported. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery
  2017-11-03 13:20 [PATCH V13 00/10] mmc: Add Command Queue support Adrian Hunter
                   ` (6 preceding siblings ...)
  2017-11-03 13:20 ` [PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion Adrian Hunter
@ 2017-11-03 13:20 ` Adrian Hunter
  2017-11-08  9:30   ` Linus Walleij
  2017-11-03 13:20 ` [PATCH V13 09/10] mmc: block: blk-mq: Stop using card_busy_detect() Adrian Hunter
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-03 13:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

Recovery is simpler to understand if it is only used for errors. Create a
separate function for card polling.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/core/block.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index cbb4b35a592d..0c29b1d8d545 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2094,6 +2094,24 @@ static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq)
 	       brq->data.error || brq->cmd.resp[0] & CMD_ERRORS;
 }
 
+static int mmc_blk_card_busy(struct mmc_card *card, struct request *req)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	bool gen_err = false;
+	int err;
+
+	if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ)
+		return 0;
+
+	err = card_busy_detect(card, MMC_BLK_TIMEOUT_MS, false, req, &gen_err);
+
+	/* Copy the general error bit so it will be seen later on */
+	if (gen_err)
+		mqrq->brq.stop.resp[0] |= R1_ERROR;
+
+	return err;
+}
+
 static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq,
 					    struct request *req)
 {
@@ -2150,8 +2168,15 @@ static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
 				       struct request *req)
 {
 	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_host *host = mq->card->host;
 
-	mmc_blk_rw_recovery(mq, req);
+	if (mmc_blk_rq_error(&mqrq->brq) ||
+	    mmc_blk_card_busy(mq->card, req)) {
+		mmc_blk_rw_recovery(mq, req);
+	} else {
+		mmc_blk_rw_reset_success(mq, req);
+		mmc_retune_release(host);
+	}
 
 	mmc_blk_urgent_bkops(mq, mqrq);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH V13 09/10] mmc: block: blk-mq: Stop using card_busy_detect()
  2017-11-03 13:20 [PATCH V13 00/10] mmc: Add Command Queue support Adrian Hunter
                   ` (7 preceding siblings ...)
  2017-11-03 13:20 ` [PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery Adrian Hunter
@ 2017-11-03 13:20 ` Adrian Hunter
  2017-11-09 13:36   ` Ulf Hansson
  2017-11-03 13:20 ` [PATCH V13 10/10] mmc: block: blk-mq: Stop using legacy recovery Adrian Hunter
       [not found] ` <CGME20171116094642epcas1p14018cb1c475efa38942109dc24cd6da9@epcas1p1.samsung.com>
  10 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-03 13:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

card_busy_detect() doesn't set a correct timeout, and it doesn't take care
of error status bits. Stop using it for blk-mq.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/core/block.c | 117 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 109 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 0c29b1d8d545..5c5ff3c34313 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1426,15 +1426,18 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq,
 	}
 }
 
-#define CMD_ERRORS							\
-	(R1_OUT_OF_RANGE |	/* Command argument out of range */	\
-	 R1_ADDRESS_ERROR |	/* Misaligned address */		\
+#define CMD_ERRORS_EXCL_OOR						\
+	(R1_ADDRESS_ERROR |	/* Misaligned address */		\
 	 R1_BLOCK_LEN_ERROR |	/* Transferred block length incorrect */\
 	 R1_WP_VIOLATION |	/* Tried to write to protected block */	\
 	 R1_CARD_ECC_FAILED |	/* Card ECC failed */			\
 	 R1_CC_ERROR |		/* Card controller error */		\
 	 R1_ERROR)		/* General/unknown error */
 
+#define CMD_ERRORS							\
+	(CMD_ERRORS_EXCL_OOR |						\
+	 R1_OUT_OF_RANGE)	/* Command argument out of range */	\
+
 static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq)
 {
 	u32 val;
@@ -1951,6 +1954,95 @@ static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req)
 	mqrq->retries = MMC_NO_RETRIES;
 }
 
+static inline bool mmc_blk_oor_valid(struct mmc_blk_request *brq)
+{
+	return !!brq->mrq.sbc;
+}
+
+static inline u32 mmc_blk_stop_err_bits(struct mmc_blk_request *brq)
+{
+	return mmc_blk_oor_valid(brq) ? CMD_ERRORS : CMD_ERRORS_EXCL_OOR;
+}
+
+static inline bool mmc_blk_in_tran_state(u32 status)
+{
+	/*
+	 * Some cards mishandle the status bits, so make sure to check both the
+	 * busy indication and the card state.
+	 */
+	return status & R1_READY_FOR_DATA &&
+	       (R1_CURRENT_STATE(status) == R1_STATE_TRAN);
+}
+
+static unsigned int mmc_blk_clock_khz(struct mmc_host *host)
+{
+	if (host->actual_clock)
+		return host->actual_clock / 1000;
+
+	/* Clock may be subject to a divisor, fudge it by a factor of 2. */
+	if (host->ios.clock)
+		return host->ios.clock / 2000;
+
+	/* How can there be no clock */
+	WARN_ON_ONCE(1);
+	return 100; /* 100 kHz is minimum possible value */
+}
+
+static unsigned long mmc_blk_data_timeout_jiffies(struct mmc_host *host,
+						  struct mmc_data *data)
+{
+	unsigned int ms = DIV_ROUND_UP(data->timeout_ns, 1000000);
+	unsigned int khz;
+
+	if (data->timeout_clks) {
+		khz = mmc_blk_clock_khz(host);
+		ms += DIV_ROUND_UP(data->timeout_clks, khz);
+	}
+
+	return msecs_to_jiffies(ms);
+}
+
+static int mmc_blk_card_stuck(struct mmc_card *card, struct request *req,
+			      u32 *resp_errs)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_data *data = &mqrq->brq.data;
+	unsigned long timeout;
+	u32 status;
+	int err;
+
+	timeout = jiffies + mmc_blk_data_timeout_jiffies(card->host, data);
+
+	while (1) {
+		bool done = time_after(jiffies, timeout);
+
+		err = __mmc_send_status(card, &status, 5);
+		if (err) {
+			pr_err("%s: error %d requesting status\n",
+			       req->rq_disk->disk_name, err);
+			break;
+		}
+
+		/* Accumulate any response error bits seen */
+		if (resp_errs)
+			*resp_errs |= status;
+
+		if (mmc_blk_in_tran_state(status))
+			break;
+
+		/* Timeout if the device never becomes ready */
+		if (done) {
+			pr_err("%s: Card stuck in wrong state! %s %s\n",
+				mmc_hostname(card->host),
+				req->rq_disk->disk_name, __func__);
+			err = -ETIMEDOUT;
+			break;
+		}
+	}
+
+	return err;
+}
+
 static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
 {
 	int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
@@ -2097,17 +2189,26 @@ static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq)
 static int mmc_blk_card_busy(struct mmc_card *card, struct request *req)
 {
 	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
-	bool gen_err = false;
+	u32 status = 0;
 	int err;
 
 	if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ)
 		return 0;
 
-	err = card_busy_detect(card, MMC_BLK_TIMEOUT_MS, false, req, &gen_err);
+	err = mmc_blk_card_stuck(card, req, &status);
+
+	/*
+	 * Do not assume data transferred correctly if there are any error bits
+	 * set.
+	 */
+	if (!err && status & mmc_blk_stop_err_bits(&mqrq->brq)) {
+		mqrq->brq.data.bytes_xfered = 0;
+		err = -EIO;
+	}
 
-	/* Copy the general error bit so it will be seen later on */
-	if (gen_err)
-		mqrq->brq.stop.resp[0] |= R1_ERROR;
+	/* Copy the exception bit so it will be seen later on */
+	if (mmc_card_mmc(card) && status & R1_EXCEPTION_EVENT)
+		mqrq->brq.cmd.resp[0] |= R1_EXCEPTION_EVENT;
 
 	return err;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH V13 10/10] mmc: block: blk-mq: Stop using legacy recovery
  2017-11-03 13:20 [PATCH V13 00/10] mmc: Add Command Queue support Adrian Hunter
                   ` (8 preceding siblings ...)
  2017-11-03 13:20 ` [PATCH V13 09/10] mmc: block: blk-mq: Stop using card_busy_detect() Adrian Hunter
@ 2017-11-03 13:20 ` Adrian Hunter
  2017-11-08  9:38   ` Linus Walleij
       [not found] ` <CGME20171116094642epcas1p14018cb1c475efa38942109dc24cd6da9@epcas1p1.samsung.com>
  10 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-03 13:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

There are only a few things the recovery needs to do. Primarily, it just
needs to:
	Determine the number of bytes transferred
	Get the card back to transfer state
	Determine whether to retry

There are also a couple of additional features:
	Reset the card before the last retry
	Read one sector at a time

The legacy code spent much effort analyzing command errors, but commands
fail fast, so it is simpler just to give all command errors the same number
of retries.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/core/block.c | 261 ++++++++++++++++++++++++-----------------------
 1 file changed, 135 insertions(+), 126 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 5c5ff3c34313..623fa2be7077 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1480,9 +1480,11 @@ static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq)
 	}
 }
 
-static enum mmc_blk_status __mmc_blk_err_check(struct mmc_card *card,
-					       struct mmc_queue_req *mq_mrq)
+static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
+					     struct mmc_async_req *areq)
 {
+	struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
+						    areq);
 	struct mmc_blk_request *brq = &mq_mrq->brq;
 	struct request *req = mmc_queue_req_to_req(mq_mrq);
 	int need_retune = card->host->need_retune;
@@ -1588,15 +1590,6 @@ static enum mmc_blk_status __mmc_blk_err_check(struct mmc_card *card,
 	return MMC_BLK_SUCCESS;
 }
 
-static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
-					     struct mmc_async_req *areq)
-{
-	struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
-						    areq);
-
-	return __mmc_blk_err_check(card, mq_mrq);
-}
-
 static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
 			      int disable_multi, bool *do_rel_wr_p,
 			      bool *do_data_tag_p)
@@ -1922,6 +1915,7 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 }
 
 #define MMC_MAX_RETRIES		5
+#define MMC_DATA_RETRIES	2
 #define MMC_NO_RETRIES		(MMC_MAX_RETRIES + 1)
 
 /* Single sector read during recovery */
@@ -1974,6 +1968,28 @@ static inline bool mmc_blk_in_tran_state(u32 status)
 	       (R1_CURRENT_STATE(status) == R1_STATE_TRAN);
 }
 
+/*
+ * Check for errors the host controller driver might not have seen such as
+ * response mode errors or invalid card state.
+ */
+static bool mmc_blk_status_error(struct request *req, u32 status)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_blk_request *brq = &mqrq->brq;
+	u32 stop_err_bits = mmc_blk_stop_err_bits(brq);
+
+	return brq->cmd.resp[0]  & CMD_ERRORS    ||
+	       brq->stop.resp[0] & stop_err_bits ||
+	       status            & stop_err_bits ||
+	       (rq_data_dir(req) == WRITE && !mmc_blk_in_tran_state(status));
+}
+
+static inline bool mmc_blk_cmd_started(struct mmc_blk_request *brq)
+{
+	return !brq->sbc.error && !brq->cmd.error &&
+	       !(brq->cmd.resp[0] & CMD_ERRORS);
+}
+
 static unsigned int mmc_blk_clock_khz(struct mmc_host *host)
 {
 	if (host->actual_clock)
@@ -2043,6 +2059,47 @@ static int mmc_blk_card_stuck(struct mmc_card *card, struct request *req,
 	return err;
 }
 
+static int mmc_blk_send_stop(struct mmc_card *card)
+{
+	struct mmc_command cmd = {
+		.opcode = MMC_STOP_TRANSMISSION,
+		.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_AC,
+	};
+
+	return mmc_wait_for_cmd(card->host, &cmd, 5);
+}
+
+static int mmc_blk_fix_state(struct mmc_card *card, struct request *req)
+{
+	int err;
+
+	mmc_retune_hold_now(card->host);
+
+	mmc_blk_send_stop(card);
+
+	err = mmc_blk_card_stuck(card, req, NULL);
+
+	mmc_retune_release(card->host);
+
+	return err;
+}
+
+/*
+ * Requests are completed by mmc_blk_mq_complete_rq() which sets simple
+ * policy:
+ * 1. A request that has transferred at least some data is considered
+ * successful and will be requeued if there is remaining data to
+ * transfer.
+ * 2. Otherwise the number of retries is incremented and the request
+ * will be requeued if there are remaining retries.
+ * 3. Otherwise the request will be errored out.
+ * That means mmc_blk_mq_complete_rq() is controlled by bytes_xfered and
+ * mqrq->retries. So there are only 4 possible actions here:
+ *	1. do not accept the bytes_xfered value i.e. set it to zero
+ *	2. change mqrq->retries to determine the number of retries
+ *	3. try to reset the card
+ *	4. read one sector at a time
+ */
 static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
 {
 	int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
@@ -2050,131 +2107,83 @@ static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
 	struct mmc_blk_request *brq = &mqrq->brq;
 	struct mmc_blk_data *md = mq->blkdata;
 	struct mmc_card *card = mq->card;
-	static enum mmc_blk_status status;
-
-	brq->retune_retry_done = mqrq->retries;
+	u32 status;
+	u32 blocks;
+	int err;
 
-	status = __mmc_blk_err_check(card, mqrq);
+	/*
+	 * Some errors the host driver might not have seen. Set the number of
+	 * bytes transferred to zero in that case.
+	 */
+	err = __mmc_send_status(card, &status, 0);
+	if (err || mmc_blk_status_error(req, status))
+		brq->data.bytes_xfered = 0;
 
 	mmc_retune_release(card->host);
 
 	/*
-	 * Requests are completed by mmc_blk_mq_complete_rq() which sets simple
-	 * policy:
-	 * 1. A request that has transferred at least some data is considered
-	 * successful and will be requeued if there is remaining data to
-	 * transfer.
-	 * 2. Otherwise the number of retries is incremented and the request
-	 * will be requeued if there are remaining retries.
-	 * 3. Otherwise the request will be errored out.
-	 * That means mmc_blk_mq_complete_rq() is controlled by bytes_xfered and
-	 * mqrq->retries. So there are only 4 possible actions here:
-	 *	1. do not accept the bytes_xfered value i.e. set it to zero
-	 *	2. change mqrq->retries to determine the number of retries
-	 *	3. try to reset the card
-	 *	4. read one sector at a time
+	 * Try again to get the status. This also provides an opportunity for
+	 * re-tuning.
 	 */
-	switch (status) {
-	case MMC_BLK_SUCCESS:
-	case MMC_BLK_PARTIAL:
-		/* Reset success, and accept bytes_xfered */
-		mmc_blk_reset_success(md, type);
-		break;
-	case MMC_BLK_CMD_ERR:
-		/*
-		 * For SD cards, get bytes written, but do not accept
-		 * bytes_xfered if that fails. For MMC cards accept
-		 * bytes_xfered. Then try to reset. If reset fails then
-		 * error out the remaining request, otherwise retry
-		 * once (N.B mmc_blk_reset() will not succeed twice in a
-		 * row).
-		 */
-		if (mmc_card_sd(card)) {
-			u32 blocks;
-			int err;
+	if (err)
+		err = __mmc_send_status(card, &status, 0);
 
-			err = mmc_sd_num_wr_blocks(card, &blocks);
-			if (err)
-				brq->data.bytes_xfered = 0;
-			else
-				brq->data.bytes_xfered = blocks << 9;
-		}
-		if (mmc_blk_reset(md, card->host, type))
-			mqrq->retries = MMC_NO_RETRIES;
-		else
-			mqrq->retries = MMC_MAX_RETRIES - 1;
-		break;
-	case MMC_BLK_RETRY:
-		/*
-		 * Do not accept bytes_xfered, but retry up to 5 times,
-		 * otherwise same as abort.
-		 */
-		brq->data.bytes_xfered = 0;
-		if (mqrq->retries < MMC_MAX_RETRIES)
-			break;
-		/* Fall through */
-	case MMC_BLK_ABORT:
-		/*
-		 * Do not accept bytes_xfered, but try to reset. If
-		 * reset succeeds, try once more, otherwise error out
-		 * the request.
-		 */
-		brq->data.bytes_xfered = 0;
-		if (mmc_blk_reset(md, card->host, type))
-			mqrq->retries = MMC_NO_RETRIES;
-		else
-			mqrq->retries = MMC_MAX_RETRIES - 1;
-		break;
-	case MMC_BLK_DATA_ERR: {
-		int err;
+	/*
+	 * Nothing more to do after the number of bytes transferred has been
+	 * updated and there is no card.
+	 */
+	if (err && mmc_detect_card_removed(card->host))
+		return;
 
-		/*
-		 * Do not accept bytes_xfered, but try to reset. If
-		 * reset succeeds, try once more. If reset fails with
-		 * ENODEV which means the partition is wrong, then error
-		 * out the request. Otherwise attempt to read one sector
-		 * at a time.
-		 */
-		brq->data.bytes_xfered = 0;
-		err = mmc_blk_reset(md, card->host, type);
-		if (!err) {
-			mqrq->retries = MMC_MAX_RETRIES - 1;
-			break;
-		}
-		if (err == -ENODEV) {
-			mqrq->retries = MMC_NO_RETRIES;
-			break;
-		}
-		/* Fall through */
+	/* Try to get back to "tran" state */
+	if (err || !mmc_blk_in_tran_state(status))
+		err = mmc_blk_fix_state(mq->card, req);
+
+	/*
+	 * Special case for SD cards where the card might record the number of
+	 * blocks written.
+	 */
+	if (!err && mmc_blk_cmd_started(brq) && mmc_card_sd(card) &&
+	    rq_data_dir(req) == WRITE) {
+		if (mmc_sd_num_wr_blocks(card, &blocks))
+			brq->data.bytes_xfered = 0;
+		else
+			brq->data.bytes_xfered = blocks << 9;
 	}
-	case MMC_BLK_ECC_ERR:
-		/*
-		 * Do not accept bytes_xfered. If reading more than one
-		 * sector, try reading one sector at a time.
-		 */
-		brq->data.bytes_xfered = 0;
-		/* FIXME: Missing single sector read for large sector size */
-		if (brq->data.blocks > 1 && !mmc_large_sector(card)) {
-			/* Redo read one sector at a time */
-			pr_warn("%s: retrying using single block read\n",
-				req->rq_disk->disk_name);
-			mmc_blk_ss_read(mq, req);
-		} else {
-			mqrq->retries = MMC_NO_RETRIES;
-		}
-		break;
-	case MMC_BLK_NOMEDIUM:
-		/* Do not accept bytes_xfered. Error out the request */
-		brq->data.bytes_xfered = 0;
-		mqrq->retries = MMC_NO_RETRIES;
-		break;
-	default:
-		/* Do not accept bytes_xfered. Error out the request */
-		brq->data.bytes_xfered = 0;
+
+	/* Reset if the card is in a bad state */
+	if (err && mmc_blk_reset(md, card->host, type)) {
+		pr_err("%s: recovery failed!\n", req->rq_disk->disk_name);
 		mqrq->retries = MMC_NO_RETRIES;
-		pr_err("%s: Unhandled return value (%d)",
-		       req->rq_disk->disk_name, status);
-		break;
+		return;
+	}
+
+	/*
+	 * If anything was done, just return and if there is anything remaining
+	 * on the request it will get requeued.
+	 */
+	if (brq->data.bytes_xfered)
+		return;
+
+	/* Reset before last retry */
+	if (mqrq->retries + 1 == MMC_MAX_RETRIES)
+		mmc_blk_reset(md, card->host, type);
+
+	/* Command errors fail fast, so use all MMC_MAX_RETRIES */
+	if (brq->sbc.error || brq->cmd.error)
+		return;
+
+	/* Reduce the remaining retries for data errors */
+	if (mqrq->retries < MMC_MAX_RETRIES - MMC_DATA_RETRIES) {
+		mqrq->retries = MMC_MAX_RETRIES - MMC_DATA_RETRIES;
+		return;
+	}
+
+	/* FIXME: Missing single sector read for large sector size */
+	if (rq_data_dir(req) == READ && !mmc_large_sector(card)) {
+		/* Read one sector at a time */
+		mmc_blk_ss_read(mq, req);
+		return;
 	}
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 01/10] mmc: core: Add parameter use_blk_mq
  2017-11-03 13:20 ` [PATCH V13 01/10] mmc: core: Add parameter use_blk_mq Adrian Hunter
@ 2017-11-06  8:38   ` Linus Walleij
  0 siblings, 0 replies; 55+ messages in thread
From: Linus Walleij @ 2017-11-06  8:38 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

> Until mmc has blk-mq support fully implemented and tested, add a
> parameter use_blk_mq, default to false unless config option MMC_MQ_DEFAULT
> is selected.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
(...)

> +config MMC_MQ_DEFAULT
> +       bool "MMC: use blk-mq I/O path by default"
> +       depends on MMC && BLOCK

default y

The reason being: I want it to be tested, I want it to be deployed.
We should rather use it as a fallback option if something turns out
to be problematic with MQ which, at this point, is highly unlikely.

> +++ b/include/linux/mmc/host.h
> @@ -380,6 +380,7 @@ struct mmc_host {
>         unsigned int            doing_retune:1; /* re-tuning in progress */
>         unsigned int            retune_now:1;   /* do re-tuning at next req */
>         unsigned int            retune_paused:1; /* re-tuning is temporarily disabled */
> +       unsigned int            use_blk_mq:1;   /* use blk-mq */

unsigned int foo:1 is really a weird way of saying "bool".

We should just fix it I guess, but it's another patch.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 02/10] mmc: block: Add error-handling comments
  2017-11-03 13:20 ` [PATCH V13 02/10] mmc: block: Add error-handling comments Adrian Hunter
@ 2017-11-06  8:39   ` Linus Walleij
  0 siblings, 0 replies; 55+ messages in thread
From: Linus Walleij @ 2017-11-06  8:39 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

> Add error-handling comments to explain what would also be done for blk-mq
> if it used the legacy error-handling.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

Safe to apply right now, during -rc8 as well.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 03/10] mmc: block: Add blk-mq support
  2017-11-03 13:20 ` [PATCH V13 03/10] mmc: block: Add blk-mq support Adrian Hunter
@ 2017-11-08  8:54   ` Linus Walleij
  2017-11-09 10:42     ` Adrian Hunter
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Walleij @ 2017-11-08  8:54 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

> Define and use a blk-mq queue. Discards and flushes are processed
> synchronously, but reads and writes asynchronously. In order to support
> slow DMA unmapping, DMA unmapping is not done until after the next request
> is started. That means the request is not completed until then. If there is
> no next request then the completion is done by queued work.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

> -       blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
> +       if (req->mq_ctx)
> +               blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
> +       else
> +               blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);

I think this quite obvious code duplication is unfortunate.

What my patches do is get rid of the old block layer in order
to be able to focus on the new stuff using just MQ.

One reason is that the code is hairy already as it is, by just
supporting MQ the above is still just one line of code, the same
goes for the other instances below.

At least you could do what I did and break out a helper like
this:

/*
 * This reports status back to the block layer for a finished request.
 */
static void mmc_blk_complete(struct mmc_queue_req *mq_rq,
                            blk_status_t status)
{
       struct request *req = mmc_queue_req_to_req(mq_rq);

       if (req->mq_ctx) {
          blk_mq_end_request(req, status);
       } else
          blk_end_request_all(req, status);
}

> +/* Single sector read during recovery */
> +static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req)
> +{
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       blk_status_t status;
> +
> +       while (1) {
> +               mmc_blk_rw_rq_prep(mqrq, mq->card, 1, mq);
> +
> +               mmc_wait_for_req(mq->card->host, &mqrq->brq.mrq);
> +
> +               /*
> +                * Not expecting command errors, so just give up in that case.
> +                * If there are retries remaining, the request will get
> +                * requeued.
> +                */
> +               if (mqrq->brq.cmd.error)
> +                       return;
> +
> +               if (blk_rq_bytes(req) <= 512)
> +                       break;
> +
> +               status = mqrq->brq.data.error ? BLK_STS_IOERR : BLK_STS_OK;
> +
> +               blk_update_request(req, status, 512);
> +       }
> +
> +       mqrq->retries = MMC_NO_RETRIES;
> +}
> +
> +static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
> +{
> +       int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       struct mmc_blk_request *brq = &mqrq->brq;
> +       struct mmc_blk_data *md = mq->blkdata;
> +       struct mmc_card *card = mq->card;
> +       static enum mmc_blk_status status;
> +
> +       brq->retune_retry_done = mqrq->retries;
> +
> +       status = __mmc_blk_err_check(card, mqrq);
> +
> +       mmc_retune_release(card->host);
> +
> +       /*
> +        * Requests are completed by mmc_blk_mq_complete_rq() which sets simple
> +        * policy:
> +        * 1. A request that has transferred at least some data is considered
> +        * successful and will be requeued if there is remaining data to
> +        * transfer.
> +        * 2. Otherwise the number of retries is incremented and the request
> +        * will be requeued if there are remaining retries.
> +        * 3. Otherwise the request will be errored out.
> +        * That means mmc_blk_mq_complete_rq() is controlled by bytes_xfered and
> +        * mqrq->retries. So there are only 4 possible actions here:
> +        *      1. do not accept the bytes_xfered value i.e. set it to zero
> +        *      2. change mqrq->retries to determine the number of retries
> +        *      3. try to reset the card
> +        *      4. read one sector at a time
> +        */
> +       switch (status) {
> +       case MMC_BLK_SUCCESS:
> +       case MMC_BLK_PARTIAL:
> +               /* Reset success, and accept bytes_xfered */
> +               mmc_blk_reset_success(md, type);
> +               break;
> +       case MMC_BLK_CMD_ERR:
> +               /*
> +                * For SD cards, get bytes written, but do not accept
> +                * bytes_xfered if that fails. For MMC cards accept
> +                * bytes_xfered. Then try to reset. If reset fails then
> +                * error out the remaining request, otherwise retry
> +                * once (N.B mmc_blk_reset() will not succeed twice in a
> +                * row).
> +                */
> +               if (mmc_card_sd(card)) {
> +                       u32 blocks;
> +                       int err;
> +
> +                       err = mmc_sd_num_wr_blocks(card, &blocks);
> +                       if (err)
> +                               brq->data.bytes_xfered = 0;
> +                       else
> +                               brq->data.bytes_xfered = blocks << 9;
> +               }
> +               if (mmc_blk_reset(md, card->host, type))
> +                       mqrq->retries = MMC_NO_RETRIES;
> +               else
> +                       mqrq->retries = MMC_MAX_RETRIES - 1;
> +               break;
> +       case MMC_BLK_RETRY:
> +               /*
> +                * Do not accept bytes_xfered, but retry up to 5 times,
> +                * otherwise same as abort.
> +                */
> +               brq->data.bytes_xfered = 0;
> +               if (mqrq->retries < MMC_MAX_RETRIES)
> +                       break;
> +               /* Fall through */
> +       case MMC_BLK_ABORT:
> +               /*
> +                * Do not accept bytes_xfered, but try to reset. If
> +                * reset succeeds, try once more, otherwise error out
> +                * the request.
> +                */
> +               brq->data.bytes_xfered = 0;
> +               if (mmc_blk_reset(md, card->host, type))
> +                       mqrq->retries = MMC_NO_RETRIES;
> +               else
> +                       mqrq->retries = MMC_MAX_RETRIES - 1;
> +               break;
> +       case MMC_BLK_DATA_ERR: {
> +               int err;
> +
> +               /*
> +                * Do not accept bytes_xfered, but try to reset. If
> +                * reset succeeds, try once more. If reset fails with
> +                * ENODEV which means the partition is wrong, then error
> +                * out the request. Otherwise attempt to read one sector
> +                * at a time.
> +                */
> +               brq->data.bytes_xfered = 0;
> +               err = mmc_blk_reset(md, card->host, type);
> +               if (!err) {
> +                       mqrq->retries = MMC_MAX_RETRIES - 1;
> +                       break;
> +               }
> +               if (err == -ENODEV) {
> +                       mqrq->retries = MMC_NO_RETRIES;
> +                       break;
> +               }
> +               /* Fall through */
> +       }
> +       case MMC_BLK_ECC_ERR:
> +               /*
> +                * Do not accept bytes_xfered. If reading more than one
> +                * sector, try reading one sector at a time.
> +                */
> +               brq->data.bytes_xfered = 0;
> +               /* FIXME: Missing single sector read for large sector size */
> +               if (brq->data.blocks > 1 && !mmc_large_sector(card)) {
> +                       /* Redo read one sector at a time */
> +                       pr_warn("%s: retrying using single block read\n",
> +                               req->rq_disk->disk_name);
> +                       mmc_blk_ss_read(mq, req);
> +               } else {
> +                       mqrq->retries = MMC_NO_RETRIES;
> +               }
> +               break;
> +       case MMC_BLK_NOMEDIUM:
> +               /* Do not accept bytes_xfered. Error out the request */
> +               brq->data.bytes_xfered = 0;
> +               mqrq->retries = MMC_NO_RETRIES;
> +               break;
> +       default:
> +               /* Do not accept bytes_xfered. Error out the request */
> +               brq->data.bytes_xfered = 0;
> +               mqrq->retries = MMC_NO_RETRIES;
> +               pr_err("%s: Unhandled return value (%d)",
> +                      req->rq_disk->disk_name, status);
> +               break;
> +       }
> +}
> +
> +static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req)
> +{
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       unsigned int nr_bytes = mqrq->brq.data.bytes_xfered;
> +
> +       if (nr_bytes) {
> +               if (blk_update_request(req, BLK_STS_OK, nr_bytes))
> +                       blk_mq_requeue_request(req, true);
> +               else
> +                       __blk_mq_end_request(req, BLK_STS_OK);
> +       } else if (mqrq->retries++ < MMC_MAX_RETRIES) {
> +               blk_mq_requeue_request(req, true);
> +       } else {
> +               if (mmc_card_removed(mq->card))
> +                       req->rq_flags |= RQF_QUIET;
> +               blk_mq_end_request(req, BLK_STS_IOERR);
> +       }
> +}

This retry and error handling using requeue is very elegant.
I really like this.

If we could also go for MQ-only, only this nice code
remains in the tree.

The problem: you have just reimplemented the whole error
handling we had in the old block layer and now we have to
maintain two copies and keep them in sync.

This is not OK IMO, we will inevitable screw it up, so we
need to get *one* error path.

> +static bool mmc_blk_urgent_bkops_needed(struct mmc_queue *mq,
> +                                       struct mmc_queue_req *mqrq)
> +{
> +       return mmc_card_mmc(mq->card) &&
> +              (mqrq->brq.cmd.resp[0] & R1_EXCEPTION_EVENT ||
> +               mqrq->brq.stop.resp[0] & R1_EXCEPTION_EVENT);
> +}
> +
> +static void mmc_blk_urgent_bkops(struct mmc_queue *mq,
> +                                struct mmc_queue_req *mqrq)
> +{
> +       if (mmc_blk_urgent_bkops_needed(mq, mqrq))
> +               mmc_start_bkops(mq->card, true);
> +}
> +
> +void mmc_blk_mq_complete(struct request *req)
> +{
> +       struct mmc_queue *mq = req->q->queuedata;
> +
> +       mmc_blk_mq_complete_rq(mq, req);
> +}

So this is called from the struct blk_mq_ops .complete()
callback. And this calls blk_mq_end_request().

So the semantic order needs to be complete -> end.

I see this pattern in newer MQ code, I got it wrong in
my patch set so I try to fix it up.

> +static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
> +                                      struct request *req)
> +{
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +
> +       mmc_blk_rw_recovery(mq, req);
> +
> +       mmc_blk_urgent_bkops(mq, mqrq);
> +}

This looks nice.

> +static void mmc_blk_mq_acct_req_done(struct mmc_queue *mq, struct request *req)

What does "acct" mean in the above function name?
Accounting? Actual? I'm lost.

> +{
> +       struct request_queue *q = req->q;
> +       unsigned long flags;
> +       bool put_card;
> +
> +       spin_lock_irqsave(q->queue_lock, flags);
> +
> +       mq->in_flight[mmc_issue_type(mq, req)] -= 1;
> +
> +       put_card = mmc_tot_in_flight(mq) == 0;

This in_flight[] business seems a bit kludgy, but I
don't really understand it fully. Magic numbers like
-1 to mark that something is not going on etc, not
super-elegant.

I believe it is necessary for CQE though as you need
to keep track of outstanding requests?

> +
> +       spin_unlock_irqrestore(q->queue_lock, flags);
> +
> +       if (put_card)
> +               mmc_put_card(mq->card, &mq->ctx);

I think you should try not to sprinkle mmc_put_card() inside
the different functions but instead you can put this in the
.complete callback I guess mmc_blk_mq_complete() in your
patch set.

Also you do not need to avoid calling it several times with
that put_card variable. It's fully reentrant thanks to your
own code in the lock and all calls come from the same block
layer process if you call it in .complete() I think?

> +static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req)
> +{
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       struct mmc_request *mrq = &mqrq->brq.mrq;
> +       struct mmc_host *host = mq->card->host;
> +
> +       if (host->ops->post_req)
> +               host->ops->post_req(host, mrq, 0);
> +
> +       blk_mq_complete_request(req);
> +
> +       mmc_blk_mq_acct_req_done(mq, req);
> +}

Now the problem Ulf has pointed out starts to creep out in the
patch: a lot of code duplication on the MQ path compared to
the ordinary block layer path.

My approach was structured partly to avoid this: first refactor
the old path, then switch to (only) MQ to avoid code duplication.

> +static void mmc_blk_mq_complete_prev_req(struct mmc_queue *mq,
> +                                        struct request **prev_req)
> +{
> +       mutex_lock(&mq->complete_lock);
> +
> +       if (!mq->complete_req)
> +               goto out_unlock;
> +
> +       mmc_blk_mq_poll_completion(mq, mq->complete_req);
> +
> +       if (prev_req)
> +               *prev_req = mq->complete_req;
> +       else
> +               mmc_blk_mq_post_req(mq, mq->complete_req);
> +
> +       mq->complete_req = NULL;
> +
> +out_unlock:
> +       mutex_unlock(&mq->complete_lock);
> +}

This looks a bit like it is reimplementing the kernel
completion abstraction using a mutex and a variable named
.complete_req?

We were using a completion in the old block layer so
why did you not use it for MQ?

> +static void mmc_blk_mq_req_done(struct mmc_request *mrq)
> +{
> +       struct mmc_queue_req *mqrq = container_of(mrq, struct mmc_queue_req,
> +                                                 brq.mrq);
> +       struct request *req = mmc_queue_req_to_req(mqrq);
> +       struct request_queue *q = req->q;
> +       struct mmc_queue *mq = q->queuedata;
> +       unsigned long flags;
> +       bool waiting;
> +
> +       spin_lock_irqsave(q->queue_lock, flags);
> +       mq->complete_req = req;
> +       mq->rw_wait = false;
> +       waiting = mq->waiting;
> +       spin_unlock_irqrestore(q->queue_lock, flags);
> +
> +       if (waiting)
> +               wake_up(&mq->wait);

I would contest using a waitqueue for this. The name even says
"complete_req" so why is a completion not the right thing to
hang on rather than a waitqueue?

The completion already contains a waitqueue, so I think you're
just essentially reimplementing it.

Just complete(&mq->mq_req_complete) or something should do
the trick.

> +       else
> +               kblockd_schedule_work(&mq->complete_work);

I did not use the kblockd workqueue for this, out of fear
that it would interfere and disturb the block layer work items.
My intuitive idea was that the MMC layer needed its own
worker (like in the past it used a thread) in order to avoid
congestion in the block layer queue leading to unnecessary
delays.

On the other hand, this likely avoids a context switch if there
is no congestion on the queue.

I am uncertain when it is advisible to use the block layer
queue for subsystems like MMC/SD.

Would be nice to see some direction from the block layer
folks here, it is indeed exposed to us...

My performance tests show no problems with this approach
though.

> +static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
> +{
> +       struct request_queue *q = mq->queue;
> +       unsigned long flags;
> +       bool done;
> +
> +       spin_lock_irqsave(q->queue_lock, flags);
> +       done = !mq->rw_wait;
> +       mq->waiting = !done;
> +       spin_unlock_irqrestore(q->queue_lock, flags);

This makes it look like a reimplementation of completion_done()
so I think you should use the completion abstraction again. The
struct completion even contains a variable named "done".

> +static int mmc_blk_mq_issue_rw_rq(struct mmc_queue *mq,
> +                                 struct request *req)
> +{
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       struct mmc_host *host = mq->card->host;
> +       struct request *prev_req = NULL;
> +       int err = 0;
> +
> +       mmc_blk_rw_rq_prep(mqrq, mq->card, 0, mq);
> +
> +       mqrq->brq.mrq.done = mmc_blk_mq_req_done;
> +
> +       if (host->ops->pre_req)
> +               host->ops->pre_req(host, &mqrq->brq.mrq);
> +
> +       err = mmc_blk_rw_wait(mq, &prev_req);
> +       if (err)
> +               goto out_post_req;
> +
> +       mq->rw_wait = true;
> +
> +       err = mmc_start_request(host, &mqrq->brq.mrq);
> +
> +       if (prev_req)
> +               mmc_blk_mq_post_req(mq, prev_req);
> +
> +       if (err)
> +               mq->rw_wait = false;
> +
> +out_post_req:
> +       if (err && host->ops->post_req)
> +               host->ops->post_req(host, &mqrq->brq.mrq, err);
> +
> +       return err;
> +}

This is pretty straight-forward (pending the comments above).
Again it has the downside of duplicating the same code for the
old block layer instead of refactoring.

> +enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
> +{
> +       struct mmc_blk_data *md = mq->blkdata;
> +       struct mmc_card *card = md->queue.card;
> +       struct mmc_host *host = card->host;
> +       int ret;
> +
> +       ret = mmc_blk_part_switch(card, md->part_type);
> +       if (ret)
> +               return MMC_REQ_FAILED_TO_START;
> +
> +       switch (mmc_issue_type(mq, req)) {
> +       case MMC_ISSUE_SYNC:
> +               ret = mmc_blk_wait_for_idle(mq, host);
> +               if (ret)
> +                       return MMC_REQ_BUSY;
> +               switch (req_op(req)) {
> +               case REQ_OP_DRV_IN:
> +               case REQ_OP_DRV_OUT:
> +                       mmc_blk_issue_drv_op(mq, req);
> +                       break;
> +               case REQ_OP_DISCARD:
> +                       mmc_blk_issue_discard_rq(mq, req);
> +                       break;
> +               case REQ_OP_SECURE_ERASE:
> +                       mmc_blk_issue_secdiscard_rq(mq, req);
> +                       break;
> +               case REQ_OP_FLUSH:
> +                       mmc_blk_issue_flush(mq, req);
> +                       break;
> +               default:
> +                       WARN_ON_ONCE(1);
> +                       return MMC_REQ_FAILED_TO_START;
> +               }
> +               return MMC_REQ_FINISHED;
> +       case MMC_ISSUE_ASYNC:
> +               switch (req_op(req)) {
> +               case REQ_OP_READ:
> +               case REQ_OP_WRITE:
> +                       ret = mmc_blk_mq_issue_rw_rq(mq, req);
> +                       break;
> +               default:
> +                       WARN_ON_ONCE(1);
> +                       ret = -EINVAL;
> +               }
> +               if (!ret)
> +                       return MMC_REQ_STARTED;
> +               return ret == -EBUSY ? MMC_REQ_BUSY : MMC_REQ_FAILED_TO_START;
> +       default:
> +               WARN_ON_ONCE(1);
> +               return MMC_REQ_FAILED_TO_START;
> +       }
> +}

Again looks fine, again duplicates code. In this case I don't even
see why the MQ code needs its own copy of the issue funtion.

> +enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req)
> +{
> +       if (req_op(req) == REQ_OP_READ || req_op(req) == REQ_OP_WRITE)
> +               return MMC_ISSUE_ASYNC;
> +
> +       return MMC_ISSUE_SYNC;
> +}

Distinguishing between SYNC and ASYNC operations and using
that as abstraction is nice.

But you only do this in the new MQ code.

Instead, make this a separate patch and first refactor the old
code to use this distinction between SYNC and ASYNC.

Unfortunately I think Ulf's earlier criticism that you're rewriting
the world instead of refactoring what we have still stands on many
accounts here.

It makes it even harder to understand your persistance in keeping
the old block layer around. If you're introducing new concepts and
cleaner code in the MQ path and kind of discarding the old
block layer path, why keep it around at all?

I would have a much easier time accepting this patch if it
deleted as much as it was adding, i.e. introduce all this new
nice MQ code, but also tossing out the old block layer and error
handling code. Even if it is a massive rewrite, at least there
is just one body of code to maintain going forward.

That said, I would strongly prefer a refactoring of the old block
layer leading up to transitioning to MQ. But I am indeed biased
since I took that approach myself.

> +static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
> +                                                bool reserved)
> +{
> +       return BLK_EH_RESET_TIMER;
> +}

This timeout looks like something I need to pick up in my patch
set as well. It seems good for stability to support this. But what happened
here? Did you experience a bunch of timeouts during development,
or let's say how was this engineered, I guess it is for the case when
something randomly locks up for a long time and we don't really know
what has happened, like a watchdog?

> +static int mmc_init_request(struct request_queue *q, struct request *req,
> +                           gfp_t gfp)
> +{
> +       return __mmc_init_request(q->queuedata, req, gfp);
> +}
> +
(...)
> +static int mmc_mq_init_request(struct blk_mq_tag_set *set, struct request *req,
> +                              unsigned int hctx_idx, unsigned int numa_node)
> +{
> +       return __mmc_init_request(set->driver_data, req, GFP_KERNEL);
> +}
> +
> +static void mmc_mq_exit_request(struct blk_mq_tag_set *set, struct request *req,
> +                               unsigned int hctx_idx)
> +{
> +       struct mmc_queue *mq = set->driver_data;
> +
> +       mmc_exit_request(mq->queue, req);
> +}

Here is more code duplication just to keep both the old block layer
and MQ around. Including introducing another inner __foo function
which I have something strongly against personally (I might be
crazily picky, because I see many people do this).

> +static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
> +                                   const struct blk_mq_queue_data *bd)
> +{
> +       struct request *req = bd->rq;
> +       struct request_queue *q = req->q;
> +       struct mmc_queue *mq = q->queuedata;
> +       struct mmc_card *card = mq->card;
> +       enum mmc_issue_type issue_type;
> +       enum mmc_issued issued;
> +       bool get_card;
> +       int ret;
> +
> +       if (mmc_card_removed(mq->card)) {
> +               req->rq_flags |= RQF_QUIET;
> +               return BLK_STS_IOERR;
> +       }
> +
> +       issue_type = mmc_issue_type(mq, req);
> +
> +       spin_lock_irq(q->queue_lock);
> +
> +       switch (issue_type) {
> +       case MMC_ISSUE_ASYNC:
> +               break;
> +       default:
> +               /*
> +                * Timeouts are handled by mmc core, so set a large value to
> +                * avoid races.
> +                */
> +               req->timeout = 600 * HZ;
> +               break;
> +       }

These timeouts again, does this mean we have competing timeout
code in the block layer and MMC?

This mentions timeouts in the MMC core, but they are actually
coming from the *MMC* core, when below you set:
blk_queue_rq_timeout(mq->queue, 60 * HZ);?

Isn't the actual case that the per-queue timeout is set up to
occur before the per-request timeout, and that you are hacking
around the block layer core having two different timeouts?

It's a bit confusing so I'd really like to know what's going on...

> +       mq->in_flight[issue_type] += 1;
> +       get_card = mmc_tot_in_flight(mq) == 1;

Parenthesis around the logical expression preferred I guess
get_card = (mmc_tot_in_flight(mq) == 1);
(Isn't checkpatch complaining about this?)

Then:
(...)
> +       if (get_card)
> +               mmc_get_card(card, &mq->ctx);

I simply took the card on every request. Since the context is the
same for all block layer business and the lock is now fully
reentrant this if (get_card) is not necessary. Just take it for
every request and release it in the .complete() callback.

> +#define MMC_QUEUE_DEPTH 64
> +
> +static int mmc_mq_init(struct mmc_queue *mq, struct mmc_card *card,
> +                        spinlock_t *lock)
> +{
> +       int q_depth;
> +       int ret;
> +
> +       q_depth = MMC_QUEUE_DEPTH;
> +
> +       ret = mmc_mq_init_queue(mq, q_depth, &mmc_mq_ops, lock);

Apart from using a define, then assigning the define to a
variable and then passing that variable instead of just
passing the define: why 64? Is that the depth of the CQE
queue? In that case we need an if (cqe) and set it down
to 2 for non-CQE.

> +       if (ret)
> +               return ret;
> +
> +       blk_queue_rq_timeout(mq->queue, 60 * HZ);

And requests timeout after 1 minute I take it.

I suspect both of these have some relation to CQE, so that is where
you find these long execution times etc?

> +static void mmc_mq_queue_suspend(struct mmc_queue *mq)
> +{
> +       blk_mq_quiesce_queue(mq->queue);
> +
> +       /*
> +        * The host remains claimed while there are outstanding requests, so
> +        * simply claiming and releasing here ensures there are none.
> +        */
> +       mmc_claim_host(mq->card->host);
> +       mmc_release_host(mq->card->host);

I think just blk_mq_quiesce_queue() should be fine as and
should make sure all requests have called .complete() and there
I think you should also release the host lock.

If the MQ code is not doing this, we need to fix MQ to
do the right thing (or add a new callback such as
blk_mq_make_sure_queue_empty()) so at the very
least put a big fat FIXME or REVISIT comment on the above.

> +static void mmc_mq_queue_resume(struct mmc_queue *mq)
> +{
> +       blk_mq_unquiesce_queue(mq->queue);
> +}
> +
> +static void __mmc_queue_suspend(struct mmc_queue *mq)
> +{
> +       struct request_queue *q = mq->queue;
> +       unsigned long flags;
> +
> +       if (!mq->suspended) {
> +               mq->suspended |= true;
> +
> +               spin_lock_irqsave(q->queue_lock, flags);
> +               blk_stop_queue(q);
> +               spin_unlock_irqrestore(q->queue_lock, flags);
> +
> +               down(&mq->thread_sem);
> +       }
> +}
> +
> +static void __mmc_queue_resume(struct mmc_queue *mq)
> +{
> +       struct request_queue *q = mq->queue;
> +       unsigned long flags;
> +
> +       if (mq->suspended) {
> +               mq->suspended = false;
> +
> +               up(&mq->thread_sem);
> +
> +               spin_lock_irqsave(q->queue_lock, flags);
> +               blk_start_queue(q);
> +               spin_unlock_irqrestore(q->queue_lock, flags);
> +       }
> +}

One of the good reasons to delete the old block layer is to get
rid of this horrible semaphore construction. So I see it as necessary
to be able to focus development efforts on code that actually has
a future.

> +       if (q->mq_ops)
> +               mmc_mq_queue_suspend(mq);
> +       else
> +               __mmc_queue_suspend(mq);

And then there is the code duplication again.

>         int                     qcnt;
> +
> +       int                     in_flight[MMC_ISSUE_MAX];

So this is a [2] containing a counter for the number of
synchronous and asynchronous requests in flight at any
time.

But are there really synchronous and asynchronous requests
going on at the same time?

Maybe on the error path I guess.

I avoided this completely but I guess it may be necessary with
CQE, such that in_flight[0,1] is way more than 1 or 2 at times
when there are commands queued?

> +       bool                    rw_wait;
> +       bool                    waiting;
> +       wait_queue_head_t       wait;

As mentioned I think this is a reimplementation of
the completion abstraction.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 04/10] mmc: block: Add CQE support
  2017-11-03 13:20 ` [PATCH V13 04/10] mmc: block: Add CQE support Adrian Hunter
@ 2017-11-08  9:00   ` Linus Walleij
  2017-11-08 13:20     ` Adrian Hunter
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Walleij @ 2017-11-08  9:00 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

> @@ -2188,11 +2327,18 @@ enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
>                         return MMC_REQ_FAILED_TO_START;
>                 }
>                 return MMC_REQ_FINISHED;
> +       case MMC_ISSUE_DCMD:
>         case MMC_ISSUE_ASYNC:
>                 switch (req_op(req)) {
> +               case REQ_OP_FLUSH:
> +                       ret = mmc_blk_cqe_issue_flush(mq, req);
> +                       break;
>                 case REQ_OP_READ:
>                 case REQ_OP_WRITE:
> -                       ret = mmc_blk_mq_issue_rw_rq(mq, req);
> +                       if (mq->use_cqe)
> +                               ret = mmc_blk_cqe_issue_rw_rq(mq, req);
> +                       else
> +                               ret = mmc_blk_mq_issue_rw_rq(mq, req);
>                         break;
>                 default:
>                         WARN_ON_ONCE(1);

This and other bits gives me the feeling CQE is now actually ONLY
working on the MQ path.

That is good. We only add new functionality on the MQ path,
yay!

But this fact (only abailable iff MQ==true) should at least be
mentioned in the commit message I think?

So why not ditch the old block layer or at least make MQ default?

When you keep it like this people have to reconfigure
their kernel to enable MQ before they see the benefits of MQ+CQE
combined, I think that should rather be the default experience.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host
  2017-11-03 13:20 ` [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host Adrian Hunter
@ 2017-11-08  9:22   ` Linus Walleij
  2017-11-08 14:14     ` Adrian Hunter
  2017-11-09 13:41   ` Ulf Hansson
  1 sibling, 1 reply; 55+ messages in thread
From: Linus Walleij @ 2017-11-08  9:22 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

> From: Venkat Gopalakrishnan <venkatg@codeaurora.org>
>
> This patch adds CMDQ support for command-queue compatible
> hosts.
>
> Command queue is added in eMMC-5.1 specification. This
> enables the controller to process upto 32 requests at
> a time.
>
> Adrian Hunter contributed renaming to cqhci, recovery, suspend
> and resume, cqhci_off, cqhci_wait_for_idle, and external timeout
> handling.
>
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
> Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org>
> Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
> Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
> Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

(...)
> +config MMC_CQHCI
> +       tristate "Command Queue Host Controller Interface support"
> +       depends on HAS_DMA

If the CQE implementation depends on MQ, should this not
also select MMC_MQ_DEFAULT?

And if the CQE implementation depens on MQ, isn't it dangerous
to even have a boot option to disable it?

Again I push the point that things are easier to keep in line
if we just support one block request path (MQ), sorry for
hammering.

> +int cqhci_suspend(struct mmc_host *mmc)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +
> +       if (cq_host->enabled)
> +               __cqhci_disable(cq_host);
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL(cqhci_suspend);
> +
> +int cqhci_resume(struct mmc_host *mmc)
> +{
> +       /* Re-enable is done upon first request */
> +       return 0;
> +}
> +EXPORT_SYMBOL(cqhci_resume);

Why would the CQE case require special suspend/resume
functionality?

This seems two much like on the side CQE-silo engineering,
just use the device .[runtime]_suspend/resume callbacks like
everyone else, make it possible for the host to figure out
if it is in CQE mode or not (I guess it should know already
since cqhci .enable() has been called?) and handle it
from there.

> +struct cqhci_host_ops {
> +       void (*dumpregs)(struct mmc_host *mmc);
> +       void (*write_l)(struct cqhci_host *host, u32 val, int reg);
> +       u32 (*read_l)(struct cqhci_host *host, int reg);
> +       void (*enable)(struct mmc_host *mmc);
> +       void (*disable)(struct mmc_host *mmc, bool recovery);
> +};
> +
> +static inline void cqhci_writel(struct cqhci_host *host, u32 val, int reg)
> +{
> +       if (unlikely(host->ops->write_l))
> +               host->ops->write_l(host, val, reg);
> +       else
> +               writel_relaxed(val, host->mmio + reg);
> +}

Why would CQE hosts need special accessors and the rest
of the host not need it?

This seems to be a bit hackish "just for CQE" approach,
abstract callbacks like ->dumpregs(), ->write_l() and
->read_l() seems to be something that should be in the
generic struct mmc_host_ops and not tied to CQE, if
needed at all.

->enable and ->disable() for just CQE seem reasonable.
But that leaves just two new ops.

So why not just put .cqe_enable() and .cqe_disable()
ops into mmc_host_ops as optional and be done with it?

> +irqreturn_t cqhci_irq(struct mmc_host *mmc, u32 intmask, int cmd_error,
> +                     int data_error);
> +int cqhci_init(struct cqhci_host *cq_host, struct mmc_host *mmc, bool dma64);
> +struct cqhci_host *cqhci_pltfm_init(struct platform_device *pdev);
> +int cqhci_suspend(struct mmc_host *mmc);
> +int cqhci_resume(struct mmc_host *mmc);

This seems overall too much bolted on the side.

I think the above approach to put any CQE-specific callbacks
directly into the struct mmc_host_ops is way more viable.

If special CQE init is needed, why a special cqhci_init()
call? And cqhci_pltfm_init()? It's confusing. Can't
you just call this by default from the core if the host is
CQE capable? Ass a .cqhci_init() callback into mmc_host_ops
if need be.

cqhci_irq() seems necessary though, I see something like this is
probably necessary.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK
  2017-11-03 13:20 ` [PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK Adrian Hunter
@ 2017-11-08  9:24   ` Linus Walleij
  2017-11-09  7:12     ` Adrian Hunter
  2017-11-09 13:37   ` Ulf Hansson
  1 sibling, 1 reply; 55+ messages in thread
From: Linus Walleij @ 2017-11-08  9:24 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

> Add CQHCI initialization and implement CQHCI operations for Intel GLK.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

This patch seems OK in context, but it merely illustrates the
weirdness of .[runtime]_suspend/resume calling into CQE-specific
APIs rather than using generic host callbacks.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion
  2017-11-03 13:20 ` [PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion Adrian Hunter
@ 2017-11-08  9:28   ` Linus Walleij
  2017-11-09  7:27     ` Adrian Hunter
  2017-11-09 13:07   ` Ulf Hansson
  1 sibling, 1 reply; 55+ messages in thread
From: Linus Walleij @ 2017-11-08  9:28 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

> For blk-mq, add support for completing requests directly in the ->done
> callback. That means that error handling and urgent background operations
> must be handled by recovery_work in that case.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

I tried enabling this on my MMC host (mmci) but I got weird
DMA error messages when I did.

I guess this has not been tested on a non-DMA-coherent
system?

I think I might be seeing this because the .pre and .post
callbacks need to be strictly sequenced, and this is
maybe not taken into account here? Isn't there as risk
that the .post callback of the next request is called before
the .post callback of the previous request has returned
for example?

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery
  2017-11-03 13:20 ` [PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery Adrian Hunter
@ 2017-11-08  9:30   ` Linus Walleij
  2017-11-09  7:56     ` Adrian Hunter
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Walleij @ 2017-11-08  9:30 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

> Recovery is simpler to understand if it is only used for errors. Create a
> separate function for card polling.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

This looks good but I can't see why it's not folded into
patch 3 already. This error handling is introduced there.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 10/10] mmc: block: blk-mq: Stop using legacy recovery
  2017-11-03 13:20 ` [PATCH V13 10/10] mmc: block: blk-mq: Stop using legacy recovery Adrian Hunter
@ 2017-11-08  9:38   ` Linus Walleij
  2017-11-09  7:43     ` Adrian Hunter
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Walleij @ 2017-11-08  9:38 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

> There are only a few things the recovery needs to do. Primarily, it just
> needs to:
>         Determine the number of bytes transferred
>         Get the card back to transfer state
>         Determine whether to retry
>
> There are also a couple of additional features:
>         Reset the card before the last retry
>         Read one sector at a time
>
> The legacy code spent much effort analyzing command errors, but commands
> fail fast, so it is simpler just to give all command errors the same number
> of retries.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

I have nothing against the patch as such. In fact something
like this makes a lot of sense (to me).

But this just makes mmc_blk_rw_recovery() look really nice.

And leaves a very ugly mmc_blk_issue_rw_rq() with the legacy
error handling in-tree.

The former function isn't even named with some *mq* infix
making it clear that the new recovery path only happens
in the MQ case.

If newcomers read this code in the MMC stack they will
just tear their hair, scream and run away. Even faster than
before.

How are they supposed to know which functions are used on
which path? Run ftrace?

This illustrates firmly why we need to refactor and/or kill off
the old block layer interface *first* then add MQ on top.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 04/10] mmc: block: Add CQE support
  2017-11-08  9:00   ` Linus Walleij
@ 2017-11-08 13:20     ` Adrian Hunter
  2017-11-09 12:04       ` Linus Walleij
  0 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-08 13:20 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 08/11/17 11:00, Linus Walleij wrote:
> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>> @@ -2188,11 +2327,18 @@ enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
>>                         return MMC_REQ_FAILED_TO_START;
>>                 }
>>                 return MMC_REQ_FINISHED;
>> +       case MMC_ISSUE_DCMD:
>>         case MMC_ISSUE_ASYNC:
>>                 switch (req_op(req)) {
>> +               case REQ_OP_FLUSH:
>> +                       ret = mmc_blk_cqe_issue_flush(mq, req);
>> +                       break;
>>                 case REQ_OP_READ:
>>                 case REQ_OP_WRITE:
>> -                       ret = mmc_blk_mq_issue_rw_rq(mq, req);
>> +                       if (mq->use_cqe)
>> +                               ret = mmc_blk_cqe_issue_rw_rq(mq, req);
>> +                       else
>> +                               ret = mmc_blk_mq_issue_rw_rq(mq, req);
>>                         break;
>>                 default:
>>                         WARN_ON_ONCE(1);
> 
> This and other bits gives me the feeling CQE is now actually ONLY
> working on the MQ path.

I was not allowed to support non-mq.

> 
> That is good. We only add new functionality on the MQ path,
> yay!
> 
> But this fact (only abailable iff MQ==true) should at least be
> mentioned in the commit message I think?

Why?  CQE is MQ only.

> 
> So why not ditch the old block layer or at least make MQ default?

CQE is MQ only.

> 
> When you keep it like this people have to reconfigure
> their kernel to enable MQ before they see the benefits of MQ+CQE
> combined, I think that should rather be the default experience.

Not at all.  I guess you are confusing the legacy mmc with CQE.  CQE is not
a layer on top of legacy mmc.  It is an alternative to legacy mmc.  CQE
does not sit on top of the legacy mmc blk-mq support.  You don't have to
enable legacy mmc blk-mq support to use CQE.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host
  2017-11-08  9:22   ` Linus Walleij
@ 2017-11-08 14:14     ` Adrian Hunter
  2017-11-09 12:26       ` Linus Walleij
  0 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-08 14:14 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 08/11/17 11:22, Linus Walleij wrote:
> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>> From: Venkat Gopalakrishnan <venkatg@codeaurora.org>
>>
>> This patch adds CMDQ support for command-queue compatible
>> hosts.
>>
>> Command queue is added in eMMC-5.1 specification. This
>> enables the controller to process upto 32 requests at
>> a time.
>>
>> Adrian Hunter contributed renaming to cqhci, recovery, suspend
>> and resume, cqhci_off, cqhci_wait_for_idle, and external timeout
>> handling.
>>
>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
>> Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
>> Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org>
>> Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
>> Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
>> Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> (...)
>> +config MMC_CQHCI
>> +       tristate "Command Queue Host Controller Interface support"
>> +       depends on HAS_DMA
> 
> If the CQE implementation depends on MQ, should this not
> also select MMC_MQ_DEFAULT?

No, there is no dependency on MMC_MQ_DEFAULT.

> 
> And if the CQE implementation depens on MQ, isn't it dangerous
> to even have a boot option to disable it?

No, there is no dependency on MMC_MQ_DEFAULT.

> 
> Again I push the point that things are easier to keep in line
> if we just support one block request path (MQ), sorry for
> hammering.
> 
>> +int cqhci_suspend(struct mmc_host *mmc)
>> +{
>> +       struct cqhci_host *cq_host = mmc->cqe_private;
>> +
>> +       if (cq_host->enabled)
>> +               __cqhci_disable(cq_host);
>> +
>> +       return 0;
>> +}
>> +EXPORT_SYMBOL(cqhci_suspend);
>> +
>> +int cqhci_resume(struct mmc_host *mmc)
>> +{
>> +       /* Re-enable is done upon first request */
>> +       return 0;
>> +}
>> +EXPORT_SYMBOL(cqhci_resume);
> 
> Why would the CQE case require special suspend/resume
> functionality?

Seems like a very strange question. Obviously CQHCI has to be configured
after suspend.

Also please don't confuse CQE and CQHCI.  CQHCI is an implementation of a
CQE.  We currently do not expect to have another implementation, but it is
not impossible.

> 
> This seems two much like on the side CQE-silo engineering,
> just use the device .[runtime]_suspend/resume callbacks like
> everyone else, make it possible for the host to figure out
> if it is in CQE mode or not (I guess it should know already
> since cqhci .enable() has been called?) and handle it
> from there.

That is how it works!  The host controller has to decide how to handle
suspend / resume.

cqhci_suspend() / cqhci_resume() are helper functions that the host
controller can use, but doesn't have to.

> 
>> +struct cqhci_host_ops {
>> +       void (*dumpregs)(struct mmc_host *mmc);
>> +       void (*write_l)(struct cqhci_host *host, u32 val, int reg);
>> +       u32 (*read_l)(struct cqhci_host *host, int reg);
>> +       void (*enable)(struct mmc_host *mmc);
>> +       void (*disable)(struct mmc_host *mmc, bool recovery);
>> +};
>> +
>> +static inline void cqhci_writel(struct cqhci_host *host, u32 val, int reg)
>> +{
>> +       if (unlikely(host->ops->write_l))
>> +               host->ops->write_l(host, val, reg);
>> +       else
>> +               writel_relaxed(val, host->mmio + reg);
>> +}
> 
> Why would CQE hosts need special accessors and the rest
> of the host not need it?

Special accessors can be used to fix up registers that don't work exactly
the way the standard specified.

> 
> This seems to be a bit hackish "just for CQE" approach,
> abstract callbacks like ->dumpregs(), ->write_l() and
> ->read_l() seems to be something that should be in the
> generic struct mmc_host_ops and not tied to CQE, if
> needed at all.

They are nothing to do with CQE or mmc ops.  These are for hosts that have a
CQHCI compatible CQE.

> 
> ->enable and ->disable() for just CQE seem reasonable.
> But that leaves just two new ops.
> 
> So why not just put .cqe_enable() and .cqe_disable()
> ops into mmc_host_ops as optional and be done with it?

Ok so you are not understanding this at all.

As a CQE implementation, CQHCI interfaces with the upper layers through the
CQE ops etc.

But CQHCI also has to work with any host controller driver, so it needs an
interface for that, which is what cqhci_host_ops is for.  All the ops serve
useful purposes.

> 
>> +irqreturn_t cqhci_irq(struct mmc_host *mmc, u32 intmask, int cmd_error,
>> +                     int data_error);
>> +int cqhci_init(struct cqhci_host *cq_host, struct mmc_host *mmc, bool dma64);
>> +struct cqhci_host *cqhci_pltfm_init(struct platform_device *pdev);
>> +int cqhci_suspend(struct mmc_host *mmc);
>> +int cqhci_resume(struct mmc_host *mmc);
> 
> This seems overall too much bolted on the side.

The whole point is to prove a library that can work with any host controller
driver.  That means it must provide functions and callbacks.

> 
> I think the above approach to put any CQE-specific callbacks
> directly into the struct mmc_host_ops is way more viable.

Nothing to do with CQE.  This is CQHCI.  Please try to get the difference.

> 
> If special CQE init is needed, why a special cqhci_init()
> call? And cqhci_pltfm_init()? It's confusing. Can't
> you just call this by default from the core if the host is
> CQE capable? Ass a .cqhci_init() callback into mmc_host_ops
> if need be.

Yeah, so CQHCI is just one of theoretically any number of CQE
implementations.  This has nothing to do with the core.  It is entirely up
to the host driver.  cqhci_pltfm_init() allows the mmio space to be defined
by platform resources, whereas cqhci_init() does all the rest of the
initialization.

> 
> cqhci_irq() seems necessary though, I see something like this is
> probably necessary.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK
  2017-11-08  9:24   ` Linus Walleij
@ 2017-11-09  7:12     ` Adrian Hunter
  2017-11-10  8:18       ` Linus Walleij
  0 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-09  7:12 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 08/11/17 11:24, Linus Walleij wrote:
> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>> Add CQHCI initialization and implement CQHCI operations for Intel GLK.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> This patch seems OK in context, but it merely illustrates the
> weirdness of .[runtime]_suspend/resume calling into CQE-specific
> APIs rather than using generic host callbacks.

Your comment makes no sense at all.  The host driver has
[runtime]_suspend/resume callbacks and it is up to the host driver to decide
what to do.  CQHCI provides helpers since that is the whole point of having
a CQHCI library.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion
  2017-11-08  9:28   ` Linus Walleij
@ 2017-11-09  7:27     ` Adrian Hunter
  2017-11-09 12:34       ` Linus Walleij
  0 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-09  7:27 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 08/11/17 11:28, Linus Walleij wrote:
> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>> For blk-mq, add support for completing requests directly in the ->done
>> callback. That means that error handling and urgent background operations
>> must be handled by recovery_work in that case.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> I tried enabling this on my MMC host (mmci) but I got weird
> DMA error messages when I did.
> 
> I guess this has not been tested on a non-DMA-coherent
> system?

I don't see what DMA-coherence has to do with anything.

Possibilities:
	- DMA unmapping doesn't work in an atomic context
	- requests' DMA operations have to be synchronized with each other

> I think I might be seeing this because the .pre and .post
> callbacks need to be strictly sequenced, and this is
> maybe not taken into account here?

I looked at mmci but that did not seem to be the case.

> Isn't there as risk
> that the .post callback of the next request is called before
> the .post callback of the previous request has returned
> for example?

Of course, the requests are treated as independent.  If the separate DMA
operations require synchronization, that is for the host driver to fix.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 10/10] mmc: block: blk-mq: Stop using legacy recovery
  2017-11-08  9:38   ` Linus Walleij
@ 2017-11-09  7:43     ` Adrian Hunter
  2017-11-09 12:45       ` Linus Walleij
  0 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-09  7:43 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 08/11/17 11:38, Linus Walleij wrote:
> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>> There are only a few things the recovery needs to do. Primarily, it just
>> needs to:
>>         Determine the number of bytes transferred
>>         Get the card back to transfer state
>>         Determine whether to retry
>>
>> There are also a couple of additional features:
>>         Reset the card before the last retry
>>         Read one sector at a time
>>
>> The legacy code spent much effort analyzing command errors, but commands
>> fail fast, so it is simpler just to give all command errors the same number
>> of retries.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> I have nothing against the patch as such. In fact something
> like this makes a lot of sense (to me).
> 
> But this just makes mmc_blk_rw_recovery() look really nice.
> 
> And leaves a very ugly mmc_blk_issue_rw_rq() with the legacy
> error handling in-tree.
> 
> The former function isn't even named with some *mq* infix
> making it clear that the new recovery path only happens
> in the MQ case.
> 
> If newcomers read this code in the MMC stack they will
> just tear their hair, scream and run away. Even faster than
> before.
> 
> How are they supposed to know which functions are used on
> which path? Run ftrace?

You're kidding me right?  You don't know how to find where a function used?

> This illustrates firmly why we need to refactor and/or kill off
> the old block layer interface *first* then add MQ on top.

No it doesn't!  You are playing games!  One function could be named
differently, so that is evidence the whole patch set should be ignored.

The old code is rubbish.  There is nothing worth keeping.  Churning it
around is a waste of everybody's time.  Review and test the new code.
Delete the old code.  Much much simpler!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery
  2017-11-08  9:30   ` Linus Walleij
@ 2017-11-09  7:56     ` Adrian Hunter
  2017-11-09 12:52       ` Linus Walleij
  0 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-09  7:56 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 08/11/17 11:30, Linus Walleij wrote:
> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>> Recovery is simpler to understand if it is only used for errors. Create a
>> separate function for card polling.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> This looks good but I can't see why it's not folded into
> patch 3 already. This error handling is introduced there.

What are you on about?  If we're going to split up the patches (which I
argued against - the new code is all new, so it could be read independently
from the old mess) then this is a logically distinct step.  Polling and
error-recovery are conceptually different things and it is important to
separate them to make the code easier to understand.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 03/10] mmc: block: Add blk-mq support
  2017-11-08  8:54   ` Linus Walleij
@ 2017-11-09 10:42     ` Adrian Hunter
  2017-11-09 15:52       ` Linus Walleij
  0 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-09 10:42 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 08/11/17 10:54, Linus Walleij wrote:
> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>> Define and use a blk-mq queue. Discards and flushes are processed
>> synchronously, but reads and writes asynchronously. In order to support
>> slow DMA unmapping, DMA unmapping is not done until after the next request
>> is started. That means the request is not completed until then. If there is
>> no next request then the completion is done by queued work.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
>> -       blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
>> +       if (req->mq_ctx)
>> +               blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
>> +       else
>> +               blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK);
> 
> I think this quite obvious code duplication is unfortunate.
> 
> What my patches do is get rid of the old block layer in order
> to be able to focus on the new stuff using just MQ.
> 
> One reason is that the code is hairy already as it is, by just
> supporting MQ the above is still just one line of code, the same
> goes for the other instances below.
> 
> At least you could do what I did and break out a helper like
> this:
> 
> /*
>  * This reports status back to the block layer for a finished request.
>  */
> static void mmc_blk_complete(struct mmc_queue_req *mq_rq,
>                             blk_status_t status)
> {
>        struct request *req = mmc_queue_req_to_req(mq_rq);
> 
>        if (req->mq_ctx) {
>           blk_mq_end_request(req, status);
>        } else
>           blk_end_request_all(req, status);
> }

You are quibbling.  It makes next to no difference especially as it all goes
away when the legacy code is deleted.  I will change it in the next
revision, but what a waste of everyone's time.  Please try to focus on
things that matter.

>> +/* Single sector read during recovery */
>> +static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req)
>> +{
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       blk_status_t status;
>> +
>> +       while (1) {
>> +               mmc_blk_rw_rq_prep(mqrq, mq->card, 1, mq);
>> +
>> +               mmc_wait_for_req(mq->card->host, &mqrq->brq.mrq);
>> +
>> +               /*
>> +                * Not expecting command errors, so just give up in that case.
>> +                * If there are retries remaining, the request will get
>> +                * requeued.
>> +                */
>> +               if (mqrq->brq.cmd.error)
>> +                       return;
>> +
>> +               if (blk_rq_bytes(req) <= 512)
>> +                       break;
>> +
>> +               status = mqrq->brq.data.error ? BLK_STS_IOERR : BLK_STS_OK;
>> +
>> +               blk_update_request(req, status, 512);
>> +       }
>> +
>> +       mqrq->retries = MMC_NO_RETRIES;
>> +}
>> +
>> +static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
>> +{
>> +       int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       struct mmc_blk_request *brq = &mqrq->brq;
>> +       struct mmc_blk_data *md = mq->blkdata;
>> +       struct mmc_card *card = mq->card;
>> +       static enum mmc_blk_status status;
>> +
>> +       brq->retune_retry_done = mqrq->retries;
>> +
>> +       status = __mmc_blk_err_check(card, mqrq);
>> +
>> +       mmc_retune_release(card->host);
>> +
>> +       /*
>> +        * Requests are completed by mmc_blk_mq_complete_rq() which sets simple
>> +        * policy:
>> +        * 1. A request that has transferred at least some data is considered
>> +        * successful and will be requeued if there is remaining data to
>> +        * transfer.
>> +        * 2. Otherwise the number of retries is incremented and the request
>> +        * will be requeued if there are remaining retries.
>> +        * 3. Otherwise the request will be errored out.
>> +        * That means mmc_blk_mq_complete_rq() is controlled by bytes_xfered and
>> +        * mqrq->retries. So there are only 4 possible actions here:
>> +        *      1. do not accept the bytes_xfered value i.e. set it to zero
>> +        *      2. change mqrq->retries to determine the number of retries
>> +        *      3. try to reset the card
>> +        *      4. read one sector at a time
>> +        */
>> +       switch (status) {
>> +       case MMC_BLK_SUCCESS:
>> +       case MMC_BLK_PARTIAL:
>> +               /* Reset success, and accept bytes_xfered */
>> +               mmc_blk_reset_success(md, type);
>> +               break;
>> +       case MMC_BLK_CMD_ERR:
>> +               /*
>> +                * For SD cards, get bytes written, but do not accept
>> +                * bytes_xfered if that fails. For MMC cards accept
>> +                * bytes_xfered. Then try to reset. If reset fails then
>> +                * error out the remaining request, otherwise retry
>> +                * once (N.B mmc_blk_reset() will not succeed twice in a
>> +                * row).
>> +                */
>> +               if (mmc_card_sd(card)) {
>> +                       u32 blocks;
>> +                       int err;
>> +
>> +                       err = mmc_sd_num_wr_blocks(card, &blocks);
>> +                       if (err)
>> +                               brq->data.bytes_xfered = 0;
>> +                       else
>> +                               brq->data.bytes_xfered = blocks << 9;
>> +               }
>> +               if (mmc_blk_reset(md, card->host, type))
>> +                       mqrq->retries = MMC_NO_RETRIES;
>> +               else
>> +                       mqrq->retries = MMC_MAX_RETRIES - 1;
>> +               break;
>> +       case MMC_BLK_RETRY:
>> +               /*
>> +                * Do not accept bytes_xfered, but retry up to 5 times,
>> +                * otherwise same as abort.
>> +                */
>> +               brq->data.bytes_xfered = 0;
>> +               if (mqrq->retries < MMC_MAX_RETRIES)
>> +                       break;
>> +               /* Fall through */
>> +       case MMC_BLK_ABORT:
>> +               /*
>> +                * Do not accept bytes_xfered, but try to reset. If
>> +                * reset succeeds, try once more, otherwise error out
>> +                * the request.
>> +                */
>> +               brq->data.bytes_xfered = 0;
>> +               if (mmc_blk_reset(md, card->host, type))
>> +                       mqrq->retries = MMC_NO_RETRIES;
>> +               else
>> +                       mqrq->retries = MMC_MAX_RETRIES - 1;
>> +               break;
>> +       case MMC_BLK_DATA_ERR: {
>> +               int err;
>> +
>> +               /*
>> +                * Do not accept bytes_xfered, but try to reset. If
>> +                * reset succeeds, try once more. If reset fails with
>> +                * ENODEV which means the partition is wrong, then error
>> +                * out the request. Otherwise attempt to read one sector
>> +                * at a time.
>> +                */
>> +               brq->data.bytes_xfered = 0;
>> +               err = mmc_blk_reset(md, card->host, type);
>> +               if (!err) {
>> +                       mqrq->retries = MMC_MAX_RETRIES - 1;
>> +                       break;
>> +               }
>> +               if (err == -ENODEV) {
>> +                       mqrq->retries = MMC_NO_RETRIES;
>> +                       break;
>> +               }
>> +               /* Fall through */
>> +       }
>> +       case MMC_BLK_ECC_ERR:
>> +               /*
>> +                * Do not accept bytes_xfered. If reading more than one
>> +                * sector, try reading one sector at a time.
>> +                */
>> +               brq->data.bytes_xfered = 0;
>> +               /* FIXME: Missing single sector read for large sector size */
>> +               if (brq->data.blocks > 1 && !mmc_large_sector(card)) {
>> +                       /* Redo read one sector at a time */
>> +                       pr_warn("%s: retrying using single block read\n",
>> +                               req->rq_disk->disk_name);
>> +                       mmc_blk_ss_read(mq, req);
>> +               } else {
>> +                       mqrq->retries = MMC_NO_RETRIES;
>> +               }
>> +               break;
>> +       case MMC_BLK_NOMEDIUM:
>> +               /* Do not accept bytes_xfered. Error out the request */
>> +               brq->data.bytes_xfered = 0;
>> +               mqrq->retries = MMC_NO_RETRIES;
>> +               break;
>> +       default:
>> +               /* Do not accept bytes_xfered. Error out the request */
>> +               brq->data.bytes_xfered = 0;
>> +               mqrq->retries = MMC_NO_RETRIES;
>> +               pr_err("%s: Unhandled return value (%d)",
>> +                      req->rq_disk->disk_name, status);
>> +               break;
>> +       }
>> +}
>> +
>> +static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req)
>> +{
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       unsigned int nr_bytes = mqrq->brq.data.bytes_xfered;
>> +
>> +       if (nr_bytes) {
>> +               if (blk_update_request(req, BLK_STS_OK, nr_bytes))
>> +                       blk_mq_requeue_request(req, true);
>> +               else
>> +                       __blk_mq_end_request(req, BLK_STS_OK);
>> +       } else if (mqrq->retries++ < MMC_MAX_RETRIES) {
>> +               blk_mq_requeue_request(req, true);
>> +       } else {
>> +               if (mmc_card_removed(mq->card))
>> +                       req->rq_flags |= RQF_QUIET;
>> +               blk_mq_end_request(req, BLK_STS_IOERR);
>> +       }
>> +}
> 
> This retry and error handling using requeue is very elegant.
> I really like this.
> 
> If we could also go for MQ-only, only this nice code
> remains in the tree.

No one has ever suggested that the legacy API will remain.  Once blk-mq is
ready the old code gets deleted.

> 
> The problem: you have just reimplemented the whole error
> handling we had in the old block layer and now we have to
> maintain two copies and keep them in sync.
> 
> This is not OK IMO, we will inevitable screw it up, so we
> need to get *one* error path.

Wow, you really didn't read the code at all.  As I have repeatedly pointed
out, the new code is all new.  There is no overlap and there nothing to keep
in sync.  It may not look like it in this patch, but that is only because of
the ridiculous idea of splitting up the patch.

> 
>> +static bool mmc_blk_urgent_bkops_needed(struct mmc_queue *mq,
>> +                                       struct mmc_queue_req *mqrq)
>> +{
>> +       return mmc_card_mmc(mq->card) &&
>> +              (mqrq->brq.cmd.resp[0] & R1_EXCEPTION_EVENT ||
>> +               mqrq->brq.stop.resp[0] & R1_EXCEPTION_EVENT);
>> +}
>> +
>> +static void mmc_blk_urgent_bkops(struct mmc_queue *mq,
>> +                                struct mmc_queue_req *mqrq)
>> +{
>> +       if (mmc_blk_urgent_bkops_needed(mq, mqrq))
>> +               mmc_start_bkops(mq->card, true);
>> +}
>> +
>> +void mmc_blk_mq_complete(struct request *req)
>> +{
>> +       struct mmc_queue *mq = req->q->queuedata;
>> +
>> +       mmc_blk_mq_complete_rq(mq, req);
>> +}
> 
> So this is called from the struct blk_mq_ops .complete()
> callback. And this calls blk_mq_end_request().
> 
> So the semantic order needs to be complete -> end.
> 
> I see this pattern in newer MQ code, I got it wrong in
> my patch set so I try to fix it up.
> 
>> +static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
>> +                                      struct request *req)
>> +{
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +
>> +       mmc_blk_rw_recovery(mq, req);
>> +
>> +       mmc_blk_urgent_bkops(mq, mqrq);
>> +}
> 
> This looks nice.
> 
>> +static void mmc_blk_mq_acct_req_done(struct mmc_queue *mq, struct request *req)
> 
> What does "acct" mean in the above function name?
> Accounting? Actual? I'm lost.

Does "actual" have two "c"'s.  You are just making things up.  Of course it
is "account".  It is counting the number of requests in flight - which is
pretty obvious from the code.  We use that to support important features
like CQE re-tuning and avoiding getting / putting the card all the time.

> 
>> +{
>> +       struct request_queue *q = req->q;
>> +       unsigned long flags;
>> +       bool put_card;
>> +
>> +       spin_lock_irqsave(q->queue_lock, flags);
>> +
>> +       mq->in_flight[mmc_issue_type(mq, req)] -= 1;
>> +
>> +       put_card = mmc_tot_in_flight(mq) == 0;
> 
> This in_flight[] business seems a bit kludgy, but I
> don't really understand it fully. Magic numbers like
> -1 to mark that something is not going on etc, not
> super-elegant.

You are misreading.  It subtracts 1 from the number of requests in flight.

> 
> I believe it is necessary for CQE though as you need
> to keep track of outstanding requests?

We have always avoided getting / putting the card when there is another
request in flight.  Yes it is also used for CQE.

> 
>> +
>> +       spin_unlock_irqrestore(q->queue_lock, flags);
>> +
>> +       if (put_card)
>> +               mmc_put_card(mq->card, &mq->ctx);
> 
> I think you should try not to sprinkle mmc_put_card() inside
> the different functions but instead you can put this in the
> .complete callback I guess mmc_blk_mq_complete() in your
> patch set.

This is the *only* block.c function in the blk-mq code that calls
mmc_put_card().  The queue also has does it for requests that didn't even
start.  That is it.

> 
> Also you do not need to avoid calling it several times with
> that put_card variable. It's fully reentrant thanks to your
> own code in the lock and all calls come from the same block
> layer process if you call it in .complete() I think?

We have always avoided unnecessary gets / puts.  Since the result is better,
why on earth take it out?

> 
>> +static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req)
>> +{
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       struct mmc_request *mrq = &mqrq->brq.mrq;
>> +       struct mmc_host *host = mq->card->host;
>> +
>> +       if (host->ops->post_req)
>> +               host->ops->post_req(host, mrq, 0);
>> +
>> +       blk_mq_complete_request(req);
>> +
>> +       mmc_blk_mq_acct_req_done(mq, req);
>> +}
> 
> Now the problem Ulf has pointed out starts to creep out in the
> patch: a lot of code duplication on the MQ path compared to
> the ordinary block layer path.
> 
> My approach was structured partly to avoid this: first refactor
> the old path, then switch to (only) MQ to avoid code duplication.

The old code is rubbish.  There is nothing of value there.  You have to make
the case why you are wasting everyone's time churning crappy code.  The idea
that nice new code is wrong because it doesn't churn the old rubbish code,
is ridiculous.

> 
>> +static void mmc_blk_mq_complete_prev_req(struct mmc_queue *mq,
>> +                                        struct request **prev_req)
>> +{
>> +       mutex_lock(&mq->complete_lock);
>> +
>> +       if (!mq->complete_req)
>> +               goto out_unlock;
>> +
>> +       mmc_blk_mq_poll_completion(mq, mq->complete_req);
>> +
>> +       if (prev_req)
>> +               *prev_req = mq->complete_req;
>> +       else
>> +               mmc_blk_mq_post_req(mq, mq->complete_req);
>> +
>> +       mq->complete_req = NULL;
>> +
>> +out_unlock:
>> +       mutex_unlock(&mq->complete_lock);
>> +}
> 
> This looks a bit like it is reimplementing the kernel
> completion abstraction using a mutex and a variable named
> .complete_req?
> 
> We were using a completion in the old block layer so
> why did you not use it for MQ?

Doesn't seem like you pay much attention to this stuff.  The previous
request has to be completed even if there is no next request.  That means
schduling work, that then races with the dispatch path.  So mutual exclusion
is necessary.

> 
>> +static void mmc_blk_mq_req_done(struct mmc_request *mrq)
>> +{
>> +       struct mmc_queue_req *mqrq = container_of(mrq, struct mmc_queue_req,
>> +                                                 brq.mrq);
>> +       struct request *req = mmc_queue_req_to_req(mqrq);
>> +       struct request_queue *q = req->q;
>> +       struct mmc_queue *mq = q->queuedata;
>> +       unsigned long flags;
>> +       bool waiting;
>> +
>> +       spin_lock_irqsave(q->queue_lock, flags);
>> +       mq->complete_req = req;
>> +       mq->rw_wait = false;
>> +       waiting = mq->waiting;
>> +       spin_unlock_irqrestore(q->queue_lock, flags);
>> +
>> +       if (waiting)
>> +               wake_up(&mq->wait);
> 
> I would contest using a waitqueue for this. The name even says
> "complete_req" so why is a completion not the right thing to
> hang on rather than a waitqueue?

I explained that above.

> 
> The completion already contains a waitqueue, so I think you're
> just essentially reimplementing it.
> 
> Just complete(&mq->mq_req_complete) or something should do
> the trick.

Nope.

> 
>> +       else
>> +               kblockd_schedule_work(&mq->complete_work);
> 
> I did not use the kblockd workqueue for this, out of fear
> that it would interfere and disturb the block layer work items.
> My intuitive idea was that the MMC layer needed its own
> worker (like in the past it used a thread) in order to avoid
> congestion in the block layer queue leading to unnecessary
> delays.

The complete work races with the dispatch of the next request, so putting
them in the same workqueue makes sense. i.e. the one that gets processed
first would anyway delay the one that gets processed second.

> 
> On the other hand, this likely avoids a context switch if there
> is no congestion on the queue.
> 
> I am uncertain when it is advisible to use the block layer
> queue for subsystems like MMC/SD.
> 
> Would be nice to see some direction from the block layer
> folks here, it is indeed exposed to us...
> 
> My performance tests show no problems with this approach
> though.

As I already wrote, the CPU-bound block layer dispatch work queue has
negative consequeuences for mmc performance.  So there are 2 aspects to that:
	1. Is the choice of CPU right to start with?  I suspect it is better for
the dispatch to run on the same CPU as the interrupt.
	2. Does the dispatch work need to be able to be migrated to a different
CPU? i.e. unbound work queue.  That helped in my tests, but it could just be
a side-effect of 1.

Of course we can't start looking at these real issues, while you are

> 
>> +static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
>> +{
>> +       struct request_queue *q = mq->queue;
>> +       unsigned long flags;
>> +       bool done;
>> +
>> +       spin_lock_irqsave(q->queue_lock, flags);
>> +       done = !mq->rw_wait;
>> +       mq->waiting = !done;
>> +       spin_unlock_irqrestore(q->queue_lock, flags);
> 
> This makes it look like a reimplementation of completion_done()
> so I think you should use the completion abstraction again. The
> struct completion even contains a variable named "done".

This just serves as an example of why splitting up the patch was such a bad
idea.  For direct completion, the wait can result in recovery being needed
for the previous request, so the current request gets requeued.

> 
>> +static int mmc_blk_mq_issue_rw_rq(struct mmc_queue *mq,
>> +                                 struct request *req)
>> +{
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       struct mmc_host *host = mq->card->host;
>> +       struct request *prev_req = NULL;
>> +       int err = 0;
>> +
>> +       mmc_blk_rw_rq_prep(mqrq, mq->card, 0, mq);
>> +
>> +       mqrq->brq.mrq.done = mmc_blk_mq_req_done;
>> +
>> +       if (host->ops->pre_req)
>> +               host->ops->pre_req(host, &mqrq->brq.mrq);
>> +
>> +       err = mmc_blk_rw_wait(mq, &prev_req);
>> +       if (err)
>> +               goto out_post_req;
>> +
>> +       mq->rw_wait = true;
>> +
>> +       err = mmc_start_request(host, &mqrq->brq.mrq);
>> +
>> +       if (prev_req)
>> +               mmc_blk_mq_post_req(mq, prev_req);
>> +
>> +       if (err)
>> +               mq->rw_wait = false;
>> +
>> +out_post_req:
>> +       if (err && host->ops->post_req)
>> +               host->ops->post_req(host, &mqrq->brq.mrq, err);
>> +
>> +       return err;
>> +}
> 
> This is pretty straight-forward (pending the comments above).
> Again it has the downside of duplicating the same code for the
> old block layer instead of refactoring.

No, the old code is rubbish.  Churning it is a waste of time.

> 
>> +enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
>> +{
>> +       struct mmc_blk_data *md = mq->blkdata;
>> +       struct mmc_card *card = md->queue.card;
>> +       struct mmc_host *host = card->host;
>> +       int ret;
>> +
>> +       ret = mmc_blk_part_switch(card, md->part_type);
>> +       if (ret)
>> +               return MMC_REQ_FAILED_TO_START;
>> +
>> +       switch (mmc_issue_type(mq, req)) {
>> +       case MMC_ISSUE_SYNC:
>> +               ret = mmc_blk_wait_for_idle(mq, host);
>> +               if (ret)
>> +                       return MMC_REQ_BUSY;
>> +               switch (req_op(req)) {
>> +               case REQ_OP_DRV_IN:
>> +               case REQ_OP_DRV_OUT:
>> +                       mmc_blk_issue_drv_op(mq, req);
>> +                       break;
>> +               case REQ_OP_DISCARD:
>> +                       mmc_blk_issue_discard_rq(mq, req);
>> +                       break;
>> +               case REQ_OP_SECURE_ERASE:
>> +                       mmc_blk_issue_secdiscard_rq(mq, req);
>> +                       break;
>> +               case REQ_OP_FLUSH:
>> +                       mmc_blk_issue_flush(mq, req);
>> +                       break;
>> +               default:
>> +                       WARN_ON_ONCE(1);
>> +                       return MMC_REQ_FAILED_TO_START;
>> +               }
>> +               return MMC_REQ_FINISHED;
>> +       case MMC_ISSUE_ASYNC:
>> +               switch (req_op(req)) {
>> +               case REQ_OP_READ:
>> +               case REQ_OP_WRITE:
>> +                       ret = mmc_blk_mq_issue_rw_rq(mq, req);
>> +                       break;
>> +               default:
>> +                       WARN_ON_ONCE(1);
>> +                       ret = -EINVAL;
>> +               }
>> +               if (!ret)
>> +                       return MMC_REQ_STARTED;
>> +               return ret == -EBUSY ? MMC_REQ_BUSY : MMC_REQ_FAILED_TO_START;
>> +       default:
>> +               WARN_ON_ONCE(1);
>> +               return MMC_REQ_FAILED_TO_START;
>> +       }
>> +}
> 
> Again looks fine, again duplicates code. In this case I don't even
> see why the MQ code needs its own copy of the issue funtion.

Because it has to support CQE.  This attitude against CQE is very disappointing!

> 
>> +enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req)
>> +{
>> +       if (req_op(req) == REQ_OP_READ || req_op(req) == REQ_OP_WRITE)
>> +               return MMC_ISSUE_ASYNC;
>> +
>> +       return MMC_ISSUE_SYNC;
>> +}
> 
> Distinguishing between SYNC and ASYNC operations and using
> that as abstraction is nice.
> 
> But you only do this in the new MQ code.
> 
> Instead, make this a separate patch and first refactor the old
> code to use this distinction between SYNC and ASYNC.

That is a non-starter.  The old code is rubbish.  Point to something work
saving.  There isn't anything.

> 
> Unfortunately I think Ulf's earlier criticism that you're rewriting
> the world instead of refactoring what we have still stands on many
> accounts here.

Nope.  It is just an excuse to delay the patches.  You guys are playing
games and it is embarassing for linux.  What is actually wrong with this
technically?  It is not good because it doesn't churn the old code?  That is
ridiculous.

> 
> It makes it even harder to understand your persistance in keeping
> the old block layer around. If you're introducing new concepts and
> cleaner code in the MQ path and kind of discarding the old
> block layer path, why keep it around at all?

Wow, you really like making things up.  Never have I suggested keeping the
old code.  It is rubbish.  As soon and blk-mq is ready and tested, delete
the old crap.

I was expecting CQE to be applied 6 months ago, supporting the legacy blk
layer until blk-mq was ready.  But you never delivered on blk-mq, which is
why I had to do it.  And now you are making up excuses about why we can't
move forward.

> 
> I would have a much easier time accepting this patch if it
> deleted as much as it was adding, i.e. introduce all this new
> nice MQ code, but also tossing out the old block layer and error
> handling code. Even if it is a massive rewrite, at least there
> is just one body of code to maintain going forward.

How can you pssibly call a few hundred lines massive.  The kernel has
millions of lines.  Your sense of scale is out of whack.

> 
> That said, I would strongly prefer a refactoring of the old block
> layer leading up to transitioning to MQ. But I am indeed biased
> since I took that approach myself.

Well stop it.  We have nice working code.  Get it applied and tested, and
then we can delete the old crap.

> 
>> +static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
>> +                                                bool reserved)
>> +{
>> +       return BLK_EH_RESET_TIMER;
>> +}
> 
> This timeout looks like something I need to pick up in my patch
> set as well. It seems good for stability to support this. But what happened
> here? Did you experience a bunch of timeouts during development,
> or let's say how was this engineered, I guess it is for the case when
> something randomly locks up for a long time and we don't really know
> what has happened, like a watchdog?

We presently don't have the host APIs to support external timeouts.  CQE
uses them though.

> 
>> +static int mmc_init_request(struct request_queue *q, struct request *req,
>> +                           gfp_t gfp)
>> +{
>> +       return __mmc_init_request(q->queuedata, req, gfp);
>> +}
>> +
> (...)
>> +static int mmc_mq_init_request(struct blk_mq_tag_set *set, struct request *req,
>> +                              unsigned int hctx_idx, unsigned int numa_node)
>> +{
>> +       return __mmc_init_request(set->driver_data, req, GFP_KERNEL);
>> +}
>> +
>> +static void mmc_mq_exit_request(struct blk_mq_tag_set *set, struct request *req,
>> +                               unsigned int hctx_idx)
>> +{
>> +       struct mmc_queue *mq = set->driver_data;
>> +
>> +       mmc_exit_request(mq->queue, req);
>> +}
> 
> Here is more code duplication just to keep both the old block layer
> and MQ around. Including introducing another inner __foo function
> which I have something strongly against personally (I might be
> crazily picky, because I see many people do this).

In this case, it is not code duplication it is re-using the same code but
called from the blk-mq API.

> 
>> +static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
>> +                                   const struct blk_mq_queue_data *bd)
>> +{
>> +       struct request *req = bd->rq;
>> +       struct request_queue *q = req->q;
>> +       struct mmc_queue *mq = q->queuedata;
>> +       struct mmc_card *card = mq->card;
>> +       enum mmc_issue_type issue_type;
>> +       enum mmc_issued issued;
>> +       bool get_card;
>> +       int ret;
>> +
>> +       if (mmc_card_removed(mq->card)) {
>> +               req->rq_flags |= RQF_QUIET;
>> +               return BLK_STS_IOERR;
>> +       }
>> +
>> +       issue_type = mmc_issue_type(mq, req);
>> +
>> +       spin_lock_irq(q->queue_lock);
>> +
>> +       switch (issue_type) {
>> +       case MMC_ISSUE_ASYNC:
>> +               break;
>> +       default:
>> +               /*
>> +                * Timeouts are handled by mmc core, so set a large value to
>> +                * avoid races.
>> +                */
>> +               req->timeout = 600 * HZ;
>> +               break;
>> +       }
> 
> These timeouts again, does this mean we have competing timeout
> code in the block layer and MMC?

Yes - the host controller provides hardware timeout interrupts in most
cases.  The core provides software timeouts in other cases.

> 
> This mentions timeouts in the MMC core, but they are actually
> coming from the *MMC* core, when below you set:
> blk_queue_rq_timeout(mq->queue, 60 * HZ);?
> 
> Isn't the actual case that the per-queue timeout is set up to
> occur before the per-request timeout, and that you are hacking
> around the block layer core having two different timeouts?

There is no per-queue timeout.  The request timeout has a default value
given by the queue.  It can be changed for different requests.

> 
> It's a bit confusing so I'd really like to know what's going on...

I don't expect to have to teach you the block layer.

> 
>> +       mq->in_flight[issue_type] += 1;
>> +       get_card = mmc_tot_in_flight(mq) == 1;
> 
> Parenthesis around the logical expression preferred I guess
> get_card = (mmc_tot_in_flight(mq) == 1);
> (Isn't checkpatch complaining about this?)

Nope

> 
> Then:
> (...)
>> +       if (get_card)
>> +               mmc_get_card(card, &mq->ctx);
> 
> I simply took the card on every request. Since the context is the
> same for all block layer business and the lock is now fully
> reentrant this if (get_card) is not necessary. Just take it for
> every request and release it in the .complete() callback.

As I have written elsewhere, we have always avoind getting / putting
unecessarily.  It is better that way, so no point in taking it out.

> 
>> +#define MMC_QUEUE_DEPTH 64
>> +
>> +static int mmc_mq_init(struct mmc_queue *mq, struct mmc_card *card,
>> +                        spinlock_t *lock)
>> +{
>> +       int q_depth;
>> +       int ret;
>> +
>> +       q_depth = MMC_QUEUE_DEPTH;
>> +
>> +       ret = mmc_mq_init_queue(mq, q_depth, &mmc_mq_ops, lock);
> 
> Apart from using a define, then assigning the define to a
> variable and then passing that variable instead of just
> passing the define: why 64? Is that the depth of the CQE
> queue? In that case we need an if (cqe) and set it down
> to 2 for non-CQE.

Are you ever going to learn about the block layer.  The number of requests
is default 128 for the legacy block layer.  For blk-mq it is queue depth
times 2.  So 64 gives the same number of requests as before.

> 
>> +       if (ret)
>> +               return ret;
>> +
>> +       blk_queue_rq_timeout(mq->queue, 60 * HZ);
> 
> And requests timeout after 1 minute I take it.
> 
> I suspect both of these have some relation to CQE, so that is where
> you find these long execution times etc?

For legacy mmc, the core takes care of timeouts.  For CQE we expect reliable
devices and I would interpret a timeout as meaning the device is broken.
However it is sensible to have anyway.  For CQE, a request might have to
wait for the entire rest of the queue to be processed first, or maybe the
request somehow gets stuck and there are other requests constantly
overtaking it.  The max queue depth is 32 so 60 seconds seems ok.

> 
>> +static void mmc_mq_queue_suspend(struct mmc_queue *mq)
>> +{
>> +       blk_mq_quiesce_queue(mq->queue);
>> +
>> +       /*
>> +        * The host remains claimed while there are outstanding requests, so
>> +        * simply claiming and releasing here ensures there are none.
>> +        */
>> +       mmc_claim_host(mq->card->host);
>> +       mmc_release_host(mq->card->host);
> 
> I think just blk_mq_quiesce_queue() should be fine as and
> should make sure all requests have called .complete() and there
> I think you should also release the host lock.
> 
> If the MQ code is not doing this, we need to fix MQ to
> do the right thing (or add a new callback such as
> blk_mq_make_sure_queue_empty()) so at the very
> least put a big fat FIXME or REVISIT comment on the above.

blk_mq_quiesce_queue() prevents dispatches not completions.  So we wait for
outstanding requests.

> 
>> +static void mmc_mq_queue_resume(struct mmc_queue *mq)
>> +{
>> +       blk_mq_unquiesce_queue(mq->queue);
>> +}
>> +
>> +static void __mmc_queue_suspend(struct mmc_queue *mq)
>> +{
>> +       struct request_queue *q = mq->queue;
>> +       unsigned long flags;
>> +
>> +       if (!mq->suspended) {
>> +               mq->suspended |= true;
>> +
>> +               spin_lock_irqsave(q->queue_lock, flags);
>> +               blk_stop_queue(q);
>> +               spin_unlock_irqrestore(q->queue_lock, flags);
>> +
>> +               down(&mq->thread_sem);
>> +       }
>> +}
>> +
>> +static void __mmc_queue_resume(struct mmc_queue *mq)
>> +{
>> +       struct request_queue *q = mq->queue;
>> +       unsigned long flags;
>> +
>> +       if (mq->suspended) {
>> +               mq->suspended = false;
>> +
>> +               up(&mq->thread_sem);
>> +
>> +               spin_lock_irqsave(q->queue_lock, flags);
>> +               blk_start_queue(q);
>> +               spin_unlock_irqrestore(q->queue_lock, flags);
>> +       }
>> +}
> 
> One of the good reasons to delete the old block layer is to get
> rid of this horrible semaphore construction. So I see it as necessary
> to be able to focus development efforts on code that actually has
> a future.

The old crap will get deleted when blk-mq is ready.

> 
>> +       if (q->mq_ops)
>> +               mmc_mq_queue_suspend(mq);
>> +       else
>> +               __mmc_queue_suspend(mq);
> 
> And then there is the code duplication again.

The code is not duplicated the blk-mq code is completely different.  The old
crap will get deleted when blk-mq is ready.

> 
>>         int                     qcnt;
>> +
>> +       int                     in_flight[MMC_ISSUE_MAX];
> 
> So this is a [2] containing a counter for the number of
> synchronous and asynchronous requests in flight at any
> time.
> 
> But are there really synchronous and asynchronous requests
> going on at the same time?
> 
> Maybe on the error path I guess.
> 
> I avoided this completely but I guess it may be necessary with
> CQE, such that in_flight[0,1] is way more than 1 or 2 at times
> when there are commands queued?

CQE needs to count DCMD separately from read / writes.  Counting by issue
type is a simple way to do that.

I already pointed out that the code makes more sense together than split up.

> 
>> +       bool                    rw_wait;
>> +       bool                    waiting;
>> +       wait_queue_head_t       wait;
> 
> As mentioned I think this is a reimplementation of
> the completion abstraction.

I pointed out why that wouldn't work.  Another case of why the code makes
more sense together than split up.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 04/10] mmc: block: Add CQE support
  2017-11-08 13:20     ` Adrian Hunter
@ 2017-11-09 12:04       ` Linus Walleij
  2017-11-09 12:39         ` Adrian Hunter
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Walleij @ 2017-11-09 12:04 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Wed, Nov 8, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 08/11/17 11:00, Linus Walleij wrote:

>> This and other bits gives me the feeling CQE is now actually ONLY
>> working on the MQ path.
>
> I was not allowed to support non-mq.

Fair enough.

>> That is good. We only add new functionality on the MQ path,
>> yay!
>>
>> But this fact (only abailable iff MQ==true) should at least be
>> mentioned in the commit message I think?
>
> Why?  CQE is MQ only.

So if you read what I say, I think the commit message should
say that CQE is MQ only so that people know that CQE is
MQ only.

>> So why not ditch the old block layer or at least make MQ default?
>
> CQE is MQ only.

Yeah? So why keep it around for everything else?

>> When you keep it like this people have to reconfigure
>> their kernel to enable MQ before they see the benefits of MQ+CQE
>> combined, I think that should rather be the default experience.
>
> Not at all.  I guess you are confusing the legacy mmc with CQE.  CQE is not
> a layer on top of legacy mmc.  It is an alternative to legacy mmc.  CQE
> does not sit on top of the legacy mmc blk-mq support.  You don't have to
> enable legacy mmc blk-mq support to use CQE.

Now I am confused. I can't parse the last sentence. There is no
such thing as legcay blk-mq?

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host
  2017-11-08 14:14     ` Adrian Hunter
@ 2017-11-09 12:26       ` Linus Walleij
  2017-11-09 12:55         ` Adrian Hunter
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Walleij @ 2017-11-09 12:26 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Wed, Nov 8, 2017 at 3:14 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 08/11/17 11:22, Linus Walleij wrote:
>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

>> (...)

>>> +EXPORT_SYMBOL(cqhci_resume);
>>
>> Why would the CQE case require special suspend/resume
>> functionality?
>
> Seems like a very strange question.

Please realize that patch review is partly about education.

Educating me or anyone about your patch set involves
being humbe and not seeing your peers as lesser.

Making your reviewer feel stupid by saying the ask
"strange" or outright stupid questions is not helping
your cause.

> Obviously CQHCI has to be configured
> after suspend.

Yeah. I think I misunderstood is such that:

> Also please don't confuse CQE and CQHCI.  CQHCI is an implementation of a
> CQE.  We currently do not expect to have another implementation, but it is
> not impossible.

OK now you educated me, see it's not that hard without
using belitteling language.

>> This seems two much like on the side CQE-silo engineering,
>> just use the device .[runtime]_suspend/resume callbacks like
>> everyone else, make it possible for the host to figure out
>> if it is in CQE mode or not (I guess it should know already
>> since cqhci .enable() has been called?) and handle it
>> from there.
>
> That is how it works!  The host controller has to decide how to handle
> suspend / resume.

OK.

> cqhci_suspend() / cqhci_resume() are helper functions that the host
> controller can use, but doesn't have to.

OK.

>> Why would CQE hosts need special accessors and the rest
>> of the host not need it?
>
> Special accessors can be used to fix up registers that don't work exactly
> the way the standard specified.

Yeah this is fine as it is for CQHCI, I didn't get that
part :)

>> ->enable and ->disable() for just CQE seem reasonable.
>> But that leaves just two new ops.
>>
>> So why not just put .cqe_enable() and .cqe_disable()
>> ops into mmc_host_ops as optional and be done with it?
>
> Ok so you are not understanding this at all.

No I did not get it. But I do now (I think).

> As a CQE implementation, CQHCI interfaces with the upper layers through the
> CQE ops etc.
>
> But CQHCI also has to work with any host controller driver, so it needs an
> interface for that, which is what cqhci_host_ops is for.  All the ops serve
> useful purposes.
(...)
> The whole point is to prove a library that can work with any host controller
> driver.  That means it must provide functions and callbacks.

OK

>> I think the above approach to put any CQE-specific callbacks
>> directly into the struct mmc_host_ops is way more viable.
>
> Nothing to do with CQE.  This is CQHCI.  Please try to get the difference.

I am trying, please try to think about your language.

>> If special CQE init is needed, why a special cqhci_init()
>> call? And cqhci_pltfm_init()? It's confusing. Can't
>> you just call this by default from the core if the host is
>> CQE capable? Ass a .cqhci_init() callback into mmc_host_ops
>> if need be.
>
> Yeah, so CQHCI is just one of theoretically any number of CQE
> implementations.  This has nothing to do with the core.  It is entirely up
> to the host driver.  cqhci_pltfm_init() allows the mmio space to be defined
> by platform resources, whereas cqhci_init() does all the rest of the
> initialization.

It's fair.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion
  2017-11-09  7:27     ` Adrian Hunter
@ 2017-11-09 12:34       ` Linus Walleij
  2017-11-09 15:33         ` Adrian Hunter
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Walleij @ 2017-11-09 12:34 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Thu, Nov 9, 2017 at 8:27 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 08/11/17 11:28, Linus Walleij wrote:
>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>>> For blk-mq, add support for completing requests directly in the ->done
>>> callback. That means that error handling and urgent background operations
>>> must be handled by recovery_work in that case.
>>>
>>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>>
>> I tried enabling this on my MMC host (mmci) but I got weird
>> DMA error messages when I did.
>>
>> I guess this has not been tested on a non-DMA-coherent
>> system?
>
> I don't see what DMA-coherence has to do with anything.
>
> Possibilities:
>         - DMA unmapping doesn't work in an atomic context
>         - requests' DMA operations have to be synchronized with each other

So since MMCI need the post_req() hook called with
an error code to properly tear down any DMA operations,
I was worried that maybe your error path is not doing this
(passing an error code or calling in the right order).

I had a bunch of fallouts in my own patch set relating
to that.

>> I think I might be seeing this because the .pre and .post
>> callbacks need to be strictly sequenced, and this is
>> maybe not taken into account here?
>
> I looked at mmci but that did not seem to be the case.
>
>> Isn't there as risk
>> that the .post callback of the next request is called before
>> the .post callback of the previous request has returned
>> for example?
>
> Of course, the requests are treated as independent.  If the separate DMA
> operations require synchronization, that is for the host driver to fix.

They are treated as independent by the block layer but
it is the subsystems duty to serialize them for the hardware,

MMCI strictly requires that pre/post hooks per request
happen in the right order, so if you have prepared a second
request after submitting the first, and the first fails, you have
to back out by unpreparing the second one before unpreparing
the first. It is also the only host driver requireing to be passed
an error code in the last parameter to the post hook in
order to work properly.

I think your patch set handles that nicely though, because I
haven't seen any errors, it's just when we do this direct
completion I see problems.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 04/10] mmc: block: Add CQE support
  2017-11-09 12:04       ` Linus Walleij
@ 2017-11-09 12:39         ` Adrian Hunter
  0 siblings, 0 replies; 55+ messages in thread
From: Adrian Hunter @ 2017-11-09 12:39 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 09/11/17 14:04, Linus Walleij wrote:
> On Wed, Nov 8, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> On 08/11/17 11:00, Linus Walleij wrote:
> 
>>> This and other bits gives me the feeling CQE is now actually ONLY
>>> working on the MQ path.
>>
>> I was not allowed to support non-mq.
> 
> Fair enough.
> 
>>> That is good. We only add new functionality on the MQ path,
>>> yay!
>>>
>>> But this fact (only abailable iff MQ==true) should at least be
>>> mentioned in the commit message I think?
>>
>> Why?  CQE is MQ only.
> 
> So if you read what I say, I think the commit message should
> say that CQE is MQ only so that people know that CQE is
> MQ only.

Alright

> 
>>> So why not ditch the old block layer or at least make MQ default?
>>
>> CQE is MQ only.
> 
> Yeah? So why keep it around for everything else?

Never said we should keep it around.  As soon as blk-mq is ready and tested,
delete it.

> 
>>> When you keep it like this people have to reconfigure
>>> their kernel to enable MQ before they see the benefits of MQ+CQE
>>> combined, I think that should rather be the default experience.
>>
>> Not at all.  I guess you are confusing the legacy mmc with CQE.  CQE is not
>> a layer on top of legacy mmc.  It is an alternative to legacy mmc.  CQE
>> does not sit on top of the legacy mmc blk-mq support.  You don't have to
>> enable legacy mmc blk-mq support to use CQE.
> 
> Now I am confused. I can't parse the last sentence. There is no
> such thing as legcay blk-mq?

Don't need non-CQE mmc blk-mq support for CQE support.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 10/10] mmc: block: blk-mq: Stop using legacy recovery
  2017-11-09  7:43     ` Adrian Hunter
@ 2017-11-09 12:45       ` Linus Walleij
  0 siblings, 0 replies; 55+ messages in thread
From: Linus Walleij @ 2017-11-09 12:45 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Thu, Nov 9, 2017 at 8:43 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 08/11/17 11:38, Linus Walleij wrote:
>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>>> There are only a few things the recovery needs to do. Primarily, it just
>>> needs to:
>>>         Determine the number of bytes transferred
>>>         Get the card back to transfer state
>>>         Determine whether to retry
>>>
>>> There are also a couple of additional features:
>>>         Reset the card before the last retry
>>>         Read one sector at a time
>>>
>>> The legacy code spent much effort analyzing command errors, but commands
>>> fail fast, so it is simpler just to give all command errors the same number
>>> of retries.
>>>
>>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>>
>> I have nothing against the patch as such. In fact something
>> like this makes a lot of sense (to me).
>>
>> But this just makes mmc_blk_rw_recovery() look really nice.
>>
>> And leaves a very ugly mmc_blk_issue_rw_rq() with the legacy
>> error handling in-tree.
>>
>> The former function isn't even named with some *mq* infix
>> making it clear that the new recovery path only happens
>> in the MQ case.
>>
>> If newcomers read this code in the MMC stack they will
>> just tear their hair, scream and run away. Even faster than
>> before.
>>
>> How are they supposed to know which functions are used on
>> which path? Run ftrace?
>
> You're kidding me right?  You don't know how to find where a function used?

What I mean is that there are now several functions in the same
file doing similar things, and it is pretty hard for a newcomer
already (IMO) to understand how the MMC/SD stack works.

This phenomenon of code complexity is not just making me
frustrated but also you, because you get annoyed that people
like me don't "just get it".

>> This illustrates firmly why we need to refactor and/or kill off
>> the old block layer interface *first* then add MQ on top.
>
> No it doesn't!  You are playing games!

You need to stop your snarky and undfriendly attitude.

Look Adrian: you have one (1) person reviewing your patches.
That is me.

I think need to quote Documentation/process/6.Followthrough.rst
in verbatim (it's an awesome piece of text!)

----8<--------8<-------8<------

Working with reviewers
----------------------

A patch of any significance will result in a number of comments from other
developers as they review the code.  Working with reviewers can be, for
many developers, the most intimidating part of the kernel development
process.  Life can be made much easier, though, if you keep a few things in
mind:

 - If you have explained your patch well, reviewers will understand its
   value and why you went to the trouble of writing it.  But that value
   will not keep them from asking a fundamental question: what will it be
   like to maintain a kernel with this code in it five or ten years later?
   Many of the changes you may be asked to make - from coding style tweaks
   to substantial rewrites - come from the understanding that Linux will
   still be around and under development a decade from now.

 - Code review is hard work, and it is a relatively thankless occupation;
   people remember who wrote kernel code, but there is little lasting fame
   for those who reviewed it.  So reviewers can get grumpy, especially when
   they see the same mistakes being made over and over again.  If you get a
   review which seems angry, insulting, or outright offensive, resist the
   impulse to respond in kind.  Code review is about the code, not about
   the people, and code reviewers are not attacking you personally.

 - Similarly, code reviewers are not trying to promote their employers'
   agendas at the expense of your own.  Kernel developers often expect to
   be working on the kernel years from now, but they understand that their
   employer could change.  They truly are, almost without exception,
   working toward the creation of the best kernel they can; they are not
   trying to create discomfort for their employers' competitors.

What all of this comes down to is that, when reviewers send you comments,
you need to pay attention to the technical observations that they are
making.  Do not let their form of expression or your own pride keep that
from happening.  When you get review comments on a patch, take the time to
understand what the reviewer is trying to say.  If possible, fix the things
that the reviewer is asking you to fix.  And respond back to the reviewer:
thank them, and describe how you will answer their questions.

----8<--------8<-------8<------

> One function could be named
> differently, so that is evidence the whole patch set should be ignored.

No I was not pointing to that at all.

> The old code is rubbish.  There is nothing worth keeping.  Churning it
> around is a waste of everybody's time.  Review and test the new code.
> Delete the old code.  Much much simpler!

Yeah, but you do not delete the old code in your patch
set so why are you are saying this?

The essence of my response to patch 3 was exactly that if
you have this "my way or the highway" (i.e. delete all the old
code paths) attitude to the code (which I kind of understand)
Then go all in and actually delete it.

As it is, here, at the end of the patch set, the old legacy block
path and combersome error handling still exists.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery
  2017-11-09  7:56     ` Adrian Hunter
@ 2017-11-09 12:52       ` Linus Walleij
  2017-11-09 13:02         ` Adrian Hunter
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Walleij @ 2017-11-09 12:52 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Thu, Nov 9, 2017 at 8:56 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 08/11/17 11:30, Linus Walleij wrote:
>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>>> Recovery is simpler to understand if it is only used for errors. Create a
>>> separate function for card polling.
>>>
>>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>>
>> This looks good but I can't see why it's not folded into
>> patch 3 already. This error handling is introduced there.
>
> What are you on about?

You are attacking your most valuable resource, a reviewer.

And I even said the patch looks good.

The only thing you attain with this kind of langauge is alienante
me and discourage others to review your patch set. You also
give your employer a bad name, since you are representing
them.

> If we're going to split up the patches (which I
> argued against - the new code is all new, so it could be read independently
> from the old mess) then this is a logically distinct step.  Polling and
> error-recovery are conceptually different things and it is important to
> separate them to make the code easier to understand.

I understand it can be tough to deal with review comments
and it can make you loose your temper when people (sometimes
even the same person!) say contradictory things.

But in hindsight, don't you think these 5 last lines of your message
had been enough without that first line?

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host
  2017-11-09 12:26       ` Linus Walleij
@ 2017-11-09 12:55         ` Adrian Hunter
  2017-11-10  8:29           ` Linus Walleij
  0 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-09 12:55 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 09/11/17 14:26, Linus Walleij wrote:
> On Wed, Nov 8, 2017 at 3:14 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> On 08/11/17 11:22, Linus Walleij wrote:
>>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>>> (...)
> 
>>>> +EXPORT_SYMBOL(cqhci_resume);
>>>
>>> Why would the CQE case require special suspend/resume
>>> functionality?
>>
>> Seems like a very strange question.
> 
> Please realize that patch review is partly about education.
> 
> Educating me or anyone about your patch set involves
> being humbe and not seeing your peers as lesser.
> 
> Making your reviewer feel stupid by saying the ask
> "strange" or outright stupid questions is not helping
> your cause.

Please try to forgive me for being rude, it is just frustration.

> 
>> Obviously CQHCI has to be configured
>> after suspend.
> 
> Yeah. I think I misunderstood is such that:
> 
>> Also please don't confuse CQE and CQHCI.  CQHCI is an implementation of a
>> CQE.  We currently do not expect to have another implementation, but it is
>> not impossible.
> 
> OK now you educated me, see it's not that hard without
> using belitteling language.
> 
>>> This seems two much like on the side CQE-silo engineering,
>>> just use the device .[runtime]_suspend/resume callbacks like
>>> everyone else, make it possible for the host to figure out
>>> if it is in CQE mode or not (I guess it should know already
>>> since cqhci .enable() has been called?) and handle it
>>> from there.
>>
>> That is how it works!  The host controller has to decide how to handle
>> suspend / resume.
> 
> OK.
> 
>> cqhci_suspend() / cqhci_resume() are helper functions that the host
>> controller can use, but doesn't have to.
> 
> OK.
> 
>>> Why would CQE hosts need special accessors and the rest
>>> of the host not need it?
>>
>> Special accessors can be used to fix up registers that don't work exactly
>> the way the standard specified.
> 
> Yeah this is fine as it is for CQHCI, I didn't get that
> part :)
> 
>>> ->enable and ->disable() for just CQE seem reasonable.
>>> But that leaves just two new ops.
>>>
>>> So why not just put .cqe_enable() and .cqe_disable()
>>> ops into mmc_host_ops as optional and be done with it?
>>
>> Ok so you are not understanding this at all.
> 
> No I did not get it. But I do now (I think).
> 
>> As a CQE implementation, CQHCI interfaces with the upper layers through the
>> CQE ops etc.
>>
>> But CQHCI also has to work with any host controller driver, so it needs an
>> interface for that, which is what cqhci_host_ops is for.  All the ops serve
>> useful purposes.
> (...)
>> The whole point is to prove a library that can work with any host controller
>> driver.  That means it must provide functions and callbacks.
> 
> OK
> 
>>> I think the above approach to put any CQE-specific callbacks
>>> directly into the struct mmc_host_ops is way more viable.
>>
>> Nothing to do with CQE.  This is CQHCI.  Please try to get the difference.
> 
> I am trying, please try to think about your language.

I strongly disapprove of being rude but sadly it seems to get results.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery
  2017-11-09 12:52       ` Linus Walleij
@ 2017-11-09 13:02         ` Adrian Hunter
  2017-11-10  8:25           ` Linus Walleij
  0 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-09 13:02 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 09/11/17 14:52, Linus Walleij wrote:
> On Thu, Nov 9, 2017 at 8:56 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> On 08/11/17 11:30, Linus Walleij wrote:
>>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>>
>>>> Recovery is simpler to understand if it is only used for errors. Create a
>>>> separate function for card polling.
>>>>
>>>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>>>
>>> This looks good but I can't see why it's not folded into
>>> patch 3 already. This error handling is introduced there.
>>
>> What are you on about?
> 
> You are attacking your most valuable resource, a reviewer.
> 
> And I even said the patch looks good.
> 
> The only thing you attain with this kind of langauge is alienante
> me and discourage others to review your patch set. You also
> give your employer a bad name, since you are representing
> them.

6 months of being messed around will do that.

>> If we're going to split up the patches (which I
>> argued against - the new code is all new, so it could be read independently
>> from the old mess) then this is a logically distinct step.  Polling and
>> error-recovery are conceptually different things and it is important to
>> separate them to make the code easier to understand.
> 
> I understand it can be tough to deal with review comments
> and it can make you loose your temper when people (sometimes
> even the same person!) say contradictory things.
> 
> But in hindsight, don't you think these 5 last lines of your message
> had been enough without that first line?

Very true.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion
  2017-11-03 13:20 ` [PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion Adrian Hunter
  2017-11-08  9:28   ` Linus Walleij
@ 2017-11-09 13:07   ` Ulf Hansson
  2017-11-09 13:15     ` Adrian Hunter
  1 sibling, 1 reply; 55+ messages in thread
From: Ulf Hansson @ 2017-11-09 13:07 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

On 3 November 2017 at 14:20, Adrian Hunter <adrian.hunter@intel.com> wrote:
> For blk-mq, add support for completing requests directly in the ->done
> callback. That means that error handling and urgent background operations
> must be handled by recovery_work in that case.

As the mmc docs sucks, I think it's important that we elaborate a bit
more on the constraints this has on the host driver, here in the
changelog.

Something along the lines, "Using MMC_CAP_DIRECT_COMPLETE requires the
host driver, when calling mmc_request_done(), to cope with that its
->post_req() callback may be called immediately from the same context,
etc.."

Otherwise this looks good to me.

Kind regards
Uffe

>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  drivers/mmc/core/block.c | 100 +++++++++++++++++++++++++++++++++++++++++------
>  drivers/mmc/core/block.h |   1 +
>  drivers/mmc/core/queue.c |   5 ++-
>  drivers/mmc/core/queue.h |   6 +++
>  include/linux/mmc/host.h |   1 +
>  5 files changed, 101 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index e8be17152884..cbb4b35a592d 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -2086,6 +2086,22 @@ static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
>         }
>  }
>
> +static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq)
> +{
> +       mmc_blk_eval_resp_error(brq);
> +
> +       return brq->sbc.error || brq->cmd.error || brq->stop.error ||
> +              brq->data.error || brq->cmd.resp[0] & CMD_ERRORS;
> +}
> +
> +static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq,
> +                                           struct request *req)
> +{
> +       int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
> +
> +       mmc_blk_reset_success(mq->blkdata, type);
> +}
> +
>  static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req)
>  {
>         struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> @@ -2167,14 +2183,43 @@ static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req)
>         if (host->ops->post_req)
>                 host->ops->post_req(host, mrq, 0);
>
> -       blk_mq_complete_request(req);
> +       /*
> +        * Block layer timeouts race with completions which means the normal
> +        * completion path cannot be used during recovery.
> +        */
> +       if (mq->in_recovery)
> +               mmc_blk_mq_complete_rq(mq, req);
> +       else
> +               blk_mq_complete_request(req);
>
>         mmc_blk_mq_acct_req_done(mq, req);
>  }
>
> +void mmc_blk_mq_recovery(struct mmc_queue *mq)
> +{
> +       struct request *req = mq->recovery_req;
> +       struct mmc_host *host = mq->card->host;
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +
> +       mq->recovery_req = NULL;
> +       mq->rw_wait = false;
> +
> +       if (mmc_blk_rq_error(&mqrq->brq)) {
> +               mmc_retune_hold_now(host);
> +               mmc_blk_rw_recovery(mq, req);
> +       }
> +
> +       mmc_blk_urgent_bkops(mq, mqrq);
> +
> +       mmc_blk_mq_post_req(mq, req);
> +}
> +
>  static void mmc_blk_mq_complete_prev_req(struct mmc_queue *mq,
>                                          struct request **prev_req)
>  {
> +       if (mmc_queue_direct_complete(mq->card->host))
> +               return;
> +
>         mutex_lock(&mq->complete_lock);
>
>         if (!mq->complete_req)
> @@ -2208,19 +2253,43 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq)
>         struct request *req = mmc_queue_req_to_req(mqrq);
>         struct request_queue *q = req->q;
>         struct mmc_queue *mq = q->queuedata;
> +       struct mmc_host *host = mq->card->host;
>         unsigned long flags;
> -       bool waiting;
>
> -       spin_lock_irqsave(q->queue_lock, flags);
> -       mq->complete_req = req;
> -       mq->rw_wait = false;
> -       waiting = mq->waiting;
> -       spin_unlock_irqrestore(q->queue_lock, flags);
> +       if (!mmc_queue_direct_complete(host)) {
> +               bool waiting;
> +
> +               spin_lock_irqsave(q->queue_lock, flags);
> +               mq->complete_req = req;
> +               mq->rw_wait = false;
> +               waiting = mq->waiting;
> +               spin_unlock_irqrestore(q->queue_lock, flags);
> +
> +               if (waiting)
> +                       wake_up(&mq->wait);
> +               else
> +                       kblockd_schedule_work(&mq->complete_work);
>
> -       if (waiting)
> +               return;
> +       }
> +
> +       if (mmc_blk_rq_error(&mqrq->brq) ||
> +           mmc_blk_urgent_bkops_needed(mq, mqrq)) {
> +               spin_lock_irqsave(q->queue_lock, flags);
> +               mq->recovery_needed = true;
> +               mq->recovery_req = req;
> +               spin_unlock_irqrestore(q->queue_lock, flags);
>                 wake_up(&mq->wait);
> -       else
> -               kblockd_schedule_work(&mq->complete_work);
> +               schedule_work(&mq->recovery_work);
> +               return;
> +       }
> +
> +       mmc_blk_rw_reset_success(mq, req);
> +
> +       mq->rw_wait = false;
> +       wake_up(&mq->wait);
> +
> +       mmc_blk_mq_post_req(mq, req);
>  }
>
>  static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
> @@ -2230,7 +2299,12 @@ static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
>         bool done;
>
>         spin_lock_irqsave(q->queue_lock, flags);
> -       done = !mq->rw_wait;
> +       if (mq->recovery_needed) {
> +               *err = -EBUSY;
> +               done = true;
> +       } else {
> +               done = !mq->rw_wait;
> +       }
>         mq->waiting = !done;
>         spin_unlock_irqrestore(q->queue_lock, flags);
>
> @@ -2277,6 +2351,10 @@ static int mmc_blk_mq_issue_rw_rq(struct mmc_queue *mq,
>         if (err)
>                 mq->rw_wait = false;
>
> +       /* Release re-tuning here where there is no synchronization required */
> +       if (mmc_queue_direct_complete(host))
> +               mmc_retune_release(host);
> +
>  out_post_req:
>         if (err && host->ops->post_req)
>                 host->ops->post_req(host, &mqrq->brq.mrq, err);
> diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h
> index 6c0e98c1af71..5ad22c1c0318 100644
> --- a/drivers/mmc/core/block.h
> +++ b/drivers/mmc/core/block.h
> @@ -12,6 +12,7 @@
>
>  enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req);
>  void mmc_blk_mq_complete(struct request *req);
> +void mmc_blk_mq_recovery(struct mmc_queue *mq);
>
>  struct work_struct;
>
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 971f97698866..bcba2995c767 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -165,7 +165,10 @@ static void mmc_mq_recovery_handler(struct work_struct *work)
>
>         mq->in_recovery = true;
>
> -       mmc_blk_cqe_recovery(mq);
> +       if (mq->use_cqe)
> +               mmc_blk_cqe_recovery(mq);
> +       else
> +               mmc_blk_mq_recovery(mq);
>
>         mq->in_recovery = false;
>
> diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
> index f05b5a9d2f87..9bbfbb1fad7b 100644
> --- a/drivers/mmc/core/queue.h
> +++ b/drivers/mmc/core/queue.h
> @@ -102,6 +102,7 @@ struct mmc_queue {
>         bool                    waiting;
>         struct work_struct      recovery_work;
>         wait_queue_head_t       wait;
> +       struct request          *recovery_req;
>         struct request          *complete_req;
>         struct mutex            complete_lock;
>         struct work_struct      complete_work;
> @@ -133,4 +134,9 @@ static inline int mmc_cqe_qcnt(struct mmc_queue *mq)
>                mq->in_flight[MMC_ISSUE_ASYNC];
>  }
>
> +static inline bool mmc_queue_direct_complete(struct mmc_host *host)
> +{
> +       return host->caps & MMC_CAP_DIRECT_COMPLETE;
> +}
> +
>  #endif
> diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
> index ce2075d6f429..4b68a95a8818 100644
> --- a/include/linux/mmc/host.h
> +++ b/include/linux/mmc/host.h
> @@ -324,6 +324,7 @@ struct mmc_host {
>  #define MMC_CAP_DRIVER_TYPE_A  (1 << 23)       /* Host supports Driver Type A */
>  #define MMC_CAP_DRIVER_TYPE_C  (1 << 24)       /* Host supports Driver Type C */
>  #define MMC_CAP_DRIVER_TYPE_D  (1 << 25)       /* Host supports Driver Type D */
> +#define MMC_CAP_DIRECT_COMPLETE        (1 << 27)       /* RW reqs can be completed within mmc_request_done() */
>  #define MMC_CAP_CD_WAKE                (1 << 28)       /* Enable card detect wake */
>  #define MMC_CAP_CMD_DURING_TFR (1 << 29)       /* Commands during data transfer */
>  #define MMC_CAP_CMD23          (1 << 30)       /* CMD23 supported. */
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion
  2017-11-09 13:07   ` Ulf Hansson
@ 2017-11-09 13:15     ` Adrian Hunter
  0 siblings, 0 replies; 55+ messages in thread
From: Adrian Hunter @ 2017-11-09 13:15 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

On 09/11/17 15:07, Ulf Hansson wrote:
> On 3 November 2017 at 14:20, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> For blk-mq, add support for completing requests directly in the ->done
>> callback. That means that error handling and urgent background operations
>> must be handled by recovery_work in that case.
> 
> As the mmc docs sucks, I think it's important that we elaborate a bit
> more on the constraints this has on the host driver, here in the
> changelog.
> 
> Something along the lines, "Using MMC_CAP_DIRECT_COMPLETE requires the
> host driver, when calling mmc_request_done(), to cope with that its
> ->post_req() callback may be called immediately from the same context,
> etc.."

Yes, I will expand it.  It is also a stepping stone to getting to support
for issuing the next request from the ->done() callback.

> 
> Otherwise this looks good to me.
> 
> Kind regards
> Uffe
> 
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  drivers/mmc/core/block.c | 100 +++++++++++++++++++++++++++++++++++++++++------
>>  drivers/mmc/core/block.h |   1 +
>>  drivers/mmc/core/queue.c |   5 ++-
>>  drivers/mmc/core/queue.h |   6 +++
>>  include/linux/mmc/host.h |   1 +
>>  5 files changed, 101 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
>> index e8be17152884..cbb4b35a592d 100644
>> --- a/drivers/mmc/core/block.c
>> +++ b/drivers/mmc/core/block.c
>> @@ -2086,6 +2086,22 @@ static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
>>         }
>>  }
>>
>> +static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq)
>> +{
>> +       mmc_blk_eval_resp_error(brq);
>> +
>> +       return brq->sbc.error || brq->cmd.error || brq->stop.error ||
>> +              brq->data.error || brq->cmd.resp[0] & CMD_ERRORS;
>> +}
>> +
>> +static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq,
>> +                                           struct request *req)
>> +{
>> +       int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
>> +
>> +       mmc_blk_reset_success(mq->blkdata, type);
>> +}
>> +
>>  static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req)
>>  {
>>         struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> @@ -2167,14 +2183,43 @@ static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req)
>>         if (host->ops->post_req)
>>                 host->ops->post_req(host, mrq, 0);
>>
>> -       blk_mq_complete_request(req);
>> +       /*
>> +        * Block layer timeouts race with completions which means the normal
>> +        * completion path cannot be used during recovery.
>> +        */
>> +       if (mq->in_recovery)
>> +               mmc_blk_mq_complete_rq(mq, req);
>> +       else
>> +               blk_mq_complete_request(req);
>>
>>         mmc_blk_mq_acct_req_done(mq, req);
>>  }
>>
>> +void mmc_blk_mq_recovery(struct mmc_queue *mq)
>> +{
>> +       struct request *req = mq->recovery_req;
>> +       struct mmc_host *host = mq->card->host;
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +
>> +       mq->recovery_req = NULL;
>> +       mq->rw_wait = false;
>> +
>> +       if (mmc_blk_rq_error(&mqrq->brq)) {
>> +               mmc_retune_hold_now(host);
>> +               mmc_blk_rw_recovery(mq, req);
>> +       }
>> +
>> +       mmc_blk_urgent_bkops(mq, mqrq);
>> +
>> +       mmc_blk_mq_post_req(mq, req);
>> +}
>> +
>>  static void mmc_blk_mq_complete_prev_req(struct mmc_queue *mq,
>>                                          struct request **prev_req)
>>  {
>> +       if (mmc_queue_direct_complete(mq->card->host))
>> +               return;
>> +
>>         mutex_lock(&mq->complete_lock);
>>
>>         if (!mq->complete_req)
>> @@ -2208,19 +2253,43 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq)
>>         struct request *req = mmc_queue_req_to_req(mqrq);
>>         struct request_queue *q = req->q;
>>         struct mmc_queue *mq = q->queuedata;
>> +       struct mmc_host *host = mq->card->host;
>>         unsigned long flags;
>> -       bool waiting;
>>
>> -       spin_lock_irqsave(q->queue_lock, flags);
>> -       mq->complete_req = req;
>> -       mq->rw_wait = false;
>> -       waiting = mq->waiting;
>> -       spin_unlock_irqrestore(q->queue_lock, flags);
>> +       if (!mmc_queue_direct_complete(host)) {
>> +               bool waiting;
>> +
>> +               spin_lock_irqsave(q->queue_lock, flags);
>> +               mq->complete_req = req;
>> +               mq->rw_wait = false;
>> +               waiting = mq->waiting;
>> +               spin_unlock_irqrestore(q->queue_lock, flags);
>> +
>> +               if (waiting)
>> +                       wake_up(&mq->wait);
>> +               else
>> +                       kblockd_schedule_work(&mq->complete_work);
>>
>> -       if (waiting)
>> +               return;
>> +       }
>> +
>> +       if (mmc_blk_rq_error(&mqrq->brq) ||
>> +           mmc_blk_urgent_bkops_needed(mq, mqrq)) {
>> +               spin_lock_irqsave(q->queue_lock, flags);
>> +               mq->recovery_needed = true;
>> +               mq->recovery_req = req;
>> +               spin_unlock_irqrestore(q->queue_lock, flags);
>>                 wake_up(&mq->wait);
>> -       else
>> -               kblockd_schedule_work(&mq->complete_work);
>> +               schedule_work(&mq->recovery_work);
>> +               return;
>> +       }
>> +
>> +       mmc_blk_rw_reset_success(mq, req);
>> +
>> +       mq->rw_wait = false;
>> +       wake_up(&mq->wait);
>> +
>> +       mmc_blk_mq_post_req(mq, req);
>>  }
>>
>>  static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
>> @@ -2230,7 +2299,12 @@ static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
>>         bool done;
>>
>>         spin_lock_irqsave(q->queue_lock, flags);
>> -       done = !mq->rw_wait;
>> +       if (mq->recovery_needed) {
>> +               *err = -EBUSY;
>> +               done = true;
>> +       } else {
>> +               done = !mq->rw_wait;
>> +       }
>>         mq->waiting = !done;
>>         spin_unlock_irqrestore(q->queue_lock, flags);
>>
>> @@ -2277,6 +2351,10 @@ static int mmc_blk_mq_issue_rw_rq(struct mmc_queue *mq,
>>         if (err)
>>                 mq->rw_wait = false;
>>
>> +       /* Release re-tuning here where there is no synchronization required */
>> +       if (mmc_queue_direct_complete(host))
>> +               mmc_retune_release(host);
>> +
>>  out_post_req:
>>         if (err && host->ops->post_req)
>>                 host->ops->post_req(host, &mqrq->brq.mrq, err);
>> diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h
>> index 6c0e98c1af71..5ad22c1c0318 100644
>> --- a/drivers/mmc/core/block.h
>> +++ b/drivers/mmc/core/block.h
>> @@ -12,6 +12,7 @@
>>
>>  enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req);
>>  void mmc_blk_mq_complete(struct request *req);
>> +void mmc_blk_mq_recovery(struct mmc_queue *mq);
>>
>>  struct work_struct;
>>
>> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
>> index 971f97698866..bcba2995c767 100644
>> --- a/drivers/mmc/core/queue.c
>> +++ b/drivers/mmc/core/queue.c
>> @@ -165,7 +165,10 @@ static void mmc_mq_recovery_handler(struct work_struct *work)
>>
>>         mq->in_recovery = true;
>>
>> -       mmc_blk_cqe_recovery(mq);
>> +       if (mq->use_cqe)
>> +               mmc_blk_cqe_recovery(mq);
>> +       else
>> +               mmc_blk_mq_recovery(mq);
>>
>>         mq->in_recovery = false;
>>
>> diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
>> index f05b5a9d2f87..9bbfbb1fad7b 100644
>> --- a/drivers/mmc/core/queue.h
>> +++ b/drivers/mmc/core/queue.h
>> @@ -102,6 +102,7 @@ struct mmc_queue {
>>         bool                    waiting;
>>         struct work_struct      recovery_work;
>>         wait_queue_head_t       wait;
>> +       struct request          *recovery_req;
>>         struct request          *complete_req;
>>         struct mutex            complete_lock;
>>         struct work_struct      complete_work;
>> @@ -133,4 +134,9 @@ static inline int mmc_cqe_qcnt(struct mmc_queue *mq)
>>                mq->in_flight[MMC_ISSUE_ASYNC];
>>  }
>>
>> +static inline bool mmc_queue_direct_complete(struct mmc_host *host)
>> +{
>> +       return host->caps & MMC_CAP_DIRECT_COMPLETE;
>> +}
>> +
>>  #endif
>> diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
>> index ce2075d6f429..4b68a95a8818 100644
>> --- a/include/linux/mmc/host.h
>> +++ b/include/linux/mmc/host.h
>> @@ -324,6 +324,7 @@ struct mmc_host {
>>  #define MMC_CAP_DRIVER_TYPE_A  (1 << 23)       /* Host supports Driver Type A */
>>  #define MMC_CAP_DRIVER_TYPE_C  (1 << 24)       /* Host supports Driver Type C */
>>  #define MMC_CAP_DRIVER_TYPE_D  (1 << 25)       /* Host supports Driver Type D */
>> +#define MMC_CAP_DIRECT_COMPLETE        (1 << 27)       /* RW reqs can be completed within mmc_request_done() */
>>  #define MMC_CAP_CD_WAKE                (1 << 28)       /* Enable card detect wake */
>>  #define MMC_CAP_CMD_DURING_TFR (1 << 29)       /* Commands during data transfer */
>>  #define MMC_CAP_CMD23          (1 << 30)       /* CMD23 supported. */
>> --
>> 1.9.1
>>
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 09/10] mmc: block: blk-mq: Stop using card_busy_detect()
  2017-11-03 13:20 ` [PATCH V13 09/10] mmc: block: blk-mq: Stop using card_busy_detect() Adrian Hunter
@ 2017-11-09 13:36   ` Ulf Hansson
  2017-11-09 15:24     ` Adrian Hunter
  0 siblings, 1 reply; 55+ messages in thread
From: Ulf Hansson @ 2017-11-09 13:36 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

On 3 November 2017 at 14:20, Adrian Hunter <adrian.hunter@intel.com> wrote:
> card_busy_detect() doesn't set a correct timeout, and it doesn't take care
> of error status bits. Stop using it for blk-mq.

I think this changelog isn't very descriptive. Could you please work
on that for the next version.

>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  drivers/mmc/core/block.c | 117 +++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 109 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index 0c29b1d8d545..5c5ff3c34313 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -1426,15 +1426,18 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq,
>         }
>  }
>
> -#define CMD_ERRORS                                                     \
> -       (R1_OUT_OF_RANGE |      /* Command argument out of range */     \
> -        R1_ADDRESS_ERROR |     /* Misaligned address */                \
> +#define CMD_ERRORS_EXCL_OOR                                            \
> +       (R1_ADDRESS_ERROR |     /* Misaligned address */                \
>          R1_BLOCK_LEN_ERROR |   /* Transferred block length incorrect */\
>          R1_WP_VIOLATION |      /* Tried to write to protected block */ \
>          R1_CARD_ECC_FAILED |   /* Card ECC failed */                   \
>          R1_CC_ERROR |          /* Card controller error */             \
>          R1_ERROR)              /* General/unknown error */
>
> +#define CMD_ERRORS                                                     \
> +       (CMD_ERRORS_EXCL_OOR |                                          \
> +        R1_OUT_OF_RANGE)       /* Command argument out of range */     \
> +
>  static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq)
>  {
>         u32 val;
> @@ -1951,6 +1954,95 @@ static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req)
>         mqrq->retries = MMC_NO_RETRIES;
>  }
>
> +static inline bool mmc_blk_oor_valid(struct mmc_blk_request *brq)
> +{
> +       return !!brq->mrq.sbc;
> +}
> +
> +static inline u32 mmc_blk_stop_err_bits(struct mmc_blk_request *brq)
> +{
> +       return mmc_blk_oor_valid(brq) ? CMD_ERRORS : CMD_ERRORS_EXCL_OOR;
> +}
> +
> +static inline bool mmc_blk_in_tran_state(u32 status)
> +{
> +       /*
> +        * Some cards mishandle the status bits, so make sure to check both the
> +        * busy indication and the card state.
> +        */
> +       return status & R1_READY_FOR_DATA &&
> +              (R1_CURRENT_STATE(status) == R1_STATE_TRAN);
> +}
> +
> +static unsigned int mmc_blk_clock_khz(struct mmc_host *host)
> +{
> +       if (host->actual_clock)
> +               return host->actual_clock / 1000;
> +
> +       /* Clock may be subject to a divisor, fudge it by a factor of 2. */
> +       if (host->ios.clock)
> +               return host->ios.clock / 2000;
> +
> +       /* How can there be no clock */
> +       WARN_ON_ONCE(1);
> +       return 100; /* 100 kHz is minimum possible value */
> +}
> +
> +static unsigned long mmc_blk_data_timeout_jiffies(struct mmc_host *host,
> +                                                 struct mmc_data *data)
> +{
> +       unsigned int ms = DIV_ROUND_UP(data->timeout_ns, 1000000);
> +       unsigned int khz;
> +
> +       if (data->timeout_clks) {
> +               khz = mmc_blk_clock_khz(host);
> +               ms += DIV_ROUND_UP(data->timeout_clks, khz);
> +       }
> +
> +       return msecs_to_jiffies(ms);
> +}
> +
> +static int mmc_blk_card_stuck(struct mmc_card *card, struct request *req,
> +                             u32 *resp_errs)
> +{
> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> +       struct mmc_data *data = &mqrq->brq.data;
> +       unsigned long timeout;
> +       u32 status;
> +       int err;
> +
> +       timeout = jiffies + mmc_blk_data_timeout_jiffies(card->host, data);
> +
> +       while (1) {
> +               bool done = time_after(jiffies, timeout);
> +
> +               err = __mmc_send_status(card, &status, 5);
> +               if (err) {
> +                       pr_err("%s: error %d requesting status\n",
> +                              req->rq_disk->disk_name, err);
> +                       break;
> +               }
> +
> +               /* Accumulate any response error bits seen */
> +               if (resp_errs)
> +                       *resp_errs |= status;
> +
> +               if (mmc_blk_in_tran_state(status))
> +                       break;
> +
> +               /* Timeout if the device never becomes ready */
> +               if (done) {
> +                       pr_err("%s: Card stuck in wrong state! %s %s\n",
> +                               mmc_hostname(card->host),
> +                               req->rq_disk->disk_name, __func__);
> +                       err = -ETIMEDOUT;
> +                       break;
> +               }
> +       }
> +
> +       return err;
> +}

The new function here, mmc_blk_card_stuck() looks very similar to
card_busy_detect().

Why can't you instead fixup card_busy_detect() so it behaves like the
new mmc_blk_card_stuck(), rather than re-implementing most of it?

> +
>  static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
>  {
>         int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
> @@ -2097,17 +2189,26 @@ static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq)
>  static int mmc_blk_card_busy(struct mmc_card *card, struct request *req)
>  {
>         struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> -       bool gen_err = false;
> +       u32 status = 0;
>         int err;
>
>         if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ)
>                 return 0;
>
> -       err = card_busy_detect(card, MMC_BLK_TIMEOUT_MS, false, req, &gen_err);
> +       err = mmc_blk_card_stuck(card, req, &status);
> +
> +       /*
> +        * Do not assume data transferred correctly if there are any error bits
> +        * set.
> +        */
> +       if (!err && status & mmc_blk_stop_err_bits(&mqrq->brq)) {
> +               mqrq->brq.data.bytes_xfered = 0;
> +               err = -EIO;
> +       }
>
> -       /* Copy the general error bit so it will be seen later on */
> -       if (gen_err)
> -               mqrq->brq.stop.resp[0] |= R1_ERROR;
> +       /* Copy the exception bit so it will be seen later on */
> +       if (mmc_card_mmc(card) && status & R1_EXCEPTION_EVENT)
> +               mqrq->brq.cmd.resp[0] |= R1_EXCEPTION_EVENT;
>
>         return err;
>  }
> --
> 1.9.1
>

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK
  2017-11-03 13:20 ` [PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK Adrian Hunter
  2017-11-08  9:24   ` Linus Walleij
@ 2017-11-09 13:37   ` Ulf Hansson
  1 sibling, 0 replies; 55+ messages in thread
From: Ulf Hansson @ 2017-11-09 13:37 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

On 3 November 2017 at 14:20, Adrian Hunter <adrian.hunter@intel.com> wrote:
> Add CQHCI initialization and implement CQHCI operations for Intel GLK.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

This looks good to me!

Kind regards
Uffe

> ---
>  drivers/mmc/host/Kconfig          |   1 +
>  drivers/mmc/host/sdhci-pci-core.c | 155 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 155 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
> index 3092b7085cb5..2b02a9788bb6 100644
> --- a/drivers/mmc/host/Kconfig
> +++ b/drivers/mmc/host/Kconfig
> @@ -81,6 +81,7 @@ config MMC_SDHCI_BIG_ENDIAN_32BIT_BYTE_SWAPPER
>  config MMC_SDHCI_PCI
>         tristate "SDHCI support on PCI bus"
>         depends on MMC_SDHCI && PCI
> +       select MMC_CQHCI
>         help
>           This selects the PCI Secure Digital Host Controller Interface.
>           Most controllers found today are PCI devices.
> diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
> index 3e4f04fd5175..110c634cfb43 100644
> --- a/drivers/mmc/host/sdhci-pci-core.c
> +++ b/drivers/mmc/host/sdhci-pci-core.c
> @@ -30,6 +30,8 @@
>  #include <linux/mmc/sdhci-pci-data.h>
>  #include <linux/acpi.h>
>
> +#include "cqhci.h"
> +
>  #include "sdhci.h"
>  #include "sdhci-pci.h"
>
> @@ -116,6 +118,28 @@ int sdhci_pci_resume_host(struct sdhci_pci_chip *chip)
>
>         return 0;
>  }
> +
> +static int sdhci_cqhci_suspend(struct sdhci_pci_chip *chip)
> +{
> +       int ret;
> +
> +       ret = cqhci_suspend(chip->slots[0]->host->mmc);
> +       if (ret)
> +               return ret;
> +
> +       return sdhci_pci_suspend_host(chip);
> +}
> +
> +static int sdhci_cqhci_resume(struct sdhci_pci_chip *chip)
> +{
> +       int ret;
> +
> +       ret = sdhci_pci_resume_host(chip);
> +       if (ret)
> +               return ret;
> +
> +       return cqhci_resume(chip->slots[0]->host->mmc);
> +}
>  #endif
>
>  #ifdef CONFIG_PM
> @@ -166,8 +190,48 @@ static int sdhci_pci_runtime_resume_host(struct sdhci_pci_chip *chip)
>
>         return 0;
>  }
> +
> +static int sdhci_cqhci_runtime_suspend(struct sdhci_pci_chip *chip)
> +{
> +       int ret;
> +
> +       ret = cqhci_suspend(chip->slots[0]->host->mmc);
> +       if (ret)
> +               return ret;
> +
> +       return sdhci_pci_runtime_suspend_host(chip);
> +}
> +
> +static int sdhci_cqhci_runtime_resume(struct sdhci_pci_chip *chip)
> +{
> +       int ret;
> +
> +       ret = sdhci_pci_runtime_resume_host(chip);
> +       if (ret)
> +               return ret;
> +
> +       return cqhci_resume(chip->slots[0]->host->mmc);
> +}
>  #endif
>
> +static u32 sdhci_cqhci_irq(struct sdhci_host *host, u32 intmask)
> +{
> +       int cmd_error = 0;
> +       int data_error = 0;
> +
> +       if (!sdhci_cqe_irq(host, intmask, &cmd_error, &data_error))
> +               return intmask;
> +
> +       cqhci_irq(host->mmc, intmask, cmd_error, data_error);
> +
> +       return 0;
> +}
> +
> +static void sdhci_pci_dumpregs(struct mmc_host *mmc)
> +{
> +       sdhci_dumpregs(mmc_priv(mmc));
> +}
> +
>  /*****************************************************************************\
>   *                                                                           *
>   * Hardware specific quirk handling                                          *
> @@ -583,6 +647,18 @@ static void sdhci_intel_voltage_switch(struct sdhci_host *host)
>         .voltage_switch         = sdhci_intel_voltage_switch,
>  };
>
> +static const struct sdhci_ops sdhci_intel_glk_ops = {
> +       .set_clock              = sdhci_set_clock,
> +       .set_power              = sdhci_intel_set_power,
> +       .enable_dma             = sdhci_pci_enable_dma,
> +       .set_bus_width          = sdhci_set_bus_width,
> +       .reset                  = sdhci_reset,
> +       .set_uhs_signaling      = sdhci_set_uhs_signaling,
> +       .hw_reset               = sdhci_pci_hw_reset,
> +       .voltage_switch         = sdhci_intel_voltage_switch,
> +       .irq                    = sdhci_cqhci_irq,
> +};
> +
>  static void byt_read_dsm(struct sdhci_pci_slot *slot)
>  {
>         struct intel_host *intel_host = sdhci_pci_priv(slot);
> @@ -612,12 +688,80 @@ static int glk_emmc_probe_slot(struct sdhci_pci_slot *slot)
>  {
>         int ret = byt_emmc_probe_slot(slot);
>
> +       slot->host->mmc->caps2 |= MMC_CAP2_CQE;
> +
>         if (slot->chip->pdev->device != PCI_DEVICE_ID_INTEL_GLK_EMMC) {
>                 slot->host->mmc->caps2 |= MMC_CAP2_HS400_ES,
>                 slot->host->mmc_host_ops.hs400_enhanced_strobe =
>                                                 intel_hs400_enhanced_strobe;
> +               slot->host->mmc->caps2 |= MMC_CAP2_CQE_DCMD;
> +       }
> +
> +       return ret;
> +}
> +
> +static void glk_cqe_enable(struct mmc_host *mmc)
> +{
> +       struct sdhci_host *host = mmc_priv(mmc);
> +       u32 reg;
> +
> +       /*
> +        * CQE gets stuck if it sees Buffer Read Enable bit set, which can be
> +        * the case after tuning, so ensure the buffer is drained.
> +        */
> +       reg = sdhci_readl(host, SDHCI_PRESENT_STATE);
> +       while (reg & SDHCI_DATA_AVAILABLE) {
> +               sdhci_readl(host, SDHCI_BUFFER);
> +               reg = sdhci_readl(host, SDHCI_PRESENT_STATE);
> +       }
> +
> +       sdhci_cqe_enable(mmc);
> +}
> +
> +static const struct cqhci_host_ops glk_cqhci_ops = {
> +       .enable         = glk_cqe_enable,
> +       .disable        = sdhci_cqe_disable,
> +       .dumpregs       = sdhci_pci_dumpregs,
> +};
> +
> +static int glk_emmc_add_host(struct sdhci_pci_slot *slot)
> +{
> +       struct device *dev = &slot->chip->pdev->dev;
> +       struct sdhci_host *host = slot->host;
> +       struct cqhci_host *cq_host;
> +       bool dma64;
> +       int ret;
> +
> +       ret = sdhci_setup_host(host);
> +       if (ret)
> +               return ret;
> +
> +       cq_host = devm_kzalloc(dev, sizeof(*cq_host), GFP_KERNEL);
> +       if (!cq_host) {
> +               ret = -ENOMEM;
> +               goto cleanup;
>         }
>
> +       cq_host->mmio = host->ioaddr + 0x200;
> +       cq_host->quirks |= CQHCI_QUIRK_SHORT_TXFR_DESC_SZ;
> +       cq_host->ops = &glk_cqhci_ops;
> +
> +       dma64 = host->flags & SDHCI_USE_64_BIT_DMA;
> +       if (dma64)
> +               cq_host->caps |= CQHCI_TASK_DESC_SZ_128;
> +
> +       ret = cqhci_init(cq_host, host->mmc, dma64);
> +       if (ret)
> +               goto cleanup;
> +
> +       ret = __sdhci_add_host(host);
> +       if (ret)
> +               goto cleanup;
> +
> +       return 0;
> +
> +cleanup:
> +       sdhci_cleanup_host(host);
>         return ret;
>  }
>
> @@ -699,11 +843,20 @@ static int byt_sd_probe_slot(struct sdhci_pci_slot *slot)
>  static const struct sdhci_pci_fixes sdhci_intel_glk_emmc = {
>         .allow_runtime_pm       = true,
>         .probe_slot             = glk_emmc_probe_slot,
> +       .add_host               = glk_emmc_add_host,
> +#ifdef CONFIG_PM_SLEEP
> +       .suspend                = sdhci_cqhci_suspend,
> +       .resume                 = sdhci_cqhci_resume,
> +#endif
> +#ifdef CONFIG_PM
> +       .runtime_suspend        = sdhci_cqhci_runtime_suspend,
> +       .runtime_resume         = sdhci_cqhci_runtime_resume,
> +#endif
>         .quirks                 = SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC,
>         .quirks2                = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
>                                   SDHCI_QUIRK2_CAPS_BIT63_FOR_HS400 |
>                                   SDHCI_QUIRK2_STOP_WITH_TC,
> -       .ops                    = &sdhci_intel_byt_ops,
> +       .ops                    = &sdhci_intel_glk_ops,
>         .priv_size              = sizeof(struct intel_host),
>  };
>
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host
  2017-11-03 13:20 ` [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host Adrian Hunter
  2017-11-08  9:22   ` Linus Walleij
@ 2017-11-09 13:41   ` Ulf Hansson
  2017-11-09 14:20     ` Adrian Hunter
  1 sibling, 1 reply; 55+ messages in thread
From: Ulf Hansson @ 2017-11-09 13:41 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

On 3 November 2017 at 14:20, Adrian Hunter <adrian.hunter@intel.com> wrote:
> From: Venkat Gopalakrishnan <venkatg@codeaurora.org>
>
> This patch adds CMDQ support for command-queue compatible
> hosts.
>
> Command queue is added in eMMC-5.1 specification. This
> enables the controller to process upto 32 requests at
> a time.
>
> Adrian Hunter contributed renaming to cqhci, recovery, suspend
> and resume, cqhci_off, cqhci_wait_for_idle, and external timeout
> handling.
>
> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
> Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
> Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org>
> Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
> Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
> Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Overall this looks good to me.

However, I didn't see MAINTAINERS being updated. Is anybody above
volunteering to maintain cqhci.*?

Kind regards
Uffe

> ---
>  drivers/mmc/host/Kconfig  |   13 +
>  drivers/mmc/host/Makefile |    1 +
>  drivers/mmc/host/cqhci.c  | 1150 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/mmc/host/cqhci.h  |  240 ++++++++++
>  4 files changed, 1404 insertions(+)
>  create mode 100644 drivers/mmc/host/cqhci.c
>  create mode 100644 drivers/mmc/host/cqhci.h
>
> diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
> index 567028c9219a..3092b7085cb5 100644
> --- a/drivers/mmc/host/Kconfig
> +++ b/drivers/mmc/host/Kconfig
> @@ -857,6 +857,19 @@ config MMC_SUNXI
>           This selects support for the SD/MMC Host Controller on
>           Allwinner sunxi SoCs.
>
> +config MMC_CQHCI
> +       tristate "Command Queue Host Controller Interface support"
> +       depends on HAS_DMA
> +       help
> +         This selects the Command Queue Host Controller Interface (CQHCI)
> +         support present in host controllers of Qualcomm Technologies, Inc
> +         amongst others.
> +         This controller supports eMMC devices with command queue support.
> +
> +         If you have a controller with this interface, say Y or M here.
> +
> +         If unsure, say N.
> +
>  config MMC_TOSHIBA_PCI
>         tristate "Toshiba Type A SD/MMC Card Interface Driver"
>         depends on PCI
> diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile
> index ab61a3e39c0b..de140e3ef402 100644
> --- a/drivers/mmc/host/Makefile
> +++ b/drivers/mmc/host/Makefile
> @@ -91,6 +91,7 @@ obj-$(CONFIG_MMC_SDHCI_ST)            += sdhci-st.o
>  obj-$(CONFIG_MMC_SDHCI_MICROCHIP_PIC32)        += sdhci-pic32.o
>  obj-$(CONFIG_MMC_SDHCI_BRCMSTB)                += sdhci-brcmstb.o
>  obj-$(CONFIG_MMC_SDHCI_OMAP)           += sdhci-omap.o
> +obj-$(CONFIG_MMC_CQHCI)                        += cqhci.o
>
>  ifeq ($(CONFIG_CB710_DEBUG),y)
>         CFLAGS-cb710-mmc        += -DDEBUG
> diff --git a/drivers/mmc/host/cqhci.c b/drivers/mmc/host/cqhci.c
> new file mode 100644
> index 000000000000..159270e947cf
> --- /dev/null
> +++ b/drivers/mmc/host/cqhci.c
> @@ -0,0 +1,1150 @@
> +/* Copyright (c) 2015, The Linux Foundation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 and
> + * only version 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/delay.h>
> +#include <linux/highmem.h>
> +#include <linux/io.h>
> +#include <linux/module.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/slab.h>
> +#include <linux/scatterlist.h>
> +#include <linux/platform_device.h>
> +#include <linux/ktime.h>
> +
> +#include <linux/mmc/mmc.h>
> +#include <linux/mmc/host.h>
> +#include <linux/mmc/card.h>
> +
> +#include "cqhci.h"
> +
> +#define DCMD_SLOT 31
> +#define NUM_SLOTS 32
> +
> +struct cqhci_slot {
> +       struct mmc_request *mrq;
> +       unsigned int flags;
> +#define CQHCI_EXTERNAL_TIMEOUT BIT(0)
> +#define CQHCI_COMPLETED                BIT(1)
> +#define CQHCI_HOST_CRC         BIT(2)
> +#define CQHCI_HOST_TIMEOUT     BIT(3)
> +#define CQHCI_HOST_OTHER       BIT(4)
> +};
> +
> +static inline u8 *get_desc(struct cqhci_host *cq_host, u8 tag)
> +{
> +       return cq_host->desc_base + (tag * cq_host->slot_sz);
> +}
> +
> +static inline u8 *get_link_desc(struct cqhci_host *cq_host, u8 tag)
> +{
> +       u8 *desc = get_desc(cq_host, tag);
> +
> +       return desc + cq_host->task_desc_len;
> +}
> +
> +static inline dma_addr_t get_trans_desc_dma(struct cqhci_host *cq_host, u8 tag)
> +{
> +       return cq_host->trans_desc_dma_base +
> +               (cq_host->mmc->max_segs * tag *
> +                cq_host->trans_desc_len);
> +}
> +
> +static inline u8 *get_trans_desc(struct cqhci_host *cq_host, u8 tag)
> +{
> +       return cq_host->trans_desc_base +
> +               (cq_host->trans_desc_len * cq_host->mmc->max_segs * tag);
> +}
> +
> +static void setup_trans_desc(struct cqhci_host *cq_host, u8 tag)
> +{
> +       u8 *link_temp;
> +       dma_addr_t trans_temp;
> +
> +       link_temp = get_link_desc(cq_host, tag);
> +       trans_temp = get_trans_desc_dma(cq_host, tag);
> +
> +       memset(link_temp, 0, cq_host->link_desc_len);
> +       if (cq_host->link_desc_len > 8)
> +               *(link_temp + 8) = 0;
> +
> +       if (tag == DCMD_SLOT && (cq_host->mmc->caps2 & MMC_CAP2_CQE_DCMD)) {
> +               *link_temp = CQHCI_VALID(0) | CQHCI_ACT(0) | CQHCI_END(1);
> +               return;
> +       }
> +
> +       *link_temp = CQHCI_VALID(1) | CQHCI_ACT(0x6) | CQHCI_END(0);
> +
> +       if (cq_host->dma64) {
> +               __le64 *data_addr = (__le64 __force *)(link_temp + 4);
> +
> +               data_addr[0] = cpu_to_le64(trans_temp);
> +       } else {
> +               __le32 *data_addr = (__le32 __force *)(link_temp + 4);
> +
> +               data_addr[0] = cpu_to_le32(trans_temp);
> +       }
> +}
> +
> +static void cqhci_set_irqs(struct cqhci_host *cq_host, u32 set)
> +{
> +       cqhci_writel(cq_host, set, CQHCI_ISTE);
> +       cqhci_writel(cq_host, set, CQHCI_ISGE);
> +}
> +
> +#define DRV_NAME "cqhci"
> +
> +#define CQHCI_DUMP(f, x...) \
> +       pr_err("%s: " DRV_NAME ": " f, mmc_hostname(mmc), ## x)
> +
> +static void cqhci_dumpregs(struct cqhci_host *cq_host)
> +{
> +       struct mmc_host *mmc = cq_host->mmc;
> +
> +       CQHCI_DUMP("============ CQHCI REGISTER DUMP ===========\n");
> +
> +       CQHCI_DUMP("Caps:      0x%08x | Version:  0x%08x\n",
> +                  cqhci_readl(cq_host, CQHCI_CAP),
> +                  cqhci_readl(cq_host, CQHCI_VER));
> +       CQHCI_DUMP("Config:    0x%08x | Control:  0x%08x\n",
> +                  cqhci_readl(cq_host, CQHCI_CFG),
> +                  cqhci_readl(cq_host, CQHCI_CTL));
> +       CQHCI_DUMP("Int stat:  0x%08x | Int enab: 0x%08x\n",
> +                  cqhci_readl(cq_host, CQHCI_IS),
> +                  cqhci_readl(cq_host, CQHCI_ISTE));
> +       CQHCI_DUMP("Int sig:   0x%08x | Int Coal: 0x%08x\n",
> +                  cqhci_readl(cq_host, CQHCI_ISGE),
> +                  cqhci_readl(cq_host, CQHCI_IC));
> +       CQHCI_DUMP("TDL base:  0x%08x | TDL up32: 0x%08x\n",
> +                  cqhci_readl(cq_host, CQHCI_TDLBA),
> +                  cqhci_readl(cq_host, CQHCI_TDLBAU));
> +       CQHCI_DUMP("Doorbell:  0x%08x | TCN:      0x%08x\n",
> +                  cqhci_readl(cq_host, CQHCI_TDBR),
> +                  cqhci_readl(cq_host, CQHCI_TCN));
> +       CQHCI_DUMP("Dev queue: 0x%08x | Dev Pend: 0x%08x\n",
> +                  cqhci_readl(cq_host, CQHCI_DQS),
> +                  cqhci_readl(cq_host, CQHCI_DPT));
> +       CQHCI_DUMP("Task clr:  0x%08x | SSC1:     0x%08x\n",
> +                  cqhci_readl(cq_host, CQHCI_TCLR),
> +                  cqhci_readl(cq_host, CQHCI_SSC1));
> +       CQHCI_DUMP("SSC2:      0x%08x | DCMD rsp: 0x%08x\n",
> +                  cqhci_readl(cq_host, CQHCI_SSC2),
> +                  cqhci_readl(cq_host, CQHCI_CRDCT));
> +       CQHCI_DUMP("RED mask:  0x%08x | TERRI:    0x%08x\n",
> +                  cqhci_readl(cq_host, CQHCI_RMEM),
> +                  cqhci_readl(cq_host, CQHCI_TERRI));
> +       CQHCI_DUMP("Resp idx:  0x%08x | Resp arg: 0x%08x\n",
> +                  cqhci_readl(cq_host, CQHCI_CRI),
> +                  cqhci_readl(cq_host, CQHCI_CRA));
> +
> +       if (cq_host->ops->dumpregs)
> +               cq_host->ops->dumpregs(mmc);
> +       else
> +               CQHCI_DUMP(": ===========================================\n");
> +}
> +
> +/**
> + * The allocated descriptor table for task, link & transfer descritors
> + * looks like:
> + * |----------|
> + * |task desc |  |->|----------|
> + * |----------|  |  |trans desc|
> + * |link desc-|->|  |----------|
> + * |----------|          .
> + *      .                .
> + *  no. of slots      max-segs
> + *      .           |----------|
> + * |----------|
> + * The idea here is to create the [task+trans] table and mark & point the
> + * link desc to the transfer desc table on a per slot basis.
> + */
> +static int cqhci_host_alloc_tdl(struct cqhci_host *cq_host)
> +{
> +       int i = 0;
> +
> +       /* task descriptor can be 64/128 bit irrespective of arch */
> +       if (cq_host->caps & CQHCI_TASK_DESC_SZ_128) {
> +               cqhci_writel(cq_host, cqhci_readl(cq_host, CQHCI_CFG) |
> +                              CQHCI_TASK_DESC_SZ, CQHCI_CFG);
> +               cq_host->task_desc_len = 16;
> +       } else {
> +               cq_host->task_desc_len = 8;
> +       }
> +
> +       /*
> +        * 96 bits length of transfer desc instead of 128 bits which means
> +        * ADMA would expect next valid descriptor at the 96th bit
> +        * or 128th bit
> +        */
> +       if (cq_host->dma64) {
> +               if (cq_host->quirks & CQHCI_QUIRK_SHORT_TXFR_DESC_SZ)
> +                       cq_host->trans_desc_len = 12;
> +               else
> +                       cq_host->trans_desc_len = 16;
> +               cq_host->link_desc_len = 16;
> +       } else {
> +               cq_host->trans_desc_len = 8;
> +               cq_host->link_desc_len = 8;
> +       }
> +
> +       /* total size of a slot: 1 task & 1 transfer (link) */
> +       cq_host->slot_sz = cq_host->task_desc_len + cq_host->link_desc_len;
> +
> +       cq_host->desc_size = cq_host->slot_sz * cq_host->num_slots;
> +
> +       cq_host->data_size = cq_host->trans_desc_len * cq_host->mmc->max_segs *
> +               (cq_host->num_slots - 1);
> +
> +       pr_debug("%s: cqhci: desc_size: %zu data_sz: %zu slot-sz: %d\n",
> +                mmc_hostname(cq_host->mmc), cq_host->desc_size, cq_host->data_size,
> +                cq_host->slot_sz);
> +
> +       /*
> +        * allocate a dma-mapped chunk of memory for the descriptors
> +        * allocate a dma-mapped chunk of memory for link descriptors
> +        * setup each link-desc memory offset per slot-number to
> +        * the descriptor table.
> +        */
> +       cq_host->desc_base = dmam_alloc_coherent(mmc_dev(cq_host->mmc),
> +                                                cq_host->desc_size,
> +                                                &cq_host->desc_dma_base,
> +                                                GFP_KERNEL);
> +       cq_host->trans_desc_base = dmam_alloc_coherent(mmc_dev(cq_host->mmc),
> +                                             cq_host->data_size,
> +                                             &cq_host->trans_desc_dma_base,
> +                                             GFP_KERNEL);
> +       if (!cq_host->desc_base || !cq_host->trans_desc_base)
> +               return -ENOMEM;
> +
> +       pr_debug("%s: cqhci: desc-base: 0x%p trans-base: 0x%p\n desc_dma 0x%llx trans_dma: 0x%llx\n",
> +                mmc_hostname(cq_host->mmc), cq_host->desc_base, cq_host->trans_desc_base,
> +               (unsigned long long)cq_host->desc_dma_base,
> +               (unsigned long long)cq_host->trans_desc_dma_base);
> +
> +       for (; i < (cq_host->num_slots); i++)
> +               setup_trans_desc(cq_host, i);
> +
> +       return 0;
> +}
> +
> +static void __cqhci_enable(struct cqhci_host *cq_host)
> +{
> +       struct mmc_host *mmc = cq_host->mmc;
> +       u32 cqcfg;
> +
> +       cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
> +
> +       /* Configuration must not be changed while enabled */
> +       if (cqcfg & CQHCI_ENABLE) {
> +               cqcfg &= ~CQHCI_ENABLE;
> +               cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
> +       }
> +
> +       cqcfg &= ~(CQHCI_DCMD | CQHCI_TASK_DESC_SZ);
> +
> +       if (mmc->caps2 & MMC_CAP2_CQE_DCMD)
> +               cqcfg |= CQHCI_DCMD;
> +
> +       if (cq_host->caps & CQHCI_TASK_DESC_SZ_128)
> +               cqcfg |= CQHCI_TASK_DESC_SZ;
> +
> +       cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
> +
> +       cqhci_writel(cq_host, lower_32_bits(cq_host->desc_dma_base),
> +                    CQHCI_TDLBA);
> +       cqhci_writel(cq_host, upper_32_bits(cq_host->desc_dma_base),
> +                    CQHCI_TDLBAU);
> +
> +       cqhci_writel(cq_host, cq_host->rca, CQHCI_SSC2);
> +
> +       cqhci_set_irqs(cq_host, 0);
> +
> +       cqcfg |= CQHCI_ENABLE;
> +
> +       cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
> +
> +       mmc->cqe_on = true;
> +
> +       if (cq_host->ops->enable)
> +               cq_host->ops->enable(mmc);
> +
> +       /* Ensure all writes are done before interrupts are enabled */
> +       wmb();
> +
> +       cqhci_set_irqs(cq_host, CQHCI_IS_MASK);
> +
> +       cq_host->activated = true;
> +}
> +
> +static void __cqhci_disable(struct cqhci_host *cq_host)
> +{
> +       u32 cqcfg;
> +
> +       cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
> +       cqcfg &= ~CQHCI_ENABLE;
> +       cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
> +
> +       cq_host->mmc->cqe_on = false;
> +
> +       cq_host->activated = false;
> +}
> +
> +int cqhci_suspend(struct mmc_host *mmc)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +
> +       if (cq_host->enabled)
> +               __cqhci_disable(cq_host);
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL(cqhci_suspend);
> +
> +int cqhci_resume(struct mmc_host *mmc)
> +{
> +       /* Re-enable is done upon first request */
> +       return 0;
> +}
> +EXPORT_SYMBOL(cqhci_resume);
> +
> +static int cqhci_enable(struct mmc_host *mmc, struct mmc_card *card)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +       int err;
> +
> +       if (cq_host->enabled)
> +               return 0;
> +
> +       cq_host->rca = card->rca;
> +
> +       err = cqhci_host_alloc_tdl(cq_host);
> +       if (err)
> +               return err;
> +
> +       __cqhci_enable(cq_host);
> +
> +       cq_host->enabled = true;
> +
> +#ifdef DEBUG
> +       cqhci_dumpregs(cq_host);
> +#endif
> +       return 0;
> +}
> +
> +/* CQHCI is idle and should halt immediately, so set a small timeout */
> +#define CQHCI_OFF_TIMEOUT 100
> +
> +static void cqhci_off(struct mmc_host *mmc)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +       ktime_t timeout;
> +       bool timed_out;
> +       u32 reg;
> +
> +       if (!cq_host->enabled || !mmc->cqe_on || cq_host->recovery_halt)
> +               return;
> +
> +       if (cq_host->ops->disable)
> +               cq_host->ops->disable(mmc, false);
> +
> +       cqhci_writel(cq_host, CQHCI_HALT, CQHCI_CTL);
> +
> +       timeout = ktime_add_us(ktime_get(), CQHCI_OFF_TIMEOUT);
> +       while (1) {
> +               timed_out = ktime_compare(ktime_get(), timeout) > 0;
> +               reg = cqhci_readl(cq_host, CQHCI_CTL);
> +               if ((reg & CQHCI_HALT) || timed_out)
> +                       break;
> +       }
> +
> +       if (timed_out)
> +               pr_err("%s: cqhci: CQE stuck on\n", mmc_hostname(mmc));
> +       else
> +               pr_debug("%s: cqhci: CQE off\n", mmc_hostname(mmc));
> +
> +       mmc->cqe_on = false;
> +}
> +
> +static void cqhci_disable(struct mmc_host *mmc)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +
> +       if (!cq_host->enabled)
> +               return;
> +
> +       cqhci_off(mmc);
> +
> +       __cqhci_disable(cq_host);
> +
> +       dmam_free_coherent(mmc_dev(mmc), cq_host->data_size,
> +                          cq_host->trans_desc_base,
> +                          cq_host->trans_desc_dma_base);
> +
> +       dmam_free_coherent(mmc_dev(mmc), cq_host->desc_size,
> +                          cq_host->desc_base,
> +                          cq_host->desc_dma_base);
> +
> +       cq_host->trans_desc_base = NULL;
> +       cq_host->desc_base = NULL;
> +
> +       cq_host->enabled = false;
> +}
> +
> +static void cqhci_prep_task_desc(struct mmc_request *mrq,
> +                                       u64 *data, bool intr)
> +{
> +       u32 req_flags = mrq->data->flags;
> +
> +       *data = CQHCI_VALID(1) |
> +               CQHCI_END(1) |
> +               CQHCI_INT(intr) |
> +               CQHCI_ACT(0x5) |
> +               CQHCI_FORCED_PROG(!!(req_flags & MMC_DATA_FORCED_PRG)) |
> +               CQHCI_DATA_TAG(!!(req_flags & MMC_DATA_DAT_TAG)) |
> +               CQHCI_DATA_DIR(!!(req_flags & MMC_DATA_READ)) |
> +               CQHCI_PRIORITY(!!(req_flags & MMC_DATA_PRIO)) |
> +               CQHCI_QBAR(!!(req_flags & MMC_DATA_QBR)) |
> +               CQHCI_REL_WRITE(!!(req_flags & MMC_DATA_REL_WR)) |
> +               CQHCI_BLK_COUNT(mrq->data->blocks) |
> +               CQHCI_BLK_ADDR((u64)mrq->data->blk_addr);
> +
> +       pr_debug("%s: cqhci: tag %d task descriptor 0x016%llx\n",
> +                mmc_hostname(mrq->host), mrq->tag, (unsigned long long)*data);
> +}
> +
> +static int cqhci_dma_map(struct mmc_host *host, struct mmc_request *mrq)
> +{
> +       int sg_count;
> +       struct mmc_data *data = mrq->data;
> +
> +       if (!data)
> +               return -EINVAL;
> +
> +       sg_count = dma_map_sg(mmc_dev(host), data->sg,
> +                             data->sg_len,
> +                             (data->flags & MMC_DATA_WRITE) ?
> +                             DMA_TO_DEVICE : DMA_FROM_DEVICE);
> +       if (!sg_count) {
> +               pr_err("%s: sg-len: %d\n", __func__, data->sg_len);
> +               return -ENOMEM;
> +       }
> +
> +       return sg_count;
> +}
> +
> +static void cqhci_set_tran_desc(u8 *desc, dma_addr_t addr, int len, bool end,
> +                               bool dma64)
> +{
> +       __le32 *attr = (__le32 __force *)desc;
> +
> +       *attr = (CQHCI_VALID(1) |
> +                CQHCI_END(end ? 1 : 0) |
> +                CQHCI_INT(0) |
> +                CQHCI_ACT(0x4) |
> +                CQHCI_DAT_LENGTH(len));
> +
> +       if (dma64) {
> +               __le64 *dataddr = (__le64 __force *)(desc + 4);
> +
> +               dataddr[0] = cpu_to_le64(addr);
> +       } else {
> +               __le32 *dataddr = (__le32 __force *)(desc + 4);
> +
> +               dataddr[0] = cpu_to_le32(addr);
> +       }
> +}
> +
> +static int cqhci_prep_tran_desc(struct mmc_request *mrq,
> +                              struct cqhci_host *cq_host, int tag)
> +{
> +       struct mmc_data *data = mrq->data;
> +       int i, sg_count, len;
> +       bool end = false;
> +       bool dma64 = cq_host->dma64;
> +       dma_addr_t addr;
> +       u8 *desc;
> +       struct scatterlist *sg;
> +
> +       sg_count = cqhci_dma_map(mrq->host, mrq);
> +       if (sg_count < 0) {
> +               pr_err("%s: %s: unable to map sg lists, %d\n",
> +                               mmc_hostname(mrq->host), __func__, sg_count);
> +               return sg_count;
> +       }
> +
> +       desc = get_trans_desc(cq_host, tag);
> +
> +       for_each_sg(data->sg, sg, sg_count, i) {
> +               addr = sg_dma_address(sg);
> +               len = sg_dma_len(sg);
> +
> +               if ((i+1) == sg_count)
> +                       end = true;
> +               cqhci_set_tran_desc(desc, addr, len, end, dma64);
> +               desc += cq_host->trans_desc_len;
> +       }
> +
> +       return 0;
> +}
> +
> +static void cqhci_prep_dcmd_desc(struct mmc_host *mmc,
> +                                  struct mmc_request *mrq)
> +{
> +       u64 *task_desc = NULL;
> +       u64 data = 0;
> +       u8 resp_type;
> +       u8 *desc;
> +       __le64 *dataddr;
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +       u8 timing;
> +
> +       if (!(mrq->cmd->flags & MMC_RSP_PRESENT)) {
> +               resp_type = 0x0;
> +               timing = 0x1;
> +       } else {
> +               if (mrq->cmd->flags & MMC_RSP_R1B) {
> +                       resp_type = 0x3;
> +                       timing = 0x0;
> +               } else {
> +                       resp_type = 0x2;
> +                       timing = 0x1;
> +               }
> +       }
> +
> +       task_desc = (__le64 __force *)get_desc(cq_host, cq_host->dcmd_slot);
> +       memset(task_desc, 0, cq_host->task_desc_len);
> +       data |= (CQHCI_VALID(1) |
> +                CQHCI_END(1) |
> +                CQHCI_INT(1) |
> +                CQHCI_QBAR(1) |
> +                CQHCI_ACT(0x5) |
> +                CQHCI_CMD_INDEX(mrq->cmd->opcode) |
> +                CQHCI_CMD_TIMING(timing) | CQHCI_RESP_TYPE(resp_type));
> +       *task_desc |= data;
> +       desc = (u8 *)task_desc;
> +       pr_debug("%s: cqhci: dcmd: cmd: %d timing: %d resp: %d\n",
> +                mmc_hostname(mmc), mrq->cmd->opcode, timing, resp_type);
> +       dataddr = (__le64 __force *)(desc + 4);
> +       dataddr[0] = cpu_to_le64((u64)mrq->cmd->arg);
> +
> +}
> +
> +static void cqhci_post_req(struct mmc_host *host, struct mmc_request *mrq)
> +{
> +       struct mmc_data *data = mrq->data;
> +
> +       if (data) {
> +               dma_unmap_sg(mmc_dev(host), data->sg, data->sg_len,
> +                            (data->flags & MMC_DATA_READ) ?
> +                            DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +       }
> +}
> +
> +static inline int cqhci_tag(struct mmc_request *mrq)
> +{
> +       return mrq->cmd ? DCMD_SLOT : mrq->tag;
> +}
> +
> +static int cqhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
> +{
> +       int err = 0;
> +       u64 data = 0;
> +       u64 *task_desc = NULL;
> +       int tag = cqhci_tag(mrq);
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +       unsigned long flags;
> +
> +       if (!cq_host->enabled) {
> +               pr_err("%s: cqhci: not enabled\n", mmc_hostname(mmc));
> +               return -EINVAL;
> +       }
> +
> +       /* First request after resume has to re-enable */
> +       if (!cq_host->activated)
> +               __cqhci_enable(cq_host);
> +
> +       if (!mmc->cqe_on) {
> +               cqhci_writel(cq_host, 0, CQHCI_CTL);
> +               mmc->cqe_on = true;
> +               pr_debug("%s: cqhci: CQE on\n", mmc_hostname(mmc));
> +               if (cqhci_readl(cq_host, CQHCI_CTL) && CQHCI_HALT) {
> +                       pr_err("%s: cqhci: CQE failed to exit halt state\n",
> +                              mmc_hostname(mmc));
> +               }
> +               if (cq_host->ops->enable)
> +                       cq_host->ops->enable(mmc);
> +       }
> +
> +       if (mrq->data) {
> +               task_desc = (__le64 __force *)get_desc(cq_host, tag);
> +               cqhci_prep_task_desc(mrq, &data, 1);
> +               *task_desc = cpu_to_le64(data);
> +               err = cqhci_prep_tran_desc(mrq, cq_host, tag);
> +               if (err) {
> +                       pr_err("%s: cqhci: failed to setup tx desc: %d\n",
> +                              mmc_hostname(mmc), err);
> +                       return err;
> +               }
> +       } else {
> +               cqhci_prep_dcmd_desc(mmc, mrq);
> +       }
> +
> +       spin_lock_irqsave(&cq_host->lock, flags);
> +
> +       if (cq_host->recovery_halt) {
> +               err = -EBUSY;
> +               goto out_unlock;
> +       }
> +
> +       cq_host->slot[tag].mrq = mrq;
> +       cq_host->slot[tag].flags = 0;
> +
> +       cq_host->qcnt += 1;
> +
> +       cqhci_writel(cq_host, 1 << tag, CQHCI_TDBR);
> +       if (!(cqhci_readl(cq_host, CQHCI_TDBR) & (1 << tag)))
> +               pr_debug("%s: cqhci: doorbell not set for tag %d\n",
> +                        mmc_hostname(mmc), tag);
> +out_unlock:
> +       spin_unlock_irqrestore(&cq_host->lock, flags);
> +
> +       if (err)
> +               cqhci_post_req(mmc, mrq);
> +
> +       return err;
> +}
> +
> +static void cqhci_recovery_needed(struct mmc_host *mmc, struct mmc_request *mrq,
> +                                 bool notify)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +
> +       if (!cq_host->recovery_halt) {
> +               cq_host->recovery_halt = true;
> +               pr_debug("%s: cqhci: recovery needed\n", mmc_hostname(mmc));
> +               wake_up(&cq_host->wait_queue);
> +               if (notify && mrq->recovery_notifier)
> +                       mrq->recovery_notifier(mrq);
> +       }
> +}
> +
> +static unsigned int cqhci_error_flags(int error1, int error2)
> +{
> +       int error = error1 ? error1 : error2;
> +
> +       switch (error) {
> +       case -EILSEQ:
> +               return CQHCI_HOST_CRC;
> +       case -ETIMEDOUT:
> +               return CQHCI_HOST_TIMEOUT;
> +       default:
> +               return CQHCI_HOST_OTHER;
> +       }
> +}
> +
> +static void cqhci_error_irq(struct mmc_host *mmc, u32 status, int cmd_error,
> +                           int data_error)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +       struct cqhci_slot *slot;
> +       u32 terri;
> +       int tag;
> +
> +       spin_lock(&cq_host->lock);
> +
> +       terri = cqhci_readl(cq_host, CQHCI_TERRI);
> +
> +       pr_debug("%s: cqhci: error IRQ status: 0x%08x cmd error %d data error %d TERRI: 0x%08x\n",
> +                mmc_hostname(mmc), status, cmd_error, data_error, terri);
> +
> +       /* Forget about errors when recovery has already been triggered */
> +       if (cq_host->recovery_halt)
> +               goto out_unlock;
> +
> +       if (!cq_host->qcnt) {
> +               WARN_ONCE(1, "%s: cqhci: error when idle. IRQ status: 0x%08x cmd error %d data error %d TERRI: 0x%08x\n",
> +                         mmc_hostname(mmc), status, cmd_error, data_error,
> +                         terri);
> +               goto out_unlock;
> +       }
> +
> +       if (CQHCI_TERRI_C_VALID(terri)) {
> +               tag = CQHCI_TERRI_C_TASK(terri);
> +               slot = &cq_host->slot[tag];
> +               if (slot->mrq) {
> +                       slot->flags = cqhci_error_flags(cmd_error, data_error);
> +                       cqhci_recovery_needed(mmc, slot->mrq, true);
> +               }
> +       }
> +
> +       if (CQHCI_TERRI_D_VALID(terri)) {
> +               tag = CQHCI_TERRI_D_TASK(terri);
> +               slot = &cq_host->slot[tag];
> +               if (slot->mrq) {
> +                       slot->flags = cqhci_error_flags(data_error, cmd_error);
> +                       cqhci_recovery_needed(mmc, slot->mrq, true);
> +               }
> +       }
> +
> +       if (!cq_host->recovery_halt) {
> +               /*
> +                * The only way to guarantee forward progress is to mark at
> +                * least one task in error, so if none is indicated, pick one.
> +                */
> +               for (tag = 0; tag < NUM_SLOTS; tag++) {
> +                       slot = &cq_host->slot[tag];
> +                       if (!slot->mrq)
> +                               continue;
> +                       slot->flags = cqhci_error_flags(data_error, cmd_error);
> +                       cqhci_recovery_needed(mmc, slot->mrq, true);
> +                       break;
> +               }
> +       }
> +
> +out_unlock:
> +       spin_unlock(&cq_host->lock);
> +}
> +
> +static void cqhci_finish_mrq(struct mmc_host *mmc, unsigned int tag)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +       struct cqhci_slot *slot = &cq_host->slot[tag];
> +       struct mmc_request *mrq = slot->mrq;
> +       struct mmc_data *data;
> +
> +       if (!mrq) {
> +               WARN_ONCE(1, "%s: cqhci: spurious TCN for tag %d\n",
> +                         mmc_hostname(mmc), tag);
> +               return;
> +       }
> +
> +       /* No completions allowed during recovery */
> +       if (cq_host->recovery_halt) {
> +               slot->flags |= CQHCI_COMPLETED;
> +               return;
> +       }
> +
> +       slot->mrq = NULL;
> +
> +       cq_host->qcnt -= 1;
> +
> +       data = mrq->data;
> +       if (data) {
> +               if (data->error)
> +                       data->bytes_xfered = 0;
> +               else
> +                       data->bytes_xfered = data->blksz * data->blocks;
> +       }
> +
> +       mmc_cqe_request_done(mmc, mrq);
> +}
> +
> +irqreturn_t cqhci_irq(struct mmc_host *mmc, u32 intmask, int cmd_error,
> +                     int data_error)
> +{
> +       u32 status;
> +       unsigned long tag = 0, comp_status;
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +
> +       status = cqhci_readl(cq_host, CQHCI_IS);
> +       cqhci_writel(cq_host, status, CQHCI_IS);
> +
> +       pr_debug("%s: cqhci: IRQ status: 0x%08x\n", mmc_hostname(mmc), status);
> +
> +       if ((status & CQHCI_IS_RED) || cmd_error || data_error)
> +               cqhci_error_irq(mmc, status, cmd_error, data_error);
> +
> +       if (status & CQHCI_IS_TCC) {
> +               /* read TCN and complete the request */
> +               comp_status = cqhci_readl(cq_host, CQHCI_TCN);
> +               cqhci_writel(cq_host, comp_status, CQHCI_TCN);
> +               pr_debug("%s: cqhci: TCN: 0x%08lx\n",
> +                        mmc_hostname(mmc), comp_status);
> +
> +               spin_lock(&cq_host->lock);
> +
> +               for_each_set_bit(tag, &comp_status, cq_host->num_slots) {
> +                       /* complete the corresponding mrq */
> +                       pr_debug("%s: cqhci: completing tag %lu\n",
> +                                mmc_hostname(mmc), tag);
> +                       cqhci_finish_mrq(mmc, tag);
> +               }
> +
> +               if (cq_host->waiting_for_idle && !cq_host->qcnt) {
> +                       cq_host->waiting_for_idle = false;
> +                       wake_up(&cq_host->wait_queue);
> +               }
> +
> +               spin_unlock(&cq_host->lock);
> +       }
> +
> +       if (status & CQHCI_IS_TCL)
> +               wake_up(&cq_host->wait_queue);
> +
> +       if (status & CQHCI_IS_HAC)
> +               wake_up(&cq_host->wait_queue);
> +
> +       return IRQ_HANDLED;
> +}
> +EXPORT_SYMBOL(cqhci_irq);
> +
> +static bool cqhci_is_idle(struct cqhci_host *cq_host, int *ret)
> +{
> +       unsigned long flags;
> +       bool is_idle;
> +
> +       spin_lock_irqsave(&cq_host->lock, flags);
> +       is_idle = !cq_host->qcnt || cq_host->recovery_halt;
> +       *ret = cq_host->recovery_halt ? -EBUSY : 0;
> +       cq_host->waiting_for_idle = !is_idle;
> +       spin_unlock_irqrestore(&cq_host->lock, flags);
> +
> +       return is_idle;
> +}
> +
> +static int cqhci_wait_for_idle(struct mmc_host *mmc)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +       int ret;
> +
> +       wait_event(cq_host->wait_queue, cqhci_is_idle(cq_host, &ret));
> +
> +       return ret;
> +}
> +
> +static bool cqhci_timeout(struct mmc_host *mmc, struct mmc_request *mrq,
> +                         bool *recovery_needed)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +       int tag = cqhci_tag(mrq);
> +       struct cqhci_slot *slot = &cq_host->slot[tag];
> +       unsigned long flags;
> +       bool timed_out;
> +
> +       spin_lock_irqsave(&cq_host->lock, flags);
> +       timed_out = slot->mrq == mrq;
> +       if (timed_out) {
> +               slot->flags |= CQHCI_EXTERNAL_TIMEOUT;
> +               cqhci_recovery_needed(mmc, mrq, false);
> +               *recovery_needed = cq_host->recovery_halt;
> +       }
> +       spin_unlock_irqrestore(&cq_host->lock, flags);
> +
> +       if (timed_out) {
> +               pr_err("%s: cqhci: timeout for tag %d\n",
> +                      mmc_hostname(mmc), tag);
> +               cqhci_dumpregs(cq_host);
> +       }
> +
> +       return timed_out;
> +}
> +
> +static bool cqhci_tasks_cleared(struct cqhci_host *cq_host)
> +{
> +       return !(cqhci_readl(cq_host, CQHCI_CTL) & CQHCI_CLEAR_ALL_TASKS);
> +}
> +
> +static bool cqhci_clear_all_tasks(struct mmc_host *mmc, unsigned int timeout)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +       bool ret;
> +       u32 ctl;
> +
> +       cqhci_set_irqs(cq_host, CQHCI_IS_TCL);
> +
> +       ctl = cqhci_readl(cq_host, CQHCI_CTL);
> +       ctl |= CQHCI_CLEAR_ALL_TASKS;
> +       cqhci_writel(cq_host, ctl, CQHCI_CTL);
> +
> +       wait_event_timeout(cq_host->wait_queue, cqhci_tasks_cleared(cq_host),
> +                          msecs_to_jiffies(timeout) + 1);
> +
> +       cqhci_set_irqs(cq_host, 0);
> +
> +       ret = cqhci_tasks_cleared(cq_host);
> +
> +       if (!ret)
> +               pr_debug("%s: cqhci: Failed to clear tasks\n",
> +                        mmc_hostname(mmc));
> +
> +       return ret;
> +}
> +
> +static bool cqhci_halted(struct cqhci_host *cq_host)
> +{
> +       return cqhci_readl(cq_host, CQHCI_CTL) & CQHCI_HALT;
> +}
> +
> +static bool cqhci_halt(struct mmc_host *mmc, unsigned int timeout)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +       bool ret;
> +       u32 ctl;
> +
> +       if (cqhci_halted(cq_host))
> +               return true;
> +
> +       cqhci_set_irqs(cq_host, CQHCI_IS_HAC);
> +
> +       ctl = cqhci_readl(cq_host, CQHCI_CTL);
> +       ctl |= CQHCI_HALT;
> +       cqhci_writel(cq_host, ctl, CQHCI_CTL);
> +
> +       wait_event_timeout(cq_host->wait_queue, cqhci_halted(cq_host),
> +                          msecs_to_jiffies(timeout) + 1);
> +
> +       cqhci_set_irqs(cq_host, 0);
> +
> +       ret = cqhci_halted(cq_host);
> +
> +       if (!ret)
> +               pr_debug("%s: cqhci: Failed to halt\n", mmc_hostname(mmc));
> +
> +       return ret;
> +}
> +
> +/*
> + * After halting we expect to be able to use the command line. We interpret the
> + * failure to halt to mean the data lines might still be in use (and the upper
> + * layers will need to send a STOP command), so we set the timeout based on a
> + * generous command timeout.
> + */
> +#define CQHCI_START_HALT_TIMEOUT       5
> +
> +static void cqhci_recovery_start(struct mmc_host *mmc)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +
> +       pr_debug("%s: cqhci: %s\n", mmc_hostname(mmc), __func__);
> +
> +       WARN_ON(!cq_host->recovery_halt);
> +
> +       cqhci_halt(mmc, CQHCI_START_HALT_TIMEOUT);
> +
> +       if (cq_host->ops->disable)
> +               cq_host->ops->disable(mmc, true);
> +
> +       mmc->cqe_on = false;
> +}
> +
> +static int cqhci_error_from_flags(unsigned int flags)
> +{
> +       if (!flags)
> +               return 0;
> +
> +       /* CRC errors might indicate re-tuning so prefer to report that */
> +       if (flags & CQHCI_HOST_CRC)
> +               return -EILSEQ;
> +
> +       if (flags & (CQHCI_EXTERNAL_TIMEOUT | CQHCI_HOST_TIMEOUT))
> +               return -ETIMEDOUT;
> +
> +       return -EIO;
> +}
> +
> +static void cqhci_recover_mrq(struct cqhci_host *cq_host, unsigned int tag)
> +{
> +       struct cqhci_slot *slot = &cq_host->slot[tag];
> +       struct mmc_request *mrq = slot->mrq;
> +       struct mmc_data *data;
> +
> +       if (!mrq)
> +               return;
> +
> +       slot->mrq = NULL;
> +
> +       cq_host->qcnt -= 1;
> +
> +       data = mrq->data;
> +       if (data) {
> +               data->bytes_xfered = 0;
> +               data->error = cqhci_error_from_flags(slot->flags);
> +       } else {
> +               mrq->cmd->error = cqhci_error_from_flags(slot->flags);
> +       }
> +
> +       mmc_cqe_request_done(cq_host->mmc, mrq);
> +}
> +
> +static void cqhci_recover_mrqs(struct cqhci_host *cq_host)
> +{
> +       int i;
> +
> +       for (i = 0; i < cq_host->num_slots; i++)
> +               cqhci_recover_mrq(cq_host, i);
> +}
> +
> +/*
> + * By now the command and data lines should be unused so there is no reason for
> + * CQHCI to take a long time to halt, but if it doesn't halt there could be
> + * problems clearing tasks, so be generous.
> + */
> +#define CQHCI_FINISH_HALT_TIMEOUT      20
> +
> +/* CQHCI could be expected to clear it's internal state pretty quickly */
> +#define CQHCI_CLEAR_TIMEOUT            20
> +
> +static void cqhci_recovery_finish(struct mmc_host *mmc)
> +{
> +       struct cqhci_host *cq_host = mmc->cqe_private;
> +       unsigned long flags;
> +       u32 cqcfg;
> +       bool ok;
> +
> +       pr_debug("%s: cqhci: %s\n", mmc_hostname(mmc), __func__);
> +
> +       WARN_ON(!cq_host->recovery_halt);
> +
> +       ok = cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT);
> +
> +       if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT))
> +               ok = false;
> +
> +       /*
> +        * The specification contradicts itself, by saying that tasks cannot be
> +        * cleared if CQHCI does not halt, but if CQHCI does not halt, it should
> +        * be disabled/re-enabled, but not to disable before clearing tasks.
> +        * Have a go anyway.
> +        */
> +       if (!ok) {
> +               pr_debug("%s: cqhci: disable / re-enable\n", mmc_hostname(mmc));
> +               cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
> +               cqcfg &= ~CQHCI_ENABLE;
> +               cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
> +               cqcfg |= CQHCI_ENABLE;
> +               cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
> +               /* Be sure that there are no tasks */
> +               ok = cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT);
> +               if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT))
> +                       ok = false;
> +               WARN_ON(!ok);
> +       }
> +
> +       cqhci_recover_mrqs(cq_host);
> +
> +       WARN_ON(cq_host->qcnt);
> +
> +       spin_lock_irqsave(&cq_host->lock, flags);
> +       cq_host->qcnt = 0;
> +       cq_host->recovery_halt = false;
> +       mmc->cqe_on = false;
> +       spin_unlock_irqrestore(&cq_host->lock, flags);
> +
> +       /* Ensure all writes are done before interrupts are re-enabled */
> +       wmb();
> +
> +       cqhci_writel(cq_host, CQHCI_IS_HAC | CQHCI_IS_TCL, CQHCI_IS);
> +
> +       cqhci_set_irqs(cq_host, CQHCI_IS_MASK);
> +
> +       pr_debug("%s: cqhci: recovery done\n", mmc_hostname(mmc));
> +}
> +
> +static const struct mmc_cqe_ops cqhci_cqe_ops = {
> +       .cqe_enable = cqhci_enable,
> +       .cqe_disable = cqhci_disable,
> +       .cqe_request = cqhci_request,
> +       .cqe_post_req = cqhci_post_req,
> +       .cqe_off = cqhci_off,
> +       .cqe_wait_for_idle = cqhci_wait_for_idle,
> +       .cqe_timeout = cqhci_timeout,
> +       .cqe_recovery_start = cqhci_recovery_start,
> +       .cqe_recovery_finish = cqhci_recovery_finish,
> +};
> +
> +struct cqhci_host *cqhci_pltfm_init(struct platform_device *pdev)
> +{
> +       struct cqhci_host *cq_host;
> +       struct resource *cqhci_memres = NULL;
> +
> +       /* check and setup CMDQ interface */
> +       cqhci_memres = platform_get_resource_byname(pdev, IORESOURCE_MEM,
> +                                                  "cqhci_mem");
> +       if (!cqhci_memres) {
> +               dev_dbg(&pdev->dev, "CMDQ not supported\n");
> +               return ERR_PTR(-EINVAL);
> +       }
> +
> +       cq_host = devm_kzalloc(&pdev->dev, sizeof(*cq_host), GFP_KERNEL);
> +       if (!cq_host)
> +               return ERR_PTR(-ENOMEM);
> +       cq_host->mmio = devm_ioremap(&pdev->dev,
> +                                    cqhci_memres->start,
> +                                    resource_size(cqhci_memres));
> +       if (!cq_host->mmio) {
> +               dev_err(&pdev->dev, "failed to remap cqhci regs\n");
> +               return ERR_PTR(-EBUSY);
> +       }
> +       dev_dbg(&pdev->dev, "CMDQ ioremap: done\n");
> +
> +       return cq_host;
> +}
> +EXPORT_SYMBOL(cqhci_pltfm_init);
> +
> +static unsigned int cqhci_ver_major(struct cqhci_host *cq_host)
> +{
> +       return CQHCI_VER_MAJOR(cqhci_readl(cq_host, CQHCI_VER));
> +}
> +
> +static unsigned int cqhci_ver_minor(struct cqhci_host *cq_host)
> +{
> +       u32 ver = cqhci_readl(cq_host, CQHCI_VER);
> +
> +       return CQHCI_VER_MINOR1(ver) * 10 + CQHCI_VER_MINOR2(ver);
> +}
> +
> +int cqhci_init(struct cqhci_host *cq_host, struct mmc_host *mmc,
> +             bool dma64)
> +{
> +       int err;
> +
> +       cq_host->dma64 = dma64;
> +       cq_host->mmc = mmc;
> +       cq_host->mmc->cqe_private = cq_host;
> +
> +       cq_host->num_slots = NUM_SLOTS;
> +       cq_host->dcmd_slot = DCMD_SLOT;
> +
> +       mmc->cqe_ops = &cqhci_cqe_ops;
> +
> +       mmc->cqe_qdepth = NUM_SLOTS;
> +       if (mmc->caps2 & MMC_CAP2_CQE_DCMD)
> +               mmc->cqe_qdepth -= 1;
> +
> +       cq_host->slot = devm_kcalloc(mmc_dev(mmc), cq_host->num_slots,
> +                                    sizeof(*cq_host->slot), GFP_KERNEL);
> +       if (!cq_host->slot) {
> +               err = -ENOMEM;
> +               goto out_err;
> +       }
> +
> +       spin_lock_init(&cq_host->lock);
> +
> +       init_completion(&cq_host->halt_comp);
> +       init_waitqueue_head(&cq_host->wait_queue);
> +
> +       pr_info("%s: CQHCI version %u.%02u\n",
> +               mmc_hostname(mmc), cqhci_ver_major(cq_host),
> +               cqhci_ver_minor(cq_host));
> +
> +       return 0;
> +
> +out_err:
> +       pr_err("%s: CQHCI version %u.%02u failed to initialize, error %d\n",
> +              mmc_hostname(mmc), cqhci_ver_major(cq_host),
> +              cqhci_ver_minor(cq_host), err);
> +       return err;
> +}
> +EXPORT_SYMBOL(cqhci_init);
> +
> +MODULE_AUTHOR("Venkat Gopalakrishnan <venkatg@codeaurora.org>");
> +MODULE_DESCRIPTION("Command Queue Host Controller Interface driver");
> +MODULE_LICENSE("GPL v2");
> diff --git a/drivers/mmc/host/cqhci.h b/drivers/mmc/host/cqhci.h
> new file mode 100644
> index 000000000000..2d39d361b322
> --- /dev/null
> +++ b/drivers/mmc/host/cqhci.h
> @@ -0,0 +1,240 @@
> +/* Copyright (c) 2015, The Linux Foundation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 and
> + * only version 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +#ifndef LINUX_MMC_CQHCI_H
> +#define LINUX_MMC_CQHCI_H
> +
> +#include <linux/compiler.h>
> +#include <linux/bitops.h>
> +#include <linux/spinlock_types.h>
> +#include <linux/types.h>
> +#include <linux/completion.h>
> +#include <linux/wait.h>
> +#include <linux/irqreturn.h>
> +#include <asm/io.h>
> +
> +/* registers */
> +/* version */
> +#define CQHCI_VER                      0x00
> +#define CQHCI_VER_MAJOR(x)             (((x) & GENMASK(11, 8)) >> 8)
> +#define CQHCI_VER_MINOR1(x)            (((x) & GENMASK(7, 4)) >> 4)
> +#define CQHCI_VER_MINOR2(x)            ((x) & GENMASK(3, 0))
> +
> +/* capabilities */
> +#define CQHCI_CAP                      0x04
> +/* configuration */
> +#define CQHCI_CFG                      0x08
> +#define CQHCI_DCMD                     0x00001000
> +#define CQHCI_TASK_DESC_SZ             0x00000100
> +#define CQHCI_ENABLE                   0x00000001
> +
> +/* control */
> +#define CQHCI_CTL                      0x0C
> +#define CQHCI_CLEAR_ALL_TASKS          0x00000100
> +#define CQHCI_HALT                     0x00000001
> +
> +/* interrupt status */
> +#define CQHCI_IS                       0x10
> +#define CQHCI_IS_HAC                   BIT(0)
> +#define CQHCI_IS_TCC                   BIT(1)
> +#define CQHCI_IS_RED                   BIT(2)
> +#define CQHCI_IS_TCL                   BIT(3)
> +
> +#define CQHCI_IS_MASK (CQHCI_IS_TCC | CQHCI_IS_RED)
> +
> +/* interrupt status enable */
> +#define CQHCI_ISTE                     0x14
> +
> +/* interrupt signal enable */
> +#define CQHCI_ISGE                     0x18
> +
> +/* interrupt coalescing */
> +#define CQHCI_IC                       0x1C
> +#define CQHCI_IC_ENABLE                        BIT(31)
> +#define CQHCI_IC_RESET                 BIT(16)
> +#define CQHCI_IC_ICCTHWEN              BIT(15)
> +#define CQHCI_IC_ICCTH(x)              ((x & 0x1F) << 8)
> +#define CQHCI_IC_ICTOVALWEN            BIT(7)
> +#define CQHCI_IC_ICTOVAL(x)            (x & 0x7F)
> +
> +/* task list base address */
> +#define CQHCI_TDLBA                    0x20
> +
> +/* task list base address upper */
> +#define CQHCI_TDLBAU                   0x24
> +
> +/* door-bell */
> +#define CQHCI_TDBR                     0x28
> +
> +/* task completion notification */
> +#define CQHCI_TCN                      0x2C
> +
> +/* device queue status */
> +#define CQHCI_DQS                      0x30
> +
> +/* device pending tasks */
> +#define CQHCI_DPT                      0x34
> +
> +/* task clear */
> +#define CQHCI_TCLR                     0x38
> +
> +/* send status config 1 */
> +#define CQHCI_SSC1                     0x40
> +
> +/* send status config 2 */
> +#define CQHCI_SSC2                     0x44
> +
> +/* response for dcmd */
> +#define CQHCI_CRDCT                    0x48
> +
> +/* response mode error mask */
> +#define CQHCI_RMEM                     0x50
> +
> +/* task error info */
> +#define CQHCI_TERRI                    0x54
> +
> +#define CQHCI_TERRI_C_INDEX(x)         ((x) & GENMASK(5, 0))
> +#define CQHCI_TERRI_C_TASK(x)          (((x) & GENMASK(12, 8)) >> 8)
> +#define CQHCI_TERRI_C_VALID(x)         ((x) & BIT(15))
> +#define CQHCI_TERRI_D_INDEX(x)         (((x) & GENMASK(21, 16)) >> 16)
> +#define CQHCI_TERRI_D_TASK(x)          (((x) & GENMASK(28, 24)) >> 24)
> +#define CQHCI_TERRI_D_VALID(x)         ((x) & BIT(31))
> +
> +/* command response index */
> +#define CQHCI_CRI                      0x58
> +
> +/* command response argument */
> +#define CQHCI_CRA                      0x5C
> +
> +#define CQHCI_INT_ALL                  0xF
> +#define CQHCI_IC_DEFAULT_ICCTH         31
> +#define CQHCI_IC_DEFAULT_ICTOVAL       1
> +
> +/* attribute fields */
> +#define CQHCI_VALID(x)                 ((x & 1) << 0)
> +#define CQHCI_END(x)                   ((x & 1) << 1)
> +#define CQHCI_INT(x)                   ((x & 1) << 2)
> +#define CQHCI_ACT(x)                   ((x & 0x7) << 3)
> +
> +/* data command task descriptor fields */
> +#define CQHCI_FORCED_PROG(x)           ((x & 1) << 6)
> +#define CQHCI_CONTEXT(x)               ((x & 0xF) << 7)
> +#define CQHCI_DATA_TAG(x)              ((x & 1) << 11)
> +#define CQHCI_DATA_DIR(x)              ((x & 1) << 12)
> +#define CQHCI_PRIORITY(x)              ((x & 1) << 13)
> +#define CQHCI_QBAR(x)                  ((x & 1) << 14)
> +#define CQHCI_REL_WRITE(x)             ((x & 1) << 15)
> +#define CQHCI_BLK_COUNT(x)             ((x & 0xFFFF) << 16)
> +#define CQHCI_BLK_ADDR(x)              ((x & 0xFFFFFFFF) << 32)
> +
> +/* direct command task descriptor fields */
> +#define CQHCI_CMD_INDEX(x)             ((x & 0x3F) << 16)
> +#define CQHCI_CMD_TIMING(x)            ((x & 1) << 22)
> +#define CQHCI_RESP_TYPE(x)             ((x & 0x3) << 23)
> +
> +/* transfer descriptor fields */
> +#define CQHCI_DAT_LENGTH(x)            ((x & 0xFFFF) << 16)
> +#define CQHCI_DAT_ADDR_LO(x)           ((x & 0xFFFFFFFF) << 32)
> +#define CQHCI_DAT_ADDR_HI(x)           ((x & 0xFFFFFFFF) << 0)
> +
> +struct cqhci_host_ops;
> +struct mmc_host;
> +struct cqhci_slot;
> +
> +struct cqhci_host {
> +       const struct cqhci_host_ops *ops;
> +       void __iomem *mmio;
> +       struct mmc_host *mmc;
> +
> +       spinlock_t lock;
> +
> +       /* relative card address of device */
> +       unsigned int rca;
> +
> +       /* 64 bit DMA */
> +       bool dma64;
> +       int num_slots;
> +       int qcnt;
> +
> +       u32 dcmd_slot;
> +       u32 caps;
> +#define CQHCI_TASK_DESC_SZ_128         0x1
> +
> +       u32 quirks;
> +#define CQHCI_QUIRK_SHORT_TXFR_DESC_SZ 0x1
> +
> +       bool enabled;
> +       bool halted;
> +       bool init_done;
> +       bool activated;
> +       bool waiting_for_idle;
> +       bool recovery_halt;
> +
> +       size_t desc_size;
> +       size_t data_size;
> +
> +       u8 *desc_base;
> +
> +       /* total descriptor size */
> +       u8 slot_sz;
> +
> +       /* 64/128 bit depends on CQHCI_CFG */
> +       u8 task_desc_len;
> +
> +       /* 64 bit on 32-bit arch, 128 bit on 64-bit */
> +       u8 link_desc_len;
> +
> +       u8 *trans_desc_base;
> +       /* same length as transfer descriptor */
> +       u8 trans_desc_len;
> +
> +       dma_addr_t desc_dma_base;
> +       dma_addr_t trans_desc_dma_base;
> +
> +       struct completion halt_comp;
> +       wait_queue_head_t wait_queue;
> +       struct cqhci_slot *slot;
> +};
> +
> +struct cqhci_host_ops {
> +       void (*dumpregs)(struct mmc_host *mmc);
> +       void (*write_l)(struct cqhci_host *host, u32 val, int reg);
> +       u32 (*read_l)(struct cqhci_host *host, int reg);
> +       void (*enable)(struct mmc_host *mmc);
> +       void (*disable)(struct mmc_host *mmc, bool recovery);
> +};
> +
> +static inline void cqhci_writel(struct cqhci_host *host, u32 val, int reg)
> +{
> +       if (unlikely(host->ops->write_l))
> +               host->ops->write_l(host, val, reg);
> +       else
> +               writel_relaxed(val, host->mmio + reg);
> +}
> +
> +static inline u32 cqhci_readl(struct cqhci_host *host, int reg)
> +{
> +       if (unlikely(host->ops->read_l))
> +               return host->ops->read_l(host, reg);
> +       else
> +               return readl_relaxed(host->mmio + reg);
> +}
> +
> +struct platform_device;
> +
> +irqreturn_t cqhci_irq(struct mmc_host *mmc, u32 intmask, int cmd_error,
> +                     int data_error);
> +int cqhci_init(struct cqhci_host *cq_host, struct mmc_host *mmc, bool dma64);
> +struct cqhci_host *cqhci_pltfm_init(struct platform_device *pdev);
> +int cqhci_suspend(struct mmc_host *mmc);
> +int cqhci_resume(struct mmc_host *mmc);
> +
> +#endif
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host
  2017-11-09 13:41   ` Ulf Hansson
@ 2017-11-09 14:20     ` Adrian Hunter
  0 siblings, 0 replies; 55+ messages in thread
From: Adrian Hunter @ 2017-11-09 14:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

On 09/11/17 15:41, Ulf Hansson wrote:
> On 3 November 2017 at 14:20, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> From: Venkat Gopalakrishnan <venkatg@codeaurora.org>
>>
>> This patch adds CMDQ support for command-queue compatible
>> hosts.
>>
>> Command queue is added in eMMC-5.1 specification. This
>> enables the controller to process upto 32 requests at
>> a time.
>>
>> Adrian Hunter contributed renaming to cqhci, recovery, suspend
>> and resume, cqhci_off, cqhci_wait_for_idle, and external timeout
>> handling.
>>
>> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
>> Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
>> Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org>
>> Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
>> Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
>> Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> Overall this looks good to me.
> 
> However, I didn't see MAINTAINERS being updated. Is anybody above
> volunteering to maintain cqhci.*?

I have hardware that I can use for testing so I can maintain it if no one
else wants to, however if anyone else is keen please feel free to offer too.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 09/10] mmc: block: blk-mq: Stop using card_busy_detect()
  2017-11-09 13:36   ` Ulf Hansson
@ 2017-11-09 15:24     ` Adrian Hunter
  0 siblings, 0 replies; 55+ messages in thread
From: Adrian Hunter @ 2017-11-09 15:24 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Linus Walleij, Shawn Lin, Christoph Hellwig

On 09/11/17 15:36, Ulf Hansson wrote:
> On 3 November 2017 at 14:20, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> card_busy_detect() doesn't set a correct timeout, and it doesn't take care
>> of error status bits. Stop using it for blk-mq.
> 
> I think this changelog isn't very descriptive. Could you please work
> on that for the next version.

Ok

> 
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  drivers/mmc/core/block.c | 117 +++++++++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 109 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
>> index 0c29b1d8d545..5c5ff3c34313 100644
>> --- a/drivers/mmc/core/block.c
>> +++ b/drivers/mmc/core/block.c
>> @@ -1426,15 +1426,18 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq,
>>         }
>>  }
>>
>> -#define CMD_ERRORS                                                     \
>> -       (R1_OUT_OF_RANGE |      /* Command argument out of range */     \
>> -        R1_ADDRESS_ERROR |     /* Misaligned address */                \
>> +#define CMD_ERRORS_EXCL_OOR                                            \
>> +       (R1_ADDRESS_ERROR |     /* Misaligned address */                \
>>          R1_BLOCK_LEN_ERROR |   /* Transferred block length incorrect */\
>>          R1_WP_VIOLATION |      /* Tried to write to protected block */ \
>>          R1_CARD_ECC_FAILED |   /* Card ECC failed */                   \
>>          R1_CC_ERROR |          /* Card controller error */             \
>>          R1_ERROR)              /* General/unknown error */
>>
>> +#define CMD_ERRORS                                                     \
>> +       (CMD_ERRORS_EXCL_OOR |                                          \
>> +        R1_OUT_OF_RANGE)       /* Command argument out of range */     \
>> +
>>  static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq)
>>  {
>>         u32 val;
>> @@ -1951,6 +1954,95 @@ static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req)
>>         mqrq->retries = MMC_NO_RETRIES;
>>  }
>>
>> +static inline bool mmc_blk_oor_valid(struct mmc_blk_request *brq)
>> +{
>> +       return !!brq->mrq.sbc;
>> +}
>> +
>> +static inline u32 mmc_blk_stop_err_bits(struct mmc_blk_request *brq)
>> +{
>> +       return mmc_blk_oor_valid(brq) ? CMD_ERRORS : CMD_ERRORS_EXCL_OOR;
>> +}
>> +
>> +static inline bool mmc_blk_in_tran_state(u32 status)
>> +{
>> +       /*
>> +        * Some cards mishandle the status bits, so make sure to check both the
>> +        * busy indication and the card state.
>> +        */
>> +       return status & R1_READY_FOR_DATA &&
>> +              (R1_CURRENT_STATE(status) == R1_STATE_TRAN);
>> +}
>> +
>> +static unsigned int mmc_blk_clock_khz(struct mmc_host *host)
>> +{
>> +       if (host->actual_clock)
>> +               return host->actual_clock / 1000;
>> +
>> +       /* Clock may be subject to a divisor, fudge it by a factor of 2. */
>> +       if (host->ios.clock)
>> +               return host->ios.clock / 2000;
>> +
>> +       /* How can there be no clock */
>> +       WARN_ON_ONCE(1);
>> +       return 100; /* 100 kHz is minimum possible value */
>> +}
>> +
>> +static unsigned long mmc_blk_data_timeout_jiffies(struct mmc_host *host,
>> +                                                 struct mmc_data *data)
>> +{
>> +       unsigned int ms = DIV_ROUND_UP(data->timeout_ns, 1000000);
>> +       unsigned int khz;
>> +
>> +       if (data->timeout_clks) {
>> +               khz = mmc_blk_clock_khz(host);
>> +               ms += DIV_ROUND_UP(data->timeout_clks, khz);
>> +       }
>> +
>> +       return msecs_to_jiffies(ms);
>> +}
>> +
>> +static int mmc_blk_card_stuck(struct mmc_card *card, struct request *req,
>> +                             u32 *resp_errs)
>> +{
>> +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> +       struct mmc_data *data = &mqrq->brq.data;
>> +       unsigned long timeout;
>> +       u32 status;
>> +       int err;
>> +
>> +       timeout = jiffies + mmc_blk_data_timeout_jiffies(card->host, data);
>> +
>> +       while (1) {
>> +               bool done = time_after(jiffies, timeout);
>> +
>> +               err = __mmc_send_status(card, &status, 5);
>> +               if (err) {
>> +                       pr_err("%s: error %d requesting status\n",
>> +                              req->rq_disk->disk_name, err);
>> +                       break;
>> +               }
>> +
>> +               /* Accumulate any response error bits seen */
>> +               if (resp_errs)
>> +                       *resp_errs |= status;
>> +
>> +               if (mmc_blk_in_tran_state(status))
>> +                       break;
>> +
>> +               /* Timeout if the device never becomes ready */
>> +               if (done) {
>> +                       pr_err("%s: Card stuck in wrong state! %s %s\n",
>> +                               mmc_hostname(card->host),
>> +                               req->rq_disk->disk_name, __func__);
>> +                       err = -ETIMEDOUT;
>> +                       break;
>> +               }
>> +       }
>> +
>> +       return err;
>> +}
> 
> The new function here, mmc_blk_card_stuck() looks very similar to
> card_busy_detect().
> 
> Why can't you instead fixup card_busy_detect() so it behaves like the
> new mmc_blk_card_stuck(), rather than re-implementing most of it?

I saw an advantage in keeping the legacy code separate so that it didn't
then also need testing.

I guess it doesn't hurt to try to fix up the old code.

> 
>> +
>>  static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
>>  {
>>         int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
>> @@ -2097,17 +2189,26 @@ static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq)
>>  static int mmc_blk_card_busy(struct mmc_card *card, struct request *req)
>>  {
>>         struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>> -       bool gen_err = false;
>> +       u32 status = 0;
>>         int err;
>>
>>         if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ)
>>                 return 0;
>>
>> -       err = card_busy_detect(card, MMC_BLK_TIMEOUT_MS, false, req, &gen_err);
>> +       err = mmc_blk_card_stuck(card, req, &status);
>> +
>> +       /*
>> +        * Do not assume data transferred correctly if there are any error bits
>> +        * set.
>> +        */
>> +       if (!err && status & mmc_blk_stop_err_bits(&mqrq->brq)) {
>> +               mqrq->brq.data.bytes_xfered = 0;
>> +               err = -EIO;
>> +       }
>>
>> -       /* Copy the general error bit so it will be seen later on */
>> -       if (gen_err)
>> -               mqrq->brq.stop.resp[0] |= R1_ERROR;
>> +       /* Copy the exception bit so it will be seen later on */
>> +       if (mmc_card_mmc(card) && status & R1_EXCEPTION_EVENT)
>> +               mqrq->brq.cmd.resp[0] |= R1_EXCEPTION_EVENT;
>>
>>         return err;
>>  }
>> --
>> 1.9.1
>>
> 
> Kind regards
> Uffe
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion
  2017-11-09 12:34       ` Linus Walleij
@ 2017-11-09 15:33         ` Adrian Hunter
  0 siblings, 0 replies; 55+ messages in thread
From: Adrian Hunter @ 2017-11-09 15:33 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 09/11/17 14:34, Linus Walleij wrote:
> On Thu, Nov 9, 2017 at 8:27 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> On 08/11/17 11:28, Linus Walleij wrote:
>>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>>
>>>> For blk-mq, add support for completing requests directly in the ->done
>>>> callback. That means that error handling and urgent background operations
>>>> must be handled by recovery_work in that case.
>>>>
>>>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>>>
>>> I tried enabling this on my MMC host (mmci) but I got weird
>>> DMA error messages when I did.
>>>
>>> I guess this has not been tested on a non-DMA-coherent
>>> system?
>>
>> I don't see what DMA-coherence has to do with anything.
>>
>> Possibilities:
>>         - DMA unmapping doesn't work in an atomic context
>>         - requests' DMA operations have to be synchronized with each other
> 
> So since MMCI need the post_req() hook called with
> an error code to properly tear down any DMA operations,
> I was worried that maybe your error path is not doing this
> (passing an error code or calling in the right order).
> 
> I had a bunch of fallouts in my own patch set relating
> to that.
> 
>>> I think I might be seeing this because the .pre and .post
>>> callbacks need to be strictly sequenced, and this is
>>> maybe not taken into account here?
>>
>> I looked at mmci but that did not seem to be the case.
>>
>>> Isn't there as risk
>>> that the .post callback of the next request is called before
>>> the .post callback of the previous request has returned
>>> for example?
>>
>> Of course, the requests are treated as independent.  If the separate DMA
>> operations require synchronization, that is for the host driver to fix.
> 
> They are treated as independent by the block layer but
> it is the subsystems duty to serialize them for the hardware,
> 
> MMCI strictly requires that pre/post hooks per request
> happen in the right order, so if you have prepared a second
> request after submitting the first, and the first fails, you have
> to back out by unpreparing the second one before unpreparing
> the first. It is also the only host driver requireing to be passed
> an error code in the last parameter to the post hook in
> order to work properly.
> 
> I think your patch set handles that nicely though, because I
> haven't seen any errors, it's just when we do this direct
> completion I see problems.

If a request gets an error, then we always do the post_req before starting
another request, so the driver can assume that the first request finished
successfully if it is asked to do post_req on the second request.  So the
driver does have enough information to keep the DMA unmapping in order if
necessary.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 03/10] mmc: block: Add blk-mq support
  2017-11-09 10:42     ` Adrian Hunter
@ 2017-11-09 15:52       ` Linus Walleij
  2017-11-10 10:19         ` Linus Walleij
  2017-11-14 13:10           ` Adrian Hunter
  0 siblings, 2 replies; 55+ messages in thread
From: Linus Walleij @ 2017-11-09 15:52 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Thu, Nov 9, 2017 at 11:42 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 08/11/17 10:54, Linus Walleij wrote:
>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

>> At least you could do what I did and break out a helper like
>> this:
>>
>> /*
>>  * This reports status back to the block layer for a finished request.
>>  */
>> static void mmc_blk_complete(struct mmc_queue_req *mq_rq,
>>                             blk_status_t status)
>> {
>>        struct request *req = mmc_queue_req_to_req(mq_rq);
>>
>>        if (req->mq_ctx) {
>>           blk_mq_end_request(req, status);
>>        } else
>>           blk_end_request_all(req, status);
>> }
>
> You are quibbling.

Hm wiktionary "quibble" points to:
https://en.wikipedia.org/wiki/Trivial_objections

Sorry for not being a native speaker in English, I think that
maybe you were again trying to be snarky to a reviewer but
I honestly don't know.

> It makes next to no difference especially as it all goes
> away when the legacy code is deleted.

Which is something your patch set doesn't do. Nor did you write
anywhere that deleting the legacy code was your intention.
But I'm happy to hear it.

>  I will change it in the next
> revision, but what a waste of everyone's time.
>  Please try to focus on
> things that matter.

As Jean-Paul Sartre said "hell is other people".
Can you try to be friendlier anyway?

>> This retry and error handling using requeue is very elegant.
>> I really like this.
>>
>> If we could also go for MQ-only, only this nice code
>> remains in the tree.
>
> No one has ever suggested that the legacy API will remain.  Once blk-mq is
> ready the old code gets deleted.

The block layer maintainers most definately think MQ is ready
but you seem to disagree. Why?

In the recent LWN article from kernel recepies:
https://lwn.net/Articles/735275/
the only obstacle I can see is a mention that SCSI was not
switched over by default because of some problems with
slow rotating media. "This change had to be backed out
because of some performance and scalability issues that
arose for rotating storage."

Christoph mentions that he's gonna try again for v4.14.
But it's true, I do not see that happening in linux-next yet.

But that is specifically rotating media, which MMC/SD is not.
And UBI is using it since ages without complaints. And
that is a sign of something.

Have you seen any problems when deploying it on MMC/SD,
really? If there are problems I bet the block layer people want
us to find them, diagnose them and ultimately fix them rather
than wait for them to do it. I haven't seen any performance or
scalability issues. At one point I had processes running
on two eMMC and one SD-card and apart from maxing out
the CPUs and DMA backplace as could be expected all was
fine.

>> The problem: you have just reimplemented the whole error
>> handling we had in the old block layer and now we have to
>> maintain two copies and keep them in sync.
>>
>> This is not OK IMO, we will inevitable screw it up, so we
>> need to get *one* error path.
>
> Wow, you really didn't read the code at all.

Just quit this snarky attitude.

> As I have repeatedly pointed
> out, the new code is all new.  There is no overlap and there nothing to keep
> in sync.

The problem is that you have not clearly communicated your
vision to delete the old code. The best way to communicate that
would be to include a patch actually deleteing it.

>  It may not look like it in this patch, but that is only because of
> the ridiculous idea of splitting up the patch.

Naming other people's review feedback as "ridiculous" is not very
helpful. I think the patch set became much easier to review
like this so I am happy with this format.

>>> +static void mmc_blk_mq_acct_req_done(struct mmc_queue *mq, struct request *req)
>>
>> What does "acct" mean in the above function name?
>> Accounting? Actual? I'm lost.
>
> Does "actual" have two "c"'s.  You are just making things up.

Please stop your snarky and belitteling attitude.

I am not a native English speaker and I am not making up
my review comments in order to annoy you.

Consider Hanlon's razor:
https://en.wikipedia.org/wiki/Hanlon%27s_razor
"Never attribute to malice that which is adequately explained by
stupidity".

It's bad enough that you repeatedly point out how stupid
you think I am, I am sorry about that, in my opinion sometimes
other people are really stupid but more often than not the problem
is really on your side, like being impatient and unwilling to educate
others about how your code actually works becuase it seems
so evident to yourself that you think it should be evident to everyone
else as well. I don't know what to say about that, it seems a bit
unempatic.

But you even go and think I am making stuff up in purpose.
That's pretty offensive.

> Of course it is "account".

This is maybe obvious for you but it was not to me.

>  It is counting the number of requests in flight - which is
> pretty obvious from the code.  We use that to support important features
> like CQE re-tuning and avoiding getting / putting the card all the time.

What is so hard with getting this point across without insults?

>>> +{
>>> +       struct request_queue *q = req->q;
>>> +       unsigned long flags;
>>> +       bool put_card;
>>> +
>>> +       spin_lock_irqsave(q->queue_lock, flags);
>>> +
>>> +       mq->in_flight[mmc_issue_type(mq, req)] -= 1;
>>> +
>>> +       put_card = mmc_tot_in_flight(mq) == 0;
>>
>> This in_flight[] business seems a bit kludgy, but I
>> don't really understand it fully. Magic numbers like
>> -1 to mark that something is not going on etc, not
>> super-elegant.
>
> You are misreading.  It subtracts 1 from the number of requests in flight.

Right! Sorry I misread it.

>> I believe it is necessary for CQE though as you need
>> to keep track of outstanding requests?
>
> We have always avoided getting / putting the card when there is another
> request in flight.  Yes it is also used for CQE.

Yeah I took the approach to get/put the card around each
request instead. It might make things a bit simpler.

>>> +       spin_unlock_irqrestore(q->queue_lock, flags);
>>> +
>>> +       if (put_card)
>>> +               mmc_put_card(mq->card, &mq->ctx);
>>
>> I think you should try not to sprinkle mmc_put_card() inside
>> the different functions but instead you can put this in the
>> .complete callback I guess mmc_blk_mq_complete() in your
>> patch set.
>
> This is the *only* block.c function in the blk-mq code that calls
> mmc_put_card().  The queue also has does it for requests that didn't even
> start.  That is it.

Yeah that's correct. If we also delete the old code it will not be
too bad.

>> Also you do not need to avoid calling it several times with
>> that put_card variable. It's fully reentrant thanks to your
>> own code in the lock and all calls come from the same block
>> layer process if you call it in .complete() I think?
>
> We have always avoided unnecessary gets / puts.  Since the result is better,
> why on earth take it out?

It's not a showstopper.

But what I think is nice in doing it around
each request is that since mmc_put_card() calls mmc_release_host()
contains this:

if (--host->claim_cnt) { (...)

So there is a counter inside the claim/release host functions
and now there is another counter keeping track if the in-flight
requests as well and it gives me the feeling we are maintaining
two counters when we only need one.

But maybe it is actually the host->claim_cnt that is overengineered
and should just be a bool, becuase with this construction
that you only call down and claim the host once, the
host->claim_cnt will only be 0 or 1, right?

>> Now the problem Ulf has pointed out starts to creep out in the
>> patch: a lot of code duplication on the MQ path compared to
>> the ordinary block layer path.
>>
>> My approach was structured partly to avoid this: first refactor
>> the old path, then switch to (only) MQ to avoid code duplication.
>
> The old code is rubbish.  There is nothing of value there.  You have to make
> the case why you are wasting everyone's time churning crappy code.

I kind of agree that the old code is rubbish, actually.

I guess I could turn that around and ask: if it is so crappy, why
is your patch set not deleting it?

And I guess the only reasonable answer to that would be what
you were saying above, something along the lines of "MQ is not
ready yet". But it can be argued that MQ is ready.

>  The idea
> that nice new code is wrong because it doesn't churn the old rubbish code,
> is ridiculous.

As I have said in the review comments I like your new
code, especially the new error path.

I took the approach of refactoring because I was afraid I would
break something I guess. But rewriting from scratch is also an
option.

But I think I would prefer that if a big slew of new code is
introduced it needs to be put to wide use and any bugs
smoked out during the -rc phase, and we are now
hiding it behind a Kconfig option so it's unlikely to see testing
until that option is turned on, and that is not good.

>> This looks a bit like it is reimplementing the kernel
>> completion abstraction using a mutex and a variable named
>> .complete_req?
>>
>> We were using a completion in the old block layer so
>> why did you not use it for MQ?
>
> Doesn't seem like you pay much attention to this stuff.

You really have to stop your snarky and belitteling attitude
to your fellow kernel developers.

>From 6.Followthrough.rst again:

----------8<-------------------8<-------------------8<-----------

Andrew Morton has suggested that every review comment which does not result
in a code change should result in an additional code comment instead; that
can help future reviewers avoid the questions which came up the first time
around.

----------8<-------------------8<-------------------8<-----------

At least take a moment to consider the option that maybe so few people
are reviwing your code because this is complicated stuff and really
hard to grasp, maybe the problem isn't on my side, neither on yours,
it may be that the subject matter is the problem.

>  The previous
> request has to be completed even if there is no next request.  That means
> schduling work, that then races with the dispatch path.  So mutual exclusion
> is necessary.

I would say a synchronization primitive is needed, indeed.
I have a different approach to this, which uses completion
as the synchronization primitive and I think that would be possible
to use also here.

I have this code:

        /*
         * Here we postprocess the request differently depending on if
         * we go on the success path or error path. The success path will
         * immediately let new requests hit the host, whereas the error
         * path will hold off new requests until we have retried and
         * succeeded or failed the current asynchronous request.
         */
        if (status == MMC_BLK_SUCCESS) {
                /*
                 * This immediately opens the gate for the next request
                 * to start on the host while we perform post-processing
                 * and report back to the block layer.
                 */
                host->areq = NULL;
                complete(&areq->complete);
                mmc_post_req(host, areq->mrq, 0);
                if (areq->report_done_status)
                        areq->report_done_status(areq, MMC_BLK_SUCCESS);
        } else {
                /*
                 * Post-process this request. Then, if
                 * another request was already prepared, back that out
                 * so we can handle the errors without anything prepared
                 * on the host.
                 */
                if (host->areq_pending)
                        mmc_post_req(host, host->areq_pending->mrq, -EINVAL);
                /*
                 * Call back with error status, this will trigger retry
                 * etc if needed
                 */
                if (areq->report_done_status) {
                        if (areq->report_done_status(areq, status)) {
                                /*
                                 * This happens when we finally give up after
                                 * a few retries or on unrecoverable errors.
                                 */
                                mmc_post_req(host, areq->mrq, 0);
                                host->areq = NULL;
                                /* Re-prepare the next request */
                                if (host->areq_pending)
                                        mmc_pre_req(host,
host->areq_pending->mrq);
                                complete(&areq->complete);
                        }
                }
        }

As you can see it is a bit elaborate about some
of this stuff and quite a lot of comments were added to make
clear what is going on. This is in my head so that is why I ask:
completion worked fine as a synchronization primitive here
so it is maybe a good alternative in your (similar) code
as well?

>> I would contest using a waitqueue for this. The name even says
>> "complete_req" so why is a completion not the right thing to
>> hang on rather than a waitqueue?
>
> I explained that above.

And I explained what I meant.

>> The completion already contains a waitqueue, so I think you're
>> just essentially reimplementing it.
>>
>> Just complete(&mq->mq_req_complete) or something should do
>> the trick.
>
> Nope.

That's a very terse answer to an honest review comment.

>>> +       else
>>> +               kblockd_schedule_work(&mq->complete_work);
>>
>> I did not use the kblockd workqueue for this, out of fear
>> that it would interfere and disturb the block layer work items.
>> My intuitive idea was that the MMC layer needed its own
>> worker (like in the past it used a thread) in order to avoid
>> congestion in the block layer queue leading to unnecessary
>> delays.
>
> The complete work races with the dispatch of the next request, so putting
> them in the same workqueue makes sense. i.e. the one that gets processed
> first would anyway delay the one that gets processed second.

So what we want to attain is that the next dispatch happens as soon
as possible after completing the previous request. They only race
as far as that they go through post/preprocessing before getting to
the synchronization primitive though?

>> On the other hand, this likely avoids a context switch if there
>> is no congestion on the queue.
>>
>> I am uncertain when it is advisible to use the block layer
>> queue for subsystems like MMC/SD.
>>
>> Would be nice to see some direction from the block layer
>> folks here, it is indeed exposed to us...
>>
>> My performance tests show no problems with this approach
>> though.
>
> As I already wrote, the CPU-bound block layer dispatch work queue has
> negative consequeuences for mmc performance.  So there are 2 aspects to that:
>         1. Is the choice of CPU right to start with?  I suspect it is better for
> the dispatch to run on the same CPU as the interrupt.
>         2. Does the dispatch work need to be able to be migrated to a different
> CPU? i.e. unbound work queue.  That helped in my tests, but it could just be
> a side-effect of 1.

Hm! What you are saying sounds correct, and we really need to
consider the multi-CPU aspects of this, maybe not now. I am
happy as long as we have equal performance as before and
maintainable code.

> Of course we can't start looking at these real issues, while you are

You seem to have dropped the mic there, but it looks like
what you were going to say was not nice anyway so I guess
it was for better.

>>> +static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
>>> +{
>>> +       struct request_queue *q = mq->queue;
>>> +       unsigned long flags;
>>> +       bool done;
>>> +
>>> +       spin_lock_irqsave(q->queue_lock, flags);
>>> +       done = !mq->rw_wait;
>>> +       mq->waiting = !done;
>>> +       spin_unlock_irqrestore(q->queue_lock, flags);
>>
>> This makes it look like a reimplementation of completion_done()
>> so I think you should use the completion abstraction again. The
>> struct completion even contains a variable named "done".
>
> This just serves as an example of why splitting up the patch was such a bad
> idea.  For direct completion, the wait can result in recovery being needed
> for the previous request, so the current request gets requeued.

I guess this falls back to the previous comment on the
synchronization primitives.

>> This is pretty straight-forward (pending the comments above).
>> Again it has the downside of duplicating the same code for the
>> old block layer instead of refactoring.
>
> No, the old code is rubbish.  Churning it is a waste of time.

I think we have addressed this recurring theme.

>> Again looks fine, again duplicates code. In this case I don't even
>> see why the MQ code needs its own copy of the issue funtion.
>
> Because it has to support CQE.  This attitude against CQE is very disappointing!

Please understand that there is no conspiracy against CQE out
there.

I admit I personally think getting MQ in place and making the code
long-term maintainable is more important than supporting
CQE. But that does not mean I think CQE is unimportant.

Or rather: just because someone is FOR something else,
doesn't mean they are AGAINST this.

I am pretty sure both sets of functionality can be attained in the next
merge window, but it requires good faith.

>>> +enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req)
>>> +{
>>> +       if (req_op(req) == REQ_OP_READ || req_op(req) == REQ_OP_WRITE)
>>> +               return MMC_ISSUE_ASYNC;
>>> +
>>> +       return MMC_ISSUE_SYNC;
>>> +}
>>
>> Distinguishing between SYNC and ASYNC operations and using
>> that as abstraction is nice.
>>
>> But you only do this in the new MQ code.
>>
>> Instead, make this a separate patch and first refactor the old
>> code to use this distinction between SYNC and ASYNC.
>
> That is a non-starter.  The old code is rubbish.  Point to something work
> saving.  There isn't anything.

I think we have addressed this.

>> Unfortunately I think Ulf's earlier criticism that you're rewriting
>> the world instead of refactoring what we have still stands on many
>> accounts here.
>
> Nope.  It is just an excuse to delay the patches.  You guys are playing
> games and it is embarassing for linux.  What is actually wrong with this
> technically?  It is not good because it doesn't churn the old code?  That is
> ridiculous.

It's not good because it does not make MQ mandatory and does
not delete the interfaces to the old block layer.

>> It makes it even harder to understand your persistance in keeping
>> the old block layer around. If you're introducing new concepts and
>> cleaner code in the MQ path and kind of discarding the old
>> block layer path, why keep it around at all?
>
> Wow, you really like making things up.  Never have I suggested keeping the
> old code.  It is rubbish.

So why are you not just deleting it?

>  As soon and blk-mq is ready and tested, delete
> the old crap.

I disagree. No "ready and tested" is needed, the code is ready,
and I have tested it. It performs. Delete the old code now.

> I was expecting CQE to be applied 6 months ago, supporting the legacy blk
> layer until blk-mq was ready.  But you never delivered on blk-mq, which is
> why I had to do it.  And now you are making up excuses about why we can't
> move forward.

Don't be so conspiracist. I think I have answered this several
times over.

Migrating to MQ and getting rid of the old block layer interface is
paramount in my view. That is the essence of all my review feedback.
The other comments are minor, and even if you don't take my
comments into consideration it is stuff I can work on fixing after
these patches are merged.

If you just make a series also deleteing the old block layer
I will test it so it doesn't break anything and then you can
probably expect a big Acked-by on the whole shebang.

>> I would have a much easier time accepting this patch if it
>> deleted as much as it was adding, i.e. introduce all this new
>> nice MQ code, but also tossing out the old block layer and error
>> handling code. Even if it is a massive rewrite, at least there
>> is just one body of code to maintain going forward.
>
> How can you pssibly call a few hundred lines massive.  The kernel has
> millions of lines.  Your sense of scale is out of whack.

Arguably you are speaking against yourself, since the old code
was described by yourself as "rubbish", and it is very terse and
uninituitive to read even a few lines of rubbish.

I had to create a (quite popular) Googledoc breaking down the
MMC stack in words before I could even start hacking at it.
So it is not massive in the sense of number of lines but in the
sense of intelligibility, it's so terse that it feels like eating a
too big piece of mudcake or something.

Just delete the rubbish and I'm happy.

>> That said, I would strongly prefer a refactoring of the old block
>> layer leading up to transitioning to MQ. But I am indeed biased
>> since I took that approach myself.
>
> Well stop it.  We have nice working code.  Get it applied and tested, and
> then we can delete the old crap.

Just get the old code deleted so there is just one thing to
test and not a matrix of old and new code paths.

>> This timeout looks like something I need to pick up in my patch
>> set as well. It seems good for stability to support this. But what happened
>> here? Did you experience a bunch of timeouts during development,
>> or let's say how was this engineered, I guess it is for the case when
>> something randomly locks up for a long time and we don't really know
>> what has happened, like a watchdog?
>
> We presently don't have the host APIs to support external timeouts.  CQE
> uses them though.

OK

>>> +static int mmc_init_request(struct request_queue *q, struct request *req,
>>> +                           gfp_t gfp)
>>> +{
>>> +       return __mmc_init_request(q->queuedata, req, gfp);
>>> +}
>>> +
>> (...)
>>> +static int mmc_mq_init_request(struct blk_mq_tag_set *set, struct request *req,
>>> +                              unsigned int hctx_idx, unsigned int numa_node)
>>> +{
>>> +       return __mmc_init_request(set->driver_data, req, GFP_KERNEL);
>>> +}
>>> +
>>> +static void mmc_mq_exit_request(struct blk_mq_tag_set *set, struct request *req,
>>> +                               unsigned int hctx_idx)
>>> +{
>>> +       struct mmc_queue *mq = set->driver_data;
>>> +
>>> +       mmc_exit_request(mq->queue, req);
>>> +}
>>
>> Here is more code duplication just to keep both the old block layer
>> and MQ around. Including introducing another inner __foo function
>> which I have something strongly against personally (I might be
>> crazily picky, because I see many people do this).
>
> In this case, it is not code duplication it is re-using the same code but
> called from the blk-mq API.

Right, OK.

>>> +               /*
>>> +                * Timeouts are handled by mmc core, so set a large value to
>>> +                * avoid races.
>>> +                */
>>> +               req->timeout = 600 * HZ;
>>> +               break;
>>> +       }
>>
>> These timeouts again, does this mean we have competing timeout
>> code in the block layer and MMC?
>
> Yes - the host controller provides hardware timeout interrupts in most
> cases.  The core provides software timeouts in other cases.

OK maybe we can extend the comment about to explain this
situation so it is clear what is racing with what.

>> This mentions timeouts in the MMC core, but they are actually
>> coming from the *MMC* core, when below you set:
>> blk_queue_rq_timeout(mq->queue, 60 * HZ);?
>>
>> Isn't the actual case that the per-queue timeout is set up to
>> occur before the per-request timeout, and that you are hacking
>> around the block layer core having two different timeouts?
>
> There is no per-queue timeout.  The request timeout has a default value
> given by the queue.  It can be changed for different requests.

Aha.

>> It's a bit confusing so I'd really like to know what's going on...
>
> I don't expect to have to teach you the block layer.

But I will tell you to stop snarking around every time you
write things like that.


>>> +       mq->in_flight[issue_type] += 1;
>>> +       get_card = mmc_tot_in_flight(mq) == 1;
>>
>> Parenthesis around the logical expression preferred I guess
>> get_card = (mmc_tot_in_flight(mq) == 1);
>> (Isn't checkpatch complaining about this?)
>
> Nope

Too bad. OK I think it's nice with a paranthesis anyway.
Aids perception or something.

>> Then:
>> (...)
>>> +       if (get_card)
>>> +               mmc_get_card(card, &mq->ctx);
>>
>> I simply took the card on every request. Since the context is the
>> same for all block layer business and the lock is now fully
>> reentrant this if (get_card) is not necessary. Just take it for
>> every request and release it in the .complete() callback.
>
> As I have written elsewhere, we have always avoind getting / putting
> unecessarily.  It is better that way, so no point in taking it out.

I explained about about the double counters.

>>> +#define MMC_QUEUE_DEPTH 64
>>> +
>>> +static int mmc_mq_init(struct mmc_queue *mq, struct mmc_card *card,
>>> +                        spinlock_t *lock)
>>> +{
>>> +       int q_depth;
>>> +       int ret;
>>> +
>>> +       q_depth = MMC_QUEUE_DEPTH;
>>> +
>>> +       ret = mmc_mq_init_queue(mq, q_depth, &mmc_mq_ops, lock);
>>
>> Apart from using a define, then assigning the define to a
>> variable and then passing that variable instead of just
>> passing the define: why 64? Is that the depth of the CQE
>> queue? In that case we need an if (cqe) and set it down
>> to 2 for non-CQE.
>
> Are you ever going to learn about the block layer.

Are you ever going to treat your fellow community peers as
equals?

It is actually your job as a patch submitter to teach others about
the details of what you are doing. If you think all your fellow programmers
suck then elevate them to your own level instead of pushing them down.

> The number of requests
> is default 128 for the legacy block layer.

I had no clue.

> For blk-mq it is queue depth
> times 2.  So 64 gives the same number of requests as before.

I see. But by just setting it to 2 as I do in my patch set, it still
performs, I guess just with less buffers allocated.

This makes some kind of sense though and I guess it explains
why we got out of memory with the bounce buffer thing, as that
then resulted in 128*64KB of allocated memory for the request
queue. I thought it would be ... a few requests.

>>> +       if (ret)
>>> +               return ret;
>>> +
>>> +       blk_queue_rq_timeout(mq->queue, 60 * HZ);
>>
>> And requests timeout after 1 minute I take it.
>>
>> I suspect both of these have some relation to CQE, so that is where
>> you find these long execution times etc?
>
> For legacy mmc, the core takes care of timeouts.  For CQE we expect reliable
> devices and I would interpret a timeout as meaning the device is broken.

OK.

> However it is sensible to have anyway.  For CQE, a request might have to
> wait for the entire rest of the queue to be processed first, or maybe the
> request somehow gets stuck and there are other requests constantly
> overtaking it.  The max queue depth is 32 so 60 seconds seems ok.

OK we will get to see as we see more and more devices of this type.
It's fine.

>>> +static void mmc_mq_queue_suspend(struct mmc_queue *mq)
>>> +{
>>> +       blk_mq_quiesce_queue(mq->queue);
>>> +
>>> +       /*
>>> +        * The host remains claimed while there are outstanding requests, so
>>> +        * simply claiming and releasing here ensures there are none.
>>> +        */
>>> +       mmc_claim_host(mq->card->host);
>>> +       mmc_release_host(mq->card->host);
>>
>> I think just blk_mq_quiesce_queue() should be fine as and
>> should make sure all requests have called .complete() and there
>> I think you should also release the host lock.
>>
>> If the MQ code is not doing this, we need to fix MQ to
>> do the right thing (or add a new callback such as
>> blk_mq_make_sure_queue_empty()) so at the very
>> least put a big fat FIXME or REVISIT comment on the above.
>
> blk_mq_quiesce_queue() prevents dispatches not completions.  So we wait for
> outstanding requests.

I guess the quiesce call not really intended for PM suspend/resume
paths. Maybe we need to add a blk_mq_flush_queue() call then.
(No showstopper, this is OK for now.)

>> One of the good reasons to delete the old block layer is to get
>> rid of this horrible semaphore construction. So I see it as necessary
>> to be able to focus development efforts on code that actually has
>> a future.
>
> The old crap will get deleted when blk-mq is ready.

Which IMO is now.

>>> +
>>> +       int                     in_flight[MMC_ISSUE_MAX];
>>
>> So this is a [2] containing a counter for the number of
>> synchronous and asynchronous requests in flight at any
>> time.
>>
>> But are there really synchronous and asynchronous requests
>> going on at the same time?
>>
>> Maybe on the error path I guess.
>>
>> I avoided this completely but I guess it may be necessary with
>> CQE, such that in_flight[0,1] is way more than 1 or 2 at times
>> when there are commands queued?
>
> CQE needs to count DCMD separately from read / writes.  Counting by issue
> type is a simple way to do that.

OK

>>> +       bool                    rw_wait;
>>> +       bool                    waiting;
>>> +       wait_queue_head_t       wait;
>>
>> As mentioned I think this is a reimplementation of
>> the completion abstraction.
>
> I pointed out why that wouldn't work.  Another case of why the code makes
> more sense together than split up.

I'm not entirely convinced about that but we'll see. It is a detail.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK
  2017-11-09  7:12     ` Adrian Hunter
@ 2017-11-10  8:18       ` Linus Walleij
  0 siblings, 0 replies; 55+ messages in thread
From: Linus Walleij @ 2017-11-10  8:18 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Thu, Nov 9, 2017 at 8:12 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 08/11/17 11:24, Linus Walleij wrote:
>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>>> Add CQHCI initialization and implement CQHCI operations for Intel GLK.
>>>
>>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>>
>> This patch seems OK in context, but it merely illustrates the
>> weirdness of .[runtime]_suspend/resume calling into CQE-specific
>> APIs rather than using generic host callbacks.
>
> Your comment makes no sense at all.  The host driver has
> [runtime]_suspend/resume callbacks and it is up to the host driver to decide
> what to do.  CQHCI provides helpers since that is the whole point of having
> a CQHCI library.

Yeah I didn't see it in context, the CQHCI library for these hosts
make a lot of sense.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery
  2017-11-09 13:02         ` Adrian Hunter
@ 2017-11-10  8:25           ` Linus Walleij
  0 siblings, 0 replies; 55+ messages in thread
From: Linus Walleij @ 2017-11-10  8:25 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Thu, Nov 9, 2017 at 2:02 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 09/11/17 14:52, Linus Walleij wrote:
>> On Thu, Nov 9, 2017 at 8:56 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>> On 08/11/17 11:30, Linus Walleij wrote:
>>>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>>>
>>>>> Recovery is simpler to understand if it is only used for errors. Create a
>>>>> separate function for card polling.
>>>>>
>>>>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>>>>
>>>> This looks good but I can't see why it's not folded into
>>>> patch 3 already. This error handling is introduced there.
>>>
>>> What are you on about?
>>
>> You are attacking your most valuable resource, a reviewer.
>>
>> And I even said the patch looks good.
>>
>> The only thing you attain with this kind of langauge is alienante
>> me and discourage others to review your patch set. You also
>> give your employer a bad name, since you are representing
>> them.
>
> 6 months of being messed around will do that.

Blessed are the meek, for they will inherit the earth.

Nobody is after you, this is just hard stuff and too few people care to
help out with review etc. With this and with the block layer as well.
That makes things slow. It's noone's agenda to slow things down.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host
  2017-11-09 12:55         ` Adrian Hunter
@ 2017-11-10  8:29           ` Linus Walleij
  0 siblings, 0 replies; 55+ messages in thread
From: Linus Walleij @ 2017-11-10  8:29 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On Thu, Nov 9, 2017 at 1:55 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 09/11/17 14:26, Linus Walleij wrote:

>>>> I think the above approach to put any CQE-specific callbacks
>>>> directly into the struct mmc_host_ops is way more viable.
>>>
>>> Nothing to do with CQE.  This is CQHCI.  Please try to get the difference.
>>
>> I am trying, please try to think about your language.
>
> I strongly disapprove of being rude but sadly it seems to get results.

Conflict is a natural part of human existence. Using it deliberately
to get one's way is manipulation. Admitting to being manipulative
is losing one's face in public.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 03/10] mmc: block: Add blk-mq support
  2017-11-09 15:52       ` Linus Walleij
@ 2017-11-10 10:19         ` Linus Walleij
  2017-11-14 13:10           ` Adrian Hunter
  1 sibling, 0 replies; 55+ messages in thread
From: Linus Walleij @ 2017-11-10 10:19 UTC (permalink / raw)
  To: Adrian Hunter, Neil Brown, Ming Lei
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

Following up on this:

On Thu, Nov 9, 2017 at 4:52 PM, Linus Walleij <linus.walleij@linaro.org> wrote:

>> No one has ever suggested that the legacy API will remain.  Once blk-mq is
>> ready the old code gets deleted.
>
> The block layer maintainers most definately think MQ is ready
> but you seem to disagree. Why?
>
> In the recent LWN article from kernel recepies:
> https://lwn.net/Articles/735275/
> the only obstacle I can see is a mention that SCSI was not
> switched over by default because of some problems with
> slow rotating media. "This change had to be backed out
> because of some performance and scalability issues that
> arose for rotating storage."
>
> Christoph mentions that he's gonna try again for v4.14.
> But it's true, I do not see that happening in linux-next yet.

Neil Brown's article on LWN points to Ming Lei's patch
set:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1513023.html

As addressing the issue that held back blk-MQ from
being default for SCSI, and it is reportedly to be
queued for v4.15. Since this MMC/SD rework is
also targeted for v4.15 I think we can assume it is
pretty much ready for everything, and delete the
non-MQ block path.

I just haven't seen any of this problem in my tests
with MMC/SD so I do not think it would be affected,
but it anyways seem to be fixed.

OK maybe I am especially optimistic. But it's not just
based on that.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 03/10] mmc: block: Add blk-mq support
  2017-11-09 15:52       ` Linus Walleij
@ 2017-11-14 13:10           ` Adrian Hunter
  2017-11-14 13:10           ` Adrian Hunter
  1 sibling, 0 replies; 55+ messages in thread
From: Adrian Hunter @ 2017-11-14 13:10 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 09/11/17 17:52, Linus Walleij wrote:
> On Thu, Nov 9, 2017 at 11:42 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> On 08/11/17 10:54, Linus Walleij wrote:
>>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>>> At least you could do what I did and break out a helper like
>>> this:
>>>
>>> /*
>>>  * This reports status back to the block layer for a finished request.
>>>  */
>>> static void mmc_blk_complete(struct mmc_queue_req *mq_rq,
>>>                             blk_status_t status)
>>> {
>>>        struct request *req = mmc_queue_req_to_req(mq_rq);
>>>
>>>        if (req->mq_ctx) {
>>>           blk_mq_end_request(req, status);
>>>        } else
>>>           blk_end_request_all(req, status);
>>> }
>>
>> You are quibbling.
> 
> Hm wiktionary "quibble" points to:
> https://en.wikipedia.org/wiki/Trivial_objections
> 
> Sorry for not being a native speaker in English, I think that
> maybe you were again trying to be snarky to a reviewer but
> I honestly don't know.
> 
>> It makes next to no difference especially as it all goes
>> away when the legacy code is deleted.
> 
> Which is something your patch set doesn't do. Nor did you write
> anywhere that deleting the legacy code was your intention.
> But I'm happy to hear it.
> 
>>  I will change it in the next
>> revision, but what a waste of everyone's time.
>>  Please try to focus on
>> things that matter.
> 
> As Jean-Paul Sartre said "hell is other people".
> Can you try to be friendlier anyway?
> 
>>> This retry and error handling using requeue is very elegant.
>>> I really like this.
>>>
>>> If we could also go for MQ-only, only this nice code
>>> remains in the tree.
>>
>> No one has ever suggested that the legacy API will remain.  Once blk-mq is
>> ready the old code gets deleted.
> 
> The block layer maintainers most definately think MQ is ready
> but you seem to disagree. Why?

I meant when the mmc conversion to blk-mq is ready.  We don't need to
consider other sub-systems, just whether it is working for mmc.  For
example, the issue with the block layer workqueues not performing as well as
a thread.  That only came to light through testing mmc.

> 
> In the recent LWN article from kernel recepies:
> https://lwn.net/Articles/735275/
> the only obstacle I can see is a mention that SCSI was not
> switched over by default because of some problems with
> slow rotating media. "This change had to be backed out
> because of some performance and scalability issues that
> arose for rotating storage."
> 
> Christoph mentions that he's gonna try again for v4.14.
> But it's true, I do not see that happening in linux-next yet.
> 
> But that is specifically rotating media, which MMC/SD is not.
> And UBI is using it since ages without complaints. And
> that is a sign of something.
> 
> Have you seen any problems when deploying it on MMC/SD,
> really? If there are problems I bet the block layer people want
> us to find them, diagnose them and ultimately fix them rather
> than wait for them to do it. I haven't seen any performance or
> scalability issues. At one point I had processes running
> on two eMMC and one SD-card and apart from maxing out
> the CPUs and DMA backplace as could be expected all was
> fine.
> 
>>> The problem: you have just reimplemented the whole error
>>> handling we had in the old block layer and now we have to
>>> maintain two copies and keep them in sync.
>>>
>>> This is not OK IMO, we will inevitable screw it up, so we
>>> need to get *one* error path.
>>
>> Wow, you really didn't read the code at all.
> 
> Just quit this snarky attitude.
> 
>> As I have repeatedly pointed
>> out, the new code is all new.  There is no overlap and there nothing to keep
>> in sync.
> 
> The problem is that you have not clearly communicated your
> vision to delete the old code. The best way to communicate that
> would be to include a patch actually deleteing it.
> 
>>  It may not look like it in this patch, but that is only because of
>> the ridiculous idea of splitting up the patch.
> 
> Naming other people's review feedback as "ridiculous" is not very
> helpful. I think the patch set became much easier to review
> like this so I am happy with this format.
> 
>>>> +static void mmc_blk_mq_acct_req_done(struct mmc_queue *mq, struct request *req)
>>>
>>> What does "acct" mean in the above function name?
>>> Accounting? Actual? I'm lost.
>>
>> Does "actual" have two "c"'s.  You are just making things up.
> 
> Please stop your snarky and belitteling attitude.
> 
> I am not a native English speaker and I am not making up
> my review comments in order to annoy you.
> 
> Consider Hanlon's razor:
> https://en.wikipedia.org/wiki/Hanlon%27s_razor
> "Never attribute to malice that which is adequately explained by
> stupidity".
> 
> It's bad enough that you repeatedly point out how stupid
> you think I am

I have never called you names.  On the contrary, it is because I don't think
you are stupid that I get upset when you seem spend so little time and
effort to understand the code.  Equally, you seem to make comments that just
assume the code is wrong without due consideration.

>                 I am sorry about that, in my opinion sometimes
> other people are really stupid but more often than not the problem
> is really on your side, like being impatient and unwilling to educate
> others about how your code actually works becuase it seems
> so evident to yourself that you think it should be evident to everyone
> else as well. I don't know what to say about that, it seems a bit
> unempatic.
> 
> But you even go and think I am making stuff up in purpose.
> That's pretty offensive.
> 
>> Of course it is "account".
> 
> This is maybe obvious for you but it was not to me.
> 
>>  It is counting the number of requests in flight - which is
>> pretty obvious from the code.  We use that to support important features
>> like CQE re-tuning and avoiding getting / putting the card all the time.
> 
> What is so hard with getting this point across without insults?
> 
>>>> +{
>>>> +       struct request_queue *q = req->q;
>>>> +       unsigned long flags;
>>>> +       bool put_card;
>>>> +
>>>> +       spin_lock_irqsave(q->queue_lock, flags);
>>>> +
>>>> +       mq->in_flight[mmc_issue_type(mq, req)] -= 1;
>>>> +
>>>> +       put_card = mmc_tot_in_flight(mq) == 0;
>>>
>>> This in_flight[] business seems a bit kludgy, but I
>>> don't really understand it fully. Magic numbers like
>>> -1 to mark that something is not going on etc, not
>>> super-elegant.
>>
>> You are misreading.  It subtracts 1 from the number of requests in flight.
> 
> Right! Sorry I misread it.
> 
>>> I believe it is necessary for CQE though as you need
>>> to keep track of outstanding requests?
>>
>> We have always avoided getting / putting the card when there is another
>> request in flight.  Yes it is also used for CQE.
> 
> Yeah I took the approach to get/put the card around each
> request instead. It might make things a bit simpler.
> 
>>>> +       spin_unlock_irqrestore(q->queue_lock, flags);
>>>> +
>>>> +       if (put_card)
>>>> +               mmc_put_card(mq->card, &mq->ctx);
>>>
>>> I think you should try not to sprinkle mmc_put_card() inside
>>> the different functions but instead you can put this in the
>>> .complete callback I guess mmc_blk_mq_complete() in your
>>> patch set.
>>
>> This is the *only* block.c function in the blk-mq code that calls
>> mmc_put_card().  The queue also has does it for requests that didn't even
>> start.  That is it.
> 
> Yeah that's correct. If we also delete the old code it will not be
> too bad.
> 
>>> Also you do not need to avoid calling it several times with
>>> that put_card variable. It's fully reentrant thanks to your
>>> own code in the lock and all calls come from the same block
>>> layer process if you call it in .complete() I think?
>>
>> We have always avoided unnecessary gets / puts.  Since the result is better,
>> why on earth take it out?
> 
> It's not a showstopper.
> 
> But what I think is nice in doing it around
> each request is that since mmc_put_card() calls mmc_release_host()
> contains this:
> 
> if (--host->claim_cnt) { (...)
> 
> So there is a counter inside the claim/release host functions
> and now there is another counter keeping track if the in-flight
> requests as well and it gives me the feeling we are maintaining
> two counters when we only need one.

The gets / puts also get runtime pm for the card.  It is a lot a messing
around for the sake of a quick check for the number of requests inflight -
which is needed anyway for CQE.

> 
> But maybe it is actually the host->claim_cnt that is overengineered
> and should just be a bool, becuase with this construction
> that you only call down and claim the host once, the
> host->claim_cnt will only be 0 or 1, right?

The claim_cnt was introduced for nested claiming.

> 
>>> Now the problem Ulf has pointed out starts to creep out in the
>>> patch: a lot of code duplication on the MQ path compared to
>>> the ordinary block layer path.
>>>
>>> My approach was structured partly to avoid this: first refactor
>>> the old path, then switch to (only) MQ to avoid code duplication.
>>
>> The old code is rubbish.  There is nothing of value there.  You have to make
>> the case why you are wasting everyone's time churning crappy code.
> 
> I kind of agree that the old code is rubbish, actually.
> 
> I guess I could turn that around and ask: if it is so crappy, why
> is your patch set not deleting it?

To allow people time to test.

> 
> And I guess the only reasonable answer to that would be what
> you were saying above, something along the lines of "MQ is not
> ready yet". But it can be argued that MQ is ready.
> 
>>  The idea
>> that nice new code is wrong because it doesn't churn the old rubbish code,
>> is ridiculous.
> 
> As I have said in the review comments I like your new
> code, especially the new error path.
> 
> I took the approach of refactoring because I was afraid I would
> break something I guess. But rewriting from scratch is also an
> option.
> 
> But I think I would prefer that if a big slew of new code is
> introduced it needs to be put to wide use and any bugs
> smoked out during the -rc phase, and we are now
> hiding it behind a Kconfig option so it's unlikely to see testing
> until that option is turned on, and that is not good.

And if you find a big problem in rc7, then what do you do?  At least with
the config option, the revert is trivial.

> 
>>> This looks a bit like it is reimplementing the kernel
>>> completion abstraction using a mutex and a variable named
>>> .complete_req?
>>>
>>> We were using a completion in the old block layer so
>>> why did you not use it for MQ?
>>
>> Doesn't seem like you pay much attention to this stuff.
> 
> You really have to stop your snarky and belitteling attitude
> to your fellow kernel developers.
> 
>>>From 6.Followthrough.rst again:
> 
> ----------8<-------------------8<-------------------8<-----------
> 
> Andrew Morton has suggested that every review comment which does not result
> in a code change should result in an additional code comment instead; that
> can help future reviewers avoid the questions which came up the first time
> around.
> 
> ----------8<-------------------8<-------------------8<-----------
> 
> At least take a moment to consider the option that maybe so few people
> are reviwing your code because this is complicated stuff and really
> hard to grasp, maybe the problem isn't on my side, neither on yours,
> it may be that the subject matter is the problem.

I would expect a review would take a number of days, perhaps longer.
However, it seems people are not willing to consider anything that takes
more than a an hour or two.

> 
>>  The previous
>> request has to be completed even if there is no next request.  That means
>> schduling work, that then races with the dispatch path.  So mutual exclusion
>> is necessary.
> 
> I would say a synchronization primitive is needed, indeed.
> I have a different approach to this, which uses completion
> as the synchronization primitive and I think that would be possible
> to use also here.

Because they are both just wake-ups.  When waiting for multiple conditions,
wait_event() is simpler.  You are not considering the possibility for having
only 1 task switch between requests.

> 
> I have this code:

Which runs as a work item, so there is already a task switch to get here.
i.e. there are 2 task switches between each request, instead of 1.

> 
>         /*
>          * Here we postprocess the request differently depending on if
>          * we go on the success path or error path. The success path will
>          * immediately let new requests hit the host, whereas the error
>          * path will hold off new requests until we have retried and
>          * succeeded or failed the current asynchronous request.
>          */
>         if (status == MMC_BLK_SUCCESS) {
>                 /*
>                  * This immediately opens the gate for the next request
>                  * to start on the host while we perform post-processing
>                  * and report back to the block layer.
>                  */
>                 host->areq = NULL;
>                 complete(&areq->complete);

So that is the second task switch.

>                 mmc_post_req(host, areq->mrq, 0);
>                 if (areq->report_done_status)
>                         areq->report_done_status(areq, MMC_BLK_SUCCESS);
>         } else {
>                 /*
>                  * Post-process this request. Then, if
>                  * another request was already prepared, back that out
>                  * so we can handle the errors without anything prepared
>                  * on the host.
>                  */
>                 if (host->areq_pending)
>                         mmc_post_req(host, host->areq_pending->mrq, -EINVAL);
>                 /*
>                  * Call back with error status, this will trigger retry
>                  * etc if needed
>                  */
>                 if (areq->report_done_status) {
>                         if (areq->report_done_status(areq, status)) {
>                                 /*
>                                  * This happens when we finally give up after
>                                  * a few retries or on unrecoverable errors.
>                                  */
>                                 mmc_post_req(host, areq->mrq, 0);
>                                 host->areq = NULL;
>                                 /* Re-prepare the next request */
>                                 if (host->areq_pending)
>                                         mmc_pre_req(host,
> host->areq_pending->mrq);
>                                 complete(&areq->complete);
>                         }
>                 }
>         }
> 
> As you can see it is a bit elaborate about some
> of this stuff and quite a lot of comments were added to make
> clear what is going on. This is in my head so that is why I ask:
> completion worked fine as a synchronization primitive here
> so it is maybe a good alternative in your (similar) code
> as well?

In the case there is another request already waiting, then instead of
scheduling work to complete the request, the dispatch is woken to complete
the previous and start the next.  That needs different synchronization.

> 
>>> I would contest using a waitqueue for this. The name even says
>>> "complete_req" so why is a completion not the right thing to
>>> hang on rather than a waitqueue?
>>
>> I explained that above.
> 
> And I explained what I meant.
> 
>>> The completion already contains a waitqueue, so I think you're
>>> just essentially reimplementing it.
>>>
>>> Just complete(&mq->mq_req_complete) or something should do
>>> the trick.
>>
>> Nope.
> 
> That's a very terse answer to an honest review comment.
> 
>>>> +       else
>>>> +               kblockd_schedule_work(&mq->complete_work);
>>>
>>> I did not use the kblockd workqueue for this, out of fear
>>> that it would interfere and disturb the block layer work items.
>>> My intuitive idea was that the MMC layer needed its own
>>> worker (like in the past it used a thread) in order to avoid
>>> congestion in the block layer queue leading to unnecessary
>>> delays.
>>
>> The complete work races with the dispatch of the next request, so putting
>> them in the same workqueue makes sense. i.e. the one that gets processed
>> first would anyway delay the one that gets processed second.
> 
> So what we want to attain is that the next dispatch happens as soon
> as possible after completing the previous request. They only race
> as far as that they go through post/preprocessing before getting to
> the synchronization primitive though?

They only race when the next request is not already waiting.  Otherwise the
work is never scheduled.

> 
>>> On the other hand, this likely avoids a context switch if there
>>> is no congestion on the queue.
>>>
>>> I am uncertain when it is advisible to use the block layer
>>> queue for subsystems like MMC/SD.
>>>
>>> Would be nice to see some direction from the block layer
>>> folks here, it is indeed exposed to us...
>>>
>>> My performance tests show no problems with this approach
>>> though.
>>
>> As I already wrote, the CPU-bound block layer dispatch work queue has
>> negative consequeuences for mmc performance.  So there are 2 aspects to that:
>>         1. Is the choice of CPU right to start with?  I suspect it is better for
>> the dispatch to run on the same CPU as the interrupt.
>>         2. Does the dispatch work need to be able to be migrated to a different
>> CPU? i.e. unbound work queue.  That helped in my tests, but it could just be
>> a side-effect of 1.
> 
> Hm! What you are saying sounds correct, and we really need to
> consider the multi-CPU aspects of this, maybe not now. I am
> happy as long as we have equal performance as before and
> maintainable code.

Well I saw 3% drop in sequential read performance, improving to 1% when an
unbound workqueue was used.

> 
>> Of course we can't start looking at these real issues, while you are
> 
> You seem to have dropped the mic there, but it looks like
> what you were going to say was not nice anyway so I guess
> it was for better.
> 
>>>> +static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
>>>> +{
>>>> +       struct request_queue *q = mq->queue;
>>>> +       unsigned long flags;
>>>> +       bool done;
>>>> +
>>>> +       spin_lock_irqsave(q->queue_lock, flags);
>>>> +       done = !mq->rw_wait;
>>>> +       mq->waiting = !done;
>>>> +       spin_unlock_irqrestore(q->queue_lock, flags);
>>>
>>> This makes it look like a reimplementation of completion_done()
>>> so I think you should use the completion abstraction again. The
>>> struct completion even contains a variable named "done".
>>
>> This just serves as an example of why splitting up the patch was such a bad
>> idea.  For direct completion, the wait can result in recovery being needed
>> for the previous request, so the current request gets requeued.
> 
> I guess this falls back to the previous comment on the
> synchronization primitives.
> 
>>> This is pretty straight-forward (pending the comments above).
>>> Again it has the downside of duplicating the same code for the
>>> old block layer instead of refactoring.
>>
>> No, the old code is rubbish.  Churning it is a waste of time.
> 
> I think we have addressed this recurring theme.
> 
>>> Again looks fine, again duplicates code. In this case I don't even
>>> see why the MQ code needs its own copy of the issue funtion.
>>
>> Because it has to support CQE.  This attitude against CQE is very disappointing!
> 
> Please understand that there is no conspiracy against CQE out
> there.
> 
> I admit I personally think getting MQ in place and making the code
> long-term maintainable is more important than supporting
> CQE. But that does not mean I think CQE is unimportant.
> 
> Or rather: just because someone is FOR something else,
> doesn't mean they are AGAINST this.

But that is not what happened.  You blocked CQE even though it was ready.

> 
> I am pretty sure both sets of functionality can be attained in the next
> merge window, but it requires good faith.
> 
>>>> +enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req)
>>>> +{
>>>> +       if (req_op(req) == REQ_OP_READ || req_op(req) == REQ_OP_WRITE)
>>>> +               return MMC_ISSUE_ASYNC;
>>>> +
>>>> +       return MMC_ISSUE_SYNC;
>>>> +}
>>>
>>> Distinguishing between SYNC and ASYNC operations and using
>>> that as abstraction is nice.
>>>
>>> But you only do this in the new MQ code.
>>>
>>> Instead, make this a separate patch and first refactor the old
>>> code to use this distinction between SYNC and ASYNC.
>>
>> That is a non-starter.  The old code is rubbish.  Point to something work
>> saving.  There isn't anything.
> 
> I think we have addressed this.
> 
>>> Unfortunately I think Ulf's earlier criticism that you're rewriting
>>> the world instead of refactoring what we have still stands on many
>>> accounts here.
>>
>> Nope.  It is just an excuse to delay the patches.  You guys are playing
>> games and it is embarassing for linux.  What is actually wrong with this
>> technically?  It is not good because it doesn't churn the old code?  That is
>> ridiculous.
> 
> It's not good because it does not make MQ mandatory and does
> not delete the interfaces to the old block layer.

That is a separate issue.

> 
>>> It makes it even harder to understand your persistance in keeping
>>> the old block layer around. If you're introducing new concepts and
>>> cleaner code in the MQ path and kind of discarding the old
>>> block layer path, why keep it around at all?
>>
>> Wow, you really like making things up.  Never have I suggested keeping the
>> old code.  It is rubbish.
> 
> So why are you not just deleting it?

Because it needs more testing first.  If the testing goes well, then switch
over the default.  If, after 1 or more release cycles, there are no
problems, delete the old code.

> 
>>  As soon and blk-mq is ready and tested, delete
>> the old crap.
> 
> I disagree. No "ready and tested" is needed, the code is ready,
> and I have tested it. It performs. Delete the old code now.

Not true.  My testing showed the block layer workqueue wasn't performing as
well as a thread.

> 
>> I was expecting CQE to be applied 6 months ago, supporting the legacy blk
>> layer until blk-mq was ready.  But you never delivered on blk-mq, which is
>> why I had to do it.  And now you are making up excuses about why we can't
>> move forward.
> 
> Don't be so conspiracist. I think I have answered this several
> times over.
> 
> Migrating to MQ and getting rid of the old block layer interface is
> paramount in my view. That is the essence of all my review feedback.
> The other comments are minor, and even if you don't take my
> comments into consideration it is stuff I can work on fixing after
> these patches are merged.
> 
> If you just make a series also deleteing the old block layer
> I will test it so it doesn't break anything and then you can
> probably expect a big Acked-by on the whole shebang.

I will add patches for that - let's see what happens.

> 
>>> I would have a much easier time accepting this patch if it
>>> deleted as much as it was adding, i.e. introduce all this new
>>> nice MQ code, but also tossing out the old block layer and error
>>> handling code. Even if it is a massive rewrite, at least there
>>> is just one body of code to maintain going forward.
>>
>> How can you pssibly call a few hundred lines massive.  The kernel has
>> millions of lines.  Your sense of scale is out of whack.
> 
> Arguably you are speaking against yourself, since the old code
> was described by yourself as "rubbish", and it is very terse and
> uninituitive to read even a few lines of rubbish.
> 
> I had to create a (quite popular) Googledoc breaking down the
> MMC stack in words before I could even start hacking at it.
> So it is not massive in the sense of number of lines but in the
> sense of intelligibility, it's so terse that it feels like eating a
> too big piece of mudcake or something.
> 
> Just delete the rubbish and I'm happy.
> 
>>> That said, I would strongly prefer a refactoring of the old block
>>> layer leading up to transitioning to MQ. But I am indeed biased
>>> since I took that approach myself.
>>
>> Well stop it.  We have nice working code.  Get it applied and tested, and
>> then we can delete the old crap.
> 
> Just get the old code deleted so there is just one thing to
> test and not a matrix of old and new code paths.
> 
>>> This timeout looks like something I need to pick up in my patch
>>> set as well. It seems good for stability to support this. But what happened
>>> here? Did you experience a bunch of timeouts during development,
>>> or let's say how was this engineered, I guess it is for the case when
>>> something randomly locks up for a long time and we don't really know
>>> what has happened, like a watchdog?
>>
>> We presently don't have the host APIs to support external timeouts.  CQE
>> uses them though.
> 
> OK
> 
>>>> +static int mmc_init_request(struct request_queue *q, struct request *req,
>>>> +                           gfp_t gfp)
>>>> +{
>>>> +       return __mmc_init_request(q->queuedata, req, gfp);
>>>> +}
>>>> +
>>> (...)
>>>> +static int mmc_mq_init_request(struct blk_mq_tag_set *set, struct request *req,
>>>> +                              unsigned int hctx_idx, unsigned int numa_node)
>>>> +{
>>>> +       return __mmc_init_request(set->driver_data, req, GFP_KERNEL);
>>>> +}
>>>> +
>>>> +static void mmc_mq_exit_request(struct blk_mq_tag_set *set, struct request *req,
>>>> +                               unsigned int hctx_idx)
>>>> +{
>>>> +       struct mmc_queue *mq = set->driver_data;
>>>> +
>>>> +       mmc_exit_request(mq->queue, req);
>>>> +}
>>>
>>> Here is more code duplication just to keep both the old block layer
>>> and MQ around. Including introducing another inner __foo function
>>> which I have something strongly against personally (I might be
>>> crazily picky, because I see many people do this).
>>
>> In this case, it is not code duplication it is re-using the same code but
>> called from the blk-mq API.
> 
> Right, OK.
> 
>>>> +               /*
>>>> +                * Timeouts are handled by mmc core, so set a large value to
>>>> +                * avoid races.
>>>> +                */
>>>> +               req->timeout = 600 * HZ;
>>>> +               break;
>>>> +       }
>>>
>>> These timeouts again, does this mean we have competing timeout
>>> code in the block layer and MMC?
>>
>> Yes - the host controller provides hardware timeout interrupts in most
>> cases.  The core provides software timeouts in other cases.
> 
> OK maybe we can extend the comment about to explain this
> situation so it is clear what is racing with what.

Ok

> 
>>> This mentions timeouts in the MMC core, but they are actually
>>> coming from the *MMC* core, when below you set:
>>> blk_queue_rq_timeout(mq->queue, 60 * HZ);?
>>>
>>> Isn't the actual case that the per-queue timeout is set up to
>>> occur before the per-request timeout, and that you are hacking
>>> around the block layer core having two different timeouts?
>>
>> There is no per-queue timeout.  The request timeout has a default value
>> given by the queue.  It can be changed for different requests.
> 
> Aha.
> 
>>> It's a bit confusing so I'd really like to know what's going on...
>>
>> I don't expect to have to teach you the block layer.
> 
> But I will tell you to stop snarking around every time you
> write things like that.
> 
> 
>>>> +       mq->in_flight[issue_type] += 1;
>>>> +       get_card = mmc_tot_in_flight(mq) == 1;
>>>
>>> Parenthesis around the logical expression preferred I guess
>>> get_card = (mmc_tot_in_flight(mq) == 1);
>>> (Isn't checkpatch complaining about this?)
>>
>> Nope
> 
> Too bad. OK I think it's nice with a paranthesis anyway.
> Aids perception or something.

Ok

> 
>>> Then:
>>> (...)
>>>> +       if (get_card)
>>>> +               mmc_get_card(card, &mq->ctx);
>>>
>>> I simply took the card on every request. Since the context is the
>>> same for all block layer business and the lock is now fully
>>> reentrant this if (get_card) is not necessary. Just take it for
>>> every request and release it in the .complete() callback.
>>
>> As I have written elsewhere, we have always avoind getting / putting
>> unecessarily.  It is better that way, so no point in taking it out.
> 
> I explained about about the double counters.
> 
>>>> +#define MMC_QUEUE_DEPTH 64
>>>> +
>>>> +static int mmc_mq_init(struct mmc_queue *mq, struct mmc_card *card,
>>>> +                        spinlock_t *lock)
>>>> +{
>>>> +       int q_depth;
>>>> +       int ret;
>>>> +
>>>> +       q_depth = MMC_QUEUE_DEPTH;
>>>> +
>>>> +       ret = mmc_mq_init_queue(mq, q_depth, &mmc_mq_ops, lock);
>>>
>>> Apart from using a define, then assigning the define to a
>>> variable and then passing that variable instead of just
>>> passing the define: why 64? Is that the depth of the CQE
>>> queue? In that case we need an if (cqe) and set it down
>>> to 2 for non-CQE.
>>
>> Are you ever going to learn about the block layer.
> 
> Are you ever going to treat your fellow community peers as
> equals?
> 
> It is actually your job as a patch submitter to teach others about
> the details of what you are doing. If you think all your fellow programmers
> suck then elevate them to your own level instead of pushing them down.
> 
>> The number of requests
>> is default 128 for the legacy block layer.
> 
> I had no clue.
> 
>> For blk-mq it is queue depth
>> times 2.  So 64 gives the same number of requests as before.
> 
> I see. But by just setting it to 2 as I do in my patch set, it still
> performs, I guess just with less buffers allocated.
> 
> This makes some kind of sense though and I guess it explains
> why we got out of memory with the bounce buffer thing, as that
> then resulted in 128*64KB of allocated memory for the request
> queue. I thought it would be ... a few requests.
> 
>>>> +       if (ret)
>>>> +               return ret;
>>>> +
>>>> +       blk_queue_rq_timeout(mq->queue, 60 * HZ);
>>>
>>> And requests timeout after 1 minute I take it.
>>>
>>> I suspect both of these have some relation to CQE, so that is where
>>> you find these long execution times etc?
>>
>> For legacy mmc, the core takes care of timeouts.  For CQE we expect reliable
>> devices and I would interpret a timeout as meaning the device is broken.
> 
> OK.
> 
>> However it is sensible to have anyway.  For CQE, a request might have to
>> wait for the entire rest of the queue to be processed first, or maybe the
>> request somehow gets stuck and there are other requests constantly
>> overtaking it.  The max queue depth is 32 so 60 seconds seems ok.
> 
> OK we will get to see as we see more and more devices of this type.
> It's fine.
> 
>>>> +static void mmc_mq_queue_suspend(struct mmc_queue *mq)
>>>> +{
>>>> +       blk_mq_quiesce_queue(mq->queue);
>>>> +
>>>> +       /*
>>>> +        * The host remains claimed while there are outstanding requests, so
>>>> +        * simply claiming and releasing here ensures there are none.
>>>> +        */
>>>> +       mmc_claim_host(mq->card->host);
>>>> +       mmc_release_host(mq->card->host);
>>>
>>> I think just blk_mq_quiesce_queue() should be fine as and
>>> should make sure all requests have called .complete() and there
>>> I think you should also release the host lock.
>>>
>>> If the MQ code is not doing this, we need to fix MQ to
>>> do the right thing (or add a new callback such as
>>> blk_mq_make_sure_queue_empty()) so at the very
>>> least put a big fat FIXME or REVISIT comment on the above.
>>
>> blk_mq_quiesce_queue() prevents dispatches not completions.  So we wait for
>> outstanding requests.
> 
> I guess the quiesce call not really intended for PM suspend/resume
> paths. Maybe we need to add a blk_mq_flush_queue() call then.
> (No showstopper, this is OK for now.)
> 
>>> One of the good reasons to delete the old block layer is to get
>>> rid of this horrible semaphore construction. So I see it as necessary
>>> to be able to focus development efforts on code that actually has
>>> a future.
>>
>> The old crap will get deleted when blk-mq is ready.
> 
> Which IMO is now.
> 
>>>> +
>>>> +       int                     in_flight[MMC_ISSUE_MAX];
>>>
>>> So this is a [2] containing a counter for the number of
>>> synchronous and asynchronous requests in flight at any
>>> time.
>>>
>>> But are there really synchronous and asynchronous requests
>>> going on at the same time?
>>>
>>> Maybe on the error path I guess.
>>>
>>> I avoided this completely but I guess it may be necessary with
>>> CQE, such that in_flight[0,1] is way more than 1 or 2 at times
>>> when there are commands queued?
>>
>> CQE needs to count DCMD separately from read / writes.  Counting by issue
>> type is a simple way to do that.
> 
> OK
> 
>>>> +       bool                    rw_wait;
>>>> +       bool                    waiting;
>>>> +       wait_queue_head_t       wait;
>>>
>>> As mentioned I think this is a reimplementation of
>>> the completion abstraction.
>>
>> I pointed out why that wouldn't work.  Another case of why the code makes
>> more sense together than split up.
> 
> I'm not entirely convinced about that but we'll see. It is a detail.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 03/10] mmc: block: Add blk-mq support
@ 2017-11-14 13:10           ` Adrian Hunter
  0 siblings, 0 replies; 55+ messages in thread
From: Adrian Hunter @ 2017-11-14 13:10 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 09/11/17 17:52, Linus Walleij wrote:
> On Thu, Nov 9, 2017 at 11:42 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> On 08/11/17 10:54, Linus Walleij wrote:
>>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>>> At least you could do what I did and break out a helper like
>>> this:
>>>
>>> /*
>>>  * This reports status back to the block layer for a finished request.
>>>  */
>>> static void mmc_blk_complete(struct mmc_queue_req *mq_rq,
>>>                             blk_status_t status)
>>> {
>>>        struct request *req = mmc_queue_req_to_req(mq_rq);
>>>
>>>        if (req->mq_ctx) {
>>>           blk_mq_end_request(req, status);
>>>        } else
>>>           blk_end_request_all(req, status);
>>> }
>>
>> You are quibbling.
> 
> Hm wiktionary "quibble" points to:
> https://en.wikipedia.org/wiki/Trivial_objections
> 
> Sorry for not being a native speaker in English, I think that
> maybe you were again trying to be snarky to a reviewer but
> I honestly don't know.
> 
>> It makes next to no difference especially as it all goes
>> away when the legacy code is deleted.
> 
> Which is something your patch set doesn't do. Nor did you write
> anywhere that deleting the legacy code was your intention.
> But I'm happy to hear it.
> 
>>  I will change it in the next
>> revision, but what a waste of everyone's time.
>>  Please try to focus on
>> things that matter.
> 
> As Jean-Paul Sartre said "hell is other people".
> Can you try to be friendlier anyway?
> 
>>> This retry and error handling using requeue is very elegant.
>>> I really like this.
>>>
>>> If we could also go for MQ-only, only this nice code
>>> remains in the tree.
>>
>> No one has ever suggested that the legacy API will remain.  Once blk-mq is
>> ready the old code gets deleted.
> 
> The block layer maintainers most definately think MQ is ready
> but you seem to disagree. Why?

I meant when the mmc conversion to blk-mq is ready.  We don't need to
consider other sub-systems, just whether it is working for mmc.  For
example, the issue with the block layer workqueues not performing as well as
a thread.  That only came to light through testing mmc.

> 
> In the recent LWN article from kernel recepies:
> https://lwn.net/Articles/735275/
> the only obstacle I can see is a mention that SCSI was not
> switched over by default because of some problems with
> slow rotating media. "This change had to be backed out
> because of some performance and scalability issues that
> arose for rotating storage."
> 
> Christoph mentions that he's gonna try again for v4.14.
> But it's true, I do not see that happening in linux-next yet.
> 
> But that is specifically rotating media, which MMC/SD is not.
> And UBI is using it since ages without complaints. And
> that is a sign of something.
> 
> Have you seen any problems when deploying it on MMC/SD,
> really? If there are problems I bet the block layer people want
> us to find them, diagnose them and ultimately fix them rather
> than wait for them to do it. I haven't seen any performance or
> scalability issues. At one point I had processes running
> on two eMMC and one SD-card and apart from maxing out
> the CPUs and DMA backplace as could be expected all was
> fine.
> 
>>> The problem: you have just reimplemented the whole error
>>> handling we had in the old block layer and now we have to
>>> maintain two copies and keep them in sync.
>>>
>>> This is not OK IMO, we will inevitable screw it up, so we
>>> need to get *one* error path.
>>
>> Wow, you really didn't read the code at all.
> 
> Just quit this snarky attitude.
> 
>> As I have repeatedly pointed
>> out, the new code is all new.  There is no overlap and there nothing to keep
>> in sync.
> 
> The problem is that you have not clearly communicated your
> vision to delete the old code. The best way to communicate that
> would be to include a patch actually deleteing it.
> 
>>  It may not look like it in this patch, but that is only because of
>> the ridiculous idea of splitting up the patch.
> 
> Naming other people's review feedback as "ridiculous" is not very
> helpful. I think the patch set became much easier to review
> like this so I am happy with this format.
> 
>>>> +static void mmc_blk_mq_acct_req_done(struct mmc_queue *mq, struct request *req)
>>>
>>> What does "acct" mean in the above function name?
>>> Accounting? Actual? I'm lost.
>>
>> Does "actual" have two "c"'s.  You are just making things up.
> 
> Please stop your snarky and belitteling attitude.
> 
> I am not a native English speaker and I am not making up
> my review comments in order to annoy you.
> 
> Consider Hanlon's razor:
> https://en.wikipedia.org/wiki/Hanlon%27s_razor
> "Never attribute to malice that which is adequately explained by
> stupidity".
> 
> It's bad enough that you repeatedly point out how stupid
> you think I am

I have never called you names.  On the contrary, it is because I don't think
you are stupid that I get upset when you seem spend so little time and
effort to understand the code.  Equally, you seem to make comments that just
assume the code is wrong without due consideration.

>                 I am sorry about that, in my opinion sometimes
> other people are really stupid but more often than not the problem
> is really on your side, like being impatient and unwilling to educate
> others about how your code actually works becuase it seems
> so evident to yourself that you think it should be evident to everyone
> else as well. I don't know what to say about that, it seems a bit
> unempatic.
> 
> But you even go and think I am making stuff up in purpose.
> That's pretty offensive.
> 
>> Of course it is "account".
> 
> This is maybe obvious for you but it was not to me.
> 
>>  It is counting the number of requests in flight - which is
>> pretty obvious from the code.  We use that to support important features
>> like CQE re-tuning and avoiding getting / putting the card all the time.
> 
> What is so hard with getting this point across without insults?
> 
>>>> +{
>>>> +       struct request_queue *q = req->q;
>>>> +       unsigned long flags;
>>>> +       bool put_card;
>>>> +
>>>> +       spin_lock_irqsave(q->queue_lock, flags);
>>>> +
>>>> +       mq->in_flight[mmc_issue_type(mq, req)] -= 1;
>>>> +
>>>> +       put_card = mmc_tot_in_flight(mq) == 0;
>>>
>>> This in_flight[] business seems a bit kludgy, but I
>>> don't really understand it fully. Magic numbers like
>>> -1 to mark that something is not going on etc, not
>>> super-elegant.
>>
>> You are misreading.  It subtracts 1 from the number of requests in flight.
> 
> Right! Sorry I misread it.
> 
>>> I believe it is necessary for CQE though as you need
>>> to keep track of outstanding requests?
>>
>> We have always avoided getting / putting the card when there is another
>> request in flight.  Yes it is also used for CQE.
> 
> Yeah I took the approach to get/put the card around each
> request instead. It might make things a bit simpler.
> 
>>>> +       spin_unlock_irqrestore(q->queue_lock, flags);
>>>> +
>>>> +       if (put_card)
>>>> +               mmc_put_card(mq->card, &mq->ctx);
>>>
>>> I think you should try not to sprinkle mmc_put_card() inside
>>> the different functions but instead you can put this in the
>>> .complete callback I guess mmc_blk_mq_complete() in your
>>> patch set.
>>
>> This is the *only* block.c function in the blk-mq code that calls
>> mmc_put_card().  The queue also has does it for requests that didn't even
>> start.  That is it.
> 
> Yeah that's correct. If we also delete the old code it will not be
> too bad.
> 
>>> Also you do not need to avoid calling it several times with
>>> that put_card variable. It's fully reentrant thanks to your
>>> own code in the lock and all calls come from the same block
>>> layer process if you call it in .complete() I think?
>>
>> We have always avoided unnecessary gets / puts.  Since the result is better,
>> why on earth take it out?
> 
> It's not a showstopper.
> 
> But what I think is nice in doing it around
> each request is that since mmc_put_card() calls mmc_release_host()
> contains this:
> 
> if (--host->claim_cnt) { (...)
> 
> So there is a counter inside the claim/release host functions
> and now there is another counter keeping track if the in-flight
> requests as well and it gives me the feeling we are maintaining
> two counters when we only need one.

The gets / puts also get runtime pm for the card.  It is a lot a messing
around for the sake of a quick check for the number of requests inflight -
which is needed anyway for CQE.

> 
> But maybe it is actually the host->claim_cnt that is overengineered
> and should just be a bool, becuase with this construction
> that you only call down and claim the host once, the
> host->claim_cnt will only be 0 or 1, right?

The claim_cnt was introduced for nested claiming.

> 
>>> Now the problem Ulf has pointed out starts to creep out in the
>>> patch: a lot of code duplication on the MQ path compared to
>>> the ordinary block layer path.
>>>
>>> My approach was structured partly to avoid this: first refactor
>>> the old path, then switch to (only) MQ to avoid code duplication.
>>
>> The old code is rubbish.  There is nothing of value there.  You have to make
>> the case why you are wasting everyone's time churning crappy code.
> 
> I kind of agree that the old code is rubbish, actually.
> 
> I guess I could turn that around and ask: if it is so crappy, why
> is your patch set not deleting it?

To allow people time to test.

> 
> And I guess the only reasonable answer to that would be what
> you were saying above, something along the lines of "MQ is not
> ready yet". But it can be argued that MQ is ready.
> 
>>  The idea
>> that nice new code is wrong because it doesn't churn the old rubbish code,
>> is ridiculous.
> 
> As I have said in the review comments I like your new
> code, especially the new error path.
> 
> I took the approach of refactoring because I was afraid I would
> break something I guess. But rewriting from scratch is also an
> option.
> 
> But I think I would prefer that if a big slew of new code is
> introduced it needs to be put to wide use and any bugs
> smoked out during the -rc phase, and we are now
> hiding it behind a Kconfig option so it's unlikely to see testing
> until that option is turned on, and that is not good.

And if you find a big problem in rc7, then what do you do?  At least with
the config option, the revert is trivial.

> 
>>> This looks a bit like it is reimplementing the kernel
>>> completion abstraction using a mutex and a variable named
>>> .complete_req?
>>>
>>> We were using a completion in the old block layer so
>>> why did you not use it for MQ?
>>
>> Doesn't seem like you pay much attention to this stuff.
> 
> You really have to stop your snarky and belitteling attitude
> to your fellow kernel developers.
> 
>>From 6.Followthrough.rst again:
> 
> ----------8<-------------------8<-------------------8<-----------
> 
> Andrew Morton has suggested that every review comment which does not result
> in a code change should result in an additional code comment instead; that
> can help future reviewers avoid the questions which came up the first time
> around.
> 
> ----------8<-------------------8<-------------------8<-----------
> 
> At least take a moment to consider the option that maybe so few people
> are reviwing your code because this is complicated stuff and really
> hard to grasp, maybe the problem isn't on my side, neither on yours,
> it may be that the subject matter is the problem.

I would expect a review would take a number of days, perhaps longer.
However, it seems people are not willing to consider anything that takes
more than a an hour or two.

> 
>>  The previous
>> request has to be completed even if there is no next request.  That means
>> schduling work, that then races with the dispatch path.  So mutual exclusion
>> is necessary.
> 
> I would say a synchronization primitive is needed, indeed.
> I have a different approach to this, which uses completion
> as the synchronization primitive and I think that would be possible
> to use also here.

Because they are both just wake-ups.  When waiting for multiple conditions,
wait_event() is simpler.  You are not considering the possibility for having
only 1 task switch between requests.

> 
> I have this code:

Which runs as a work item, so there is already a task switch to get here.
i.e. there are 2 task switches between each request, instead of 1.

> 
>         /*
>          * Here we postprocess the request differently depending on if
>          * we go on the success path or error path. The success path will
>          * immediately let new requests hit the host, whereas the error
>          * path will hold off new requests until we have retried and
>          * succeeded or failed the current asynchronous request.
>          */
>         if (status == MMC_BLK_SUCCESS) {
>                 /*
>                  * This immediately opens the gate for the next request
>                  * to start on the host while we perform post-processing
>                  * and report back to the block layer.
>                  */
>                 host->areq = NULL;
>                 complete(&areq->complete);

So that is the second task switch.

>                 mmc_post_req(host, areq->mrq, 0);
>                 if (areq->report_done_status)
>                         areq->report_done_status(areq, MMC_BLK_SUCCESS);
>         } else {
>                 /*
>                  * Post-process this request. Then, if
>                  * another request was already prepared, back that out
>                  * so we can handle the errors without anything prepared
>                  * on the host.
>                  */
>                 if (host->areq_pending)
>                         mmc_post_req(host, host->areq_pending->mrq, -EINVAL);
>                 /*
>                  * Call back with error status, this will trigger retry
>                  * etc if needed
>                  */
>                 if (areq->report_done_status) {
>                         if (areq->report_done_status(areq, status)) {
>                                 /*
>                                  * This happens when we finally give up after
>                                  * a few retries or on unrecoverable errors.
>                                  */
>                                 mmc_post_req(host, areq->mrq, 0);
>                                 host->areq = NULL;
>                                 /* Re-prepare the next request */
>                                 if (host->areq_pending)
>                                         mmc_pre_req(host,
> host->areq_pending->mrq);
>                                 complete(&areq->complete);
>                         }
>                 }
>         }
> 
> As you can see it is a bit elaborate about some
> of this stuff and quite a lot of comments were added to make
> clear what is going on. This is in my head so that is why I ask:
> completion worked fine as a synchronization primitive here
> so it is maybe a good alternative in your (similar) code
> as well?

In the case there is another request already waiting, then instead of
scheduling work to complete the request, the dispatch is woken to complete
the previous and start the next.  That needs different synchronization.

> 
>>> I would contest using a waitqueue for this. The name even says
>>> "complete_req" so why is a completion not the right thing to
>>> hang on rather than a waitqueue?
>>
>> I explained that above.
> 
> And I explained what I meant.
> 
>>> The completion already contains a waitqueue, so I think you're
>>> just essentially reimplementing it.
>>>
>>> Just complete(&mq->mq_req_complete) or something should do
>>> the trick.
>>
>> Nope.
> 
> That's a very terse answer to an honest review comment.
> 
>>>> +       else
>>>> +               kblockd_schedule_work(&mq->complete_work);
>>>
>>> I did not use the kblockd workqueue for this, out of fear
>>> that it would interfere and disturb the block layer work items.
>>> My intuitive idea was that the MMC layer needed its own
>>> worker (like in the past it used a thread) in order to avoid
>>> congestion in the block layer queue leading to unnecessary
>>> delays.
>>
>> The complete work races with the dispatch of the next request, so putting
>> them in the same workqueue makes sense. i.e. the one that gets processed
>> first would anyway delay the one that gets processed second.
> 
> So what we want to attain is that the next dispatch happens as soon
> as possible after completing the previous request. They only race
> as far as that they go through post/preprocessing before getting to
> the synchronization primitive though?

They only race when the next request is not already waiting.  Otherwise the
work is never scheduled.

> 
>>> On the other hand, this likely avoids a context switch if there
>>> is no congestion on the queue.
>>>
>>> I am uncertain when it is advisible to use the block layer
>>> queue for subsystems like MMC/SD.
>>>
>>> Would be nice to see some direction from the block layer
>>> folks here, it is indeed exposed to us...
>>>
>>> My performance tests show no problems with this approach
>>> though.
>>
>> As I already wrote, the CPU-bound block layer dispatch work queue has
>> negative consequeuences for mmc performance.  So there are 2 aspects to that:
>>         1. Is the choice of CPU right to start with?  I suspect it is better for
>> the dispatch to run on the same CPU as the interrupt.
>>         2. Does the dispatch work need to be able to be migrated to a different
>> CPU? i.e. unbound work queue.  That helped in my tests, but it could just be
>> a side-effect of 1.
> 
> Hm! What you are saying sounds correct, and we really need to
> consider the multi-CPU aspects of this, maybe not now. I am
> happy as long as we have equal performance as before and
> maintainable code.

Well I saw 3% drop in sequential read performance, improving to 1% when an
unbound workqueue was used.

> 
>> Of course we can't start looking at these real issues, while you are
> 
> You seem to have dropped the mic there, but it looks like
> what you were going to say was not nice anyway so I guess
> it was for better.
> 
>>>> +static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err)
>>>> +{
>>>> +       struct request_queue *q = mq->queue;
>>>> +       unsigned long flags;
>>>> +       bool done;
>>>> +
>>>> +       spin_lock_irqsave(q->queue_lock, flags);
>>>> +       done = !mq->rw_wait;
>>>> +       mq->waiting = !done;
>>>> +       spin_unlock_irqrestore(q->queue_lock, flags);
>>>
>>> This makes it look like a reimplementation of completion_done()
>>> so I think you should use the completion abstraction again. The
>>> struct completion even contains a variable named "done".
>>
>> This just serves as an example of why splitting up the patch was such a bad
>> idea.  For direct completion, the wait can result in recovery being needed
>> for the previous request, so the current request gets requeued.
> 
> I guess this falls back to the previous comment on the
> synchronization primitives.
> 
>>> This is pretty straight-forward (pending the comments above).
>>> Again it has the downside of duplicating the same code for the
>>> old block layer instead of refactoring.
>>
>> No, the old code is rubbish.  Churning it is a waste of time.
> 
> I think we have addressed this recurring theme.
> 
>>> Again looks fine, again duplicates code. In this case I don't even
>>> see why the MQ code needs its own copy of the issue funtion.
>>
>> Because it has to support CQE.  This attitude against CQE is very disappointing!
> 
> Please understand that there is no conspiracy against CQE out
> there.
> 
> I admit I personally think getting MQ in place and making the code
> long-term maintainable is more important than supporting
> CQE. But that does not mean I think CQE is unimportant.
> 
> Or rather: just because someone is FOR something else,
> doesn't mean they are AGAINST this.

But that is not what happened.  You blocked CQE even though it was ready.

> 
> I am pretty sure both sets of functionality can be attained in the next
> merge window, but it requires good faith.
> 
>>>> +enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req)
>>>> +{
>>>> +       if (req_op(req) == REQ_OP_READ || req_op(req) == REQ_OP_WRITE)
>>>> +               return MMC_ISSUE_ASYNC;
>>>> +
>>>> +       return MMC_ISSUE_SYNC;
>>>> +}
>>>
>>> Distinguishing between SYNC and ASYNC operations and using
>>> that as abstraction is nice.
>>>
>>> But you only do this in the new MQ code.
>>>
>>> Instead, make this a separate patch and first refactor the old
>>> code to use this distinction between SYNC and ASYNC.
>>
>> That is a non-starter.  The old code is rubbish.  Point to something work
>> saving.  There isn't anything.
> 
> I think we have addressed this.
> 
>>> Unfortunately I think Ulf's earlier criticism that you're rewriting
>>> the world instead of refactoring what we have still stands on many
>>> accounts here.
>>
>> Nope.  It is just an excuse to delay the patches.  You guys are playing
>> games and it is embarassing for linux.  What is actually wrong with this
>> technically?  It is not good because it doesn't churn the old code?  That is
>> ridiculous.
> 
> It's not good because it does not make MQ mandatory and does
> not delete the interfaces to the old block layer.

That is a separate issue.

> 
>>> It makes it even harder to understand your persistance in keeping
>>> the old block layer around. If you're introducing new concepts and
>>> cleaner code in the MQ path and kind of discarding the old
>>> block layer path, why keep it around at all?
>>
>> Wow, you really like making things up.  Never have I suggested keeping the
>> old code.  It is rubbish.
> 
> So why are you not just deleting it?

Because it needs more testing first.  If the testing goes well, then switch
over the default.  If, after 1 or more release cycles, there are no
problems, delete the old code.

> 
>>  As soon and blk-mq is ready and tested, delete
>> the old crap.
> 
> I disagree. No "ready and tested" is needed, the code is ready,
> and I have tested it. It performs. Delete the old code now.

Not true.  My testing showed the block layer workqueue wasn't performing as
well as a thread.

> 
>> I was expecting CQE to be applied 6 months ago, supporting the legacy blk
>> layer until blk-mq was ready.  But you never delivered on blk-mq, which is
>> why I had to do it.  And now you are making up excuses about why we can't
>> move forward.
> 
> Don't be so conspiracist. I think I have answered this several
> times over.
> 
> Migrating to MQ and getting rid of the old block layer interface is
> paramount in my view. That is the essence of all my review feedback.
> The other comments are minor, and even if you don't take my
> comments into consideration it is stuff I can work on fixing after
> these patches are merged.
> 
> If you just make a series also deleteing the old block layer
> I will test it so it doesn't break anything and then you can
> probably expect a big Acked-by on the whole shebang.

I will add patches for that - let's see what happens.

> 
>>> I would have a much easier time accepting this patch if it
>>> deleted as much as it was adding, i.e. introduce all this new
>>> nice MQ code, but also tossing out the old block layer and error
>>> handling code. Even if it is a massive rewrite, at least there
>>> is just one body of code to maintain going forward.
>>
>> How can you pssibly call a few hundred lines massive.  The kernel has
>> millions of lines.  Your sense of scale is out of whack.
> 
> Arguably you are speaking against yourself, since the old code
> was described by yourself as "rubbish", and it is very terse and
> uninituitive to read even a few lines of rubbish.
> 
> I had to create a (quite popular) Googledoc breaking down the
> MMC stack in words before I could even start hacking at it.
> So it is not massive in the sense of number of lines but in the
> sense of intelligibility, it's so terse that it feels like eating a
> too big piece of mudcake or something.
> 
> Just delete the rubbish and I'm happy.
> 
>>> That said, I would strongly prefer a refactoring of the old block
>>> layer leading up to transitioning to MQ. But I am indeed biased
>>> since I took that approach myself.
>>
>> Well stop it.  We have nice working code.  Get it applied and tested, and
>> then we can delete the old crap.
> 
> Just get the old code deleted so there is just one thing to
> test and not a matrix of old and new code paths.
> 
>>> This timeout looks like something I need to pick up in my patch
>>> set as well. It seems good for stability to support this. But what happened
>>> here? Did you experience a bunch of timeouts during development,
>>> or let's say how was this engineered, I guess it is for the case when
>>> something randomly locks up for a long time and we don't really know
>>> what has happened, like a watchdog?
>>
>> We presently don't have the host APIs to support external timeouts.  CQE
>> uses them though.
> 
> OK
> 
>>>> +static int mmc_init_request(struct request_queue *q, struct request *req,
>>>> +                           gfp_t gfp)
>>>> +{
>>>> +       return __mmc_init_request(q->queuedata, req, gfp);
>>>> +}
>>>> +
>>> (...)
>>>> +static int mmc_mq_init_request(struct blk_mq_tag_set *set, struct request *req,
>>>> +                              unsigned int hctx_idx, unsigned int numa_node)
>>>> +{
>>>> +       return __mmc_init_request(set->driver_data, req, GFP_KERNEL);
>>>> +}
>>>> +
>>>> +static void mmc_mq_exit_request(struct blk_mq_tag_set *set, struct request *req,
>>>> +                               unsigned int hctx_idx)
>>>> +{
>>>> +       struct mmc_queue *mq = set->driver_data;
>>>> +
>>>> +       mmc_exit_request(mq->queue, req);
>>>> +}
>>>
>>> Here is more code duplication just to keep both the old block layer
>>> and MQ around. Including introducing another inner __foo function
>>> which I have something strongly against personally (I might be
>>> crazily picky, because I see many people do this).
>>
>> In this case, it is not code duplication it is re-using the same code but
>> called from the blk-mq API.
> 
> Right, OK.
> 
>>>> +               /*
>>>> +                * Timeouts are handled by mmc core, so set a large value to
>>>> +                * avoid races.
>>>> +                */
>>>> +               req->timeout = 600 * HZ;
>>>> +               break;
>>>> +       }
>>>
>>> These timeouts again, does this mean we have competing timeout
>>> code in the block layer and MMC?
>>
>> Yes - the host controller provides hardware timeout interrupts in most
>> cases.  The core provides software timeouts in other cases.
> 
> OK maybe we can extend the comment about to explain this
> situation so it is clear what is racing with what.

Ok

> 
>>> This mentions timeouts in the MMC core, but they are actually
>>> coming from the *MMC* core, when below you set:
>>> blk_queue_rq_timeout(mq->queue, 60 * HZ);?
>>>
>>> Isn't the actual case that the per-queue timeout is set up to
>>> occur before the per-request timeout, and that you are hacking
>>> around the block layer core having two different timeouts?
>>
>> There is no per-queue timeout.  The request timeout has a default value
>> given by the queue.  It can be changed for different requests.
> 
> Aha.
> 
>>> It's a bit confusing so I'd really like to know what's going on...
>>
>> I don't expect to have to teach you the block layer.
> 
> But I will tell you to stop snarking around every time you
> write things like that.
> 
> 
>>>> +       mq->in_flight[issue_type] += 1;
>>>> +       get_card = mmc_tot_in_flight(mq) == 1;
>>>
>>> Parenthesis around the logical expression preferred I guess
>>> get_card = (mmc_tot_in_flight(mq) == 1);
>>> (Isn't checkpatch complaining about this?)
>>
>> Nope
> 
> Too bad. OK I think it's nice with a paranthesis anyway.
> Aids perception or something.

Ok

> 
>>> Then:
>>> (...)
>>>> +       if (get_card)
>>>> +               mmc_get_card(card, &mq->ctx);
>>>
>>> I simply took the card on every request. Since the context is the
>>> same for all block layer business and the lock is now fully
>>> reentrant this if (get_card) is not necessary. Just take it for
>>> every request and release it in the .complete() callback.
>>
>> As I have written elsewhere, we have always avoind getting / putting
>> unecessarily.  It is better that way, so no point in taking it out.
> 
> I explained about about the double counters.
> 
>>>> +#define MMC_QUEUE_DEPTH 64
>>>> +
>>>> +static int mmc_mq_init(struct mmc_queue *mq, struct mmc_card *card,
>>>> +                        spinlock_t *lock)
>>>> +{
>>>> +       int q_depth;
>>>> +       int ret;
>>>> +
>>>> +       q_depth = MMC_QUEUE_DEPTH;
>>>> +
>>>> +       ret = mmc_mq_init_queue(mq, q_depth, &mmc_mq_ops, lock);
>>>
>>> Apart from using a define, then assigning the define to a
>>> variable and then passing that variable instead of just
>>> passing the define: why 64? Is that the depth of the CQE
>>> queue? In that case we need an if (cqe) and set it down
>>> to 2 for non-CQE.
>>
>> Are you ever going to learn about the block layer.
> 
> Are you ever going to treat your fellow community peers as
> equals?
> 
> It is actually your job as a patch submitter to teach others about
> the details of what you are doing. If you think all your fellow programmers
> suck then elevate them to your own level instead of pushing them down.
> 
>> The number of requests
>> is default 128 for the legacy block layer.
> 
> I had no clue.
> 
>> For blk-mq it is queue depth
>> times 2.  So 64 gives the same number of requests as before.
> 
> I see. But by just setting it to 2 as I do in my patch set, it still
> performs, I guess just with less buffers allocated.
> 
> This makes some kind of sense though and I guess it explains
> why we got out of memory with the bounce buffer thing, as that
> then resulted in 128*64KB of allocated memory for the request
> queue. I thought it would be ... a few requests.
> 
>>>> +       if (ret)
>>>> +               return ret;
>>>> +
>>>> +       blk_queue_rq_timeout(mq->queue, 60 * HZ);
>>>
>>> And requests timeout after 1 minute I take it.
>>>
>>> I suspect both of these have some relation to CQE, so that is where
>>> you find these long execution times etc?
>>
>> For legacy mmc, the core takes care of timeouts.  For CQE we expect reliable
>> devices and I would interpret a timeout as meaning the device is broken.
> 
> OK.
> 
>> However it is sensible to have anyway.  For CQE, a request might have to
>> wait for the entire rest of the queue to be processed first, or maybe the
>> request somehow gets stuck and there are other requests constantly
>> overtaking it.  The max queue depth is 32 so 60 seconds seems ok.
> 
> OK we will get to see as we see more and more devices of this type.
> It's fine.
> 
>>>> +static void mmc_mq_queue_suspend(struct mmc_queue *mq)
>>>> +{
>>>> +       blk_mq_quiesce_queue(mq->queue);
>>>> +
>>>> +       /*
>>>> +        * The host remains claimed while there are outstanding requests, so
>>>> +        * simply claiming and releasing here ensures there are none.
>>>> +        */
>>>> +       mmc_claim_host(mq->card->host);
>>>> +       mmc_release_host(mq->card->host);
>>>
>>> I think just blk_mq_quiesce_queue() should be fine as and
>>> should make sure all requests have called .complete() and there
>>> I think you should also release the host lock.
>>>
>>> If the MQ code is not doing this, we need to fix MQ to
>>> do the right thing (or add a new callback such as
>>> blk_mq_make_sure_queue_empty()) so at the very
>>> least put a big fat FIXME or REVISIT comment on the above.
>>
>> blk_mq_quiesce_queue() prevents dispatches not completions.  So we wait for
>> outstanding requests.
> 
> I guess the quiesce call not really intended for PM suspend/resume
> paths. Maybe we need to add a blk_mq_flush_queue() call then.
> (No showstopper, this is OK for now.)
> 
>>> One of the good reasons to delete the old block layer is to get
>>> rid of this horrible semaphore construction. So I see it as necessary
>>> to be able to focus development efforts on code that actually has
>>> a future.
>>
>> The old crap will get deleted when blk-mq is ready.
> 
> Which IMO is now.
> 
>>>> +
>>>> +       int                     in_flight[MMC_ISSUE_MAX];
>>>
>>> So this is a [2] containing a counter for the number of
>>> synchronous and asynchronous requests in flight at any
>>> time.
>>>
>>> But are there really synchronous and asynchronous requests
>>> going on at the same time?
>>>
>>> Maybe on the error path I guess.
>>>
>>> I avoided this completely but I guess it may be necessary with
>>> CQE, such that in_flight[0,1] is way more than 1 or 2 at times
>>> when there are commands queued?
>>
>> CQE needs to count DCMD separately from read / writes.  Counting by issue
>> type is a simple way to do that.
> 
> OK
> 
>>>> +       bool                    rw_wait;
>>>> +       bool                    waiting;
>>>> +       wait_queue_head_t       wait;
>>>
>>> As mentioned I think this is a reimplementation of
>>> the completion abstraction.
>>
>> I pointed out why that wouldn't work.  Another case of why the code makes
>> more sense together than split up.
> 
> I'm not entirely convinced about that but we'll see. It is a detail.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 03/10] mmc: block: Add blk-mq support
  2017-11-14 13:10           ` Adrian Hunter
  (?)
@ 2017-11-14 14:50           ` Linus Walleij
  2017-11-15 10:55             ` Ulf Hansson
  -1 siblings, 1 reply; 55+ messages in thread
From: Linus Walleij @ 2017-11-14 14:50 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

Hi Adrian!

Thanks for the helpful and productive tone in the recent mail,
it is very helpful and improves my work environment.

>> The block layer maintainers most definately think MQ is ready
>> but you seem to disagree. Why?
>
> I meant when the mmc conversion to blk-mq is ready.  We don't need to
> consider other sub-systems, just whether it is working for mmc.  For
> example, the issue with the block layer workqueues not performing as well as
> a thread.  That only came to light through testing mmc.

OK I see your point.

>> It's bad enough that you repeatedly point out how stupid
>> you think I am
>
> I have never called you names.  On the contrary, it is because I don't think
> you are stupid that I get upset when you seem spend so little time and
> effort to understand the code.  Equally, you seem to make comments that just
> assume the code is wrong without due consideration.

Please work on your patience. I think so few people are reviewing the
code because it is massive and complicated and sits in a complicated
place. One can not be blamed for trying, right?

I have spent considerable time with your code, more than with my own.
Mostly testing it, and that is why I can say it does not regress performance
on my end.

I also ran several rounds of fault injection, which it handled without
problems.

Then I tried to eject the SD card when running DD from the card and
then it crashed.

But that is an extreme test case, so overall it is very robust code.
With the exception of card removal during I/O, feel free to add
my Tested-by: Linus Walleij <linus.walleij@linaro.org> on this series.

>> But what I think is nice in doing it around
>> each request is that since mmc_put_card() calls mmc_release_host()
>> contains this:
>>
>> if (--host->claim_cnt) { (...)
>>
>> So there is a counter inside the claim/release host functions
>> and now there is another counter keeping track if the in-flight
>> requests as well and it gives me the feeling we are maintaining
>> two counters when we only need one.
>
> The gets / puts also get runtime pm for the card.  It is a lot a messing
> around for the sake of a quick check for the number of requests inflight -
> which is needed anyway for CQE.

OK I buy that.

>> But maybe it is actually the host->claim_cnt that is overengineered
>> and should just be a bool, becuase with this construction
>> that you only call down and claim the host once, the
>> host->claim_cnt will only be 0 or 1, right?
>
> The claim_cnt was introduced for nested claiming.

Yeah that sounds familiar. I'm not really sure it happens
anymore after this but I guess I can test that with some
practical experiements, let's leave it like this.

>> I guess I could turn that around and ask: if it is so crappy, why
>> is your patch set not deleting it?
>
> To allow people time to test.

OK I guess I'm more forging ahead with such things. But we could
at least enable it by default so whoever checks out and builds
and tests linux-next with their MMC/SD controllers will then be
testing this for us in the next kernel cycle.

>> But I think I would prefer that if a big slew of new code is
>> introduced it needs to be put to wide use and any bugs
>> smoked out during the -rc phase, and we are now
>> hiding it behind a Kconfig option so it's unlikely to see testing
>> until that option is turned on, and that is not good.
>
> And if you find a big problem in rc7, then what do you do?  At least with
> the config option, the revert is trivial.

I see your point. I guess it is up to Ulf how he feels about
trust wrt the code. If a problem appears at -rc7 it's indeed
nice if we can leave the code in-tree and work on it.

>> At least take a moment to consider the option that maybe so few people
>> are reviwing your code because this is complicated stuff and really
>> hard to grasp, maybe the problem isn't on my side, neither on yours,
>> it may be that the subject matter is the problem.
>
> I would expect a review would take a number of days, perhaps longer.
> However, it seems people are not willing to consider anything that takes
> more than a an hour or two.

Your criticism is something like "do it properly or not at all"
and I understand that feeling. I deserve some of the criticism
and I can accept it much easier when put with these words.

It would be nice if the community would step in here and I have
seen so many people talk about this code that I really think they
should invest in proper review. More people than me and Ulf
need to help out with review and test.

>> I would say a synchronization primitive is needed, indeed.
>> I have a different approach to this, which uses completion
>> as the synchronization primitive and I think that would be possible
>> to use also here.
>
> Because they are both just wake-ups.  When waiting for multiple conditions,
> wait_event() is simpler.  You are not considering the possibility for having
> only 1 task switch between requests.

That is indeed a nice feature.

>>
>> I have this code:
>
> Which runs as a work item, so there is already a task switch to get here.
> i.e. there are 2 task switches between each request, instead of 1.

Yes. This is a limitation in my patch set.

>>         /*
>>          * Here we postprocess the request differently depending on if
>>          * we go on the success path or error path. The success path will
>>          * immediately let new requests hit the host, whereas the error
>>          * path will hold off new requests until we have retried and
>>          * succeeded or failed the current asynchronous request.
>>          */
>>         if (status == MMC_BLK_SUCCESS) {
>>                 /*
>>                  * This immediately opens the gate for the next request
>>                  * to start on the host while we perform post-processing
>>                  * and report back to the block layer.
>>                  */
>>                 host->areq = NULL;
>>                 complete(&areq->complete);
>
> So that is the second task switch.

Yes.

Interestingly, it does not affect performance in my tests.
I thought it would but it doesn't.

>> As you can see it is a bit elaborate about some
>> of this stuff and quite a lot of comments were added to make
>> clear what is going on. This is in my head so that is why I ask:
>> completion worked fine as a synchronization primitive here
>> so it is maybe a good alternative in your (similar) code
>> as well?
>
> In the case there is another request already waiting, then instead of
> scheduling work to complete the request, the dispatch is woken to complete
> the previous and start the next.  That needs different synchronization.

OK.

>> So what we want to attain is that the next dispatch happens as soon
>> as possible after completing the previous request. They only race
>> as far as that they go through post/preprocessing before getting to
>> the synchronization primitive though?
>
> They only race when the next request is not already waiting.  Otherwise the
> work is never scheduled.

OK I see better which race you were referring to and it makes
a lot of sense. This should work optimally with the kblockd queue
then. (I should have used that in my patch set as well.)

>>> As I already wrote, the CPU-bound block layer dispatch work queue has
>>> negative consequeuences for mmc performance.  So there are 2 aspects to that:
>>>         1. Is the choice of CPU right to start with?  I suspect it is better for
>>> the dispatch to run on the same CPU as the interrupt.
>>>         2. Does the dispatch work need to be able to be migrated to a different
>>> CPU? i.e. unbound work queue.  That helped in my tests, but it could just be
>>> a side-effect of 1.
>>
>> Hm! What you are saying sounds correct, and we really need to
>> consider the multi-CPU aspects of this, maybe not now. I am
>> happy as long as we have equal performance as before and
>> maintainable code.
>
> Well I saw 3% drop in sequential read performance, improving to 1% when an
> unbound workqueue was used.

I think that is acceptable, albeit I cannot see it on my host.
I wonder if people disagree strongly when I say "acceptable..."

In my tests any throughput loss is in the error margin. It may be
because of fuzz in this hardware or because of differences between
ARM and x86, caches or whatever.

>> Or rather: just because someone is FOR something else,
>> doesn't mean they are AGAINST this.
>
> But that is not what happened.  You blocked CQE even though it was ready.

That is an unfair accusation. I alone, not even being a
maintainer of MMC can't block your patch even if something is
seriously wrong with it.

The whole core of your conflict with me and the MMC maintainer is that
you think CQE is more important to get in than to get MMC converted
to MQ. You thought it was more important to support this new feature
than to evolute the MMC subsystem to support MQ. Please understand that.

When you are putting one thing over another and refusing to discuss
you are not a team player and you're not playing well with others.

With the recent patch set you have come a long way towards acceptance
from my side, because you do MQ and MQ first. As I have repeatedly
said, make it it the default and kill the old code, then you're home in
my book.

>> It's not good because it does not make MQ mandatory and does
>> not delete the interfaces to the old block layer.
>
> That is a separate issue.

Not at all. This is the core of this entire conflict, my insistance on
tranitioning to MQ over all. We need to turn this into a win-win
situation where we get both MQ and CQE in the same go in the same
merge window IMO.

>> So why are you not just deleting it?
>
> Because it needs more testing first.  If the testing goes well, then switch
> over the default.  If, after 1 or more release cycles, there are no
> problems, delete the old code.

OK I understand the conservative stance.

But it's just not my preferred stance :)

I feel a strong pressure from the block maintainers to move forward
with MQ and now it is happening. I'm very happy with that.

>> I disagree. No "ready and tested" is needed, the code is ready,
>> and I have tested it. It performs. Delete the old code now.
>
> Not true.  My testing showed the block layer workqueue wasn't performing as
> well as a thread.

Yeah good point. It's weird that my testing doesn't show anything
of the kind.

We can probably agree on one thing though: the MQ code should be
default-enabled on CQE-capable devices.

>> If you just make a series also deleteing the old block layer
>> I will test it so it doesn't break anything and then you can
>> probably expect a big Acked-by on the whole shebang.
>
> I will add patches for that - let's see what happens.

Yay!

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 03/10] mmc: block: Add blk-mq support
  2017-11-14 14:50           ` Linus Walleij
@ 2017-11-15 10:55             ` Ulf Hansson
  2017-11-15 13:07               ` Adrian Hunter
  0 siblings, 1 reply; 55+ messages in thread
From: Ulf Hansson @ 2017-11-15 10:55 UTC (permalink / raw)
  To: Linus Walleij, Adrian Hunter
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Shawn Lin, Christoph Hellwig

Linus, Adrian,

Apologize for sidetracking the discussion, just wanted to add some
minor comments.

[...]

>
>>> But what I think is nice in doing it around
>>> each request is that since mmc_put_card() calls mmc_release_host()
>>> contains this:
>>>
>>> if (--host->claim_cnt) { (...)
>>>
>>> So there is a counter inside the claim/release host functions
>>> and now there is another counter keeping track if the in-flight
>>> requests as well and it gives me the feeling we are maintaining
>>> two counters when we only need one.
>>
>> The gets / puts also get runtime pm for the card.  It is a lot a messing
>> around for the sake of a quick check for the number of requests inflight -
>> which is needed anyway for CQE.

Actually the get / puts for runtime PM of the card is already done
taking the host->claim_cnt into account.

In other words, the additional in-flight counter does not provide an
additional improvement in this regards.

For that reason, perhaps the in-flight counter should be added in the
CQE patch instead, because it seems like it is there it really
belongs?

[...]

>
> OK I guess I'm more forging ahead with such things. But we could
> at least enable it by default so whoever checks out and builds
> and tests linux-next with their MMC/SD controllers will then be
> testing this for us in the next kernel cycle.
>
>>> But I think I would prefer that if a big slew of new code is
>>> introduced it needs to be put to wide use and any bugs
>>> smoked out during the -rc phase, and we are now
>>> hiding it behind a Kconfig option so it's unlikely to see testing
>>> until that option is turned on, and that is not good.
>>
>> And if you find a big problem in rc7, then what do you do?  At least with
>> the config option, the revert is trivial.
>
> I see your point. I guess it is up to Ulf how he feels about
> trust wrt the code. If a problem appears at -rc7 it's indeed
> nice if we can leave the code in-tree and work on it.

Well, it's not an easy decision, simply because it moves the code in
an even worse situation maintenance wise. So, if you guys just
suddenly have to move on to do something else, then it becomes my
problem to work out. :-)

However, as I trust both of you, and that you seems to agree on the
path forward, I am fine keeping the old code for while.

Although, please make sure the Kconfig option starts out by being
enabled by default, so we can get some test coverage of the mq path.

Of course, then we need to work on the card removal problem asap, and
hopefully we manage to fix them. If not, or other strange errors
happens, we would need to change the default value to 'n' of the
Kconfig.

[...]

>>> Hm! What you are saying sounds correct, and we really need to
>>> consider the multi-CPU aspects of this, maybe not now. I am
>>> happy as long as we have equal performance as before and
>>> maintainable code.
>>
>> Well I saw 3% drop in sequential read performance, improving to 1% when an
>> unbound workqueue was used.

Can you please make this improvement as a standalone patch on top of
the mq patch?

I think that would be good, because it points out some generic
problems with mq, which we then, at least short term, tries to address
locally in MMC.

[...]

>
>>> If you just make a series also deleteing the old block layer
>>> I will test it so it doesn't break anything and then you can
>>> probably expect a big Acked-by on the whole shebang.
>>
>> I will add patches for that - let's see what happens.

Yes, please. However, I will as stated, be withholding that change for
a while, until we fixed any showstoppers in the mq path.

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 03/10] mmc: block: Add blk-mq support
  2017-11-15 10:55             ` Ulf Hansson
@ 2017-11-15 13:07               ` Adrian Hunter
  2017-11-16  7:19                 ` Ulf Hansson
  0 siblings, 1 reply; 55+ messages in thread
From: Adrian Hunter @ 2017-11-15 13:07 UTC (permalink / raw)
  To: Ulf Hansson, Linus Walleij
  Cc: linux-mmc, linux-block, linux-kernel, Bough Chen, Alex Lemberg,
	Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung, Dong Aisheng,
	Das Asutosh, Zhangfei Gao, Sahitya Tummala, Harjani Ritesh,
	Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 15/11/17 12:55, Ulf Hansson wrote:
> Linus, Adrian,
> 
> Apologize for sidetracking the discussion, just wanted to add some
> minor comments.
> 
> [...]
> 
>>
>>>> But what I think is nice in doing it around
>>>> each request is that since mmc_put_card() calls mmc_release_host()
>>>> contains this:
>>>>
>>>> if (--host->claim_cnt) { (...)
>>>>
>>>> So there is a counter inside the claim/release host functions
>>>> and now there is another counter keeping track if the in-flight
>>>> requests as well and it gives me the feeling we are maintaining
>>>> two counters when we only need one.
>>>
>>> The gets / puts also get runtime pm for the card.  It is a lot a messing
>>> around for the sake of a quick check for the number of requests inflight -
>>> which is needed anyway for CQE.
> 
> Actually the get / puts for runtime PM of the card is already done
> taking the host->claim_cnt into account.

We do pm_runtime_get_sync(&card->dev) always i.e.

void mmc_get_card(struct mmc_card *card, struct mmc_ctx *ctx)
{
	pm_runtime_get_sync(&card->dev);
	__mmc_claim_host(card->host, ctx, NULL);
}

> 
> In other words, the additional in-flight counter does not provide an
> additional improvement in this regards.
> 
> For that reason, perhaps the in-flight counter should be added in the
> CQE patch instead, because it seems like it is there it really
> belongs?
> 
> [...]
> 
>>
>> OK I guess I'm more forging ahead with such things. But we could
>> at least enable it by default so whoever checks out and builds
>> and tests linux-next with their MMC/SD controllers will then be
>> testing this for us in the next kernel cycle.
>>
>>>> But I think I would prefer that if a big slew of new code is
>>>> introduced it needs to be put to wide use and any bugs
>>>> smoked out during the -rc phase, and we are now
>>>> hiding it behind a Kconfig option so it's unlikely to see testing
>>>> until that option is turned on, and that is not good.
>>>
>>> And if you find a big problem in rc7, then what do you do?  At least with
>>> the config option, the revert is trivial.
>>
>> I see your point. I guess it is up to Ulf how he feels about
>> trust wrt the code. If a problem appears at -rc7 it's indeed
>> nice if we can leave the code in-tree and work on it.
> 
> Well, it's not an easy decision, simply because it moves the code in
> an even worse situation maintenance wise. So, if you guys just
> suddenly have to move on to do something else, then it becomes my
> problem to work out. :-)
> 
> However, as I trust both of you, and that you seems to agree on the
> path forward, I am fine keeping the old code for while.
> 
> Although, please make sure the Kconfig option starts out by being
> enabled by default, so we can get some test coverage of the mq path.

Ok

> 
> Of course, then we need to work on the card removal problem asap, and
> hopefully we manage to fix them. If not, or other strange errors
> happens, we would need to change the default value to 'n' of the
> Kconfig.
> 
> [...]
> 
>>>> Hm! What you are saying sounds correct, and we really need to
>>>> consider the multi-CPU aspects of this, maybe not now. I am
>>>> happy as long as we have equal performance as before and
>>>> maintainable code.
>>>
>>> Well I saw 3% drop in sequential read performance, improving to 1% when an
>>> unbound workqueue was used.
> 
> Can you please make this improvement as a standalone patch on top of
> the mq patch?

Unfortunately it was just a hack to the blk-mq core - the block layer does
not have an unbound workqueue.  I have not had time to consider a proper
fix.  It will have to wait, but I agree 3% is not enough to delay going forward.

> 
> I think that would be good, because it points out some generic
> problems with mq, which we then, at least short term, tries to address
> locally in MMC.
> 
> [...]
> 
>>
>>>> If you just make a series also deleteing the old block layer
>>>> I will test it so it doesn't break anything and then you can
>>>> probably expect a big Acked-by on the whole shebang.
>>>
>>> I will add patches for that - let's see what happens.
> 
> Yes, please. However, I will as stated, be withholding that change for
> a while, until we fixed any showstoppers in the mq path.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 03/10] mmc: block: Add blk-mq support
  2017-11-15 13:07               ` Adrian Hunter
@ 2017-11-16  7:19                 ` Ulf Hansson
  0 siblings, 0 replies; 55+ messages in thread
From: Ulf Hansson @ 2017-11-16  7:19 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Linus Walleij, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Shawn Lin, Christoph Hellwig

On 15 November 2017 at 14:07, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 15/11/17 12:55, Ulf Hansson wrote:
>> Linus, Adrian,
>>
>> Apologize for sidetracking the discussion, just wanted to add some
>> minor comments.
>>
>> [...]
>>
>>>
>>>>> But what I think is nice in doing it around
>>>>> each request is that since mmc_put_card() calls mmc_release_host()
>>>>> contains this:
>>>>>
>>>>> if (--host->claim_cnt) { (...)
>>>>>
>>>>> So there is a counter inside the claim/release host functions
>>>>> and now there is another counter keeping track if the in-flight
>>>>> requests as well and it gives me the feeling we are maintaining
>>>>> two counters when we only need one.
>>>>
>>>> The gets / puts also get runtime pm for the card.  It is a lot a messing
>>>> around for the sake of a quick check for the number of requests inflight -
>>>> which is needed anyway for CQE.
>>
>> Actually the get / puts for runtime PM of the card is already done
>> taking the host->claim_cnt into account.
>
> We do pm_runtime_get_sync(&card->dev) always i.e.
>
> void mmc_get_card(struct mmc_card *card, struct mmc_ctx *ctx)
> {
>         pm_runtime_get_sync(&card->dev);
>         __mmc_claim_host(card->host, ctx, NULL);
> }

You are absolutely correct, so let's leave the inflight counter in.
Apologize for the noise!

I was thinking about the runtime PM of the host dev, which is managed
by mmc_claim|release host().

[...]

>>>> Well I saw 3% drop in sequential read performance, improving to 1% when an
>>>> unbound workqueue was used.
>>
>> Can you please make this improvement as a standalone patch on top of
>> the mq patch?
>
> Unfortunately it was just a hack to the blk-mq core - the block layer does
> not have an unbound workqueue.  I have not had time to consider a proper
> fix.  It will have to wait, but I agree 3% is not enough to delay going forward.

I see, thanks!

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH V13 00/10] mmc: Add Command Queue support
       [not found] ` <CGME20171116094642epcas1p14018cb1c475efa38942109dc24cd6da9@epcas1p1.samsung.com>
@ 2017-11-16  9:46   ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 55+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2017-11-16  9:46 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ulf Hansson, linux-mmc, linux-block, linux-kernel, Bough Chen,
	Alex Lemberg, Mateusz Nowak, Yuliy Izrailov, Jaehoon Chung,
	Dong Aisheng, Das Asutosh, Zhangfei Gao, Sahitya Tummala,
	Harjani Ritesh, Venu Byravarasu, Linus Walleij, Shawn Lin,
	Christoph Hellwig

On Friday, November 03, 2017 03:20:10 PM Adrian Hunter wrote:
> Hi
> 
> Here is V13 of the hardware command queue patches without the software
> command queue patches, now using blk-mq and now with blk-mq support for
> non-CQE I/O.
> 
> HW CMDQ offers 25% - 50% better random multi-threaded I/O.  I see a slight
> 2% drop in sequential read speed but no change to sequential write.
> 
> Non-CQE blk-mq showed a 3% decrease in sequential read performance.  This
> seemed to be coming from the inferior latency of running work items compared
> with a dedicated thread.  Hacking blk-mq workqueue to be unbound reduced the
> performance degradation from 3% to 1%.
> 
> While we should look at changing blk-mq to give better workqueue performance,
> a bigger gain is likely to be made by adding a new host API to enable the
> next already-prepared request to be issued directly from within ->done()
> callback of the current request.

Tested-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> [ for non-CQE changes ]

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2017-11-16  9:46 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-03 13:20 [PATCH V13 00/10] mmc: Add Command Queue support Adrian Hunter
2017-11-03 13:20 ` [PATCH V13 01/10] mmc: core: Add parameter use_blk_mq Adrian Hunter
2017-11-06  8:38   ` Linus Walleij
2017-11-03 13:20 ` [PATCH V13 02/10] mmc: block: Add error-handling comments Adrian Hunter
2017-11-06  8:39   ` Linus Walleij
2017-11-03 13:20 ` [PATCH V13 03/10] mmc: block: Add blk-mq support Adrian Hunter
2017-11-08  8:54   ` Linus Walleij
2017-11-09 10:42     ` Adrian Hunter
2017-11-09 15:52       ` Linus Walleij
2017-11-10 10:19         ` Linus Walleij
2017-11-14 13:10         ` Adrian Hunter
2017-11-14 13:10           ` Adrian Hunter
2017-11-14 14:50           ` Linus Walleij
2017-11-15 10:55             ` Ulf Hansson
2017-11-15 13:07               ` Adrian Hunter
2017-11-16  7:19                 ` Ulf Hansson
2017-11-03 13:20 ` [PATCH V13 04/10] mmc: block: Add CQE support Adrian Hunter
2017-11-08  9:00   ` Linus Walleij
2017-11-08 13:20     ` Adrian Hunter
2017-11-09 12:04       ` Linus Walleij
2017-11-09 12:39         ` Adrian Hunter
2017-11-03 13:20 ` [PATCH V13 05/10] mmc: cqhci: support for command queue enabled host Adrian Hunter
2017-11-08  9:22   ` Linus Walleij
2017-11-08 14:14     ` Adrian Hunter
2017-11-09 12:26       ` Linus Walleij
2017-11-09 12:55         ` Adrian Hunter
2017-11-10  8:29           ` Linus Walleij
2017-11-09 13:41   ` Ulf Hansson
2017-11-09 14:20     ` Adrian Hunter
2017-11-03 13:20 ` [PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK Adrian Hunter
2017-11-08  9:24   ` Linus Walleij
2017-11-09  7:12     ` Adrian Hunter
2017-11-10  8:18       ` Linus Walleij
2017-11-09 13:37   ` Ulf Hansson
2017-11-03 13:20 ` [PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion Adrian Hunter
2017-11-08  9:28   ` Linus Walleij
2017-11-09  7:27     ` Adrian Hunter
2017-11-09 12:34       ` Linus Walleij
2017-11-09 15:33         ` Adrian Hunter
2017-11-09 13:07   ` Ulf Hansson
2017-11-09 13:15     ` Adrian Hunter
2017-11-03 13:20 ` [PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery Adrian Hunter
2017-11-08  9:30   ` Linus Walleij
2017-11-09  7:56     ` Adrian Hunter
2017-11-09 12:52       ` Linus Walleij
2017-11-09 13:02         ` Adrian Hunter
2017-11-10  8:25           ` Linus Walleij
2017-11-03 13:20 ` [PATCH V13 09/10] mmc: block: blk-mq: Stop using card_busy_detect() Adrian Hunter
2017-11-09 13:36   ` Ulf Hansson
2017-11-09 15:24     ` Adrian Hunter
2017-11-03 13:20 ` [PATCH V13 10/10] mmc: block: blk-mq: Stop using legacy recovery Adrian Hunter
2017-11-08  9:38   ` Linus Walleij
2017-11-09  7:43     ` Adrian Hunter
2017-11-09 12:45       ` Linus Walleij
     [not found] ` <CGME20171116094642epcas1p14018cb1c475efa38942109dc24cd6da9@epcas1p1.samsung.com>
2017-11-16  9:46   ` [PATCH V13 00/10] mmc: Add Command Queue support Bartlomiej Zolnierkiewicz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.