Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/7] Add MMC packed function
@ 2019-07-22 13:09 Baolin Wang
  2019-07-22 13:09 ` [RFC PATCH 1/7] blk-mq: Export blk_mq_hctx_has_pending() function Baolin Wang
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Baolin Wang @ 2019-07-22 13:09 UTC (permalink / raw)
  To: axboe, adrian.hunter, ulf.hansson
  Cc: zhang.lyra, orsonzhai, arnd, linus.walleij, baolin.wang,
	vincent.guittot, linux-mmc, linux-kernel, linux-block

Hi All,

Now some SD/MMC controllers can support packed command or packed request,
that means it can package multiple requests to host controller to be handled
at one time, which can improve the I/O performence. Thus this patchset is
used to add the MMC packed function to support packed request or packed
command.

In this patch set, I implemented the SD host ADMA3 transfer mode to support
packed request. The ADMA3 transfer mode can process a multi-block data transfer
by using a pair of command descriptor and ADMA2 descriptor. In future we can
easily expand the MMC packed function to support packed command.

Below are some comparison data between packed request and non-packed request
with fio tool. The fio command I used is like below with changing the
'--rw' parameter and enabling the direct IO flag to measure the actual hardware
transfer speed.

./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read

My eMMC card working at HS400 Enhanced strobe mode:
[    2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001
[    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB 
[    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
[    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
[    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0)

1. Non-packed request
I tested 3 times for each case and output a average speed.

1) Sequential read:
Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s
Average speed: 28.7MiB/s

2) Random read:
Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s
Average speed: 14.3MiB/s

3) Sequential write:
Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s
Average speed: 24.7MiB/s

4) Random write:
Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s
Average speed: 19.2MiB/s

2. Packed request
In packed request mode, I set the host controller can package maximum 10
requests at one time (Actually I can increase the package number), and I
enabled read/write packed request mode. Also I tested 3 times for each
case and output a average speed.

1) Sequential read:
Speed: 165MiB/s, 167MiB/s, 164MiB/s
Average speed: 165.3MiB/s

2) Random read:
Speed: 147MiB/s, 141MiB/s, 144MiB/s
Average speed: 144MiB/s

3) Sequential write:
Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s
Average speed: 89MiB/s

4) Random write:
Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s
Average speed: 90.4MiB/s

Form above data, we can see the packed request can improve the performance greatly.
Any comments are welcome. Thanks a lot.

Baolin Wang (7):
  blk-mq: Export blk_mq_hctx_has_pending() function
  mmc: core: Add MMC packed request function
  mmc: host: sdhci: Introduce ADMA3 transfer mode
  mmc: host: sdhci: Factor out the command configuration
  mmc: host: sdhci: Remove redundant sg_count member of struct
    sdhci_host
  mmc: host: sdhci: Add MMC packed request support
  mmc: host: sdhci-sprd: Add MMC packed request support

 block/blk-mq.c                |    3 +-
 drivers/mmc/core/Kconfig      |    2 +
 drivers/mmc/core/Makefile     |    1 +
 drivers/mmc/core/block.c      |   71 +++++-
 drivers/mmc/core/block.h      |    3 +-
 drivers/mmc/core/core.c       |   51 ++++
 drivers/mmc/core/core.h       |    3 +
 drivers/mmc/core/packed.c     |  478 ++++++++++++++++++++++++++++++++++++++
 drivers/mmc/core/queue.c      |   28 ++-
 drivers/mmc/host/Kconfig      |    1 +
 drivers/mmc/host/sdhci-sprd.c |   22 +-
 drivers/mmc/host/sdhci.c      |  513 +++++++++++++++++++++++++++++++++++------
 drivers/mmc/host/sdhci.h      |   59 ++++-
 include/linux/blk-mq.h        |    1 +
 include/linux/mmc/core.h      |    1 +
 include/linux/mmc/host.h      |    3 +
 include/linux/mmc/packed.h    |  123 ++++++++++
 17 files changed, 1286 insertions(+), 77 deletions(-)
 create mode 100644 drivers/mmc/core/packed.c
 create mode 100644 include/linux/mmc/packed.h

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 1/7] blk-mq: Export blk_mq_hctx_has_pending() function
  2019-07-22 13:09 [RFC PATCH 0/7] Add MMC packed function Baolin Wang
@ 2019-07-22 13:09 ` Baolin Wang
  2019-07-22 14:19   ` Ming Lei
  2019-07-22 13:09 ` [RFC PATCH 2/7] mmc: core: Add MMC packed request function Baolin Wang
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Baolin Wang @ 2019-07-22 13:09 UTC (permalink / raw)
  To: axboe, adrian.hunter, ulf.hansson
  Cc: zhang.lyra, orsonzhai, arnd, linus.walleij, baolin.wang,
	vincent.guittot, linux-mmc, linux-kernel, linux-block

Some SD/MMC host controllers can support packed command or packed request,
that means we can package several requests to host controller at one time
to improve performence. And this patch set will introduce MMC packed function
to support this feature by following patches.

To support MMC packed function, the MMC layer need to know if there are
requests are pending now in hardware queue to help to combine requests
as much as possible. If we know there are requests pending in hardware
queue, then we should not package requests to host controller immediately,
instead we should collect more requests into MMC packed queue to be packed
to host controller with packed condition.

Thus export this function for MMC packed function.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
 block/blk-mq.c         |    3 ++-
 include/linux/blk-mq.h |    1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b038ec6..5bd4ef9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -63,12 +63,13 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
  * Check if any of the ctx, dispatch list or elevator
  * have pending work in this hardware queue.
  */
-static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
+bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
 {
 	return !list_empty_careful(&hctx->dispatch) ||
 		sbitmap_any_bit_set(&hctx->ctx_map) ||
 			blk_mq_sched_has_work(hctx);
 }
+EXPORT_SYMBOL_GPL(blk_mq_hctx_has_pending);
 
 /*
  * Mark this ctx as having pending work in this hardware queue
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3fa1fa5..15a2b7b 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -334,6 +334,7 @@ int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 void blk_mq_quiesce_queue_nowait(struct request_queue *q);
 
 unsigned int blk_mq_rq_cpu(struct request *rq);
+bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
 
 /*
  * Driver command data is immediately after the request. So subtract request
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 2/7] mmc: core: Add MMC packed request function
  2019-07-22 13:09 [RFC PATCH 0/7] Add MMC packed function Baolin Wang
  2019-07-22 13:09 ` [RFC PATCH 1/7] blk-mq: Export blk_mq_hctx_has_pending() function Baolin Wang
@ 2019-07-22 13:09 ` Baolin Wang
  2019-07-22 13:09 ` [RFC PATCH 3/7] mmc: host: sdhci: Introduce ADMA3 transfer mode Baolin Wang
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2019-07-22 13:09 UTC (permalink / raw)
  To: axboe, adrian.hunter, ulf.hansson
  Cc: zhang.lyra, orsonzhai, arnd, linus.walleij, baolin.wang,
	vincent.guittot, linux-mmc, linux-kernel, linux-block

Some SD controllers can support packed command or packed request, that
means it can package several requests to host controller to be handled
at one time, which can reduce interrutps and improve the DMA transfer.
As a result, the I/O performence can be improved.

Thus this patch adds MMC packed function to support packed requests or
packe command.

The basic concept of this function is that, we try to collect more
requests from block layer as much as possible to be linked into
MMC packed queue by mmc_blk_packed_issue_rw_rq(). When the last
request of the hardware queue comes, or the collected request numbers
are larger than 16, or a larger request comes, then we can start to
pakage a packed request to host controller. The MMC packed function
also supplies packed algorithm operations to help to package qualified
requests. After finishing the packed request, the MMC packed function
will help to complete each request, at the same time, the MMC packed
queue will allow to collect more requests from block layer. After
completing each request, the MMC packed function can try to package
another packed request to host controller directly in the complete path,
if there are enough requests in MMC packed queue or the request pending
flag is not set. If the pending flag was set, we should let the
mmc_blk_packed_issue_rw_rq() collect more request as much as possible.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
 drivers/mmc/core/Kconfig   |    2 +
 drivers/mmc/core/Makefile  |    1 +
 drivers/mmc/core/block.c   |   71 ++++++-
 drivers/mmc/core/block.h   |    3 +-
 drivers/mmc/core/core.c    |   51 +++++
 drivers/mmc/core/core.h    |    3 +
 drivers/mmc/core/packed.c  |  478 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/mmc/core/queue.c   |   28 ++-
 include/linux/mmc/core.h   |    1 +
 include/linux/mmc/host.h   |    3 +
 include/linux/mmc/packed.h |  123 ++++++++++++
 11 files changed, 760 insertions(+), 4 deletions(-)
 create mode 100644 drivers/mmc/core/packed.c
 create mode 100644 include/linux/mmc/packed.h

diff --git a/drivers/mmc/core/Kconfig b/drivers/mmc/core/Kconfig
index c12fe13..50d1a2f 100644
--- a/drivers/mmc/core/Kconfig
+++ b/drivers/mmc/core/Kconfig
@@ -81,3 +81,5 @@ config MMC_TEST
 	  This driver is only of interest to those developing or
 	  testing a host driver. Most people should say N here.
 
+config MMC_PACKED
+	bool
diff --git a/drivers/mmc/core/Makefile b/drivers/mmc/core/Makefile
index 95ffe00..dd303d9 100644
--- a/drivers/mmc/core/Makefile
+++ b/drivers/mmc/core/Makefile
@@ -18,3 +18,4 @@ obj-$(CONFIG_MMC_BLOCK)		+= mmc_block.o
 mmc_block-objs			:= block.o queue.o
 obj-$(CONFIG_MMC_TEST)		+= mmc_test.o
 obj-$(CONFIG_SDIO_UART)		+= sdio_uart.o
+obj-$(CONFIG_MMC_PACKED)	+= packed.o
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 2c71a43..e7a8b2c 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -44,6 +44,7 @@
 #include <linux/mmc/host.h>
 #include <linux/mmc/mmc.h>
 #include <linux/mmc/sd.h>
+#include <linux/mmc/packed.h>
 
 #include <linux/uaccess.h>
 
@@ -2208,11 +2209,77 @@ static int mmc_blk_wait_for_idle(struct mmc_queue *mq, struct mmc_host *host)
 {
 	if (mq->use_cqe)
 		return host->cqe_ops->cqe_wait_for_idle(host);
+	else if (host->packed)
+		return mmc_packed_wait_for_idle(host->packed);
 
 	return mmc_blk_rw_wait(mq, NULL);
 }
 
-enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_packed_req_done(struct mmc_request *mrq)
+{
+	struct mmc_queue_req *mqrq =
+		container_of(mrq, struct mmc_queue_req, brq.mrq);
+	struct request *req = mmc_queue_req_to_req(mqrq);
+	struct request_queue *q = req->q;
+	struct mmc_queue *mq = q->queuedata;
+
+	mutex_lock(&mq->complete_lock);
+	mmc_blk_mq_poll_completion(mq, req);
+	mutex_unlock(&mq->complete_lock);
+
+	mmc_blk_mq_post_req(mq, req);
+}
+
+static int mmc_blk_packed_issue_rw_rq(struct mmc_queue *mq, struct request *req,
+				      bool last)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_host *host = mq->card->host;
+	unsigned long nr_rqs;
+	int err;
+
+	/*
+	 * If the packed queue has pumped all requests, then we should check if
+	 * need retuning firstly.
+	 */
+	nr_rqs = mmc_packed_queue_length(host->packed);
+	if (!nr_rqs)
+		host->retune_now = host->need_retune && !host->hold_retune;
+
+	mutex_lock(&mq->complete_lock);
+	mmc_retune_hold(host);
+	mutex_unlock(&mq->complete_lock);
+
+	mmc_blk_rw_rq_prep(mqrq, mq->card, 0, mq);
+	mmc_pre_req(host, &mqrq->brq.mrq);
+	mqrq->brq.mrq.done = mmc_blk_packed_req_done;
+
+	err = mmc_packed_start_req(host, &mqrq->brq.mrq);
+	if (err) {
+		mutex_lock(&mq->complete_lock);
+		mmc_retune_release(host);
+		mutex_unlock(&mq->complete_lock);
+
+		mmc_post_req(host, &mqrq->brq.mrq, err);
+
+		return err;
+	}
+
+	/*
+	 * If it is the last request from block layer or a larger request or
+	 * request count is larger than MMC_PACKED_MAX_REQUEST_COUNT, we should
+	 * pump requests to controller. Otherwise we should try to combine
+	 * requests as much as we can.
+	 */
+	if (last || blk_rq_bytes(req) > MMC_PACKED_FLUSH_SIZE ||
+	    nr_rqs > MMC_PACKED_MAX_REQUEST_COUNT)
+		mmc_packed_pump_requests(host->packed);
+
+	return 0;
+}
+
+enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req,
+				    bool last)
 {
 	struct mmc_blk_data *md = mq->blkdata;
 	struct mmc_card *card = md->queue.card;
@@ -2257,6 +2324,8 @@ enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
 		case REQ_OP_WRITE:
 			if (mq->use_cqe)
 				ret = mmc_blk_cqe_issue_rw_rq(mq, req);
+			else if (host->packed)
+				ret = mmc_blk_packed_issue_rw_rq(mq, req, last);
 			else
 				ret = mmc_blk_mq_issue_rw_rq(mq, req);
 			break;
diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h
index 31153f6..8bfb89f 100644
--- a/drivers/mmc/core/block.h
+++ b/drivers/mmc/core/block.h
@@ -9,7 +9,8 @@
 
 enum mmc_issued;
 
-enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req);
+enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req,
+				    bool last);
 void mmc_blk_mq_complete(struct request *req);
 void mmc_blk_mq_recovery(struct mmc_queue *mq);
 
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 2211273..924e733 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -29,6 +29,7 @@
 #include <linux/mmc/card.h>
 #include <linux/mmc/host.h>
 #include <linux/mmc/mmc.h>
+#include <linux/mmc/packed.h>
 #include <linux/mmc/sd.h>
 #include <linux/mmc/slot-gpio.h>
 
@@ -329,6 +330,7 @@ static int mmc_mrq_prep(struct mmc_host *host, struct mmc_request *mrq)
 		}
 	}
 
+	INIT_LIST_HEAD(&mrq->packed_list);
 	return 0;
 }
 
@@ -487,6 +489,55 @@ int mmc_cqe_start_req(struct mmc_host *host, struct mmc_request *mrq)
 }
 EXPORT_SYMBOL(mmc_cqe_start_req);
 
+int mmc_packed_start_req(struct mmc_host *host, struct mmc_request *mrq)
+{
+	int err;
+
+	if (mmc_card_removed(host->card))
+		return -ENOMEDIUM;
+
+	err = mmc_retune(host);
+	if (err)
+		return err;
+
+	mrq->host = host;
+
+	mmc_mrq_pr_debug(host, mrq, true);
+
+	err = mmc_mrq_prep(host, mrq);
+	if (err)
+		return err;
+
+	err = mmc_packed_queue_request(host->packed, mrq);
+	if (err)
+		return err;
+
+	trace_mmc_request_start(host, mrq);
+
+	return 0;
+}
+EXPORT_SYMBOL(mmc_packed_start_req);
+
+void mmc_packed_request_done(struct mmc_host *host, struct mmc_request *mrq)
+{
+	mmc_should_fail_request(host, mrq);
+
+	/* Flag re-tuning needed on CRC errors */
+	if (mrq->data && mrq->data->error == -EILSEQ)
+		mmc_retune_needed(host);
+
+	trace_mmc_request_done(host, mrq);
+
+	if (mrq->data) {
+		pr_debug("%s:     %d bytes transferred: %d\n",
+			 mmc_hostname(host),
+			 mrq->data->bytes_xfered, mrq->data->error);
+	}
+
+	mrq->done(mrq);
+}
+EXPORT_SYMBOL(mmc_packed_request_done);
+
 /**
  *	mmc_cqe_request_done - CQE has finished processing an MMC request
  *	@host: MMC host which completed request
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index 328c78d..b88b3b3 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -138,6 +138,9 @@ static inline void mmc_claim_host(struct mmc_host *host)
 void mmc_cqe_post_req(struct mmc_host *host, struct mmc_request *mrq);
 int mmc_cqe_recovery(struct mmc_host *host);
 
+int mmc_packed_start_req(struct mmc_host *host, struct mmc_request *mrq);
+void mmc_packed_request_done(struct mmc_host *host, struct mmc_request *mrq);
+
 /**
  *	mmc_pre_req - Prepare for a new request
  *	@host: MMC host to prepare command
diff --git a/drivers/mmc/core/packed.c b/drivers/mmc/core/packed.c
new file mode 100644
index 0000000..91b7e9d
--- /dev/null
+++ b/drivers/mmc/core/packed.c
@@ -0,0 +1,478 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// MMC packed request support
+//
+// Copyright (C) 2019 Linaro, Inc.
+// Author: Baolin Wang <baolin.wang@linaro.org>
+
+#include <linux/slab.h>
+#include <linux/export.h>
+#include <linux/mmc/card.h>
+#include <linux/mmc/host.h>
+#include <linux/mmc/mmc.h>
+#include <linux/mmc/packed.h>
+
+#include "block.h"
+#include "card.h"
+#include "core.h"
+#include "host.h"
+#include "queue.h"
+
+#define MMC_PACKED_REQ_DIR(mrq)					\
+	(((mrq)->cmd->opcode == MMC_READ_MULTIPLE_BLOCK ||	\
+	  (mrq)->cmd->opcode == MMC_READ_SINGLE_BLOCK) ? READ : WRITE)
+
+static void mmc_packed_allow_pump(struct mmc_packed *packed)
+{
+	struct mmc_packed_request *prq = &packed->prq;
+	unsigned long flags, remains;
+	bool need_pump;
+
+	/* Allow requests to be pumped after completing previous requests. */
+	spin_lock_irqsave(&packed->lock, flags);
+	prq->nr_reqs = 0;
+	need_pump = !packed->rqs_pending;
+	remains = packed->rqs_len;
+
+	if (packed->waiting_for_idle && !remains) {
+		packed->waiting_for_idle = false;
+		wake_up(&packed->wait_queue);
+	}
+
+	spin_unlock_irqrestore(&packed->lock, flags);
+
+	/*
+	 * If there are not enough requests in queue and the request pending
+	 * flag was set, then do not pump requests here, and let the
+	 * mmc_blk_packed_issue_rw_rq() combine more requests and pump them.
+	 */
+	if ((need_pump && remains > 0) || remains >= packed->max_entries)
+		mmc_packed_pump_requests(packed);
+}
+
+static void mmc_packed_complete_work(struct work_struct *work)
+{
+	struct mmc_packed *packed =
+		container_of(work, struct mmc_packed, complete_work);
+	struct mmc_request *mrq, *t;
+	unsigned long flags;
+	LIST_HEAD(head);
+
+	spin_lock_irqsave(&packed->lock, flags);
+	list_splice_tail_init(&packed->complete_list, &head);
+	spin_unlock_irqrestore(&packed->lock, flags);
+
+	list_for_each_entry_safe(mrq, t, &head, packed_list) {
+		list_del(&mrq->packed_list);
+		mmc_packed_request_done(packed->host, mrq);
+	}
+
+	mmc_packed_allow_pump(packed);
+}
+
+/**
+ * mmc_packed_finalize_requests - finalize one packed request
+ * if the packed request is done
+ * @host: the host controller
+ * @prq: the packed request need to be finalized
+ * @err: error number
+ */
+void mmc_packed_finalize_requests(struct mmc_host *host,
+				  struct mmc_packed_request *prq)
+{
+	struct mmc_packed *packed = host->packed;
+	struct mmc_request *mrq, *t;
+	LIST_HEAD(head);
+	unsigned long flags;
+
+	if (packed->ops->unprepare_hardware &&
+	    packed->ops->unprepare_hardware(packed))
+		pr_err("failed to unprepare hardware\n");
+
+	/*
+	 * Clear busy flag to let more requests link into MMC packed queue,
+	 * but now we can not pump them to controller, we should wait for all
+	 * requests are completed. During the period of completing request,
+	 * we should collect more requests from block layer as much as possible.
+	 */
+	spin_lock_irqsave(&packed->lock, flags);
+	list_splice_tail_init(&prq->list, &head);
+	packed->busy = false;
+	spin_unlock_irqrestore(&packed->lock, flags);
+
+	list_for_each_entry_safe(mrq, t, &head, packed_list) {
+		if (mmc_host_done_complete(host)) {
+			list_del(&mrq->packed_list);
+
+			mmc_packed_request_done(host, mrq);
+		}
+	}
+
+	/*
+	 * If we cannot complete these requests in this context, so
+	 * queue a work to do this.
+	 *
+	 * Note: we must make sure all requests are completed before
+	 * pumping new requests to host controller.
+	 */
+	if (!mmc_host_done_complete(host)) {
+		spin_lock_irqsave(&packed->lock, flags);
+		list_splice_tail_init(&head, &packed->complete_list);
+		spin_unlock_irqrestore(&packed->lock, flags);
+
+		schedule_work(&packed->complete_work);
+		return;
+	}
+
+	mmc_packed_allow_pump(packed);
+}
+EXPORT_SYMBOL_GPL(mmc_packed_finalize_requests);
+
+/**
+ * mmc_packed_queue_length - return the request number in MMC packed queue
+ * @packed: the mmc_packed
+ */
+unsigned long mmc_packed_queue_length(struct mmc_packed *packed)
+{
+	unsigned long flags;
+	unsigned long len;
+
+	spin_lock_irqsave(&packed->lock, flags);
+	len = packed->rqs_len;
+	spin_unlock_irqrestore(&packed->lock, flags);
+
+	return len;
+}
+EXPORT_SYMBOL_GPL(mmc_packed_queue_is_empty);
+
+/**
+ * mmc_packed_queue_is_busy - if the MMC packed queue is busy or not
+ * @packed: the mmc_packed
+ *
+ * If the MMC hardware is busy now, we should not add more rquests
+ * into MMC packed queue, instead we should return busy to block layer,
+ * to make block layer tell MMC layer there are more requests will be coming.
+ */
+bool mmc_packed_queue_is_busy(struct mmc_packed *packed)
+{
+	unsigned long flags;
+	bool busy;
+
+	spin_lock_irqsave(&packed->lock, flags);
+	busy = packed->busy;
+	spin_unlock_irqrestore(&packed->lock, flags);
+
+	return busy;
+}
+EXPORT_SYMBOL_GPL(mmc_packed_queue_is_busy);
+
+/**
+ * mmc_packed_queue_commit_rqs - tell us more request will be coming
+ * @packed: the mmc_packed
+ */
+void mmc_packed_queue_commit_rqs(struct mmc_packed *packed)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&packed->lock, flags);
+
+	/* Set pending flag which indicates more request will be coming */
+	if (!packed->rqs_pending)
+		packed->rqs_pending = true;
+
+	spin_unlock_irqrestore(&packed->lock, flags);
+}
+EXPORT_SYMBOL_GPL(mmc_packed_queue_is_busy);
+
+/**
+ * mmc_packed_queue_request - add one mmc request into packed list
+ * @packed: the mmc_packed
+ * @mrq: the MMC request
+ */
+int mmc_packed_queue_request(struct mmc_packed *packed,
+			     struct mmc_request *mrq)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&packed->lock, flags);
+
+	if (!packed->running) {
+		spin_unlock_irqrestore(&packed->lock, flags);
+		return -ESHUTDOWN;
+	}
+
+	list_add_tail(&mrq->packed_list, &packed->list);
+
+	/* New request comes, then clear pending flag */
+	if (packed->rqs_pending)
+		packed->rqs_pending = false;
+
+	packed->rqs_len++;
+	spin_unlock_irqrestore(&packed->lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mmc_packed_queue_request);
+
+/**
+ * mmc_packed_queue_start - start the MMC packed queue
+ * @packed: the mmc_packed
+ */
+int mmc_packed_queue_start(struct mmc_packed *packed)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&packed->lock, flags);
+
+	if (packed->running || packed->busy) {
+		spin_unlock_irqrestore(&packed->lock, flags);
+		return -EBUSY;
+	}
+
+	packed->running = true;
+	spin_unlock_irqrestore(&packed->lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mmc_packed_queue_start);
+
+static bool mmc_packed_queue_is_idle(struct mmc_packed *packed)
+{
+	unsigned long flags;
+	bool is_idle;
+
+	spin_lock_irqsave(&packed->lock, flags);
+	is_idle = !packed->prq.nr_reqs && list_empty(&packed->list);
+
+	packed->waiting_for_idle = !is_idle;
+	spin_unlock_irqrestore(&packed->lock, flags);
+
+	return is_idle;
+}
+
+/**
+ * mmc_packed_queue_stop - stop the MMC packed queue
+ * @packed: the mmc_packed
+ */
+int mmc_packed_queue_stop(struct mmc_packed *packed)
+{
+	unsigned long flags;
+	u32 timeout = 500;
+	int ret;
+
+	ret = wait_event_timeout(packed->wait_queue,
+				 mmc_packed_queue_is_idle(packed),
+				 msecs_to_jiffies(timeout));
+	if (ret == 0) {
+		pr_warn("could not stop mmc packed queue\n");
+		return -ETIMEDOUT;
+	}
+
+	spin_lock_irqsave(&packed->lock, flags);
+	packed->running = false;
+	spin_unlock_irqrestore(&packed->lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mmc_packed_queue_stop);
+
+/**
+ * mmc_packed_wait_for_idle - wait for all requests are finished
+ * @packed: the mmc_packed
+ */
+int mmc_packed_wait_for_idle(struct mmc_packed *packed)
+{
+	wait_event(packed->wait_queue, mmc_packed_queue_is_idle(packed));
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mmc_packed_wait_for_idle);
+
+/**
+ * mmc_packed_algo_rw - the algorithm to pack read or write requests
+ * @packed: the mmc_packed
+ *
+ * TODO: we can add more condition to decide if we can package this
+ * request or not.
+ */
+void mmc_packed_algo_rw(struct mmc_packed *packed)
+{
+	struct mmc_packed_request *prq = &packed->prq;
+	struct mmc_request *mrq, *t;
+	u32 i = 0;
+
+	list_for_each_entry_safe(mrq, t, &packed->list, packed_list) {
+		if (++i > packed->max_entries)
+			break;
+
+		list_move_tail(&mrq->packed_list, &prq->list);
+		prq->nr_reqs++;
+	}
+}
+EXPORT_SYMBOL_GPL(mmc_packed_algo_rw);
+
+/**
+ * mmc_packed_algo_ro - the algorithm only to pack read requests
+ * @packed: the mmc_packed
+ *
+ * TODO: more condition need to consider
+ */
+void mmc_packed_algo_ro(struct mmc_packed *packed)
+{
+	struct mmc_packed_request *prq = &packed->prq;
+	struct mmc_request *mrq, *t;
+	u32 i = 0;
+
+	list_for_each_entry_safe(mrq, t, &packed->list, packed_list) {
+		if (++i > packed->max_entries)
+			break;
+
+		if (MMC_PACKED_REQ_DIR(mrq) != READ) {
+			if (!prq->nr_reqs) {
+				list_move_tail(&mrq->packed_list, &prq->list);
+				prq->nr_reqs = 1;
+			}
+
+			break;
+		}
+
+		list_move_tail(&mrq->packed_list, &prq->list);
+		prq->nr_reqs++;
+	}
+}
+EXPORT_SYMBOL_GPL(mmc_packed_algo_ro);
+
+/**
+ * mmc_packed_algo_wo - the algorithm only to pack write requests
+ * @packed: the mmc_packed
+ *
+ * TODO: more condition need to consider
+ */
+void mmc_packed_algo_wo(struct mmc_packed *packed)
+{
+	struct mmc_packed_request *prq = &packed->prq;
+	struct mmc_request *mrq, *t;
+	u32 i = 0;
+
+	list_for_each_entry_safe(mrq, t, &packed->list, packed_list) {
+		if (++i > packed->max_entries)
+			break;
+
+		if (MMC_PACKED_REQ_DIR(mrq) != WRITE) {
+			if (!prq->nr_reqs) {
+				list_move_tail(&mrq->packed_list, &prq->list);
+				prq->nr_reqs = 1;
+			}
+
+			break;
+		}
+
+		list_move_tail(&mrq->packed_list, &prq->list);
+		prq->nr_reqs++;
+	}
+}
+EXPORT_SYMBOL_GPL(mmc_packed_algo_wo);
+
+/**
+ * mmc_packed_pump_requests - start to pump packed request to host controller
+ * @packed: the mmc_packed
+ */
+void mmc_packed_pump_requests(struct mmc_packed *packed)
+{
+	struct mmc_packed_request *prq = &packed->prq;
+	struct mmc_host *host = packed->host;
+	struct mmc_request *mrq;
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&packed->lock, flags);
+
+	/* Make sure we are not already running a packed request */
+	if (packed->prq.nr_reqs) {
+		spin_unlock_irqrestore(&packed->lock, flags);
+		return;
+	}
+
+	/* Make sure there are remain requests need to pump */
+	if (list_empty(&packed->list) || !packed->running) {
+		spin_unlock_irqrestore(&packed->lock, flags);
+		return;
+	}
+
+	/* Try to package requests */
+	packed->ops->packed_algo(packed);
+
+	packed->rqs_len -= packed->prq.nr_reqs;
+	packed->busy = true;
+
+	spin_unlock_irqrestore(&packed->lock, flags);
+
+	if (packed->ops->prepare_hardware) {
+		ret = packed->ops->prepare_hardware(packed);
+		if (ret) {
+			pr_err("failed to prepare hardware\n");
+			goto error;
+		}
+	}
+
+	ret = packed->ops->packed_request(packed, prq);
+	if (ret) {
+		pr_err("failed to packed requests\n");
+		goto error;
+	}
+
+	return;
+
+error:
+	spin_lock_irqsave(&packed->lock, flags);
+
+	list_for_each_entry(mrq, &packed->prq.list, packed_list) {
+		struct mmc_data *data = mrq->data;
+
+		data->error = ret;
+		data->bytes_xfered = 0;
+	}
+
+	spin_unlock_irqrestore(&packed->lock, flags);
+
+	mmc_packed_finalize_requests(host, prq);
+}
+EXPORT_SYMBOL_GPL(mmc_packed_pump_requests);
+
+int mmc_packed_init(struct mmc_host *host, const struct mmc_packed_ops *ops,
+		    int max_packed)
+{
+	struct mmc_packed *packed;
+
+	packed = kzalloc(sizeof(struct mmc_packed), GFP_KERNEL);
+	if (!packed)
+		return -ENOMEM;
+
+	packed->max_entries = max_packed;
+	packed->ops = ops;
+	packed->host = host;
+	spin_lock_init(&packed->lock);
+	INIT_LIST_HEAD(&packed->list);
+	INIT_LIST_HEAD(&packed->complete_list);
+	INIT_LIST_HEAD(&packed->prq.list);
+	INIT_WORK(&packed->complete_work, mmc_packed_complete_work);
+	init_waitqueue_head(&packed->wait_queue);
+
+	host->packed = packed;
+	packed->running = true;
+
+	dev_info(host->parent, "Enable MMC packed requests, max packed = %d\n",
+		 packed->max_entries);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mmc_packed_init);
+
+void mmc_packed_exit(struct mmc_host *host)
+{
+	struct mmc_packed *packed = host->packed;
+
+	mmc_packed_queue_stop(packed);
+	kfree(packed);
+	host->packed = NULL;
+}
+EXPORT_SYMBOL_GPL(mmc_packed_exit);
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index e327f80..0a1782d 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -244,7 +244,7 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct mmc_host *host = card->host;
 	enum mmc_issue_type issue_type;
 	enum mmc_issued issued;
-	bool get_card, cqe_retune_ok;
+	bool get_card, cqe_retune_ok, last = false;
 	int ret;
 
 	if (mmc_card_removed(mq->card)) {
@@ -270,6 +270,15 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		}
 		break;
 	case MMC_ISSUE_ASYNC:
+		/*
+		 * If packed request is busy now, we can return BLK_STS_RESOURCE
+		 * to tell block layer to queue them later and MMC packed layer
+		 * will try to combine requests as much as possible.
+		 */
+		if (host->packed && mmc_packed_queue_is_busy(host->packed)) {
+			spin_unlock_irq(&mq->lock);
+			return BLK_STS_RESOURCE;
+		}
 		break;
 	default:
 		/*
@@ -305,9 +314,12 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 				   !host->hold_retune;
 	}
 
+	if (host->packed)
+		last = bd->last && !blk_mq_hctx_has_pending(hctx);
+
 	blk_mq_start_request(req);
 
-	issued = mmc_blk_mq_issue_rq(mq, req);
+	issued = mmc_blk_mq_issue_rq(mq, req, last);
 
 	switch (issued) {
 	case MMC_REQ_BUSY:
@@ -339,8 +351,20 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return ret;
 }
 
+static void mmc_mq_commit_rqs(struct blk_mq_hw_ctx *hctx)
+{
+	struct mmc_queue *mq = hctx->queue->queuedata;
+	struct mmc_card *card = mq->card;
+	struct mmc_host *host = card->host;
+
+	/* Tell MMC packed more requests will be coming */
+	if (host->packed)
+		mmc_packed_queue_commit_rqs(host->packed);
+}
+
 static const struct blk_mq_ops mmc_mq_ops = {
 	.queue_rq	= mmc_mq_queue_rq,
+	.commit_rqs	= mmc_mq_commit_rqs,
 	.init_request	= mmc_mq_init_request,
 	.exit_request	= mmc_mq_exit_request,
 	.complete	= mmc_blk_mq_complete,
diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h
index b7ba881..1602556 100644
--- a/include/linux/mmc/core.h
+++ b/include/linux/mmc/core.h
@@ -165,6 +165,7 @@ struct mmc_request {
 	bool			cap_cmd_during_tfr;
 
 	int			tag;
+	struct list_head	packed_list;
 };
 
 struct mmc_card;
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 4a351cb..8ecc244 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -13,6 +13,7 @@
 
 #include <linux/mmc/core.h>
 #include <linux/mmc/card.h>
+#include <linux/mmc/packed.h>
 #include <linux/mmc/pm.h>
 #include <linux/dma-direction.h>
 
@@ -441,6 +442,8 @@ struct mmc_host {
 	/* Ongoing data transfer that allows commands during transfer */
 	struct mmc_request	*ongoing_mrq;
 
+	struct mmc_packed	*packed;
+
 #ifdef CONFIG_FAIL_MMC_REQUEST
 	struct fault_attr	fail_mmc_request;
 #endif
diff --git a/include/linux/mmc/packed.h b/include/linux/mmc/packed.h
new file mode 100644
index 0000000..a952889
--- /dev/null
+++ b/include/linux/mmc/packed.h
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2019 Linaro, Inc.
+// Author: Baolin Wang <baolin.wang@linaro.org>
+
+#ifndef MMC_PACKED_H
+#define MMC_PACKED_H
+
+#include <linux/list.h>
+#include <linux/mmc/core.h>
+#include <linux/wait.h>
+
+#define MMC_PACKED_MAX_REQUEST_COUNT	16
+#define MMC_PACKED_FLUSH_SIZE		(128 * 1024)
+
+struct mmc_packed;
+
+struct mmc_packed_request {
+	struct list_head list;
+	u32 nr_reqs;
+};
+
+struct mmc_packed_ops {
+	void (*packed_algo)(struct mmc_packed *packed);
+	int (*prepare_hardware)(struct mmc_packed *packed);
+	int (*unprepare_hardware)(struct mmc_packed *packed);
+	int (*packed_request)(struct mmc_packed *packed,
+			      struct mmc_packed_request *prq);
+};
+
+struct mmc_packed {
+	struct list_head list;
+	bool busy;
+	bool rqs_pending;
+	bool running;
+	bool waiting_for_idle;
+	spinlock_t lock;
+	u32 max_entries;
+	unsigned long rqs_len;
+
+	struct mmc_host *host;
+	struct mmc_packed_request prq;
+	const struct mmc_packed_ops *ops;
+
+	struct work_struct complete_work;
+	struct list_head complete_list;
+
+	wait_queue_head_t wait_queue;
+};
+
+#ifdef CONFIG_MMC_PACKED
+int mmc_packed_init(struct mmc_host *host, const struct mmc_packed_ops *ops,
+		    int max_packed);
+void mmc_packed_exit(struct mmc_host *host);
+void mmc_packed_finalize_requests(struct mmc_host *host,
+				  struct mmc_packed_request *prq);
+int mmc_packed_queue_request(struct mmc_packed *packed,
+			      struct mmc_request *mrq);
+void mmc_packed_pump_requests(struct mmc_packed *packed);
+bool mmc_packed_queue_is_busy(struct mmc_packed *packed);
+unsigned long mmc_packed_queue_length(struct mmc_packed *packed);
+void mmc_packed_queue_commit_rqs(struct mmc_packed *packed);
+int mmc_packed_wait_for_idle(struct mmc_packed *packed);
+
+int mmc_packed_queue_start(struct mmc_packed *packed);
+int mmc_packed_queue_stop(struct mmc_packed *packed);
+
+/* Some packed algorithm helpers */
+void mmc_packed_algo_rw(struct mmc_packed *packed);
+void mmc_packed_algo_ro(struct mmc_packed *packed);
+void mmc_packed_algo_wo(struct mmc_packed *packed);
+#else
+static inline int mmc_packed_init(struct mmc_host *host,
+				  const struct mmc_packed_ops *ops,
+				  int max_packed)
+{
+	return 0;
+}
+static inline void mmc_packed_exit(struct mmc_host *host)
+{ }
+static inline void mmc_packed_finalize_requests(struct mmc_host *host,
+						struct mmc_packed_request *prq)
+{ }
+static inline int mmc_packed_queue_request(struct mmc_packed *packed,
+					   struct mmc_request *mrq)
+{
+	return -EINVAL;
+}
+static inline int mmc_packed_pump_requests(struct mmc_packed *packed)
+{
+	return -EINVAL;
+}
+static inline bool mmc_packed_queue_is_busy(struct mmc_packed *packed)
+{
+	return false;
+}
+static inline unsigned long mmc_packed_queue_length(struct mmc_packed *packed)
+{
+	return 0;
+}
+static inline void mmc_packed_queue_commit_rqs(struct mmc_packed *packed)
+{ }
+static inline int mmc_packed_wait_for_idle(struct mmc_packed *packed)
+{
+	return -EBUSY;
+}
+static inline int mmc_packed_queue_start(struct mmc_packed *packed)
+{
+	return -EINVAL;
+}
+static inline int mmc_packed_queue_stop(struct mmc_packed *packed)
+{
+	return -EINVAL;
+}
+static inline void mmc_packed_algo_rw(struct mmc_packed *packed)
+{ }
+static inline void mmc_packed_algo_ro(struct mmc_packed *packed)
+{ }
+static inline void mmc_packed_algo_wo(struct mmc_packed *packed)
+{ }
+
+#endif
+
+#endif
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 3/7] mmc: host: sdhci: Introduce ADMA3 transfer mode
  2019-07-22 13:09 [RFC PATCH 0/7] Add MMC packed function Baolin Wang
  2019-07-22 13:09 ` [RFC PATCH 1/7] blk-mq: Export blk_mq_hctx_has_pending() function Baolin Wang
  2019-07-22 13:09 ` [RFC PATCH 2/7] mmc: core: Add MMC packed request function Baolin Wang
@ 2019-07-22 13:09 ` Baolin Wang
  2019-07-22 13:09 ` [RFC PATCH 4/7] mmc: host: sdhci: Factor out the command configuration Baolin Wang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2019-07-22 13:09 UTC (permalink / raw)
  To: axboe, adrian.hunter, ulf.hansson
  Cc: zhang.lyra, orsonzhai, arnd, linus.walleij, baolin.wang,
	vincent.guittot, linux-mmc, linux-kernel, linux-block

The standard SD host controller can support ADMA3 transfer mode optionally.
The ADMA3 uses command descriptor to issue an SD command, and a multi-block
data transfer is programmed by using a pair of command descriptor and ADMA2
descriptor. ADMA3 performs multiple of multi-block data transfer by using
integrated descriptor.

This is a preparation patch to add ADMA3 structures and help to expand the
ADMA buffer for support ADMA3 transfer mode.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
 drivers/mmc/host/sdhci.c |  105 ++++++++++++++++++++++++++++++++++++++--------
 drivers/mmc/host/sdhci.h |   48 +++++++++++++++++++++
 2 files changed, 136 insertions(+), 17 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 59acf8e..e57a5b7 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -240,7 +240,7 @@ static void sdhci_do_reset(struct sdhci_host *host, u8 mask)
 	host->ops->reset(host, mask);
 
 	if (mask & SDHCI_RESET_ALL) {
-		if (host->flags & (SDHCI_USE_SDMA | SDHCI_USE_ADMA)) {
+		if (host->flags & (SDHCI_USE_SDMA | SDHCI_USE_ADMA | SDHCI_USE_ADMA3)) {
 			if (host->ops->enable_dma)
 				host->ops->enable_dma(host);
 		}
@@ -3750,10 +3750,17 @@ int sdhci_setup_host(struct sdhci_host *host)
 		(host->caps & SDHCI_CAN_DO_ADMA2))
 		host->flags |= SDHCI_USE_ADMA;
 
+	if ((host->quirks2 & SDHCI_QUIRK2_USE_ADMA3_SUPPORT) &&
+	    (host->flags & SDHCI_USE_ADMA) &&
+	    (host->caps1 & SDHCI_CAN_DO_ADMA3)) {
+		DBG("Enable ADMA3 mode for data transfer\n");
+		host->flags |= SDHCI_USE_ADMA3;
+	}
+
 	if ((host->quirks & SDHCI_QUIRK_BROKEN_ADMA) &&
 		(host->flags & SDHCI_USE_ADMA)) {
 		DBG("Disabling ADMA as it is marked broken\n");
-		host->flags &= ~SDHCI_USE_ADMA;
+		host->flags &= ~(SDHCI_USE_ADMA | SDHCI_USE_ADMA3);
 	}
 
 	/*
@@ -3775,7 +3782,7 @@ int sdhci_setup_host(struct sdhci_host *host)
 		if (ret) {
 			pr_warn("%s: No suitable DMA available - falling back to PIO\n",
 				mmc_hostname(mmc));
-			host->flags &= ~(SDHCI_USE_SDMA | SDHCI_USE_ADMA);
+			host->flags &= ~(SDHCI_USE_SDMA | SDHCI_USE_ADMA | SDHCI_USE_ADMA3);
 
 			ret = 0;
 		}
@@ -3799,31 +3806,68 @@ int sdhci_setup_host(struct sdhci_host *host)
 			host->desc_sz = SDHCI_ADMA2_32_DESC_SZ;
 		}
 
+		host->adma3_table_cnt = 1;
+
+		if (host->flags & SDHCI_USE_ADMA3) {
+			/* We can pack maximum 16 requests once */
+			host->adma3_table_cnt = SDHCI_MAX_ADMA3_ENTRIES;
+
+			if (host->flags & SDHCI_USE_64_BIT_DMA)
+				host->integr_desc_sz = SDHCI_INTEGR_64_DESC_SZ;
+			else
+				host->integr_desc_sz = SDHCI_INTEGR_32_DESC_SZ;
+
+			host->cmd_desc_sz = SDHCI_ADMA3_CMD_DESC_SZ;
+			host->cmd_table_sz = host->adma3_table_cnt *
+				SDHCI_ADMA3_CMD_DESC_SZ * SDHCI_ADMA3_CMD_DESC_ENTRIES;
+
+			buf = dma_alloc_coherent(mmc_dev(mmc),
+						 host->adma3_table_cnt *
+						 host->integr_desc_sz,
+						 &dma, GFP_KERNEL);
+			if (!buf) {
+				pr_warn("%s: Unable to allocate ADMA3 integrated buffers - falling back to ADMA\n",
+					mmc_hostname(mmc));
+				host->flags &= ~SDHCI_USE_ADMA3;
+				host->adma3_table_cnt = 1;
+			} else {
+				host->integr_table = buf;
+				host->integr_addr = dma;
+			}
+		}
+
 		host->align_buffer_sz = SDHCI_MAX_SEGS * SDHCI_ADMA2_ALIGN;
 		/*
 		 * Use zalloc to zero the reserved high 32-bits of 128-bit
 		 * descriptors so that they never need to be written.
 		 */
 		buf = dma_alloc_coherent(mmc_dev(mmc),
-					 host->align_buffer_sz + host->adma_table_sz,
+					 host->align_buffer_sz *
+					 host->adma3_table_cnt +
+					 host->cmd_table_sz +
+					 host->adma_table_sz *
+					 host->adma3_table_cnt,
 					 &dma, GFP_KERNEL);
 		if (!buf) {
 			pr_warn("%s: Unable to allocate ADMA buffers - falling back to standard DMA\n",
 				mmc_hostname(mmc));
-			host->flags &= ~SDHCI_USE_ADMA;
-		} else if ((dma + host->align_buffer_sz) &
+			host->flags &= ~(SDHCI_USE_ADMA | SDHCI_USE_ADMA3);
+		} else if ((dma + host->align_buffer_sz * host->adma3_table_cnt) &
 			   (SDHCI_ADMA2_DESC_ALIGN - 1)) {
 			pr_warn("%s: unable to allocate aligned ADMA descriptor\n",
 				mmc_hostname(mmc));
-			host->flags &= ~SDHCI_USE_ADMA;
-			dma_free_coherent(mmc_dev(mmc), host->align_buffer_sz +
-					  host->adma_table_sz, buf, dma);
+			host->flags &= ~(SDHCI_USE_ADMA | SDHCI_USE_ADMA3);
+			dma_free_coherent(mmc_dev(mmc), host->align_buffer_sz *
+					  host->adma3_table_cnt +
+					  host->cmd_table_sz +
+					  host->adma_table_sz *
+					  host->adma3_table_cnt, buf, dma);
 		} else {
 			host->align_buffer = buf;
 			host->align_addr = dma;
 
-			host->adma_table = buf + host->align_buffer_sz;
-			host->adma_addr = dma + host->align_buffer_sz;
+			host->adma_table = buf + host->align_buffer_sz * host->adma3_table_cnt;
+			host->adma_addr = dma + host->align_buffer_sz * host->adma3_table_cnt;
 		}
 	}
 
@@ -4222,12 +4266,21 @@ int sdhci_setup_host(struct sdhci_host *host)
 		regulator_disable(mmc->supply.vqmmc);
 undma:
 	if (host->align_buffer)
-		dma_free_coherent(mmc_dev(mmc), host->align_buffer_sz +
-				  host->adma_table_sz, host->align_buffer,
+		dma_free_coherent(mmc_dev(mmc),
+				  host->align_buffer_sz * host->adma3_table_cnt +
+				  host->cmd_table_sz +
+				  host->adma_table_sz * host->adma3_table_cnt,
+				  host->align_buffer,
 				  host->align_addr);
 	host->adma_table = NULL;
 	host->align_buffer = NULL;
 
+	if (host->integr_table)
+		dma_free_coherent(mmc_dev(mmc),
+				  host->adma3_table_cnt * host->integr_desc_sz,
+				  host->integr_table, host->integr_addr);
+	host->integr_table = NULL;
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(sdhci_setup_host);
@@ -4240,11 +4293,20 @@ void sdhci_cleanup_host(struct sdhci_host *host)
 		regulator_disable(mmc->supply.vqmmc);
 
 	if (host->align_buffer)
-		dma_free_coherent(mmc_dev(mmc), host->align_buffer_sz +
-				  host->adma_table_sz, host->align_buffer,
+		dma_free_coherent(mmc_dev(mmc),
+				  host->align_buffer_sz * host->adma3_table_cnt +
+				  host->cmd_table_sz +
+				  host->adma_table_sz * host->adma3_table_cnt,
+				  host->align_buffer,
 				  host->align_addr);
 	host->adma_table = NULL;
 	host->align_buffer = NULL;
+
+	if (host->integr_table)
+		dma_free_coherent(mmc_dev(mmc),
+				  host->adma3_table_cnt * host->integr_desc_sz,
+				  host->integr_table, host->integr_addr);
+	host->integr_table = NULL;
 }
 EXPORT_SYMBOL_GPL(sdhci_cleanup_host);
 
@@ -4372,12 +4434,21 @@ void sdhci_remove_host(struct sdhci_host *host, int dead)
 		regulator_disable(mmc->supply.vqmmc);
 
 	if (host->align_buffer)
-		dma_free_coherent(mmc_dev(mmc), host->align_buffer_sz +
-				  host->adma_table_sz, host->align_buffer,
+		dma_free_coherent(mmc_dev(mmc),
+				  host->align_buffer_sz * host->adma3_table_cnt +
+				  host->cmd_table_sz +
+				  host->adma_table_sz * host->adma3_table_cnt,
+				  host->align_buffer,
 				  host->align_addr);
 
 	host->adma_table = NULL;
 	host->align_buffer = NULL;
+
+	if (host->integr_table)
+		dma_free_coherent(mmc_dev(mmc),
+				  host->adma3_table_cnt * host->integr_desc_sz,
+				  host->integr_table, host->integr_addr);
+	host->integr_table = NULL;
 }
 
 EXPORT_SYMBOL_GPL(sdhci_remove_host);
diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 89fd965..010cc29 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -273,6 +273,9 @@
 #define SDHCI_PRESET_SDCLK_FREQ_MASK   0x3FF
 #define SDHCI_PRESET_SDCLK_FREQ_SHIFT	0
 
+#define SDHCI_ADMA3_ADDRESS	0x78
+#define SDHCI_ADMA3_ADDRESS_HI	0x7c
+
 #define SDHCI_SLOT_INT_STATUS	0xFC
 
 #define SDHCI_HOST_VERSION	0xFE
@@ -345,6 +348,41 @@ struct sdhci_adma2_64_desc {
 #define ADMA2_NOP_END_VALID	0x3
 #define ADMA2_END		0x2
 
+#define SDHCI_MAX_ADMA3_ENTRIES	16
+
+/* ADMA3 command descriptor */
+struct sdhci_adma3_cmd_desc {
+	__le32	cmd;
+	__le32	reg;
+}  __packed __aligned(4);
+
+#define ADMA3_TRAN_VALID	0x9
+#define ADMA3_TRAN_END		0xb
+
+/* ADMA3 command descriptor size */
+#define SDHCI_ADMA3_CMD_DESC_ENTRIES	4
+#define SDHCI_ADMA3_CMD_DESC_SZ		8
+
+/* ADMA3 integrated 32-bit descriptor */
+struct sdhci_integr_32_desc {
+	__le32	cmd;
+	__le32	addr;
+}  __packed __aligned(4);
+
+#define SDHCI_INTEGR_32_DESC_SZ		8
+
+/* ADMA3 integrated 64-bit descriptor. */
+struct sdhci_integr_64_desc {
+	__le32	cmd;
+	__le32	addr_lo;
+	__le32	addr_hi;
+}  __packed __aligned(4);
+
+#define SDHCI_INTEGR_64_DESC_SZ		16
+
+#define ADMA3_INTEGR_TRAN_VALID		0x39
+#define ADMA3_INTEGR_TRAN_END		0x3b
+
 /*
  * Maximum segments assuming a 512KiB maximum requisition size and a minimum
  * 4KiB page size.
@@ -481,6 +519,8 @@ struct sdhci_host {
  * block count.
  */
 #define SDHCI_QUIRK2_USE_32BIT_BLK_CNT			(1<<18)
+/* use ADMA3 for data read/write if hardware supports */
+#define SDHCI_QUIRK2_USE_ADMA3_SUPPORT			(1<<19)
 
 	int irq;		/* Device IRQ */
 	void __iomem *ioaddr;	/* Mapped address */
@@ -517,6 +557,7 @@ struct sdhci_host {
 #define SDHCI_SIGNALING_330	(1<<14)	/* Host is capable of 3.3V signaling */
 #define SDHCI_SIGNALING_180	(1<<15)	/* Host is capable of 1.8V signaling */
 #define SDHCI_SIGNALING_120	(1<<16)	/* Host is capable of 1.2V signaling */
+#define SDHCI_USE_ADMA3		(1<<17)	/* Host is ADMA3 capable */
 
 	unsigned int version;	/* SDHCI spec. version */
 
@@ -547,14 +588,19 @@ struct sdhci_host {
 
 	void *adma_table;	/* ADMA descriptor table */
 	void *align_buffer;	/* Bounce buffer */
+	void *integr_table;	/* ADMA3 intergrate descriptor table */
 
 	size_t adma_table_sz;	/* ADMA descriptor table size */
 	size_t align_buffer_sz;	/* Bounce buffer size */
+	size_t cmd_table_sz;	/* ADMA3 command descriptor table size */
 
 	dma_addr_t adma_addr;	/* Mapped ADMA descr. table */
 	dma_addr_t align_addr;	/* Mapped bounce buffer */
+	dma_addr_t integr_addr;	/* Mapped ADMA3 intergrate descr. table */
 
 	unsigned int desc_sz;	/* ADMA descriptor size */
+	unsigned int cmd_desc_sz;	/* ADMA3 command descriptor size */
+	unsigned int integr_desc_sz;	/* ADMA3 intergrate descriptor size */
 
 	struct workqueue_struct *complete_wq;	/* Request completion wq */
 	struct work_struct	complete_work;	/* Request completion work */
@@ -600,6 +646,8 @@ struct sdhci_host {
 
 	/* Host ADMA table count */
 	u32			adma_table_cnt;
+	/* Host ADMA3 table count */
+	u32			adma3_table_cnt;
 
 	u64			data_timeout;
 
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 4/7] mmc: host: sdhci: Factor out the command configuration
  2019-07-22 13:09 [RFC PATCH 0/7] Add MMC packed function Baolin Wang
                   ` (2 preceding siblings ...)
  2019-07-22 13:09 ` [RFC PATCH 3/7] mmc: host: sdhci: Introduce ADMA3 transfer mode Baolin Wang
@ 2019-07-22 13:09 ` Baolin Wang
  2019-07-22 13:09 ` [RFC PATCH 5/7] mmc: host: sdhci: Remove redundant sg_count member of struct sdhci_host Baolin Wang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2019-07-22 13:09 UTC (permalink / raw)
  To: axboe, adrian.hunter, ulf.hansson
  Cc: zhang.lyra, orsonzhai, arnd, linus.walleij, baolin.wang,
	vincent.guittot, linux-mmc, linux-kernel, linux-block

Move the SD command configuration into one separate function to simplify
the sdhci_send_command(). Moreover this function can be used to support
ADMA3 transfer mode in following patches.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
 drivers/mmc/host/sdhci.c |   65 +++++++++++++++++++++++++++-------------------
 1 file changed, 38 insertions(+), 27 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index e57a5b7..5760b7c 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -1339,9 +1339,43 @@ static void sdhci_finish_data(struct sdhci_host *host)
 	}
 }
 
-void sdhci_send_command(struct sdhci_host *host, struct mmc_command *cmd)
+static int sdhci_get_command(struct sdhci_host *host, struct mmc_command *cmd)
 {
 	int flags;
+
+	if ((cmd->flags & MMC_RSP_136) && (cmd->flags & MMC_RSP_BUSY)) {
+		pr_err("%s: Unsupported response type!\n",
+			mmc_hostname(host->mmc));
+		cmd->error = -EINVAL;
+		sdhci_finish_mrq(host, cmd->mrq);
+		return -EINVAL;
+	}
+
+	if (!(cmd->flags & MMC_RSP_PRESENT))
+		flags = SDHCI_CMD_RESP_NONE;
+	else if (cmd->flags & MMC_RSP_136)
+		flags = SDHCI_CMD_RESP_LONG;
+	else if (cmd->flags & MMC_RSP_BUSY)
+		flags = SDHCI_CMD_RESP_SHORT_BUSY;
+	else
+		flags = SDHCI_CMD_RESP_SHORT;
+
+	if (cmd->flags & MMC_RSP_CRC)
+		flags |= SDHCI_CMD_CRC;
+	if (cmd->flags & MMC_RSP_OPCODE)
+		flags |= SDHCI_CMD_INDEX;
+
+	/* CMD19 is special in that the Data Present Select should be set */
+	if (cmd->data || cmd->opcode == MMC_SEND_TUNING_BLOCK ||
+	    cmd->opcode == MMC_SEND_TUNING_BLOCK_HS200)
+		flags |= SDHCI_CMD_DATA;
+
+	return SDHCI_MAKE_CMD(cmd->opcode, flags);
+}
+
+void sdhci_send_command(struct sdhci_host *host, struct mmc_command *cmd)
+{
+	int command;
 	u32 mask;
 	unsigned long timeout;
 
@@ -1391,32 +1425,9 @@ void sdhci_send_command(struct sdhci_host *host, struct mmc_command *cmd)
 
 	sdhci_set_transfer_mode(host, cmd);
 
-	if ((cmd->flags & MMC_RSP_136) && (cmd->flags & MMC_RSP_BUSY)) {
-		pr_err("%s: Unsupported response type!\n",
-			mmc_hostname(host->mmc));
-		cmd->error = -EINVAL;
-		sdhci_finish_mrq(host, cmd->mrq);
+	command = sdhci_get_command(host, cmd);
+	if (command < 0)
 		return;
-	}
-
-	if (!(cmd->flags & MMC_RSP_PRESENT))
-		flags = SDHCI_CMD_RESP_NONE;
-	else if (cmd->flags & MMC_RSP_136)
-		flags = SDHCI_CMD_RESP_LONG;
-	else if (cmd->flags & MMC_RSP_BUSY)
-		flags = SDHCI_CMD_RESP_SHORT_BUSY;
-	else
-		flags = SDHCI_CMD_RESP_SHORT;
-
-	if (cmd->flags & MMC_RSP_CRC)
-		flags |= SDHCI_CMD_CRC;
-	if (cmd->flags & MMC_RSP_OPCODE)
-		flags |= SDHCI_CMD_INDEX;
-
-	/* CMD19 is special in that the Data Present Select should be set */
-	if (cmd->data || cmd->opcode == MMC_SEND_TUNING_BLOCK ||
-	    cmd->opcode == MMC_SEND_TUNING_BLOCK_HS200)
-		flags |= SDHCI_CMD_DATA;
 
 	timeout = jiffies;
 	if (host->data_timeout)
@@ -1427,7 +1438,7 @@ void sdhci_send_command(struct sdhci_host *host, struct mmc_command *cmd)
 		timeout += 10 * HZ;
 	sdhci_mod_timer(host, cmd->mrq, timeout);
 
-	sdhci_writew(host, SDHCI_MAKE_CMD(cmd->opcode, flags), SDHCI_COMMAND);
+	sdhci_writew(host, command, SDHCI_COMMAND);
 }
 EXPORT_SYMBOL_GPL(sdhci_send_command);
 
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 5/7] mmc: host: sdhci: Remove redundant sg_count member of struct sdhci_host
  2019-07-22 13:09 [RFC PATCH 0/7] Add MMC packed function Baolin Wang
                   ` (3 preceding siblings ...)
  2019-07-22 13:09 ` [RFC PATCH 4/7] mmc: host: sdhci: Factor out the command configuration Baolin Wang
@ 2019-07-22 13:09 ` Baolin Wang
  2019-07-22 13:09 ` [RFC PATCH 6/7] mmc: host: sdhci: Add MMC packed request support Baolin Wang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2019-07-22 13:09 UTC (permalink / raw)
  To: axboe, adrian.hunter, ulf.hansson
  Cc: zhang.lyra, orsonzhai, arnd, linus.walleij, baolin.wang,
	vincent.guittot, linux-mmc, linux-kernel, linux-block

The mmc_data structure has a member to save the mapped sg count, so
no need introduce a redundant sg_count of struct sdhci_host, remove it.
This is also a preparation patch to support ADMA3 transfer mode.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
 drivers/mmc/host/sdhci.c |   12 +++++-------
 drivers/mmc/host/sdhci.h |    2 --
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 5760b7c..9fec82f 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -696,7 +696,7 @@ static void sdhci_adma_mark_end(void *desc)
 }
 
 static void sdhci_adma_table_pre(struct sdhci_host *host,
-	struct mmc_data *data, int sg_count)
+	struct mmc_data *data)
 {
 	struct scatterlist *sg;
 	unsigned long flags;
@@ -710,14 +710,12 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
 	 * We currently guess that it is LE.
 	 */
 
-	host->sg_count = sg_count;
-
 	desc = host->adma_table;
 	align = host->align_buffer;
 
 	align_addr = host->align_addr;
 
-	for_each_sg(data->sg, sg, host->sg_count, i) {
+	for_each_sg(data->sg, sg, data->sg_count, i) {
 		addr = sg_dma_address(sg);
 		len = sg_dma_len(sg);
 
@@ -788,7 +786,7 @@ static void sdhci_adma_table_post(struct sdhci_host *host,
 		bool has_unaligned = false;
 
 		/* Do a quick scan of the SG list for any unaligned mappings */
-		for_each_sg(data->sg, sg, host->sg_count, i)
+		for_each_sg(data->sg, sg, data->sg_count, i)
 			if (sg_dma_address(sg) & SDHCI_ADMA2_MASK) {
 				has_unaligned = true;
 				break;
@@ -800,7 +798,7 @@ static void sdhci_adma_table_post(struct sdhci_host *host,
 
 			align = host->align_buffer;
 
-			for_each_sg(data->sg, sg, host->sg_count, i) {
+			for_each_sg(data->sg, sg, data->sg_count, i) {
 				if (sg_dma_address(sg) & SDHCI_ADMA2_MASK) {
 					size = SDHCI_ADMA2_ALIGN -
 					       (sg_dma_address(sg) & SDHCI_ADMA2_MASK);
@@ -1094,7 +1092,7 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd)
 			WARN_ON(1);
 			host->flags &= ~SDHCI_REQ_USE_DMA;
 		} else if (host->flags & SDHCI_USE_ADMA) {
-			sdhci_adma_table_pre(host, data, sg_cnt);
+			sdhci_adma_table_pre(host, data);
 
 			sdhci_writel(host, host->adma_addr, SDHCI_ADMA_ADDRESS);
 			if (host->flags & SDHCI_USE_64_BIT_DMA)
diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 010cc29..4548d9c 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -584,8 +584,6 @@ struct sdhci_host {
 	struct sg_mapping_iter sg_miter;	/* SG state for PIO */
 	unsigned int blocks;	/* remaining PIO blocks */
 
-	int sg_count;		/* Mapped sg entries */
-
 	void *adma_table;	/* ADMA descriptor table */
 	void *align_buffer;	/* Bounce buffer */
 	void *integr_table;	/* ADMA3 intergrate descriptor table */
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 6/7] mmc: host: sdhci: Add MMC packed request support
  2019-07-22 13:09 [RFC PATCH 0/7] Add MMC packed function Baolin Wang
                   ` (4 preceding siblings ...)
  2019-07-22 13:09 ` [RFC PATCH 5/7] mmc: host: sdhci: Remove redundant sg_count member of struct sdhci_host Baolin Wang
@ 2019-07-22 13:09 ` Baolin Wang
  2019-07-22 13:09 ` [RFC PATCH 7/7] mmc: host: sdhci-sprd: " Baolin Wang
  2019-08-12  5:20 ` [RFC PATCH 0/7] Add MMC packed function Baolin Wang
  7 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2019-07-22 13:09 UTC (permalink / raw)
  To: axboe, adrian.hunter, ulf.hansson
  Cc: zhang.lyra, orsonzhai, arnd, linus.walleij, baolin.wang,
	vincent.guittot, linux-mmc, linux-kernel, linux-block

This patch adds MMC packed operations to support packed requests,
and enables ADMA3 transfer mode to support this feature.

Enable ADMA3 transfer mode only for read and write commands, and
we will disable command interrupt and data timeout interrupt,
instead we will use software data timeout for ADMA3 fransfer mode.
For other non-data commands, we still use the ADMA2 transfer, since
no bebefits using ADMA3 transfer.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
 drivers/mmc/host/sdhci.c |  329 +++++++++++++++++++++++++++++++++++++++++++---
 drivers/mmc/host/sdhci.h |    9 ++
 2 files changed, 322 insertions(+), 16 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 9fec82f..3c4f701 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -109,6 +109,19 @@ void sdhci_dumpregs(struct sdhci_host *host)
 		}
 	}
 
+	if (host->adma3_enabled) {
+		if (host->flags & SDHCI_USE_64_BIT_DMA) {
+			SDHCI_DUMP("ADMA3 Err:  0x%08x | ADMA3 Ptr: 0x%08x%08x\n",
+				   sdhci_readl(host, SDHCI_ADMA_ERROR),
+				   sdhci_readl(host, SDHCI_ADMA3_ADDRESS_HI),
+				   sdhci_readl(host, SDHCI_ADMA3_ADDRESS));
+		} else {
+			SDHCI_DUMP("ADMA3 Err:  0x%08x | ADMA3 Ptr: 0x%08x\n",
+				   sdhci_readl(host, SDHCI_ADMA_ERROR),
+				   sdhci_readl(host, SDHCI_ADMA3_ADDRESS));
+		}
+	}
+
 	SDHCI_DUMP("============================================\n");
 }
 EXPORT_SYMBOL_GPL(sdhci_dumpregs);
@@ -286,7 +299,9 @@ static void sdhci_config_dma(struct sdhci_host *host)
 		goto out;
 
 	/* Note if DMA Select is zero then SDMA is selected */
-	if (host->flags & SDHCI_USE_ADMA)
+	if (host->adma3_enabled)
+		ctrl |= SDHCI_CTRL_ADMA3;
+	else if (host->flags & SDHCI_USE_ADMA)
 		ctrl |= SDHCI_CTRL_ADMA32;
 
 	if (host->flags & SDHCI_USE_64_BIT_DMA) {
@@ -445,7 +460,7 @@ static inline void sdhci_led_deactivate(struct sdhci_host *host)
 static void sdhci_mod_timer(struct sdhci_host *host, struct mmc_request *mrq,
 			    unsigned long timeout)
 {
-	if (sdhci_data_line_cmd(mrq->cmd))
+	if (host->prq || sdhci_data_line_cmd(mrq->cmd))
 		mod_timer(&host->data_timer, timeout);
 	else
 		mod_timer(&host->timer, timeout);
@@ -453,7 +468,7 @@ static void sdhci_mod_timer(struct sdhci_host *host, struct mmc_request *mrq,
 
 static void sdhci_del_timer(struct sdhci_host *host, struct mmc_request *mrq)
 {
-	if (sdhci_data_line_cmd(mrq->cmd))
+	if (host->prq || sdhci_data_line_cmd(mrq->cmd))
 		del_timer(&host->data_timer);
 	else
 		del_timer(&host->timer);
@@ -710,10 +725,16 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
 	 * We currently guess that it is LE.
 	 */
 
-	desc = host->adma_table;
-	align = host->align_buffer;
-
-	align_addr = host->align_addr;
+	if (host->adma3_enabled) {
+		desc = host->adma3_pos;
+		align = host->adma3_align_pos;
+		align_addr = host->align_addr +
+			host->adma3_align_pos - host->align_buffer;
+	} else {
+		desc = host->adma_table;
+		align = host->align_buffer;
+		align_addr = host->align_addr;
+	}
 
 	for_each_sg(data->sg, sg, data->sg_count, i) {
 		addr = sg_dma_address(sg);
@@ -771,6 +792,11 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
 		/* Add a terminating entry - nop, end, valid */
 		__sdhci_adma_write_desc(host, &desc, 0, 0, ADMA2_NOP_END_VALID);
 	}
+
+	if (host->adma3_enabled) {
+		host->adma3_pos = desc;
+		host->adma3_align_pos = align;
+	}
 }
 
 static void sdhci_adma_table_post(struct sdhci_host *host,
@@ -796,7 +822,10 @@ static void sdhci_adma_table_post(struct sdhci_host *host,
 			dma_sync_sg_for_cpu(mmc_dev(host->mmc), data->sg,
 					    data->sg_len, DMA_FROM_DEVICE);
 
-			align = host->align_buffer;
+			if (host->adma3_enabled)
+				align = host->adma3_align_pos;
+			else
+				align = host->align_buffer;
 
 			for_each_sg(data->sg, sg, data->sg_count, i) {
 				if (sg_dma_address(sg) & SDHCI_ADMA2_MASK) {
@@ -810,6 +839,9 @@ static void sdhci_adma_table_post(struct sdhci_host *host,
 					align += SDHCI_ADMA2_ALIGN;
 				}
 			}
+
+			if (host->adma3_enabled)
+				host->adma3_align_pos = align;
 		}
 	}
 }
@@ -1014,13 +1046,13 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd)
 
 	host->data_timeout = 0;
 
-	if (sdhci_data_line_cmd(cmd))
+	if (!host->prq && sdhci_data_line_cmd(cmd))
 		sdhci_set_timeout(host, cmd);
 
 	if (!data)
 		return;
 
-	WARN_ON(host->data);
+	WARN_ON(!host->prq && host->data);
 
 	/* Sanity checks */
 	BUG_ON(data->blksz * data->blocks > 524288);
@@ -1094,11 +1126,14 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd)
 		} else if (host->flags & SDHCI_USE_ADMA) {
 			sdhci_adma_table_pre(host, data);
 
-			sdhci_writel(host, host->adma_addr, SDHCI_ADMA_ADDRESS);
-			if (host->flags & SDHCI_USE_64_BIT_DMA)
-				sdhci_writel(host,
-					     (u64)host->adma_addr >> 32,
-					     SDHCI_ADMA_ADDRESS_HI);
+			if (!host->adma3_enabled) {
+				sdhci_writel(host, host->adma_addr,
+					     SDHCI_ADMA_ADDRESS);
+				if (host->flags & SDHCI_USE_64_BIT_DMA)
+					sdhci_writel(host,
+						     (u64)host->adma_addr >> 32,
+						     SDHCI_ADMA_ADDRESS_HI);
+			}
 		} else {
 			WARN_ON(sg_cnt != 1);
 			sdhci_set_sdma_addr(host, sdhci_sdma_address(host));
@@ -1121,6 +1156,9 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd)
 
 	sdhci_set_transfer_irqs(host);
 
+	if (host->adma3_enabled)
+		return;
+
 	/* Set the DMA boundary value and block size */
 	sdhci_writew(host, SDHCI_MAKE_BLKSZ(host->sdma_boundary, data->blksz),
 		     SDHCI_BLOCK_SIZE);
@@ -1278,6 +1316,36 @@ static void sdhci_finish_mrq(struct sdhci_host *host, struct mmc_request *mrq)
 	queue_work(host->complete_wq, &host->complete_work);
 }
 
+static void sdhci_finish_packed_data(struct sdhci_host *host, int error)
+{
+	struct mmc_request *mrq;
+
+	host->data = NULL;
+	/*
+	 * Reset the align buffer pointer address for unaligned mappings after
+	 * finishing the transfer.
+	 */
+	host->adma3_align_pos = host->align_buffer;
+
+	if (error)
+		sdhci_do_reset(host, SDHCI_RESET_DATA);
+
+	list_for_each_entry(mrq, &host->prq->list, packed_list) {
+		struct mmc_data *data = mrq->data;
+
+		sdhci_adma_table_post(host, data);
+		data->error = error;
+
+		if (data->error)
+			data->bytes_xfered = 0;
+		else
+			data->bytes_xfered = data->blksz * data->blocks;
+	}
+
+	sdhci_del_timer(host, NULL);
+	sdhci_led_deactivate(host);
+}
+
 static void sdhci_finish_data(struct sdhci_host *host)
 {
 	struct mmc_command *data_cmd = host->data_cmd;
@@ -1786,6 +1854,209 @@ void sdhci_set_power(struct sdhci_host *host, unsigned char mode,
  *                                                                           *
 \*****************************************************************************/
 
+static void sdhci_adma3_write_cmd_desc(struct sdhci_host *host,
+				       struct mmc_command *cmd)
+{
+	struct mmc_data *data = cmd->data;
+	struct sdhci_adma3_cmd_desc *cmd_desc = host->adma3_pos;
+	int blksz, command;
+	u16 mode = 0;
+
+	/* Set block count */
+	cmd_desc->cmd = cpu_to_le32(ADMA3_TRAN_VALID);
+	cmd_desc->reg = cpu_to_le32(data->blocks);
+	cmd_desc++;
+
+	/* Set block size */
+	cmd_desc->cmd = cpu_to_le32(ADMA3_TRAN_VALID);
+	blksz = SDHCI_MAKE_BLKSZ(host->sdma_boundary, data->blksz);
+	cmd_desc->reg = cpu_to_le32(blksz);
+	cmd_desc++;
+
+	/* Set argument */
+	cmd_desc->cmd = cpu_to_le32(ADMA3_TRAN_VALID);
+	cmd_desc->reg = cpu_to_le32(cmd->arg);
+	cmd_desc++;
+
+	/* set command and transfer mode */
+	if (data->flags & MMC_DATA_READ)
+		mode |= SDHCI_TRNS_READ;
+
+	if (!(host->quirks2 & SDHCI_QUIRK2_SUPPORT_SINGLE))
+		mode |= SDHCI_TRNS_BLK_CNT_EN;
+
+	if (mmc_op_multi(cmd->opcode) || data->blocks > 1)
+		mode |= SDHCI_TRNS_MULTI;
+
+	sdhci_auto_cmd_select(host, cmd, &mode);
+	mode |= SDHCI_TRNS_DMA;
+
+	command = sdhci_get_command(host, cmd);
+	command = (command << 16) | mode;
+	cmd_desc->cmd = cpu_to_le32(ADMA3_TRAN_END);
+	cmd_desc->reg = cpu_to_le32(command);
+
+	host->adma3_pos +=
+		SDHCI_ADMA3_CMD_DESC_SZ * SDHCI_ADMA3_CMD_DESC_ENTRIES;
+}
+
+static void sdhci_adma3_write_integr_desc(struct sdhci_host *host,
+					  dma_addr_t addr)
+{
+	struct sdhci_integr_64_desc *integr_desc = host->integr_table;
+
+	integr_desc->cmd = cpu_to_le32(ADMA3_INTEGR_TRAN_END);
+	integr_desc->addr_lo = cpu_to_le32((u32)addr);
+
+	if (host->flags & SDHCI_USE_64_BIT_DMA)
+		integr_desc->addr_hi = cpu_to_le32((u64)addr >> 32);
+}
+
+static void sdhci_set_adma3_addr(struct sdhci_host *host, dma_addr_t addr)
+{
+	sdhci_writel(host, addr, SDHCI_ADMA3_ADDRESS);
+	if (host->flags & SDHCI_USE_64_BIT_DMA)
+		sdhci_writel(host, (u64)addr >> 32, SDHCI_ADMA3_ADDRESS_HI);
+}
+
+int sdhci_prepare_packed(struct mmc_packed *packed)
+{
+	struct mmc_host *mmc = packed->host;
+	struct sdhci_host *host = mmc_priv(mmc);
+	unsigned long timeout, flags;
+	u32 mask;
+
+	spin_lock_irqsave(&host->lock, flags);
+
+	if (!(host->flags & SDHCI_USE_ADMA3) ||
+	    !(host->flags & (SDHCI_AUTO_CMD23 | SDHCI_AUTO_CMD12))) {
+		spin_unlock_irqrestore(&host->lock, flags);
+		pr_err("%s: Unsupported packed request\n",
+		       mmc_hostname(host->mmc));
+		return -EOPNOTSUPP;
+	}
+
+	/* Wait max 10 ms */
+	timeout = 10;
+	mask = SDHCI_CMD_INHIBIT | SDHCI_DATA_INHIBIT;
+
+	while (sdhci_readl(host, SDHCI_PRESENT_STATE) & mask) {
+		if (timeout == 0) {
+			sdhci_dumpregs(host);
+			spin_unlock_irqrestore(&host->lock, flags);
+
+			pr_err("%s: Controller never released inhibit bit(s).\n",
+			       mmc_hostname(host->mmc));
+			return -EIO;
+		}
+
+		timeout--;
+		mdelay(1);
+	}
+
+	/* Disable command complete event for ADMA3 mode */
+	host->ier &= ~SDHCI_INT_RESPONSE;
+	sdhci_writel(host, host->ier, SDHCI_INT_ENABLE);
+	sdhci_writel(host, host->ier, SDHCI_SIGNAL_ENABLE);
+
+	/*
+	 * Disable data timeout interrupt, and will use software timeout for
+	 * packed request.
+	 */
+	sdhci_set_data_timeout_irq(host, false);
+
+	/* Enable ADMA3 mode for packed request */
+	host->adma3_enabled = true;
+
+	spin_unlock_irqrestore(&host->lock, flags);
+
+	return 0;
+}
+
+int sdhci_unprepare_packed(struct mmc_packed *packed)
+{
+	struct mmc_host *mmc = packed->host;
+	struct sdhci_host *host = mmc_priv(mmc);
+	unsigned long flags;
+
+	spin_lock_irqsave(&host->lock, flags);
+
+	/* Disable ADMA3 mode after finishing packed request */
+	host->adma3_enabled = false;
+
+	/* Re-enable command complete event after ADMA3 mode */
+	host->ier |= SDHCI_INT_RESPONSE;
+
+	sdhci_writel(host, host->ier, SDHCI_INT_ENABLE);
+	sdhci_writel(host, host->ier, SDHCI_SIGNAL_ENABLE);
+	spin_unlock_irqrestore(&host->lock, flags);
+
+	return 0;
+}
+
+int sdhci_packed_request(struct mmc_packed *packed,
+			 struct mmc_packed_request *prq)
+{
+	struct mmc_host *mmc = packed->host;
+	struct sdhci_host *host = mmc_priv(mmc);
+	struct mmc_request *mrq;
+	unsigned long timeout, flags;
+	u64 data_timeout = 0;
+	dma_addr_t integr_addr;
+	int present;
+
+	/* Firstly check card presence */
+	present = mmc->ops->get_cd(mmc);
+
+	spin_lock_irqsave(&host->lock, flags);
+
+	sdhci_led_activate(host);
+
+	if (!present || host->flags & SDHCI_DEVICE_DEAD) {
+		spin_unlock_irqrestore(&host->lock, flags);
+		return -ENOMEDIUM;
+	}
+
+	host->prq = prq;
+	host->adma3_pos = host->adma_table;
+	host->adma3_align_pos = host->align_buffer;
+	integr_addr = host->adma_addr;
+
+	list_for_each_entry(mrq, &prq->list, packed_list) {
+		struct mmc_command *cmd = mrq->cmd;
+
+		/* Set command descriptor */
+		sdhci_adma3_write_cmd_desc(host, cmd);
+		/* Set ADMA2 descriptors */
+		sdhci_prepare_data(host, cmd);
+		/* Set integrated descriptor */
+		sdhci_adma3_write_integr_desc(host, integr_addr);
+
+		/* Update the integrated descriptor address */
+		integr_addr =
+			host->adma_addr + (host->adma3_pos - host->adma_table);
+
+		/* Calculate each command's data timeout */
+		sdhci_calc_sw_timeout(host, cmd);
+		data_timeout += host->data_timeout;
+	}
+
+	timeout = jiffies;
+	if (data_timeout)
+		timeout += nsecs_to_jiffies(data_timeout);
+	else
+		timeout += 10 * HZ * prq->nr_reqs;
+	sdhci_mod_timer(host, NULL, timeout);
+
+	/* Start ADMA3 transfer */
+	sdhci_set_adma3_addr(host, host->integr_addr);
+
+	spin_unlock_irqrestore(&host->lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sdhci_packed_request);
+
 void sdhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
 {
 	struct sdhci_host *host;
@@ -2619,9 +2890,19 @@ static bool sdhci_request_done(struct sdhci_host *host)
 {
 	unsigned long flags;
 	struct mmc_request *mrq;
+	struct mmc_packed_request *prq;
 	int i;
 
 	spin_lock_irqsave(&host->lock, flags);
+	prq = host->prq;
+
+	if (prq) {
+		host->prq = NULL;
+		spin_unlock_irqrestore(&host->lock, flags);
+
+		mmc_packed_finalize_requests(host->mmc, prq);
+		return true;
+	}
 
 	for (i = 0; i < SDHCI_MAX_MRQS; i++) {
 		mrq = host->mrqs_done[i];
@@ -2763,6 +3044,17 @@ static void sdhci_timeout_data_timer(struct timer_list *t)
 
 	spin_lock_irqsave(&host->lock, flags);
 
+	if (host->prq) {
+		pr_err("%s: Packed requests timeout for hardware interrupt.\n",
+		       mmc_hostname(host->mmc));
+		sdhci_dumpregs(host);
+		sdhci_finish_packed_data(host, -ETIMEDOUT);
+		queue_work(host->complete_wq, &host->complete_work);
+		spin_unlock_irqrestore(&host->lock, flags);
+
+		return;
+	}
+
 	if (host->data || host->data_cmd ||
 	    (host->cmd && sdhci_data_line_cmd(host->cmd))) {
 		pr_err("%s: Timeout waiting for hardware interrupt.\n",
@@ -2965,7 +3257,9 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask)
 			host->ops->adma_workaround(host, intmask);
 	}
 
-	if (host->data->error)
+	if (host->prq)
+		sdhci_finish_packed_data(host, host->data->error);
+	else if (host->data->error)
 		sdhci_finish_data(host);
 	else {
 		if (intmask & (SDHCI_INT_DATA_AVAIL | SDHCI_INT_SPACE_AVAIL))
@@ -3137,6 +3431,9 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
 			host->mrqs_done[i] = NULL;
 		}
 	}
+
+	if (host->prq)
+		result = IRQ_WAKE_THREAD;
 out:
 	spin_unlock(&host->lock);
 
diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 4548d9c..59cfa5d 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -574,6 +574,7 @@ struct sdhci_host {
 	bool pending_reset;	/* Cmd/data reset is pending */
 	bool irq_wake_enabled;	/* IRQ wakeup is enabled */
 	bool v4_mode;		/* Host Version 4 Enable */
+	bool adma3_enabled;	/* ADMA3 mode enabled */
 
 	struct mmc_request *mrqs_done[SDHCI_MAX_MRQS];	/* Requests done */
 	struct mmc_command *cmd;	/* Current command */
@@ -581,12 +582,15 @@ struct sdhci_host {
 	struct mmc_data *data;	/* Current data request */
 	unsigned int data_early:1;	/* Data finished before cmd */
 
+	struct mmc_packed_request *prq;	/* Current packed request */
 	struct sg_mapping_iter sg_miter;	/* SG state for PIO */
 	unsigned int blocks;	/* remaining PIO blocks */
 
 	void *adma_table;	/* ADMA descriptor table */
 	void *align_buffer;	/* Bounce buffer */
 	void *integr_table;	/* ADMA3 intergrate descriptor table */
+	void *adma3_pos;	/* ADMA3 buffer position */
+	void *adma3_align_pos;	/* ADMA3 Bounce buffer position */
 
 	size_t adma_table_sz;	/* ADMA descriptor table size */
 	size_t align_buffer_sz;	/* Bounce buffer size */
@@ -843,4 +847,9 @@ bool sdhci_cqe_irq(struct sdhci_host *host, u32 intmask, int *cmd_error,
 void sdhci_reset_tuning(struct sdhci_host *host);
 void sdhci_send_tuning(struct sdhci_host *host, u32 opcode);
 
+int sdhci_prepare_packed(struct mmc_packed *packed);
+int sdhci_unprepare_packed(struct mmc_packed *packed);
+int sdhci_packed_request(struct mmc_packed *packed,
+			 struct mmc_packed_request *prq);
+
 #endif /* __SDHCI_HW_H */
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 7/7] mmc: host: sdhci-sprd: Add MMC packed request support
  2019-07-22 13:09 [RFC PATCH 0/7] Add MMC packed function Baolin Wang
                   ` (5 preceding siblings ...)
  2019-07-22 13:09 ` [RFC PATCH 6/7] mmc: host: sdhci: Add MMC packed request support Baolin Wang
@ 2019-07-22 13:09 ` " Baolin Wang
  2019-08-12  5:20 ` [RFC PATCH 0/7] Add MMC packed function Baolin Wang
  7 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2019-07-22 13:09 UTC (permalink / raw)
  To: axboe, adrian.hunter, ulf.hansson
  Cc: zhang.lyra, orsonzhai, arnd, linus.walleij, baolin.wang,
	vincent.guittot, linux-mmc, linux-kernel, linux-block

Enable the ADMA3 transfer mode as well as adding packed operations
to support MMC packed requests to improve IO performance.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
 drivers/mmc/host/Kconfig      |    1 +
 drivers/mmc/host/sdhci-sprd.c |   22 ++++++++++++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 14d89a1..44ea3cc 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -619,6 +619,7 @@ config MMC_SDHCI_SPRD
 	depends on ARCH_SPRD
 	depends on MMC_SDHCI_PLTFM
 	select MMC_SDHCI_IO_ACCESSORS
+	select MMC_PACKED
 	help
 	  This selects the SDIO Host Controller in Spreadtrum
 	  SoCs, this driver supports R11(IP version: R11P0).
diff --git a/drivers/mmc/host/sdhci-sprd.c b/drivers/mmc/host/sdhci-sprd.c
index 80a9055..e5651fd 100644
--- a/drivers/mmc/host/sdhci-sprd.c
+++ b/drivers/mmc/host/sdhci-sprd.c
@@ -524,10 +524,18 @@ static void sdhci_sprd_phy_param_parse(struct sdhci_sprd_host *sprd_host,
 static const struct sdhci_pltfm_data sdhci_sprd_pdata = {
 	.quirks = SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK,
 	.quirks2 = SDHCI_QUIRK2_BROKEN_HS200 |
-		   SDHCI_QUIRK2_USE_32BIT_BLK_CNT,
+		   SDHCI_QUIRK2_USE_32BIT_BLK_CNT |
+		   SDHCI_QUIRK2_USE_ADMA3_SUPPORT,
 	.ops = &sdhci_sprd_ops,
 };
 
+static const struct mmc_packed_ops packed_ops = {
+	.packed_algo = mmc_packed_algo_rw,
+	.prepare_hardware = sdhci_prepare_packed,
+	.unprepare_hardware = sdhci_unprepare_packed,
+	.packed_request = sdhci_packed_request,
+};
+
 static int sdhci_sprd_probe(struct platform_device *pdev)
 {
 	struct sdhci_host *host;
@@ -642,10 +650,14 @@ static int sdhci_sprd_probe(struct platform_device *pdev)
 
 	sprd_host->flags = host->flags;
 
-	ret = __sdhci_add_host(host);
+	ret = mmc_packed_init(host->mmc, &packed_ops, 10);
 	if (ret)
 		goto err_cleanup_host;
 
+	ret = __sdhci_add_host(host);
+	if (ret)
+		goto err_packed;
+
 	pm_runtime_mark_last_busy(&pdev->dev);
 	pm_runtime_put_autosuspend(&pdev->dev);
 
@@ -653,6 +665,9 @@ static int sdhci_sprd_probe(struct platform_device *pdev)
 		__func__, host->version);
 	return 0;
 
+err_packed:
+	mmc_packed_exit(host->mmc);
+
 err_cleanup_host:
 	sdhci_cleanup_host(host);
 
@@ -680,6 +695,7 @@ static int sdhci_sprd_remove(struct platform_device *pdev)
 	struct sdhci_sprd_host *sprd_host = TO_SPRD_HOST(host);
 	struct mmc_host *mmc = host->mmc;
 
+	mmc_packed_exit(mmc);
 	mmc_remove_host(mmc);
 	clk_disable_unprepare(sprd_host->clk_sdio);
 	clk_disable_unprepare(sprd_host->clk_enable);
@@ -702,6 +718,7 @@ static int sdhci_sprd_runtime_suspend(struct device *dev)
 	struct sdhci_host *host = dev_get_drvdata(dev);
 	struct sdhci_sprd_host *sprd_host = TO_SPRD_HOST(host);
 
+	mmc_packed_queue_stop(host->mmc->packed);
 	sdhci_runtime_suspend_host(host);
 
 	clk_disable_unprepare(sprd_host->clk_sdio);
@@ -730,6 +747,7 @@ static int sdhci_sprd_runtime_resume(struct device *dev)
 		goto clk_disable;
 
 	sdhci_runtime_resume_host(host);
+	mmc_packed_queue_start(host->mmc->packed);
 	return 0;
 
 clk_disable:
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/7] blk-mq: Export blk_mq_hctx_has_pending() function
  2019-07-22 13:09 ` [RFC PATCH 1/7] blk-mq: Export blk_mq_hctx_has_pending() function Baolin Wang
@ 2019-07-22 14:19   ` Ming Lei
  2019-07-23  3:12     ` Baolin Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Ming Lei @ 2019-07-22 14:19 UTC (permalink / raw)
  To: Baolin Wang
  Cc: axboe, adrian.hunter, ulf.hansson, zhang.lyra, orsonzhai, arnd,
	linus.walleij, vincent.guittot, linux-mmc, linux-kernel,
	linux-block

On Mon, Jul 22, 2019 at 09:09:36PM +0800, Baolin Wang wrote:
> Some SD/MMC host controllers can support packed command or packed request,
> that means we can package several requests to host controller at one time
> to improve performence. And this patch set will introduce MMC packed function
> to support this feature by following patches.
> 
> To support MMC packed function, the MMC layer need to know if there are
> requests are pending now in hardware queue to help to combine requests
> as much as possible. If we know there are requests pending in hardware
> queue, then we should not package requests to host controller immediately,
> instead we should collect more requests into MMC packed queue to be packed
> to host controller with packed condition.
> 
> Thus export this function for MMC packed function.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> ---
>  block/blk-mq.c         |    3 ++-
>  include/linux/blk-mq.h |    1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b038ec6..5bd4ef9 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -63,12 +63,13 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
>   * Check if any of the ctx, dispatch list or elevator
>   * have pending work in this hardware queue.
>   */
> -static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
> +bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
>  {
>  	return !list_empty_careful(&hctx->dispatch) ||
>  		sbitmap_any_bit_set(&hctx->ctx_map) ||
>  			blk_mq_sched_has_work(hctx);
>  }
> +EXPORT_SYMBOL_GPL(blk_mq_hctx_has_pending);

Just wondering why you don't use the 'last' field of 'struct blk_mq_queue_data',
which is passed to .queue_rq(), and supposed for implementing batch submission.
	

Thanks,
Ming

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/7] blk-mq: Export blk_mq_hctx_has_pending() function
  2019-07-22 14:19   ` Ming Lei
@ 2019-07-23  3:12     ` Baolin Wang
  2019-07-23  3:31       ` Ming Lei
  0 siblings, 1 reply; 18+ messages in thread
From: Baolin Wang @ 2019-07-23  3:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Adrian Hunter, Ulf Hansson, Chunyan Zhang,
	Orson Zhai, Arnd Bergmann, Linus Walleij, Vincent Guittot,
	linux-mmc, LKML, linux-block

Hi Ming,

On Mon, 22 Jul 2019 at 22:19, Ming Lei <ming.lei@redhat.com> wrote:
>
> On Mon, Jul 22, 2019 at 09:09:36PM +0800, Baolin Wang wrote:
> > Some SD/MMC host controllers can support packed command or packed request,
> > that means we can package several requests to host controller at one time
> > to improve performence. And this patch set will introduce MMC packed function
> > to support this feature by following patches.
> >
> > To support MMC packed function, the MMC layer need to know if there are
> > requests are pending now in hardware queue to help to combine requests
> > as much as possible. If we know there are requests pending in hardware
> > queue, then we should not package requests to host controller immediately,
> > instead we should collect more requests into MMC packed queue to be packed
> > to host controller with packed condition.
> >
> > Thus export this function for MMC packed function.
> >
> > Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> > ---
> >  block/blk-mq.c         |    3 ++-
> >  include/linux/blk-mq.h |    1 +
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index b038ec6..5bd4ef9 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -63,12 +63,13 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
> >   * Check if any of the ctx, dispatch list or elevator
> >   * have pending work in this hardware queue.
> >   */
> > -static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
> > +bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
> >  {
> >       return !list_empty_careful(&hctx->dispatch) ||
> >               sbitmap_any_bit_set(&hctx->ctx_map) ||
> >                       blk_mq_sched_has_work(hctx);
> >  }
> > +EXPORT_SYMBOL_GPL(blk_mq_hctx_has_pending);
>
> Just wondering why you don't use the 'last' field of 'struct blk_mq_queue_data',
> which is passed to .queue_rq(), and supposed for implementing batch submission.

The 'last' field of 'struct blk_mq_queue_data' does not indicate the
last request in the hardware queue, since we want to collect more
requests from block layer as much as possible to be packed later.

And from blk_mq_do_dispatch_sched()--->blk_mq_dispatch_rq_list()--->
queue_rq(), I always get 'bd.last = true', which is not useful to
combine requests for MMC packed queue. Maybe I missed anything?

Thanks for your comments.

-- 
Baolin Wang
Best Regards

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/7] blk-mq: Export blk_mq_hctx_has_pending() function
  2019-07-23  3:12     ` Baolin Wang
@ 2019-07-23  3:31       ` Ming Lei
  2019-07-23  7:15         ` Baolin Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Ming Lei @ 2019-07-23  3:31 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Jens Axboe, Adrian Hunter, Ulf Hansson, Chunyan Zhang,
	Orson Zhai, Arnd Bergmann, Linus Walleij, Vincent Guittot,
	linux-mmc, LKML, linux-block

On Tue, Jul 23, 2019 at 11:12:57AM +0800, Baolin Wang wrote:
> Hi Ming,
> 
> On Mon, 22 Jul 2019 at 22:19, Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Mon, Jul 22, 2019 at 09:09:36PM +0800, Baolin Wang wrote:
> > > Some SD/MMC host controllers can support packed command or packed request,
> > > that means we can package several requests to host controller at one time
> > > to improve performence. And this patch set will introduce MMC packed function
> > > to support this feature by following patches.
> > >
> > > To support MMC packed function, the MMC layer need to know if there are
> > > requests are pending now in hardware queue to help to combine requests
> > > as much as possible. If we know there are requests pending in hardware
> > > queue, then we should not package requests to host controller immediately,
> > > instead we should collect more requests into MMC packed queue to be packed
> > > to host controller with packed condition.
> > >
> > > Thus export this function for MMC packed function.
> > >
> > > Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> > > ---
> > >  block/blk-mq.c         |    3 ++-
> > >  include/linux/blk-mq.h |    1 +
> > >  2 files changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index b038ec6..5bd4ef9 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -63,12 +63,13 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
> > >   * Check if any of the ctx, dispatch list or elevator
> > >   * have pending work in this hardware queue.
> > >   */
> > > -static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
> > > +bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
> > >  {
> > >       return !list_empty_careful(&hctx->dispatch) ||
> > >               sbitmap_any_bit_set(&hctx->ctx_map) ||
> > >                       blk_mq_sched_has_work(hctx);
> > >  }
> > > +EXPORT_SYMBOL_GPL(blk_mq_hctx_has_pending);
> >
> > Just wondering why you don't use the 'last' field of 'struct blk_mq_queue_data',
> > which is passed to .queue_rq(), and supposed for implementing batch submission.
> 
> The 'last' field of 'struct blk_mq_queue_data' does not indicate the
> last request in the hardware queue, since we want to collect more
> requests from block layer as much as possible to be packed later.
> 
> And from blk_mq_do_dispatch_sched()--->blk_mq_dispatch_rq_list()--->
> queue_rq(), I always get 'bd.last = true', which is not useful to
> combine requests for MMC packed queue. Maybe I missed anything?

That is one flaw of current implementation, and we may improve it,
so other drivers(virtio-io, ...) can benefit from it too.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/7] blk-mq: Export blk_mq_hctx_has_pending() function
  2019-07-23  3:31       ` Ming Lei
@ 2019-07-23  7:15         ` Baolin Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2019-07-23  7:15 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Adrian Hunter, Ulf Hansson, Chunyan Zhang,
	Orson Zhai, Arnd Bergmann, Linus Walleij, Vincent Guittot,
	linux-mmc, LKML, linux-block

On 23/07/2019, Ming Lei <ming.lei@redhat.com> wrote:
> On Tue, Jul 23, 2019 at 11:12:57AM +0800, Baolin Wang wrote:
>> Hi Ming,
>>
>> On Mon, 22 Jul 2019 at 22:19, Ming Lei <ming.lei@redhat.com> wrote:
>> >
>> > On Mon, Jul 22, 2019 at 09:09:36PM +0800, Baolin Wang wrote:
>> > > Some SD/MMC host controllers can support packed command or packed
>> > > request,
>> > > that means we can package several requests to host controller at one
>> > > time
>> > > to improve performence. And this patch set will introduce MMC packed
>> > > function
>> > > to support this feature by following patches.
>> > >
>> > > To support MMC packed function, the MMC layer need to know if there
>> > > are
>> > > requests are pending now in hardware queue to help to combine
>> > > requests
>> > > as much as possible. If we know there are requests pending in
>> > > hardware
>> > > queue, then we should not package requests to host controller
>> > > immediately,
>> > > instead we should collect more requests into MMC packed queue to be
>> > > packed
>> > > to host controller with packed condition.
>> > >
>> > > Thus export this function for MMC packed function.
>> > >
>> > > Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
>> > > ---
>> > >  block/blk-mq.c         |    3 ++-
>> > >  include/linux/blk-mq.h |    1 +
>> > >  2 files changed, 3 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
>> > > index b038ec6..5bd4ef9 100644
>> > > --- a/block/blk-mq.c
>> > > +++ b/block/blk-mq.c
>> > > @@ -63,12 +63,13 @@ static int blk_mq_poll_stats_bkt(const struct
>> > > request *rq)
>> > >   * Check if any of the ctx, dispatch list or elevator
>> > >   * have pending work in this hardware queue.
>> > >   */
>> > > -static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
>> > > +bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
>> > >  {
>> > >       return !list_empty_careful(&hctx->dispatch) ||
>> > >               sbitmap_any_bit_set(&hctx->ctx_map) ||
>> > >                       blk_mq_sched_has_work(hctx);
>> > >  }
>> > > +EXPORT_SYMBOL_GPL(blk_mq_hctx_has_pending);
>> >
>> > Just wondering why you don't use the 'last' field of 'struct
>> > blk_mq_queue_data',
>> > which is passed to .queue_rq(), and supposed for implementing batch
>> > submission.
>>
>> The 'last' field of 'struct blk_mq_queue_data' does not indicate the
>> last request in the hardware queue, since we want to collect more
>> requests from block layer as much as possible to be packed later.
>>
>> And from blk_mq_do_dispatch_sched()--->blk_mq_dispatch_rq_list()--->
>> queue_rq(), I always get 'bd.last = true', which is not useful to
>> combine requests for MMC packed queue. Maybe I missed anything?
>
> That is one flaw of current implementation, and we may improve it,
> so other drivers(virtio-io, ...) can benefit from it too.
>

OK. I am not sure can I add a new flag to indicate if there are
requests are pending in the hardware queue? That will help MMC driver
to combine more requests.

Or do you have any other good suggestion? Thanks.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5bd4ef9..cb240f4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1257,6 +1257,8 @@ bool blk_mq_dispatch_rq_list(struct
request_queue *q, struct list_head *list,
                        bd.last = !blk_mq_get_driver_tag(nxt);
                }

+               bd.rq_pending = blk_mq_hctx_has_pending(hctx);
+
                ret = q->mq_ops->queue_rq(hctx, &bd);
                if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) {
                        /*
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 15a2b7b..9b06fe0 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -118,6 +118,7 @@ struct blk_mq_tag_set {
 struct blk_mq_queue_data {
        struct request *rq;
        bool last;
+       bool rq_pending;
 };

-- 
Baolin Wang
Best Regards

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Add MMC packed function
  2019-07-22 13:09 [RFC PATCH 0/7] Add MMC packed function Baolin Wang
                   ` (6 preceding siblings ...)
  2019-07-22 13:09 ` [RFC PATCH 7/7] mmc: host: sdhci-sprd: " Baolin Wang
@ 2019-08-12  5:20 ` Baolin Wang
  2019-08-12  8:58   ` Adrian Hunter
  7 siblings, 1 reply; 18+ messages in thread
From: Baolin Wang @ 2019-08-12  5:20 UTC (permalink / raw)
  To: Jens Axboe, Adrian Hunter, Ulf Hansson
  Cc: Chunyan Zhang, Orson Zhai, Arnd Bergmann, Linus Walleij,
	Vincent Guittot, linux-mmc, LKML, linux-block

Hi,

On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote:
>
> Hi All,
>
> Now some SD/MMC controllers can support packed command or packed request,
> that means it can package multiple requests to host controller to be handled
> at one time, which can improve the I/O performence. Thus this patchset is
> used to add the MMC packed function to support packed request or packed
> command.
>
> In this patch set, I implemented the SD host ADMA3 transfer mode to support
> packed request. The ADMA3 transfer mode can process a multi-block data transfer
> by using a pair of command descriptor and ADMA2 descriptor. In future we can
> easily expand the MMC packed function to support packed command.
>
> Below are some comparison data between packed request and non-packed request
> with fio tool. The fio command I used is like below with changing the
> '--rw' parameter and enabling the direct IO flag to measure the actual hardware
> transfer speed.
>
> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read
>
> My eMMC card working at HS400 Enhanced strobe mode:
> [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001
> [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
> [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
> [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0)
>
> 1. Non-packed request
> I tested 3 times for each case and output a average speed.
>
> 1) Sequential read:
> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s
> Average speed: 28.7MiB/s
>
> 2) Random read:
> Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s
> Average speed: 14.3MiB/s
>
> 3) Sequential write:
> Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s
> Average speed: 24.7MiB/s
>
> 4) Random write:
> Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s
> Average speed: 19.2MiB/s
>
> 2. Packed request
> In packed request mode, I set the host controller can package maximum 10
> requests at one time (Actually I can increase the package number), and I
> enabled read/write packed request mode. Also I tested 3 times for each
> case and output a average speed.
>
> 1) Sequential read:
> Speed: 165MiB/s, 167MiB/s, 164MiB/s
> Average speed: 165.3MiB/s
>
> 2) Random read:
> Speed: 147MiB/s, 141MiB/s, 144MiB/s
> Average speed: 144MiB/s
>
> 3) Sequential write:
> Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s
> Average speed: 89MiB/s
>
> 4) Random write:
> Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s
> Average speed: 90.4MiB/s
>
> Form above data, we can see the packed request can improve the performance greatly.
> Any comments are welcome. Thanks a lot.

Any comments for this patch set? Thanks.

>
> Baolin Wang (7):
>   blk-mq: Export blk_mq_hctx_has_pending() function
>   mmc: core: Add MMC packed request function
>   mmc: host: sdhci: Introduce ADMA3 transfer mode
>   mmc: host: sdhci: Factor out the command configuration
>   mmc: host: sdhci: Remove redundant sg_count member of struct
>     sdhci_host
>   mmc: host: sdhci: Add MMC packed request support
>   mmc: host: sdhci-sprd: Add MMC packed request support
>
>  block/blk-mq.c                |    3 +-
>  drivers/mmc/core/Kconfig      |    2 +
>  drivers/mmc/core/Makefile     |    1 +
>  drivers/mmc/core/block.c      |   71 +++++-
>  drivers/mmc/core/block.h      |    3 +-
>  drivers/mmc/core/core.c       |   51 ++++
>  drivers/mmc/core/core.h       |    3 +
>  drivers/mmc/core/packed.c     |  478 ++++++++++++++++++++++++++++++++++++++
>  drivers/mmc/core/queue.c      |   28 ++-
>  drivers/mmc/host/Kconfig      |    1 +
>  drivers/mmc/host/sdhci-sprd.c |   22 +-
>  drivers/mmc/host/sdhci.c      |  513 +++++++++++++++++++++++++++++++++++------
>  drivers/mmc/host/sdhci.h      |   59 ++++-
>  include/linux/blk-mq.h        |    1 +
>  include/linux/mmc/core.h      |    1 +
>  include/linux/mmc/host.h      |    3 +
>  include/linux/mmc/packed.h    |  123 ++++++++++
>  17 files changed, 1286 insertions(+), 77 deletions(-)
>  create mode 100644 drivers/mmc/core/packed.c
>  create mode 100644 include/linux/mmc/packed.h
>
> --
> 1.7.9.5
>


-- 
Baolin Wang
Best Regards

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Add MMC packed function
  2019-08-12  5:20 ` [RFC PATCH 0/7] Add MMC packed function Baolin Wang
@ 2019-08-12  8:58   ` Adrian Hunter
  2019-08-12  9:44     ` Baolin Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Adrian Hunter @ 2019-08-12  8:58 UTC (permalink / raw)
  To: Baolin Wang, Jens Axboe, Ulf Hansson
  Cc: Chunyan Zhang, Orson Zhai, Arnd Bergmann, Linus Walleij,
	Vincent Guittot, linux-mmc, LKML, linux-block

On 12/08/19 8:20 AM, Baolin Wang wrote:
> Hi,
> 
> On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote:
>>
>> Hi All,
>>
>> Now some SD/MMC controllers can support packed command or packed request,
>> that means it can package multiple requests to host controller to be handled
>> at one time, which can improve the I/O performence. Thus this patchset is
>> used to add the MMC packed function to support packed request or packed
>> command.
>>
>> In this patch set, I implemented the SD host ADMA3 transfer mode to support
>> packed request. The ADMA3 transfer mode can process a multi-block data transfer
>> by using a pair of command descriptor and ADMA2 descriptor. In future we can
>> easily expand the MMC packed function to support packed command.
>>
>> Below are some comparison data between packed request and non-packed request
>> with fio tool. The fio command I used is like below with changing the
>> '--rw' parameter and enabling the direct IO flag to measure the actual hardware
>> transfer speed.
>>
>> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read
>>
>> My eMMC card working at HS400 Enhanced strobe mode:
>> [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001
>> [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
>> [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
>> [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
>> [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0)
>>
>> 1. Non-packed request
>> I tested 3 times for each case and output a average speed.
>>
>> 1) Sequential read:
>> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s
>> Average speed: 28.7MiB/s

This seems surprising low for a HS400ES card.  Do you know why that is?

>>
>> 2) Random read:
>> Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s
>> Average speed: 14.3MiB/s
>>
>> 3) Sequential write:
>> Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s
>> Average speed: 24.7MiB/s
>>
>> 4) Random write:
>> Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s
>> Average speed: 19.2MiB/s
>>
>> 2. Packed request
>> In packed request mode, I set the host controller can package maximum 10
>> requests at one time (Actually I can increase the package number), and I
>> enabled read/write packed request mode. Also I tested 3 times for each
>> case and output a average speed.
>>
>> 1) Sequential read:
>> Speed: 165MiB/s, 167MiB/s, 164MiB/s
>> Average speed: 165.3MiB/s
>>
>> 2) Random read:
>> Speed: 147MiB/s, 141MiB/s, 144MiB/s
>> Average speed: 144MiB/s
>>
>> 3) Sequential write:
>> Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s
>> Average speed: 89MiB/s
>>
>> 4) Random write:
>> Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s
>> Average speed: 90.4MiB/s
>>
>> Form above data, we can see the packed request can improve the performance greatly.
>> Any comments are welcome. Thanks a lot.
> 
> Any comments for this patch set? Thanks.

Did you consider adapting the CQE interface?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Add MMC packed function
  2019-08-12  8:58   ` Adrian Hunter
@ 2019-08-12  9:44     ` Baolin Wang
  2019-08-12 10:50       ` Adrian Hunter
  2019-08-16  2:09       ` Baolin Wang
  0 siblings, 2 replies; 18+ messages in thread
From: Baolin Wang @ 2019-08-12  9:44 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jens Axboe, Ulf Hansson, Chunyan Zhang, Orson Zhai,
	Arnd Bergmann, Linus Walleij, Vincent Guittot, linux-mmc, LKML,
	linux-block

Hi Adrian,

On Mon, 12 Aug 2019 at 16:59, Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 12/08/19 8:20 AM, Baolin Wang wrote:
> > Hi,
> >
> > On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote:
> >>
> >> Hi All,
> >>
> >> Now some SD/MMC controllers can support packed command or packed request,
> >> that means it can package multiple requests to host controller to be handled
> >> at one time, which can improve the I/O performence. Thus this patchset is
> >> used to add the MMC packed function to support packed request or packed
> >> command.
> >>
> >> In this patch set, I implemented the SD host ADMA3 transfer mode to support
> >> packed request. The ADMA3 transfer mode can process a multi-block data transfer
> >> by using a pair of command descriptor and ADMA2 descriptor. In future we can
> >> easily expand the MMC packed function to support packed command.
> >>
> >> Below are some comparison data between packed request and non-packed request
> >> with fio tool. The fio command I used is like below with changing the
> >> '--rw' parameter and enabling the direct IO flag to measure the actual hardware
> >> transfer speed.
> >>
> >> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read
> >>
> >> My eMMC card working at HS400 Enhanced strobe mode:
> >> [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001
> >> [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> >> [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
> >> [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
> >> [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0)
> >>
> >> 1. Non-packed request
> >> I tested 3 times for each case and output a average speed.
> >>
> >> 1) Sequential read:
> >> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s
> >> Average speed: 28.7MiB/s
>
> This seems surprising low for a HS400ES card.  Do you know why that is?

I've set the clock to 400M, but it seems the hardware did not output
the corresponding clock. I will check my hardware.

> >>
> >> 2) Random read:
> >> Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s
> >> Average speed: 14.3MiB/s
> >>
> >> 3) Sequential write:
> >> Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s
> >> Average speed: 24.7MiB/s
> >>
> >> 4) Random write:
> >> Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s
> >> Average speed: 19.2MiB/s
> >>
> >> 2. Packed request
> >> In packed request mode, I set the host controller can package maximum 10
> >> requests at one time (Actually I can increase the package number), and I
> >> enabled read/write packed request mode. Also I tested 3 times for each
> >> case and output a average speed.
> >>
> >> 1) Sequential read:
> >> Speed: 165MiB/s, 167MiB/s, 164MiB/s
> >> Average speed: 165.3MiB/s
> >>
> >> 2) Random read:
> >> Speed: 147MiB/s, 141MiB/s, 144MiB/s
> >> Average speed: 144MiB/s
> >>
> >> 3) Sequential write:
> >> Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s
> >> Average speed: 89MiB/s
> >>
> >> 4) Random write:
> >> Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s
> >> Average speed: 90.4MiB/s
> >>
> >> Form above data, we can see the packed request can improve the performance greatly.
> >> Any comments are welcome. Thanks a lot.
> >
> > Any comments for this patch set? Thanks.
>
> Did you consider adapting the CQE interface?

I am not very familiar with CQE, since my controller did not support
it. But the MMC packed function had introduced some callbacks to help
for different controllers to do packed request, so I think it is easy
to adapt the CQE interface.

-- 
Baolin Wang
Best Regards

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Add MMC packed function
  2019-08-12  9:44     ` Baolin Wang
@ 2019-08-12 10:50       ` Adrian Hunter
  2019-08-12 11:29         ` Baolin Wang
  2019-08-16  2:09       ` Baolin Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Adrian Hunter @ 2019-08-12 10:50 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Jens Axboe, Ulf Hansson, Chunyan Zhang, Orson Zhai,
	Arnd Bergmann, Linus Walleij, Vincent Guittot, linux-mmc, LKML,
	linux-block

On 12/08/19 12:44 PM, Baolin Wang wrote:
> Hi Adrian,
> 
> On Mon, 12 Aug 2019 at 16:59, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> On 12/08/19 8:20 AM, Baolin Wang wrote:
>>> Hi,
>>>
>>> On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> Now some SD/MMC controllers can support packed command or packed request,
>>>> that means it can package multiple requests to host controller to be handled
>>>> at one time, which can improve the I/O performence. Thus this patchset is
>>>> used to add the MMC packed function to support packed request or packed
>>>> command.
>>>>
>>>> In this patch set, I implemented the SD host ADMA3 transfer mode to support
>>>> packed request. The ADMA3 transfer mode can process a multi-block data transfer
>>>> by using a pair of command descriptor and ADMA2 descriptor. In future we can
>>>> easily expand the MMC packed function to support packed command.
>>>>
>>>> Below are some comparison data between packed request and non-packed request
>>>> with fio tool. The fio command I used is like below with changing the
>>>> '--rw' parameter and enabling the direct IO flag to measure the actual hardware
>>>> transfer speed.
>>>>
>>>> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read
>>>>
>>>> My eMMC card working at HS400 Enhanced strobe mode:
>>>> [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001
>>>> [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
>>>> [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
>>>> [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
>>>> [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0)
>>>>
>>>> 1. Non-packed request
>>>> I tested 3 times for each case and output a average speed.
>>>>
>>>> 1) Sequential read:
>>>> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s
>>>> Average speed: 28.7MiB/s
>>
>> This seems surprising low for a HS400ES card.  Do you know why that is?
> 
> I've set the clock to 400M, but it seems the hardware did not output
> the corresponding clock. I will check my hardware.
> 
>>>>
>>>> 2) Random read:
>>>> Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s
>>>> Average speed: 14.3MiB/s
>>>>
>>>> 3) Sequential write:
>>>> Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s
>>>> Average speed: 24.7MiB/s
>>>>
>>>> 4) Random write:
>>>> Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s
>>>> Average speed: 19.2MiB/s
>>>>
>>>> 2. Packed request
>>>> In packed request mode, I set the host controller can package maximum 10
>>>> requests at one time (Actually I can increase the package number), and I
>>>> enabled read/write packed request mode. Also I tested 3 times for each
>>>> case and output a average speed.
>>>>
>>>> 1) Sequential read:
>>>> Speed: 165MiB/s, 167MiB/s, 164MiB/s
>>>> Average speed: 165.3MiB/s
>>>>
>>>> 2) Random read:
>>>> Speed: 147MiB/s, 141MiB/s, 144MiB/s
>>>> Average speed: 144MiB/s
>>>>
>>>> 3) Sequential write:
>>>> Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s
>>>> Average speed: 89MiB/s
>>>>
>>>> 4) Random write:
>>>> Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s
>>>> Average speed: 90.4MiB/s
>>>>
>>>> Form above data, we can see the packed request can improve the performance greatly.
>>>> Any comments are welcome. Thanks a lot.
>>>
>>> Any comments for this patch set? Thanks.
>>
>> Did you consider adapting the CQE interface?
> 
> I am not very familiar with CQE, since my controller did not support
> it. But the MMC packed function had introduced some callbacks to help
> for different controllers to do packed request, so I think it is easy
> to adapt the CQE interface.
> 

I meant did you consider using the CQE interface instead of creating another
one?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Add MMC packed function
  2019-08-12 10:50       ` Adrian Hunter
@ 2019-08-12 11:29         ` Baolin Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2019-08-12 11:29 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jens Axboe, Ulf Hansson, Chunyan Zhang, Orson Zhai,
	Arnd Bergmann, Linus Walleij, Vincent Guittot, linux-mmc, LKML,
	linux-block

On Mon, 12 Aug 2019 at 18:52, Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 12/08/19 12:44 PM, Baolin Wang wrote:
> > Hi Adrian,
> >
> > On Mon, 12 Aug 2019 at 16:59, Adrian Hunter <adrian.hunter@intel.com> wrote:
> >>
> >> On 12/08/19 8:20 AM, Baolin Wang wrote:
> >>> Hi,
> >>>
> >>> On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote:
> >>>>
> >>>> Hi All,
> >>>>
> >>>> Now some SD/MMC controllers can support packed command or packed request,
> >>>> that means it can package multiple requests to host controller to be handled
> >>>> at one time, which can improve the I/O performence. Thus this patchset is
> >>>> used to add the MMC packed function to support packed request or packed
> >>>> command.
> >>>>
> >>>> In this patch set, I implemented the SD host ADMA3 transfer mode to support
> >>>> packed request. The ADMA3 transfer mode can process a multi-block data transfer
> >>>> by using a pair of command descriptor and ADMA2 descriptor. In future we can
> >>>> easily expand the MMC packed function to support packed command.
> >>>>
> >>>> Below are some comparison data between packed request and non-packed request
> >>>> with fio tool. The fio command I used is like below with changing the
> >>>> '--rw' parameter and enabling the direct IO flag to measure the actual hardware
> >>>> transfer speed.
> >>>>
> >>>> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read
> >>>>
> >>>> My eMMC card working at HS400 Enhanced strobe mode:
> >>>> [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001
> >>>> [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> >>>> [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
> >>>> [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
> >>>> [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0)
> >>>>
> >>>> 1. Non-packed request
> >>>> I tested 3 times for each case and output a average speed.
> >>>>
> >>>> 1) Sequential read:
> >>>> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s
> >>>> Average speed: 28.7MiB/s
> >>
> >> This seems surprising low for a HS400ES card.  Do you know why that is?
> >
> > I've set the clock to 400M, but it seems the hardware did not output
> > the corresponding clock. I will check my hardware.
> >
> >>>>
> >>>> 2) Random read:
> >>>> Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s
> >>>> Average speed: 14.3MiB/s
> >>>>
> >>>> 3) Sequential write:
> >>>> Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s
> >>>> Average speed: 24.7MiB/s
> >>>>
> >>>> 4) Random write:
> >>>> Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s
> >>>> Average speed: 19.2MiB/s
> >>>>
> >>>> 2. Packed request
> >>>> In packed request mode, I set the host controller can package maximum 10
> >>>> requests at one time (Actually I can increase the package number), and I
> >>>> enabled read/write packed request mode. Also I tested 3 times for each
> >>>> case and output a average speed.
> >>>>
> >>>> 1) Sequential read:
> >>>> Speed: 165MiB/s, 167MiB/s, 164MiB/s
> >>>> Average speed: 165.3MiB/s
> >>>>
> >>>> 2) Random read:
> >>>> Speed: 147MiB/s, 141MiB/s, 144MiB/s
> >>>> Average speed: 144MiB/s
> >>>>
> >>>> 3) Sequential write:
> >>>> Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s
> >>>> Average speed: 89MiB/s
> >>>>
> >>>> 4) Random write:
> >>>> Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s
> >>>> Average speed: 90.4MiB/s
> >>>>
> >>>> Form above data, we can see the packed request can improve the performance greatly.
> >>>> Any comments are welcome. Thanks a lot.
> >>>
> >>> Any comments for this patch set? Thanks.
> >>
> >> Did you consider adapting the CQE interface?
> >
> > I am not very familiar with CQE, since my controller did not support
> > it. But the MMC packed function had introduced some callbacks to help
> > for different controllers to do packed request, so I think it is easy
> > to adapt the CQE interface.
> >
>
> I meant did you consider using the CQE interface instead of creating another
> one?

Sorry for misunderstanding. I think the core/core.c modification can
use the CQE interface, but there are some difference in core/block.c,
and I think they are different mechanisms, also I want to keep avoid
affecting CQE and normal transfer, so I think adding MMC packed
related interfaces will be easy to read and maintain.

-- 
Baolin Wang
Best Regards

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] Add MMC packed function
  2019-08-12  9:44     ` Baolin Wang
  2019-08-12 10:50       ` Adrian Hunter
@ 2019-08-16  2:09       ` Baolin Wang
  1 sibling, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2019-08-16  2:09 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jens Axboe, Ulf Hansson, Chunyan Zhang, Orson Zhai,
	Arnd Bergmann, Linus Walleij, Vincent Guittot, linux-mmc, LKML,
	linux-block

Hi Adrian,

On Mon, 12 Aug 2019 at 17:44, Baolin Wang <baolin.wang@linaro.org> wrote:
>
> Hi Adrian,
>
> On Mon, 12 Aug 2019 at 16:59, Adrian Hunter <adrian.hunter@intel.com> wrote:
> >
> > On 12/08/19 8:20 AM, Baolin Wang wrote:
> > > Hi,
> > >
> > > On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote:
> > >>
> > >> Hi All,
> > >>
> > >> Now some SD/MMC controllers can support packed command or packed request,
> > >> that means it can package multiple requests to host controller to be handled
> > >> at one time, which can improve the I/O performence. Thus this patchset is
> > >> used to add the MMC packed function to support packed request or packed
> > >> command.
> > >>
> > >> In this patch set, I implemented the SD host ADMA3 transfer mode to support
> > >> packed request. The ADMA3 transfer mode can process a multi-block data transfer
> > >> by using a pair of command descriptor and ADMA2 descriptor. In future we can
> > >> easily expand the MMC packed function to support packed command.
> > >>
> > >> Below are some comparison data between packed request and non-packed request
> > >> with fio tool. The fio command I used is like below with changing the
> > >> '--rw' parameter and enabling the direct IO flag to measure the actual hardware
> > >> transfer speed.
> > >>
> > >> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read
> > >>
> > >> My eMMC card working at HS400 Enhanced strobe mode:
> > >> [    2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001
> > >> [    2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> > >> [    2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
> > >> [    2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
> > >> [    2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0)
> > >>
> > >> 1. Non-packed request
> > >> I tested 3 times for each case and output a average speed.
> > >>
> > >> 1) Sequential read:
> > >> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s
> > >> Average speed: 28.7MiB/s
> >
> > This seems surprising low for a HS400ES card.  Do you know why that is?
>
> I've set the clock to 400M, but it seems the hardware did not output
> the corresponding clock. I will check my hardware.

I've checked my hardware and did not find any problem.

The reason of low speed is that I set the bs = 4k, when I changed the
bs=1M, and the speed can go up to 251MiB/s.

./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read
--bs=1M --size=512M --group_reporting --numjobs=20 --name=test_read
READ: bw=251MiB/s (263MB/s), 251MiB/s-251MiB/s (263MB/s-263MB/s),
io=10.0GiB (10.7GB), run=40826-40826msec

-- 
Baolin Wang
Best Regards

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, back to index

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-22 13:09 [RFC PATCH 0/7] Add MMC packed function Baolin Wang
2019-07-22 13:09 ` [RFC PATCH 1/7] blk-mq: Export blk_mq_hctx_has_pending() function Baolin Wang
2019-07-22 14:19   ` Ming Lei
2019-07-23  3:12     ` Baolin Wang
2019-07-23  3:31       ` Ming Lei
2019-07-23  7:15         ` Baolin Wang
2019-07-22 13:09 ` [RFC PATCH 2/7] mmc: core: Add MMC packed request function Baolin Wang
2019-07-22 13:09 ` [RFC PATCH 3/7] mmc: host: sdhci: Introduce ADMA3 transfer mode Baolin Wang
2019-07-22 13:09 ` [RFC PATCH 4/7] mmc: host: sdhci: Factor out the command configuration Baolin Wang
2019-07-22 13:09 ` [RFC PATCH 5/7] mmc: host: sdhci: Remove redundant sg_count member of struct sdhci_host Baolin Wang
2019-07-22 13:09 ` [RFC PATCH 6/7] mmc: host: sdhci: Add MMC packed request support Baolin Wang
2019-07-22 13:09 ` [RFC PATCH 7/7] mmc: host: sdhci-sprd: " Baolin Wang
2019-08-12  5:20 ` [RFC PATCH 0/7] Add MMC packed function Baolin Wang
2019-08-12  8:58   ` Adrian Hunter
2019-08-12  9:44     ` Baolin Wang
2019-08-12 10:50       ` Adrian Hunter
2019-08-12 11:29         ` Baolin Wang
2019-08-16  2:09       ` Baolin Wang

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org linux-block@archiver.kernel.org
	public-inbox-index linux-block


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/ public-inbox