linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/12] use nonblock mmc requests to minimize latency
@ 2011-06-28  8:11 Per Forlin
  2011-06-28  8:11 ` [PATCH v8 01/12] mmc: core: add non-blocking mmc request function Per Forlin
                   ` (15 more replies)
  0 siblings, 16 replies; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

How significant is the cache maintenance over head?
It depends, the eMMC are much faster now
compared to a few years ago and cache maintenance cost more due to
multiple cache levels and speculative cache pre-fetch. In relation the
cost for handling the caches have increased and is now a bottle neck
dealing with fast eMMC together with DMA.

The intention for introducing non-blocking mmc requests is to minimize the
time between a mmc request ends and another mmc request starts. In the
current implementation the MMC controller is idle when dma_map_sg and
dma_unmap_sg is processing. Introducing non-blocking mmc request makes it
possible to prepare the caches for next job in parallel to an active
mmc request.

This is done by making the issue_rw_rq() non-blocking.
The increase in throughput is proportional to the time it takes to
prepare (major part of preparations is dma_map_sg and dma_unmap_sg)
a request and how fast the memory is. The faster the MMC/SD is
the more significant the prepare request time becomes. Measurements on U5500
and Panda on eMMC and SD shows significant performance gain for large
reads when running DMA mode. In the PIO case the performance is unchanged.

There are two optional hooks pre_req() and post_req() that the host driver
may implement in order to move work to before and after the actual mmc_request
function is called. In the DMA case pre_req() may do dma_map_sg() and prepare
the dma descriptor and post_req runs the dma_unmap_sg.

Details on measurements from IOZone and mmc_test:
https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req

Changes since v7:
 * rebase on mmc-next, on top of Russell's updated error handling.
 * Clarify description of mmc_start_req()
 * Resolve compile without CONFIG_DMA_ENIGNE issue for mmci
 * Add mmc test to measure how performance is affected by sg length
 * Add missing wait_for_busy in mmc_test non-blocking test. This call got lost
   in v4 of this patchset when refactoring mmc_start_req.
 * Add sub-prefix (core block queue) to relevant patches.

Per Forlin (12):
  mmc: core: add non-blocking mmc request function
  omap_hsmmc: add support for pre_req and post_req
  mmci: implement pre_req() and post_req()
  mmc: mmc_test: add debugfs file to list all tests
  mmc: mmc_test: add test for non-blocking transfers
  mmc: mmc_test: test to measure how sg_len affect performance
  mmc: block: add member in mmc queue struct to hold request data
  mmc: block: add a block request prepare function
  mmc: block: move error code in issue_rw_rq to a separate function.
  mmc: queue: add a second mmc queue request member
  mmc: core: add random fault injection
  mmc: block: add handling for two parallel block requests in
    issue_rw_rq

 drivers/mmc/card/block.c      |  505 ++++++++++++++++++++++++-----------------
 drivers/mmc/card/mmc_test.c   |  491 ++++++++++++++++++++++++++++++++++++++--
 drivers/mmc/card/queue.c      |  184 ++++++++++------
 drivers/mmc/card/queue.h      |   33 ++-
 drivers/mmc/core/core.c       |  167 +++++++++++++-
 drivers/mmc/core/debugfs.c    |    5 +
 drivers/mmc/host/mmci.c       |  147 +++++++++++-
 drivers/mmc/host/mmci.h       |    8 +
 drivers/mmc/host/omap_hsmmc.c |   87 +++++++-
 include/linux/mmc/core.h      |    6 +-
 include/linux/mmc/host.h      |   24 ++
 lib/Kconfig.debug             |   11 +
 12 files changed, 1345 insertions(+), 323 deletions(-)

-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v8 01/12] mmc: core: add non-blocking mmc request function
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
@ 2011-06-28  8:11 ` Per Forlin
  2011-06-28  8:11 ` [PATCH v8 02/12] omap_hsmmc: add support for pre_req and post_req Per Forlin
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

Previously there has only been one function mmc_wait_for_req()
to start and wait for a request. This patch adds
 * mmc_start_req() - starts a request wihtout waiting
   If there is on ongoing request wait for completion
   of that request and start the new one and return.
   Does not wait for the new command to complete.

This patch also adds new function members in struct mmc_host_ops
only called from core.c
 * pre_req - asks the host driver to prepare for the next job
 * post_req - asks the host driver to clean up after a completed job

The intention is to use pre_req() and post_req() to do cache maintenance
while a request is active. pre_req() can be called while a request is active
to minimize latency to start next job. post_req() can be used after the next
job is started to clean up the request. This will minimize the host driver
request end latency. post_req() is typically used before ending the block
request and handing over the buffer to the block layer.

Add a host-private member in mmc_data to be used by
pre_req to mark the data. The host driver will then
check this mark to see if the data is prepared or not.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/mmc/core/core.c  |  113 +++++++++++++++++++++++++++++++++++++++++----
 include/linux/mmc/core.h |    6 ++-
 include/linux/mmc/host.h |   21 +++++++++
 3 files changed, 129 insertions(+), 11 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 7843efe..d2d7239 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -198,9 +198,109 @@ mmc_start_request(struct mmc_host *host, struct mmc_request *mrq)
 
 static void mmc_wait_done(struct mmc_request *mrq)
 {
-	complete(mrq->done_data);
+	complete(&mrq->completion);
 }
 
+static void __mmc_start_req(struct mmc_host *host, struct mmc_request *mrq)
+{
+	init_completion(&mrq->completion);
+	mrq->done = mmc_wait_done;
+	mmc_start_request(host, mrq);
+}
+
+static void mmc_wait_for_req_done(struct mmc_host *host,
+				  struct mmc_request *mrq)
+{
+	wait_for_completion(&mrq->completion);
+}
+
+/**
+ *	mmc_pre_req - Prepare for a new request
+ *	@host: MMC host to prepare command
+ *	@mrq: MMC request to prepare for
+ *	@is_first_req: true if there is no previous started request
+ *                     that may run in parellel to this call, otherwise false
+ *
+ *	mmc_pre_req() is called in prior to mmc_start_req() to let
+ *	host prepare for the new request. Preparation of a request may be
+ *	performed while another request is running on the host.
+ */
+static void mmc_pre_req(struct mmc_host *host, struct mmc_request *mrq,
+		 bool is_first_req)
+{
+	if (host->ops->pre_req)
+		host->ops->pre_req(host, mrq, is_first_req);
+}
+
+/**
+ *	mmc_post_req - Post process a completed request
+ *	@host: MMC host to post process command
+ *	@mrq: MMC request to post process for
+ *	@err: Error, if non zero, clean up any resources made in pre_req
+ *
+ *	Let the host post process a completed request. Post processing of
+ *	a request may be performed while another reuqest is running.
+ */
+static void mmc_post_req(struct mmc_host *host, struct mmc_request *mrq,
+			 int err)
+{
+	if (host->ops->post_req)
+		host->ops->post_req(host, mrq, err);
+}
+
+/**
+ *	mmc_start_req - start a non-blocking request
+ *	@host: MMC host to start command
+ *	@areq: async request to start
+ *	@error: out parameter returns 0 for success, otherwise non zero
+ *
+ *	Start a new MMC custom command request for a host.
+ *	If there is on ongoing async request wait for completion
+ *	of that request and start the new one and return.
+ *	Does not wait for the new request to complete.
+ *
+ *      Returns the completed request, NULL in case of none completed.
+ *	Wait for the an ongoing request (previoulsy started) to complete and
+ *	return the completed request. If there is no ongoing request, NULL
+ *	is returned without waiting. NULL is not an error condition.
+ */
+struct mmc_async_req *mmc_start_req(struct mmc_host *host,
+				    struct mmc_async_req *areq, int *error)
+{
+	int err = 0;
+	struct mmc_async_req *data = host->areq;
+
+	/* Prepare a new request */
+	if (areq)
+		mmc_pre_req(host, areq->mrq, !host->areq);
+
+	if (host->areq) {
+		mmc_wait_for_req_done(host, host->areq->mrq);
+		err = host->areq->err_check(host->card, host->areq);
+		if (err) {
+			mmc_post_req(host, host->areq->mrq, 0);
+			if (areq)
+				mmc_post_req(host, areq->mrq, -EINVAL);
+
+			host->areq = NULL;
+			goto out;
+		}
+	}
+
+	if (areq)
+		__mmc_start_req(host, areq->mrq);
+
+	if (host->areq)
+		mmc_post_req(host, host->areq->mrq, 0);
+
+	host->areq = areq;
+ out:
+	if (error)
+		*error = err;
+	return data;
+}
+EXPORT_SYMBOL(mmc_start_req);
+
 /**
  *	mmc_wait_for_req - start a request and wait for completion
  *	@host: MMC host to start command
@@ -212,16 +312,9 @@ static void mmc_wait_done(struct mmc_request *mrq)
  */
 void mmc_wait_for_req(struct mmc_host *host, struct mmc_request *mrq)
 {
-	DECLARE_COMPLETION_ONSTACK(complete);
-
-	mrq->done_data = &complete;
-	mrq->done = mmc_wait_done;
-
-	mmc_start_request(host, mrq);
-
-	wait_for_completion(&complete);
+	__mmc_start_req(host, mrq);
+	mmc_wait_for_req_done(host, mrq);
 }
-
 EXPORT_SYMBOL(mmc_wait_for_req);
 
 /**
diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h
index 791f060..57ee19e 100644
--- a/include/linux/mmc/core.h
+++ b/include/linux/mmc/core.h
@@ -117,6 +117,7 @@ struct mmc_data {
 
 	unsigned int		sg_len;		/* size of scatter list */
 	struct scatterlist	*sg;		/* I/O scatter list */
+	s32			host_cookie;	/* host private data */
 };
 
 struct mmc_request {
@@ -125,13 +126,16 @@ struct mmc_request {
 	struct mmc_data		*data;
 	struct mmc_command	*stop;
 
-	void			*done_data;	/* completion data */
+	struct completion	completion;
 	void			(*done)(struct mmc_request *);/* completion function */
 };
 
 struct mmc_host;
 struct mmc_card;
+struct mmc_async_req;
 
+extern struct mmc_async_req *mmc_start_req(struct mmc_host *,
+					   struct mmc_async_req *, int *);
 extern void mmc_wait_for_req(struct mmc_host *, struct mmc_request *);
 extern int mmc_wait_for_cmd(struct mmc_host *, struct mmc_command *, int);
 extern int mmc_app_cmd(struct mmc_host *, struct mmc_card *);
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index ac3fbac..59db6f2 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -106,6 +106,15 @@ struct mmc_host_ops {
 	 */
 	int (*enable)(struct mmc_host *host);
 	int (*disable)(struct mmc_host *host, int lazy);
+	/*
+	 * It is optional for the host to implement pre_req and post_req in
+	 * order to support double buffering of requests (prepare one
+	 * request while another request is active).
+	 */
+	void	(*post_req)(struct mmc_host *host, struct mmc_request *req,
+			    int err);
+	void	(*pre_req)(struct mmc_host *host, struct mmc_request *req,
+			   bool is_first_req);
 	void	(*request)(struct mmc_host *host, struct mmc_request *req);
 	/*
 	 * Avoid calling these three functions too often or in a "fast path",
@@ -144,6 +153,16 @@ struct mmc_host_ops {
 struct mmc_card;
 struct device;
 
+struct mmc_async_req {
+	/* active mmc request */
+	struct mmc_request	*mrq;
+	/*
+	 * Check error status of completed mmc request.
+	 * Returns 0 if success otherwise non zero.
+	 */
+	int (*err_check) (struct mmc_card *, struct mmc_async_req *);
+};
+
 struct mmc_host {
 	struct device		*parent;
 	struct device		class_dev;
@@ -280,6 +299,8 @@ struct mmc_host {
 
 	struct dentry		*debugfs_root;
 
+	struct mmc_async_req	*areq;		/* active async req */
+
 	unsigned long		private[0] ____cacheline_aligned;
 };
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 02/12] omap_hsmmc: add support for pre_req and post_req
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
  2011-06-28  8:11 ` [PATCH v8 01/12] mmc: core: add non-blocking mmc request function Per Forlin
@ 2011-06-28  8:11 ` Per Forlin
  2011-06-28  8:11 ` [PATCH v8 03/12] mmci: implement pre_req() and post_req() Per Forlin
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

pre_req() runs dma_map_sg(), post_req() runs dma_unmap_sg.
If not calling pre_req() before omap_hsmmc_request()
dma_map_sg will be issued before starting the transfer.
It is optional to use pre_req(). If issuing pre_req()
post_req() must be to be called as well.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/mmc/host/omap_hsmmc.c |   87 +++++++++++++++++++++++++++++++++++++++--
 1 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c
index cd317af..b29ca90 100644
--- a/drivers/mmc/host/omap_hsmmc.c
+++ b/drivers/mmc/host/omap_hsmmc.c
@@ -141,6 +141,11 @@
 #define OMAP_HSMMC_WRITE(base, reg, val) \
 	__raw_writel((val), (base) + OMAP_HSMMC_##reg)
 
+struct omap_hsmmc_next {
+	unsigned int	dma_len;
+	s32		cookie;
+};
+
 struct omap_hsmmc_host {
 	struct	device		*dev;
 	struct	mmc_host	*mmc;
@@ -184,6 +189,7 @@ struct omap_hsmmc_host {
 	int			reqs_blocked;
 	int			use_reg;
 	int			req_in_progress;
+	struct omap_hsmmc_next	next_data;
 
 	struct	omap_mmc_platform_data	*pdata;
 };
@@ -1343,8 +1349,9 @@ static void omap_hsmmc_dma_cb(int lch, u16 ch_status, void *cb_data)
 		return;
 	}
 
-	dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len,
-		omap_hsmmc_get_dma_dir(host, data));
+	if (!data->host_cookie)
+		dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len,
+			     omap_hsmmc_get_dma_dir(host, data));
 
 	req_in_progress = host->req_in_progress;
 	dma_ch = host->dma_ch;
@@ -1362,6 +1369,45 @@ static void omap_hsmmc_dma_cb(int lch, u16 ch_status, void *cb_data)
 	}
 }
 
+static int omap_hsmmc_pre_dma_transfer(struct omap_hsmmc_host *host,
+				       struct mmc_data *data,
+				       struct omap_hsmmc_next *next)
+{
+	int dma_len;
+
+	if (!next && data->host_cookie &&
+	    data->host_cookie != host->next_data.cookie) {
+		printk(KERN_WARNING "[%s] invalid cookie: data->host_cookie %d"
+		       " host->next_data.cookie %d\n",
+		       __func__, data->host_cookie, host->next_data.cookie);
+		data->host_cookie = 0;
+	}
+
+	/* Check if next job is already prepared */
+	if (next ||
+	    (!next && data->host_cookie != host->next_data.cookie)) {
+		dma_len = dma_map_sg(mmc_dev(host->mmc), data->sg,
+				     data->sg_len,
+				     omap_hsmmc_get_dma_dir(host, data));
+
+	} else {
+		dma_len = host->next_data.dma_len;
+		host->next_data.dma_len = 0;
+	}
+
+
+	if (dma_len == 0)
+		return -EINVAL;
+
+	if (next) {
+		next->dma_len = dma_len;
+		data->host_cookie = ++next->cookie < 0 ? 1 : next->cookie;
+	} else
+		host->dma_len = dma_len;
+
+	return 0;
+}
+
 /*
  * Routine to configure and start DMA for the MMC card
  */
@@ -1395,9 +1441,10 @@ static int omap_hsmmc_start_dma_transfer(struct omap_hsmmc_host *host,
 			mmc_hostname(host->mmc), ret);
 		return ret;
 	}
+	ret = omap_hsmmc_pre_dma_transfer(host, data, NULL);
+	if (ret)
+		return ret;
 
-	host->dma_len = dma_map_sg(mmc_dev(host->mmc), data->sg,
-			data->sg_len, omap_hsmmc_get_dma_dir(host, data));
 	host->dma_ch = dma_ch;
 	host->dma_sg_idx = 0;
 
@@ -1477,6 +1524,35 @@ omap_hsmmc_prepare_data(struct omap_hsmmc_host *host, struct mmc_request *req)
 	return 0;
 }
 
+static void omap_hsmmc_post_req(struct mmc_host *mmc, struct mmc_request *mrq,
+				int err)
+{
+	struct omap_hsmmc_host *host = mmc_priv(mmc);
+	struct mmc_data *data = mrq->data;
+
+	if (host->use_dma) {
+		dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len,
+			     omap_hsmmc_get_dma_dir(host, data));
+		data->host_cookie = 0;
+	}
+}
+
+static void omap_hsmmc_pre_req(struct mmc_host *mmc, struct mmc_request *mrq,
+			       bool is_first_req)
+{
+	struct omap_hsmmc_host *host = mmc_priv(mmc);
+
+	if (mrq->data->host_cookie) {
+		mrq->data->host_cookie = 0;
+		return ;
+	}
+
+	if (host->use_dma)
+		if (omap_hsmmc_pre_dma_transfer(host, mrq->data,
+						&host->next_data))
+			mrq->data->host_cookie = 0;
+}
+
 /*
  * Request function. for read/write operation
  */
@@ -1925,6 +2001,8 @@ static int omap_hsmmc_disable_fclk(struct mmc_host *mmc, int lazy)
 static const struct mmc_host_ops omap_hsmmc_ops = {
 	.enable = omap_hsmmc_enable_fclk,
 	.disable = omap_hsmmc_disable_fclk,
+	.post_req = omap_hsmmc_post_req,
+	.pre_req = omap_hsmmc_pre_req,
 	.request = omap_hsmmc_request,
 	.set_ios = omap_hsmmc_set_ios,
 	.get_cd = omap_hsmmc_get_cd,
@@ -2074,6 +2152,7 @@ static int __init omap_hsmmc_probe(struct platform_device *pdev)
 	host->mapbase	= res->start;
 	host->base	= ioremap(host->mapbase, SZ_4K);
 	host->power_mode = MMC_POWER_OFF;
+	host->next_data.cookie = 1;
 
 	platform_set_drvdata(pdev, host);
 	INIT_WORK(&host->mmc_carddetect_work, omap_hsmmc_detect);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 03/12] mmci: implement pre_req() and post_req()
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
  2011-06-28  8:11 ` [PATCH v8 01/12] mmc: core: add non-blocking mmc request function Per Forlin
  2011-06-28  8:11 ` [PATCH v8 02/12] omap_hsmmc: add support for pre_req and post_req Per Forlin
@ 2011-06-28  8:11 ` Per Forlin
  2011-06-28  8:11 ` [PATCH v8 04/12] mmc: mmc_test: add debugfs file to list all tests Per Forlin
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

pre_req() runs dma_map_sg() and prepares the dma descriptor
for the next mmc data transfer. post_req() runs dma_unmap_sg.
If not calling pre_req() before mmci_request(), mmci_request()
will prepare the cache and dma just like it did it before.
It is optional to use pre_req() and post_req() for mmci.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/mmc/host/mmci.c |  147 ++++++++++++++++++++++++++++++++++++++++++----
 drivers/mmc/host/mmci.h |    8 +++
 2 files changed, 142 insertions(+), 13 deletions(-)

diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index b4a7e4f..6ed6770 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -211,6 +211,9 @@ static void __devinit mmci_dma_setup(struct mmci_host *host)
 		return;
 	}
 
+	/* initialize pre request cookie */
+	host->next_data.cookie = 1;
+
 	/* Try to acquire a generic DMA engine slave channel */
 	dma_cap_zero(mask);
 	dma_cap_set(DMA_SLAVE, mask);
@@ -320,7 +323,8 @@ static void mmci_dma_unmap(struct mmci_host *host, struct mmc_data *data)
 		dir = DMA_FROM_DEVICE;
 	}
 
-	dma_unmap_sg(chan->device->dev, data->sg, data->sg_len, dir);
+	if (!data->host_cookie)
+		dma_unmap_sg(chan->device->dev, data->sg, data->sg_len, dir);
 
 	/*
 	 * Use of DMA with scatter-gather is impossible.
@@ -338,7 +342,8 @@ static void mmci_dma_data_error(struct mmci_host *host)
 	dmaengine_terminate_all(host->dma_current);
 }
 
-static int mmci_dma_start_data(struct mmci_host *host, unsigned int datactrl)
+static int mmci_dma_prep_data(struct mmci_host *host, struct mmc_data *data,
+			      struct mmci_host_next *next)
 {
 	struct variant_data *variant = host->variant;
 	struct dma_slave_config conf = {
@@ -349,13 +354,20 @@ static int mmci_dma_start_data(struct mmci_host *host, unsigned int datactrl)
 		.src_maxburst = variant->fifohalfsize >> 2, /* # of words */
 		.dst_maxburst = variant->fifohalfsize >> 2, /* # of words */
 	};
-	struct mmc_data *data = host->data;
 	struct dma_chan *chan;
 	struct dma_device *device;
 	struct dma_async_tx_descriptor *desc;
 	int nr_sg;
 
-	host->dma_current = NULL;
+	/* Check if next job is already prepared */
+	if (data->host_cookie && !next &&
+	    host->dma_current && host->dma_desc_current)
+		return 0;
+
+	if (!next) {
+		host->dma_current = NULL;
+		host->dma_desc_current = NULL;
+	}
 
 	if (data->flags & MMC_DATA_READ) {
 		conf.direction = DMA_FROM_DEVICE;
@@ -370,7 +382,7 @@ static int mmci_dma_start_data(struct mmci_host *host, unsigned int datactrl)
 		return -EINVAL;
 
 	/* If less than or equal to the fifo size, don't bother with DMA */
-	if (host->size <= variant->fifosize)
+	if (data->blksz * data->blocks <= variant->fifosize)
 		return -EINVAL;
 
 	device = chan->device;
@@ -384,14 +396,38 @@ static int mmci_dma_start_data(struct mmci_host *host, unsigned int datactrl)
 	if (!desc)
 		goto unmap_exit;
 
-	/* Okay, go for it. */
-	host->dma_current = chan;
+	if (next) {
+		next->dma_chan = chan;
+		next->dma_desc = desc;
+	} else {
+		host->dma_current = chan;
+		host->dma_desc_current = desc;
+	}
+
+	return 0;
 
+ unmap_exit:
+	if (!next)
+		dmaengine_terminate_all(chan);
+	dma_unmap_sg(device->dev, data->sg, data->sg_len, conf.direction);
+	return -ENOMEM;
+}
+
+static int mmci_dma_start_data(struct mmci_host *host, unsigned int datactrl)
+{
+	int ret;
+	struct mmc_data *data = host->data;
+
+	ret = mmci_dma_prep_data(host, host->data, NULL);
+	if (ret)
+		return ret;
+
+	/* Okay, go for it. */
 	dev_vdbg(mmc_dev(host->mmc),
 		 "Submit MMCI DMA job, sglen %d blksz %04x blks %04x flags %08x\n",
 		 data->sg_len, data->blksz, data->blocks, data->flags);
-	dmaengine_submit(desc);
-	dma_async_issue_pending(chan);
+	dmaengine_submit(host->dma_desc_current);
+	dma_async_issue_pending(host->dma_current);
 
 	datactrl |= MCI_DPSM_DMAENABLE;
 
@@ -406,14 +442,90 @@ static int mmci_dma_start_data(struct mmci_host *host, unsigned int datactrl)
 	writel(readl(host->base + MMCIMASK0) | MCI_DATAENDMASK,
 	       host->base + MMCIMASK0);
 	return 0;
+}
 
-unmap_exit:
-	dmaengine_terminate_all(chan);
-	dma_unmap_sg(device->dev, data->sg, data->sg_len, conf.direction);
-	return -ENOMEM;
+static void mmci_get_next_data(struct mmci_host *host, struct mmc_data *data)
+{
+	struct mmci_host_next *next = &host->next_data;
+
+	if (data->host_cookie && data->host_cookie != next->cookie) {
+		printk(KERN_WARNING "[%s] invalid cookie: data->host_cookie %d"
+		       " host->next_data.cookie %d\n",
+		       __func__, data->host_cookie, host->next_data.cookie);
+		data->host_cookie = 0;
+	}
+
+	if (!data->host_cookie)
+		return;
+
+	host->dma_desc_current = next->dma_desc;
+	host->dma_current = next->dma_chan;
+
+	next->dma_desc = NULL;
+	next->dma_chan = NULL;
 }
+
+static void mmci_pre_request(struct mmc_host *mmc, struct mmc_request *mrq,
+			     bool is_first_req)
+{
+	struct mmci_host *host = mmc_priv(mmc);
+	struct mmc_data *data = mrq->data;
+	struct mmci_host_next *nd = &host->next_data;
+
+	if (!data)
+		return;
+
+	if (data->host_cookie) {
+		data->host_cookie = 0;
+		return;
+	}
+
+	/* if config for dma */
+	if (((data->flags & MMC_DATA_WRITE) && host->dma_tx_channel) ||
+	    ((data->flags & MMC_DATA_READ) && host->dma_rx_channel)) {
+		if (mmci_dma_prep_data(host, data, nd))
+			data->host_cookie = 0;
+		else
+			data->host_cookie = ++nd->cookie < 0 ? 1 : nd->cookie;
+	}
+}
+
+static void mmci_post_request(struct mmc_host *mmc, struct mmc_request *mrq,
+			      int err)
+{
+	struct mmci_host *host = mmc_priv(mmc);
+	struct mmc_data *data = mrq->data;
+	struct dma_chan *chan;
+	enum dma_data_direction dir;
+
+	if (!data)
+		return;
+
+	if (data->flags & MMC_DATA_READ) {
+		dir = DMA_FROM_DEVICE;
+		chan = host->dma_rx_channel;
+	} else {
+		dir = DMA_TO_DEVICE;
+		chan = host->dma_tx_channel;
+	}
+
+
+	/* if config for dma */
+	if (chan) {
+		if (err)
+			dmaengine_terminate_all(chan);
+		if (err || data->host_cookie)
+			dma_unmap_sg(mmc_dev(host->mmc), data->sg,
+				     data->sg_len, dir);
+		mrq->data->host_cookie = 0;
+	}
+}
+
 #else
 /* Blank functions if the DMA engine is not available */
+static void mmci_get_next_data(struct mmci_host *host, struct mmc_data *data)
+{
+}
 static inline void mmci_dma_setup(struct mmci_host *host)
 {
 }
@@ -434,6 +546,10 @@ static inline int mmci_dma_start_data(struct mmci_host *host, unsigned int datac
 {
 	return -ENOSYS;
 }
+
+#define mmci_pre_request NULL
+#define mmci_post_request NULL
+
 #endif
 
 static void mmci_start_data(struct mmci_host *host, struct mmc_data *data)
@@ -852,6 +968,9 @@ static void mmci_request(struct mmc_host *mmc, struct mmc_request *mrq)
 
 	host->mrq = mrq;
 
+	if (mrq->data)
+		mmci_get_next_data(host, mrq->data);
+
 	if (mrq->data && mrq->data->flags & MMC_DATA_READ)
 		mmci_start_data(host, mrq->data);
 
@@ -966,6 +1085,8 @@ static irqreturn_t mmci_cd_irq(int irq, void *dev_id)
 
 static const struct mmc_host_ops mmci_ops = {
 	.request	= mmci_request,
+	.pre_req	= mmci_pre_request,
+	.post_req	= mmci_post_request,
 	.set_ios	= mmci_set_ios,
 	.get_ro		= mmci_get_ro,
 	.get_cd		= mmci_get_cd,
diff --git a/drivers/mmc/host/mmci.h b/drivers/mmc/host/mmci.h
index ec9a7bc6..e21d850 100644
--- a/drivers/mmc/host/mmci.h
+++ b/drivers/mmc/host/mmci.h
@@ -150,6 +150,12 @@ struct clk;
 struct variant_data;
 struct dma_chan;
 
+struct mmci_host_next {
+	struct dma_async_tx_descriptor	*dma_desc;
+	struct dma_chan			*dma_chan;
+	s32				cookie;
+};
+
 struct mmci_host {
 	phys_addr_t		phybase;
 	void __iomem		*base;
@@ -187,6 +193,8 @@ struct mmci_host {
 	struct dma_chan		*dma_current;
 	struct dma_chan		*dma_rx_channel;
 	struct dma_chan		*dma_tx_channel;
+	struct dma_async_tx_descriptor	*dma_desc_current;
+	struct mmci_host_next	next_data;
 
 #define dma_inprogress(host)	((host)->dma_current)
 #else
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 04/12] mmc: mmc_test: add debugfs file to list all tests
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (2 preceding siblings ...)
  2011-06-28  8:11 ` [PATCH v8 03/12] mmci: implement pre_req() and post_req() Per Forlin
@ 2011-06-28  8:11 ` Per Forlin
  2011-06-28  8:11 ` [PATCH v8 05/12] mmc: mmc_test: add test for non-blocking transfers Per Forlin
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

Add a debugfs file "testlist" to print all available tests

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/mmc/card/mmc_test.c |   39 ++++++++++++++++++++++++++++++++++++++-
 1 files changed, 38 insertions(+), 1 deletions(-)

diff --git a/drivers/mmc/card/mmc_test.c b/drivers/mmc/card/mmc_test.c
index 233cdfa..e8508e9 100644
--- a/drivers/mmc/card/mmc_test.c
+++ b/drivers/mmc/card/mmc_test.c
@@ -2445,6 +2445,32 @@ static const struct file_operations mmc_test_fops_test = {
 	.release	= single_release,
 };
 
+static int mtf_testlist_show(struct seq_file *sf, void *data)
+{
+	int i;
+
+	mutex_lock(&mmc_test_lock);
+
+	for (i = 0; i < ARRAY_SIZE(mmc_test_cases); i++)
+		seq_printf(sf, "%d:\t%s\n", i+1, mmc_test_cases[i].name);
+
+	mutex_unlock(&mmc_test_lock);
+
+	return 0;
+}
+
+static int mtf_testlist_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, mtf_testlist_show, inode->i_private);
+}
+
+static const struct file_operations mmc_test_fops_testlist = {
+	.open		= mtf_testlist_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 static void mmc_test_free_file_test(struct mmc_card *card)
 {
 	struct mmc_test_dbgfs_file *df, *dfs;
@@ -2476,7 +2502,18 @@ static int mmc_test_register_file_test(struct mmc_card *card)
 
 	if (IS_ERR_OR_NULL(file)) {
 		dev_err(&card->dev,
-			"Can't create file. Perhaps debugfs is disabled.\n");
+			"Can't create test. Perhaps debugfs is disabled.\n");
+		ret = -ENODEV;
+		goto err;
+	}
+
+	if (card->debugfs_root)
+		file = debugfs_create_file("testlist", S_IRUGO,
+			card->debugfs_root, card, &mmc_test_fops_testlist);
+
+	if (IS_ERR_OR_NULL(file)) {
+		dev_err(&card->dev,
+			"Can't create testlist. Perhaps debugfs is disabled.\n");
 		ret = -ENODEV;
 		goto err;
 	}
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 05/12] mmc: mmc_test: add test for non-blocking transfers
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (3 preceding siblings ...)
  2011-06-28  8:11 ` [PATCH v8 04/12] mmc: mmc_test: add debugfs file to list all tests Per Forlin
@ 2011-06-28  8:11 ` Per Forlin
  2011-07-01 13:29   ` Per Forlin
  2011-06-28  8:11 ` [PATCH v8 06/12] mmc: mmc_test: test to measure how sg_len affect performance Per Forlin
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

Add four tests for read and write performance per
different transfer size, 4k to 4M.
 * Read using blocking mmc request
 * Read using non-blocking mmc request
 * Write using blocking mmc request
 * Write using non-blocking mmc request

The host dirver must support pre_req() and post_req()
in order to run the non-blocking test cases.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/mmc/card/mmc_test.c |  311 +++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 303 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/card/mmc_test.c b/drivers/mmc/card/mmc_test.c
index e8508e9..466a192 100644
--- a/drivers/mmc/card/mmc_test.c
+++ b/drivers/mmc/card/mmc_test.c
@@ -148,6 +148,26 @@ struct mmc_test_card {
 	struct mmc_test_general_result	*gr;
 };
 
+enum mmc_test_prep_media {
+	MMC_TEST_PREP_NONE = 0,
+	MMC_TEST_PREP_WRITE_FULL = 1 << 0,
+	MMC_TEST_PREP_ERASE = 1 << 1,
+};
+
+struct mmc_test_multiple_rw {
+	unsigned int *bs;
+	unsigned int len;
+	unsigned int size;
+	bool do_write;
+	bool do_nonblock_req;
+	enum mmc_test_prep_media prepare;
+};
+
+struct mmc_test_async_req {
+	struct mmc_async_req areq;
+	struct mmc_test_card *test;
+};
+
 /*******************************************************************/
 /*  General helper functions                                       */
 /*******************************************************************/
@@ -661,7 +681,7 @@ static void mmc_test_prepare_broken_mrq(struct mmc_test_card *test,
  * Checks that a normal transfer didn't have any errors
  */
 static int mmc_test_check_result(struct mmc_test_card *test,
-	struct mmc_request *mrq)
+				 struct mmc_request *mrq)
 {
 	int ret;
 
@@ -685,6 +705,17 @@ static int mmc_test_check_result(struct mmc_test_card *test,
 	return ret;
 }
 
+static int mmc_test_check_result_async(struct mmc_card *card,
+				       struct mmc_async_req *areq)
+{
+	struct mmc_test_async_req *test_async =
+		container_of(areq, struct mmc_test_async_req, areq);
+
+	mmc_test_wait_busy(test_async->test);
+
+	return mmc_test_check_result(test_async->test, areq->mrq);
+}
+
 /*
  * Checks that a "short transfer" behaved as expected
  */
@@ -720,6 +751,89 @@ static int mmc_test_check_broken_result(struct mmc_test_card *test,
 }
 
 /*
+ * Tests nonblock transfer with certain parameters
+ */
+static void mmc_test_nonblock_reset(struct mmc_request *mrq,
+				    struct mmc_command *cmd,
+				    struct mmc_command *stop,
+				    struct mmc_data *data)
+{
+	memset(mrq, 0, sizeof(struct mmc_request));
+	memset(cmd, 0, sizeof(struct mmc_command));
+	memset(data, 0, sizeof(struct mmc_data));
+	memset(stop, 0, sizeof(struct mmc_command));
+
+	mrq->cmd = cmd;
+	mrq->data = data;
+	mrq->stop = stop;
+}
+static int mmc_test_nonblock_transfer(struct mmc_test_card *test,
+				      struct scatterlist *sg, unsigned sg_len,
+				      unsigned dev_addr, unsigned blocks,
+				      unsigned blksz, int write, int count)
+{
+	struct mmc_request mrq1;
+	struct mmc_command cmd1;
+	struct mmc_command stop1;
+	struct mmc_data data1;
+
+	struct mmc_request mrq2;
+	struct mmc_command cmd2;
+	struct mmc_command stop2;
+	struct mmc_data data2;
+
+	struct mmc_test_async_req test_areq[2];
+	struct mmc_async_req *done_areq;
+	struct mmc_async_req *cur_areq = &test_areq[0].areq;
+	struct mmc_async_req *other_areq = &test_areq[1].areq;
+	int i;
+	int ret;
+
+	test_areq[0].test = test;
+	test_areq[1].test = test;
+
+	if (!test->card->host->ops->pre_req ||
+		!test->card->host->ops->post_req)
+		return -RESULT_UNSUP_HOST;
+
+	mmc_test_nonblock_reset(&mrq1, &cmd1, &stop1, &data1);
+	mmc_test_nonblock_reset(&mrq2, &cmd2, &stop2, &data2);
+
+	cur_areq->mrq = &mrq1;
+	cur_areq->err_check = mmc_test_check_result_async;
+	other_areq->mrq = &mrq2;
+	other_areq->err_check = mmc_test_check_result_async;
+
+	for (i = 0; i < count; i++) {
+		mmc_test_prepare_mrq(test, cur_areq->mrq, sg, sg_len, dev_addr,
+				     blocks, blksz, write);
+		done_areq = mmc_start_req(test->card->host, cur_areq, &ret);
+
+		if (ret || (!done_areq && i > 0))
+			goto err;
+
+		if (done_areq) {
+			if (done_areq->mrq == &mrq2)
+				mmc_test_nonblock_reset(&mrq2, &cmd2,
+							&stop2, &data2);
+			else
+				mmc_test_nonblock_reset(&mrq1, &cmd1,
+							&stop1, &data1);
+		}
+		done_areq = cur_areq;
+		cur_areq = other_areq;
+		other_areq = done_areq;
+		dev_addr += blocks;
+	}
+
+	done_areq = mmc_start_req(test->card->host, NULL, &ret);
+
+	return ret;
+err:
+	return ret;
+}
+
+/*
  * Tests a basic transfer with certain parameters
  */
 static int mmc_test_simple_transfer(struct mmc_test_card *test,
@@ -1336,14 +1450,17 @@ static int mmc_test_area_transfer(struct mmc_test_card *test,
 }
 
 /*
- * Map and transfer bytes.
+ * Map and transfer bytes for multiple transfers.
  */
-static int mmc_test_area_io(struct mmc_test_card *test, unsigned long sz,
-			    unsigned int dev_addr, int write, int max_scatter,
-			    int timed)
+static int mmc_test_area_io_seq(struct mmc_test_card *test, unsigned long sz,
+				unsigned int dev_addr, int write,
+				int max_scatter, int timed, int count,
+				bool nonblock)
 {
 	struct timespec ts1, ts2;
-	int ret;
+	int ret = 0;
+	int i;
+	struct mmc_test_area *t = &test->area;
 
 	/*
 	 * In the case of a maximally scattered transfer, the maximum transfer
@@ -1367,8 +1484,15 @@ static int mmc_test_area_io(struct mmc_test_card *test, unsigned long sz,
 
 	if (timed)
 		getnstimeofday(&ts1);
+	if (nonblock)
+		ret = mmc_test_nonblock_transfer(test, t->sg, t->sg_len,
+				 dev_addr, t->blocks, 512, write, count);
+	else
+		for (i = 0; i < count && ret == 0; i++) {
+			ret = mmc_test_area_transfer(test, dev_addr, write);
+			dev_addr += sz >> 9;
+		}
 
-	ret = mmc_test_area_transfer(test, dev_addr, write);
 	if (ret)
 		return ret;
 
@@ -1376,11 +1500,19 @@ static int mmc_test_area_io(struct mmc_test_card *test, unsigned long sz,
 		getnstimeofday(&ts2);
 
 	if (timed)
-		mmc_test_print_rate(test, sz, &ts1, &ts2);
+		mmc_test_print_avg_rate(test, sz, count, &ts1, &ts2);
 
 	return 0;
 }
 
+static int mmc_test_area_io(struct mmc_test_card *test, unsigned long sz,
+			    unsigned int dev_addr, int write, int max_scatter,
+			    int timed)
+{
+	return mmc_test_area_io_seq(test, sz, dev_addr, write, max_scatter,
+				    timed, 1, false);
+}
+
 /*
  * Write the test area entirely.
  */
@@ -1954,6 +2086,142 @@ static int mmc_test_large_seq_write_perf(struct mmc_test_card *test)
 	return mmc_test_large_seq_perf(test, 1);
 }
 
+static int mmc_test_rw_multiple(struct mmc_test_card *test,
+				struct mmc_test_multiple_rw *tdata,
+				unsigned int reqsize, unsigned int size)
+{
+	unsigned int dev_addr;
+	struct mmc_test_area *t = &test->area;
+	int ret = 0;
+
+	/* Set up test area */
+	if (size > mmc_test_capacity(test->card) / 2 * 512)
+		size = mmc_test_capacity(test->card) / 2 * 512;
+	if (reqsize > t->max_tfr)
+		reqsize = t->max_tfr;
+	dev_addr = mmc_test_capacity(test->card) / 4;
+	if ((dev_addr & 0xffff0000))
+		dev_addr &= 0xffff0000; /* Round to 64MiB boundary */
+	else
+		dev_addr &= 0xfffff800; /* Round to 1MiB boundary */
+	if (!dev_addr)
+		goto err;
+
+	/* prepare test area */
+	if (mmc_can_erase(test->card) &&
+	    tdata->prepare & MMC_TEST_PREP_ERASE) {
+		ret = mmc_erase(test->card, dev_addr,
+				size / 512, MMC_SECURE_ERASE_ARG);
+		if (ret)
+			ret = mmc_erase(test->card, dev_addr,
+					size / 512, MMC_ERASE_ARG);
+		if (ret)
+			goto err;
+	}
+
+	/* Run test */
+	ret = mmc_test_area_io_seq(test, reqsize, dev_addr,
+				   tdata->do_write, 0, 1, size / reqsize,
+				   tdata->do_nonblock_req);
+	if (ret)
+		goto err;
+
+	return ret;
+ err:
+	printk(KERN_INFO "[%s] error\n", __func__);
+	return ret;
+}
+
+static int mmc_test_rw_multiple_size(struct mmc_test_card *test,
+				     struct mmc_test_multiple_rw *rw)
+{
+	int ret = 0;
+	int i;
+
+	for (i = 0 ; i < rw->len && ret == 0; i++) {
+		ret = mmc_test_rw_multiple(test, rw, rw->bs[i], rw->size);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
+/*
+ * Multiple blocking write 4k to 4 MB chunks
+ */
+static int mmc_test_profile_mult_write_blocking_perf(struct mmc_test_card *test)
+{
+	unsigned int bs[] = {1 << 12, 1 << 13, 1 << 14, 1 << 15, 1 << 16,
+			     1 << 17, 1 << 18, 1 << 19, 1 << 20, 1 << 22};
+	struct mmc_test_multiple_rw test_data = {
+		.bs = bs,
+		.size = 128*1024*1024,
+		.len = ARRAY_SIZE(bs),
+		.do_write = true,
+		.do_nonblock_req = false,
+		.prepare = MMC_TEST_PREP_ERASE,
+	};
+
+	return mmc_test_rw_multiple_size(test, &test_data);
+};
+
+/*
+ * Multiple non-blocking write 4k to 4 MB chunks
+ */
+static int mmc_test_profile_mult_write_nonblock_perf(struct mmc_test_card *test)
+{
+	unsigned int bs[] = {1 << 12, 1 << 13, 1 << 14, 1 << 15, 1 << 16,
+			     1 << 17, 1 << 18, 1 << 19, 1 << 20, 1 << 22};
+	struct mmc_test_multiple_rw test_data = {
+		.bs = bs,
+		.size = 128*1024*1024,
+		.len = ARRAY_SIZE(bs),
+		.do_write = true,
+		.do_nonblock_req = true,
+		.prepare = MMC_TEST_PREP_ERASE,
+	};
+
+	return mmc_test_rw_multiple_size(test, &test_data);
+}
+
+/*
+ * Multiple blocking read 4k to 4 MB chunks
+ */
+static int mmc_test_profile_mult_read_blocking_perf(struct mmc_test_card *test)
+{
+	unsigned int bs[] = {1 << 12, 1 << 13, 1 << 14, 1 << 15, 1 << 16,
+			     1 << 17, 1 << 18, 1 << 19, 1 << 20, 1 << 22};
+	struct mmc_test_multiple_rw test_data = {
+		.bs = bs,
+		.size = 128*1024*1024,
+		.len = ARRAY_SIZE(bs),
+		.do_write = false,
+		.do_nonblock_req = false,
+		.prepare = MMC_TEST_PREP_NONE,
+	};
+
+	return mmc_test_rw_multiple_size(test, &test_data);
+}
+
+/*
+ * Multiple non-blocking read 4k to 4 MB chunks
+ */
+static int mmc_test_profile_mult_read_nonblock_perf(struct mmc_test_card *test)
+{
+	unsigned int bs[] = {1 << 12, 1 << 13, 1 << 14, 1 << 15, 1 << 16,
+			     1 << 17, 1 << 18, 1 << 19, 1 << 20, 1 << 22};
+	struct mmc_test_multiple_rw test_data = {
+		.bs = bs,
+		.size = 128*1024*1024,
+		.len = ARRAY_SIZE(bs),
+		.do_write = false,
+		.do_nonblock_req = true,
+		.prepare = MMC_TEST_PREP_NONE,
+	};
+
+	return mmc_test_rw_multiple_size(test, &test_data);
+}
+
 static const struct mmc_test_case mmc_test_cases[] = {
 	{
 		.name = "Basic write (no data verification)",
@@ -2221,6 +2489,33 @@ static const struct mmc_test_case mmc_test_cases[] = {
 		.cleanup = mmc_test_area_cleanup,
 	},
 
+	{
+		.name = "Write performance with blocking req 4k to 4MB",
+		.prepare = mmc_test_area_prepare,
+		.run = mmc_test_profile_mult_write_blocking_perf,
+		.cleanup = mmc_test_area_cleanup,
+	},
+
+	{
+		.name = "Write performance with non-blocking req 4k to 4MB",
+		.prepare = mmc_test_area_prepare,
+		.run = mmc_test_profile_mult_write_nonblock_perf,
+		.cleanup = mmc_test_area_cleanup,
+	},
+
+	{
+		.name = "Read performance with blocking req 4k to 4MB",
+		.prepare = mmc_test_area_prepare,
+		.run = mmc_test_profile_mult_read_blocking_perf,
+		.cleanup = mmc_test_area_cleanup,
+	},
+
+	{
+		.name = "Read performance with non-blocking req 4k to 4MB",
+		.prepare = mmc_test_area_prepare,
+		.run = mmc_test_profile_mult_read_nonblock_perf,
+		.cleanup = mmc_test_area_cleanup,
+	},
 };
 
 static DEFINE_MUTEX(mmc_test_lock);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 06/12] mmc: mmc_test: test to measure how sg_len affect performance
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (4 preceding siblings ...)
  2011-06-28  8:11 ` [PATCH v8 05/12] mmc: mmc_test: add test for non-blocking transfers Per Forlin
@ 2011-06-28  8:11 ` Per Forlin
  2011-07-01 13:33   ` Per Forlin
  2011-06-28  8:11 ` [PATCH v8 07/12] mmc: block: add member in mmc queue struct to hold request data Per Forlin
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

test that measures how the mmc bandwidth depends on the numbers of sg elements
in the sg list. The transfer size if fixed and sg length goes from a few up
to 512. The purpose is to measure overhead caused by multiple sg elements.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/mmc/card/mmc_test.c |  151 +++++++++++++++++++++++++++++++++++++++----
 1 files changed, 139 insertions(+), 12 deletions(-)

diff --git a/drivers/mmc/card/mmc_test.c b/drivers/mmc/card/mmc_test.c
index 466a192..453d770 100644
--- a/drivers/mmc/card/mmc_test.c
+++ b/drivers/mmc/card/mmc_test.c
@@ -155,6 +155,7 @@ enum mmc_test_prep_media {
 };
 
 struct mmc_test_multiple_rw {
+	unsigned int *sg_len;
 	unsigned int *bs;
 	unsigned int len;
 	unsigned int size;
@@ -387,21 +388,26 @@ out_free:
  * Map memory into a scatterlist.  Optionally allow the same memory to be
  * mapped more than once.
  */
-static int mmc_test_map_sg(struct mmc_test_mem *mem, unsigned long sz,
+static int mmc_test_map_sg(struct mmc_test_mem *mem, unsigned long size,
 			   struct scatterlist *sglist, int repeat,
 			   unsigned int max_segs, unsigned int max_seg_sz,
-			   unsigned int *sg_len)
+			   unsigned int *sg_len, int min_sg_len)
 {
 	struct scatterlist *sg = NULL;
 	unsigned int i;
+	unsigned long sz = size;
 
 	sg_init_table(sglist, max_segs);
+	if (min_sg_len > max_segs)
+		min_sg_len = max_segs;
 
 	*sg_len = 0;
 	do {
 		for (i = 0; i < mem->cnt; i++) {
 			unsigned long len = PAGE_SIZE << mem->arr[i].order;
 
+			if (min_sg_len && (size / min_sg_len < len))
+				len = size / min_sg_len;
 			if (len > sz)
 				len = sz;
 			if (len > max_seg_sz)
@@ -574,11 +580,12 @@ static void mmc_test_print_avg_rate(struct mmc_test_card *test, uint64_t bytes,
 
 	printk(KERN_INFO "%s: Transfer of %u x %u sectors (%u x %u%s KiB) took "
 			 "%lu.%09lu seconds (%u kB/s, %u KiB/s, "
-			 "%u.%02u IOPS)\n",
+			 "%u.%02u IOPS, sg_len %d)\n",
 			 mmc_hostname(test->card->host), count, sectors, count,
 			 sectors >> 1, (sectors & 1 ? ".5" : ""),
 			 (unsigned long)ts.tv_sec, (unsigned long)ts.tv_nsec,
-			 rate / 1000, rate / 1024, iops / 100, iops % 100);
+			 rate / 1000, rate / 1024, iops / 100, iops % 100,
+			 test->area.sg_len);
 
 	mmc_test_save_transfer_result(test, count, sectors, ts, rate, iops);
 }
@@ -1416,7 +1423,7 @@ static int mmc_test_no_highmem(struct mmc_test_card *test)
  * Map sz bytes so that it can be transferred.
  */
 static int mmc_test_area_map(struct mmc_test_card *test, unsigned long sz,
-			     int max_scatter)
+			     int max_scatter, int min_sg_len)
 {
 	struct mmc_test_area *t = &test->area;
 	int err;
@@ -1429,7 +1436,7 @@ static int mmc_test_area_map(struct mmc_test_card *test, unsigned long sz,
 				       &t->sg_len);
 	} else {
 		err = mmc_test_map_sg(t->mem, sz, t->sg, 1, t->max_segs,
-				      t->max_seg_sz, &t->sg_len);
+				      t->max_seg_sz, &t->sg_len, min_sg_len);
 	}
 	if (err)
 		printk(KERN_INFO "%s: Failed to map sg list\n",
@@ -1455,7 +1462,7 @@ static int mmc_test_area_transfer(struct mmc_test_card *test,
 static int mmc_test_area_io_seq(struct mmc_test_card *test, unsigned long sz,
 				unsigned int dev_addr, int write,
 				int max_scatter, int timed, int count,
-				bool nonblock)
+				bool nonblock, int min_sg_len)
 {
 	struct timespec ts1, ts2;
 	int ret = 0;
@@ -1478,7 +1485,7 @@ static int mmc_test_area_io_seq(struct mmc_test_card *test, unsigned long sz,
 			sz = max_tfr;
 	}
 
-	ret = mmc_test_area_map(test, sz, max_scatter);
+	ret = mmc_test_area_map(test, sz, max_scatter, min_sg_len);
 	if (ret)
 		return ret;
 
@@ -1510,7 +1517,7 @@ static int mmc_test_area_io(struct mmc_test_card *test, unsigned long sz,
 			    int timed)
 {
 	return mmc_test_area_io_seq(test, sz, dev_addr, write, max_scatter,
-				    timed, 1, false);
+				    timed, 1, false, 0);
 }
 
 /*
@@ -2088,7 +2095,8 @@ static int mmc_test_large_seq_write_perf(struct mmc_test_card *test)
 
 static int mmc_test_rw_multiple(struct mmc_test_card *test,
 				struct mmc_test_multiple_rw *tdata,
-				unsigned int reqsize, unsigned int size)
+				unsigned int reqsize, unsigned int size,
+				int min_sg_len)
 {
 	unsigned int dev_addr;
 	struct mmc_test_area *t = &test->area;
@@ -2122,7 +2130,7 @@ static int mmc_test_rw_multiple(struct mmc_test_card *test,
 	/* Run test */
 	ret = mmc_test_area_io_seq(test, reqsize, dev_addr,
 				   tdata->do_write, 0, 1, size / reqsize,
-				   tdata->do_nonblock_req);
+				   tdata->do_nonblock_req, min_sg_len);
 	if (ret)
 		goto err;
 
@@ -2139,7 +2147,22 @@ static int mmc_test_rw_multiple_size(struct mmc_test_card *test,
 	int i;
 
 	for (i = 0 ; i < rw->len && ret == 0; i++) {
-		ret = mmc_test_rw_multiple(test, rw, rw->bs[i], rw->size);
+		ret = mmc_test_rw_multiple(test, rw, rw->bs[i], rw->size, 0);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
+static int mmc_test_rw_multiple_sg_len(struct mmc_test_card *test,
+				       struct mmc_test_multiple_rw *rw)
+{
+	int ret = 0;
+	int i;
+
+	for (i = 0 ; i < rw->len && ret == 0; i++) {
+		ret = mmc_test_rw_multiple(test, rw, 512*1024, rw->size,
+					   rw->sg_len[i]);
 		if (ret)
 			break;
 	}
@@ -2222,6 +2245,82 @@ static int mmc_test_profile_mult_read_nonblock_perf(struct mmc_test_card *test)
 	return mmc_test_rw_multiple_size(test, &test_data);
 }
 
+/*
+ * Multiple blocking write 1 to 512 sg elements
+ */
+static int mmc_test_profile_sglen_wr_blocking_perf(struct mmc_test_card *test)
+{
+	unsigned int sg_len[] = {1, 1 << 3, 1 << 4, 1 << 5, 1 << 6,
+				 1 << 7, 1 << 8, 1 << 9};
+	struct mmc_test_multiple_rw test_data = {
+		.sg_len = sg_len,
+		.size = 128*1024*1024,
+		.len = ARRAY_SIZE(sg_len),
+		.do_write = true,
+		.do_nonblock_req = false,
+		.prepare = MMC_TEST_PREP_ERASE,
+	};
+
+	return mmc_test_rw_multiple_sg_len(test, &test_data);
+};
+
+/*
+ * Multiple non-blocking write 1 to 512 sg elements
+ */
+static int mmc_test_profile_sglen_wr_nonblock_perf(struct mmc_test_card *test)
+{
+	unsigned int sg_len[] = {1, 1 << 3, 1 << 4, 1 << 5, 1 << 6,
+				 1 << 7, 1 << 8, 1 << 9};
+	struct mmc_test_multiple_rw test_data = {
+		.sg_len = sg_len,
+		.size = 128*1024*1024,
+		.len = ARRAY_SIZE(sg_len),
+		.do_write = true,
+		.do_nonblock_req = true,
+		.prepare = MMC_TEST_PREP_ERASE,
+	};
+
+	return mmc_test_rw_multiple_sg_len(test, &test_data);
+}
+
+/*
+ * Multiple blocking read 1 to 512 sg elements
+ */
+static int mmc_test_profile_sglen_r_blocking_perf(struct mmc_test_card *test)
+{
+	unsigned int sg_len[] = {1, 1 << 3, 1 << 4, 1 << 5, 1 << 6,
+				 1 << 7, 1 << 8, 1 << 9};
+	struct mmc_test_multiple_rw test_data = {
+		.sg_len = sg_len,
+		.size = 128*1024*1024,
+		.len = ARRAY_SIZE(sg_len),
+		.do_write = false,
+		.do_nonblock_req = false,
+		.prepare = MMC_TEST_PREP_NONE,
+	};
+
+	return mmc_test_rw_multiple_sg_len(test, &test_data);
+}
+
+/*
+ * Multiple non-blocking read 1 to 512 sg elements
+ */
+static int mmc_test_profile_sglen_r_nonblock_perf(struct mmc_test_card *test)
+{
+	unsigned int sg_len[] = {1, 1 << 3, 1 << 4, 1 << 5, 1 << 6,
+				 1 << 7, 1 << 8, 1 << 9};
+	struct mmc_test_multiple_rw test_data = {
+		.sg_len = sg_len,
+		.size = 128*1024*1024,
+		.len = ARRAY_SIZE(sg_len),
+		.do_write = false,
+		.do_nonblock_req = true,
+		.prepare = MMC_TEST_PREP_NONE,
+	};
+
+	return mmc_test_rw_multiple_sg_len(test, &test_data);
+}
+
 static const struct mmc_test_case mmc_test_cases[] = {
 	{
 		.name = "Basic write (no data verification)",
@@ -2516,6 +2615,34 @@ static const struct mmc_test_case mmc_test_cases[] = {
 		.run = mmc_test_profile_mult_read_nonblock_perf,
 		.cleanup = mmc_test_area_cleanup,
 	},
+
+	{
+		.name = "Write performance blocking req 1 to 512 sg elems",
+		.prepare = mmc_test_area_prepare,
+		.run = mmc_test_profile_sglen_wr_blocking_perf,
+		.cleanup = mmc_test_area_cleanup,
+	},
+
+	{
+		.name = "Write performance non-blocking req 1 to 512 sg elems",
+		.prepare = mmc_test_area_prepare,
+		.run = mmc_test_profile_sglen_wr_nonblock_perf,
+		.cleanup = mmc_test_area_cleanup,
+	},
+
+	{
+		.name = "Read performance blocking req 1 to 512 sg elems",
+		.prepare = mmc_test_area_prepare,
+		.run = mmc_test_profile_sglen_r_blocking_perf,
+		.cleanup = mmc_test_area_cleanup,
+	},
+
+	{
+		.name = "Read performance non-blocking req 1 to 512 sg elems",
+		.prepare = mmc_test_area_prepare,
+		.run = mmc_test_profile_sglen_r_nonblock_perf,
+		.cleanup = mmc_test_area_cleanup,
+	},
 };
 
 static DEFINE_MUTEX(mmc_test_lock);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 07/12] mmc: block: add member in mmc queue struct to hold request data
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (5 preceding siblings ...)
  2011-06-28  8:11 ` [PATCH v8 06/12] mmc: mmc_test: test to measure how sg_len affect performance Per Forlin
@ 2011-06-28  8:11 ` Per Forlin
  2011-06-28  8:11 ` [PATCH v8 08/12] mmc: block: add a block request prepare function Per Forlin
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

The way the request data is organized in the mmc queue struct
it only allows processing of one request at the time.
This patch adds a new struct to hold mmc queue request data such as
sg list, request, blk request and bounce buffers, and updates any functions
depending on the mmc queue struct. This lies the ground for
using multiple active request for one mmc queue.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/mmc/card/block.c |  109 ++++++++++++++++++---------------------
 drivers/mmc/card/queue.c |  129 ++++++++++++++++++++++++----------------------
 drivers/mmc/card/queue.h |   31 ++++++++---
 3 files changed, 141 insertions(+), 128 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index bee2106..88bcc4e 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -427,14 +427,6 @@ static const struct block_device_operations mmc_bdops = {
 #endif
 };
 
-struct mmc_blk_request {
-	struct mmc_request	mrq;
-	struct mmc_command	sbc;
-	struct mmc_command	cmd;
-	struct mmc_command	stop;
-	struct mmc_data		data;
-};
-
 static inline int mmc_blk_part_switch(struct mmc_card *card,
 				      struct mmc_blk_data *md)
 {
@@ -824,7 +816,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 {
 	struct mmc_blk_data *md = mq->data;
 	struct mmc_card *card = md->queue.card;
-	struct mmc_blk_request brq;
+	struct mmc_blk_request *brq = &mq->mqrq_cur->brq;
 	int ret = 1, disable_multi = 0, retry = 0;
 
 	/*
@@ -839,60 +831,60 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 	do {
 		u32 readcmd, writecmd;
 
-		memset(&brq, 0, sizeof(struct mmc_blk_request));
-		brq.mrq.cmd = &brq.cmd;
-		brq.mrq.data = &brq.data;
+		memset(brq, 0, sizeof(struct mmc_blk_request));
+		brq->mrq.cmd = &brq->cmd;
+		brq->mrq.data = &brq->data;
 
-		brq.cmd.arg = blk_rq_pos(req);
+		brq->cmd.arg = blk_rq_pos(req);
 		if (!mmc_card_blockaddr(card))
-			brq.cmd.arg <<= 9;
-		brq.cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
-		brq.data.blksz = 512;
-		brq.stop.opcode = MMC_STOP_TRANSMISSION;
-		brq.stop.arg = 0;
-		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
-		brq.data.blocks = blk_rq_sectors(req);
+			brq->cmd.arg <<= 9;
+		brq->cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
+		brq->data.blksz = 512;
+		brq->stop.opcode = MMC_STOP_TRANSMISSION;
+		brq->stop.arg = 0;
+		brq->stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
+		brq->data.blocks = blk_rq_sectors(req);
 
 		/*
 		 * The block layer doesn't support all sector count
 		 * restrictions, so we need to be prepared for too big
 		 * requests.
 		 */
-		if (brq.data.blocks > card->host->max_blk_count)
-			brq.data.blocks = card->host->max_blk_count;
+		if (brq->data.blocks > card->host->max_blk_count)
+			brq->data.blocks = card->host->max_blk_count;
 
 		/*
 		 * After a read error, we redo the request one sector at a time
 		 * in order to accurately determine which sectors can be read
 		 * successfully.
 		 */
-		if (disable_multi && brq.data.blocks > 1)
-			brq.data.blocks = 1;
+		if (disable_multi && brq->data.blocks > 1)
+			brq->data.blocks = 1;
 
-		if (brq.data.blocks > 1 || do_rel_wr) {
+		if (brq->data.blocks > 1 || do_rel_wr) {
 			/* SPI multiblock writes terminate using a special
 			 * token, not a STOP_TRANSMISSION request.
 			 */
 			if (!mmc_host_is_spi(card->host) ||
 			    rq_data_dir(req) == READ)
-				brq.mrq.stop = &brq.stop;
+				brq->mrq.stop = &brq->stop;
 			readcmd = MMC_READ_MULTIPLE_BLOCK;
 			writecmd = MMC_WRITE_MULTIPLE_BLOCK;
 		} else {
-			brq.mrq.stop = NULL;
+			brq->mrq.stop = NULL;
 			readcmd = MMC_READ_SINGLE_BLOCK;
 			writecmd = MMC_WRITE_BLOCK;
 		}
 		if (rq_data_dir(req) == READ) {
-			brq.cmd.opcode = readcmd;
-			brq.data.flags |= MMC_DATA_READ;
+			brq->cmd.opcode = readcmd;
+			brq->data.flags |= MMC_DATA_READ;
 		} else {
-			brq.cmd.opcode = writecmd;
-			brq.data.flags |= MMC_DATA_WRITE;
+			brq->cmd.opcode = writecmd;
+			brq->data.flags |= MMC_DATA_WRITE;
 		}
 
 		if (do_rel_wr)
-			mmc_apply_rel_rw(&brq, card, req);
+			mmc_apply_rel_rw(brq, card, req);
 
 		/*
 		 * Pre-defined multi-block transfers are preferable to
@@ -914,29 +906,29 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		 */
 
 		if ((md->flags & MMC_BLK_CMD23) &&
-		    mmc_op_multi(brq.cmd.opcode) &&
+		    mmc_op_multi(brq->cmd.opcode) &&
 		    (do_rel_wr || !(card->quirks & MMC_QUIRK_BLK_NO_CMD23))) {
-			brq.sbc.opcode = MMC_SET_BLOCK_COUNT;
-			brq.sbc.arg = brq.data.blocks |
+			brq->sbc.opcode = MMC_SET_BLOCK_COUNT;
+			brq->sbc.arg = brq->data.blocks |
 				(do_rel_wr ? (1 << 31) : 0);
-			brq.sbc.flags = MMC_RSP_R1 | MMC_CMD_AC;
-			brq.mrq.sbc = &brq.sbc;
+			brq->sbc.flags = MMC_RSP_R1 | MMC_CMD_AC;
+			brq->mrq.sbc = &brq->sbc;
 		}
 
-		mmc_set_data_timeout(&brq.data, card);
+		mmc_set_data_timeout(&brq->data, card);
 
-		brq.data.sg = mq->sg;
-		brq.data.sg_len = mmc_queue_map_sg(mq);
+		brq->data.sg = mq->mqrq_cur->sg;
+		brq->data.sg_len = mmc_queue_map_sg(mq, mq->mqrq_cur);
 
 		/*
 		 * Adjust the sg list so it is the same size as the
 		 * request.
 		 */
-		if (brq.data.blocks != blk_rq_sectors(req)) {
-			int i, data_size = brq.data.blocks << 9;
+		if (brq->data.blocks != blk_rq_sectors(req)) {
+			int i, data_size = brq->data.blocks << 9;
 			struct scatterlist *sg;
 
-			for_each_sg(brq.data.sg, sg, brq.data.sg_len, i) {
+			for_each_sg(brq->data.sg, sg, brq->data.sg_len, i) {
 				data_size -= sg->length;
 				if (data_size <= 0) {
 					sg->length += data_size;
@@ -944,14 +936,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 					break;
 				}
 			}
-			brq.data.sg_len = i;
+			brq->data.sg_len = i;
 		}
 
-		mmc_queue_bounce_pre(mq);
+		mmc_queue_bounce_pre(mq->mqrq_cur);
 
-		mmc_wait_for_req(card->host, &brq.mrq);
+		mmc_wait_for_req(card->host, &brq->mrq);
 
-		mmc_queue_bounce_post(mq);
+		mmc_queue_bounce_post(mq->mqrq_cur);
 
 		/*
 		 * sbc.error indicates a problem with the set block count
@@ -963,8 +955,8 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		 * stop.error indicates a problem with the stop command.  Data
 		 * may have been transferred, or may still be transferring.
 		 */
-		if (brq.sbc.error || brq.cmd.error || brq.stop.error) {
-			switch (mmc_blk_cmd_recovery(card, req, &brq)) {
+		if (brq->sbc.error || brq->cmd.error || brq->stop.error) {
+			switch (mmc_blk_cmd_recovery(card, req, brq)) {
 			case ERR_RETRY:
 				if (retry++ < 5)
 					continue;
@@ -980,9 +972,9 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		 * initial command - such as address errors.  No data
 		 * has been transferred.
 		 */
-		if (brq.cmd.resp[0] & CMD_ERRORS) {
+		if (brq->cmd.resp[0] & CMD_ERRORS) {
 			pr_err("%s: r/w command failed, status = %#x\n",
-				req->rq_disk->disk_name, brq.cmd.resp[0]);
+				req->rq_disk->disk_name, brq->cmd.resp[0]);
 			goto cmd_abort;
 		}
 
@@ -1009,15 +1001,15 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 				 (R1_CURRENT_STATE(status) == R1_STATE_PRG));
 		}
 
-		if (brq.data.error) {
+		if (brq->data.error) {
 			pr_err("%s: error %d transferring data, sector %u, nr %u, cmd response %#x, card status %#x\n",
-				req->rq_disk->disk_name, brq.data.error,
+				req->rq_disk->disk_name, brq->data.error,
 				(unsigned)blk_rq_pos(req),
 				(unsigned)blk_rq_sectors(req),
-				brq.cmd.resp[0], brq.stop.resp[0]);
+				brq->cmd.resp[0], brq->stop.resp[0]);
 
 			if (rq_data_dir(req) == READ) {
-				if (brq.data.blocks > 1) {
+				if (brq->data.blocks > 1) {
 					/* Redo read one sector at a time */
 					pr_warning("%s: retrying using single block read\n",
 						req->rq_disk->disk_name);
@@ -1031,7 +1023,8 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 				 * read a single sector.
 				 */
 				spin_lock_irq(&md->lock);
-				ret = __blk_end_request(req, -EIO, brq.data.blksz);
+				ret = __blk_end_request(req, -EIO,
+							brq->data.blksz);
 				spin_unlock_irq(&md->lock);
 				continue;
 			} else {
@@ -1043,7 +1036,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		 * A block was successfully transferred.
 		 */
 		spin_lock_irq(&md->lock);
-		ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
+		ret = __blk_end_request(req, 0, brq->data.bytes_xfered);
 		spin_unlock_irq(&md->lock);
 	} while (ret);
 
@@ -1069,7 +1062,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		}
 	} else {
 		spin_lock_irq(&md->lock);
-		ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
+		ret = __blk_end_request(req, 0, brq->data.bytes_xfered);
 		spin_unlock_irq(&md->lock);
 	}
 
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 6413afa..4222e1a 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -56,7 +56,7 @@ static int mmc_queue_thread(void *d)
 		spin_lock_irq(q->queue_lock);
 		set_current_state(TASK_INTERRUPTIBLE);
 		req = blk_fetch_request(q);
-		mq->req = req;
+		mq->mqrq_cur->req = req;
 		spin_unlock_irq(q->queue_lock);
 
 		if (!req) {
@@ -97,10 +97,25 @@ static void mmc_request(struct request_queue *q)
 		return;
 	}
 
-	if (!mq->req)
+	if (!mq->mqrq_cur->req)
 		wake_up_process(mq->thread);
 }
 
+struct scatterlist *mmc_alloc_sg(int sg_len, int *err)
+{
+	struct scatterlist *sg;
+
+	sg = kmalloc(sizeof(struct scatterlist)*sg_len, GFP_KERNEL);
+	if (!sg)
+		*err = -ENOMEM;
+	else {
+		*err = 0;
+		sg_init_table(sg, sg_len);
+	}
+
+	return sg;
+}
+
 /**
  * mmc_init_queue - initialise a queue structure.
  * @mq: mmc queue
@@ -116,6 +131,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 	struct mmc_host *host = card->host;
 	u64 limit = BLK_BOUNCE_HIGH;
 	int ret;
+	struct mmc_queue_req *mqrq_cur = &mq->mqrq[0];
 
 	if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
 		limit = *mmc_dev(host)->dma_mask;
@@ -125,8 +141,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 	if (!mq->queue)
 		return -ENOMEM;
 
+	memset(&mq->mqrq_cur, 0, sizeof(mq->mqrq_cur));
+	mq->mqrq_cur = mqrq_cur;
 	mq->queue->queuedata = mq;
-	mq->req = NULL;
 
 	blk_queue_prep_rq(mq->queue, mmc_prep_request);
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, mq->queue);
@@ -155,53 +172,44 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 			bouncesz = host->max_blk_count * 512;
 
 		if (bouncesz > 512) {
-			mq->bounce_buf = kmalloc(bouncesz, GFP_KERNEL);
-			if (!mq->bounce_buf) {
+			mqrq_cur->bounce_buf = kmalloc(bouncesz, GFP_KERNEL);
+			if (!mqrq_cur->bounce_buf) {
 				printk(KERN_WARNING "%s: unable to "
-					"allocate bounce buffer\n",
+					"allocate bounce cur buffer\n",
 					mmc_card_name(card));
 			}
 		}
 
-		if (mq->bounce_buf) {
+		if (mqrq_cur->bounce_buf) {
 			blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_ANY);
 			blk_queue_max_hw_sectors(mq->queue, bouncesz / 512);
 			blk_queue_max_segments(mq->queue, bouncesz / 512);
 			blk_queue_max_segment_size(mq->queue, bouncesz);
 
-			mq->sg = kmalloc(sizeof(struct scatterlist),
-				GFP_KERNEL);
-			if (!mq->sg) {
-				ret = -ENOMEM;
+			mqrq_cur->sg = mmc_alloc_sg(1, &ret);
+			if (ret)
 				goto cleanup_queue;
-			}
-			sg_init_table(mq->sg, 1);
 
-			mq->bounce_sg = kmalloc(sizeof(struct scatterlist) *
-				bouncesz / 512, GFP_KERNEL);
-			if (!mq->bounce_sg) {
-				ret = -ENOMEM;
+			mqrq_cur->bounce_sg =
+				mmc_alloc_sg(bouncesz / 512, &ret);
+			if (ret)
 				goto cleanup_queue;
-			}
-			sg_init_table(mq->bounce_sg, bouncesz / 512);
+
 		}
 	}
 #endif
 
-	if (!mq->bounce_buf) {
+	if (!mqrq_cur->bounce_buf) {
 		blk_queue_bounce_limit(mq->queue, limit);
 		blk_queue_max_hw_sectors(mq->queue,
 			min(host->max_blk_count, host->max_req_size / 512));
 		blk_queue_max_segments(mq->queue, host->max_segs);
 		blk_queue_max_segment_size(mq->queue, host->max_seg_size);
 
-		mq->sg = kmalloc(sizeof(struct scatterlist) *
-			host->max_segs, GFP_KERNEL);
-		if (!mq->sg) {
-			ret = -ENOMEM;
+		mqrq_cur->sg = mmc_alloc_sg(host->max_segs, &ret);
+		if (ret)
 			goto cleanup_queue;
-		}
-		sg_init_table(mq->sg, host->max_segs);
+
 	}
 
 	sema_init(&mq->thread_sem, 1);
@@ -216,16 +224,15 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 
 	return 0;
  free_bounce_sg:
- 	if (mq->bounce_sg)
- 		kfree(mq->bounce_sg);
- 	mq->bounce_sg = NULL;
+	kfree(mqrq_cur->bounce_sg);
+	mqrq_cur->bounce_sg = NULL;
+
  cleanup_queue:
- 	if (mq->sg)
-		kfree(mq->sg);
-	mq->sg = NULL;
-	if (mq->bounce_buf)
-		kfree(mq->bounce_buf);
-	mq->bounce_buf = NULL;
+	kfree(mqrq_cur->sg);
+	mqrq_cur->sg = NULL;
+	kfree(mqrq_cur->bounce_buf);
+	mqrq_cur->bounce_buf = NULL;
+
 	blk_cleanup_queue(mq->queue);
 	return ret;
 }
@@ -234,6 +241,7 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
 {
 	struct request_queue *q = mq->queue;
 	unsigned long flags;
+	struct mmc_queue_req *mqrq_cur = mq->mqrq_cur;
 
 	/* Make sure the queue isn't suspended, as that will deadlock */
 	mmc_queue_resume(mq);
@@ -247,16 +255,14 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
 	blk_start_queue(q);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
- 	if (mq->bounce_sg)
- 		kfree(mq->bounce_sg);
- 	mq->bounce_sg = NULL;
+	kfree(mqrq_cur->bounce_sg);
+	mqrq_cur->bounce_sg = NULL;
 
-	kfree(mq->sg);
-	mq->sg = NULL;
+	kfree(mqrq_cur->sg);
+	mqrq_cur->sg = NULL;
 
-	if (mq->bounce_buf)
-		kfree(mq->bounce_buf);
-	mq->bounce_buf = NULL;
+	kfree(mqrq_cur->bounce_buf);
+	mqrq_cur->bounce_buf = NULL;
 
 	mq->card = NULL;
 }
@@ -309,27 +315,27 @@ void mmc_queue_resume(struct mmc_queue *mq)
 /*
  * Prepare the sg list(s) to be handed of to the host driver
  */
-unsigned int mmc_queue_map_sg(struct mmc_queue *mq)
+unsigned int mmc_queue_map_sg(struct mmc_queue *mq, struct mmc_queue_req *mqrq)
 {
 	unsigned int sg_len;
 	size_t buflen;
 	struct scatterlist *sg;
 	int i;
 
-	if (!mq->bounce_buf)
-		return blk_rq_map_sg(mq->queue, mq->req, mq->sg);
+	if (!mqrq->bounce_buf)
+		return blk_rq_map_sg(mq->queue, mqrq->req, mqrq->sg);
 
-	BUG_ON(!mq->bounce_sg);
+	BUG_ON(!mqrq->bounce_sg);
 
-	sg_len = blk_rq_map_sg(mq->queue, mq->req, mq->bounce_sg);
+	sg_len = blk_rq_map_sg(mq->queue, mqrq->req, mqrq->bounce_sg);
 
-	mq->bounce_sg_len = sg_len;
+	mqrq->bounce_sg_len = sg_len;
 
 	buflen = 0;
-	for_each_sg(mq->bounce_sg, sg, sg_len, i)
+	for_each_sg(mqrq->bounce_sg, sg, sg_len, i)
 		buflen += sg->length;
 
-	sg_init_one(mq->sg, mq->bounce_buf, buflen);
+	sg_init_one(mqrq->sg, mqrq->bounce_buf, buflen);
 
 	return 1;
 }
@@ -338,31 +344,30 @@ unsigned int mmc_queue_map_sg(struct mmc_queue *mq)
  * If writing, bounce the data to the buffer before the request
  * is sent to the host driver
  */
-void mmc_queue_bounce_pre(struct mmc_queue *mq)
+void mmc_queue_bounce_pre(struct mmc_queue_req *mqrq)
 {
-	if (!mq->bounce_buf)
+	if (!mqrq->bounce_buf)
 		return;
 
-	if (rq_data_dir(mq->req) != WRITE)
+	if (rq_data_dir(mqrq->req) != WRITE)
 		return;
 
-	sg_copy_to_buffer(mq->bounce_sg, mq->bounce_sg_len,
-		mq->bounce_buf, mq->sg[0].length);
+	sg_copy_to_buffer(mqrq->bounce_sg, mqrq->bounce_sg_len,
+		mqrq->bounce_buf, mqrq->sg[0].length);
 }
 
 /*
  * If reading, bounce the data from the buffer after the request
  * has been handled by the host driver
  */
-void mmc_queue_bounce_post(struct mmc_queue *mq)
+void mmc_queue_bounce_post(struct mmc_queue_req *mqrq)
 {
-	if (!mq->bounce_buf)
+	if (!mqrq->bounce_buf)
 		return;
 
-	if (rq_data_dir(mq->req) != READ)
+	if (rq_data_dir(mqrq->req) != READ)
 		return;
 
-	sg_copy_from_buffer(mq->bounce_sg, mq->bounce_sg_len,
-		mq->bounce_buf, mq->sg[0].length);
+	sg_copy_from_buffer(mqrq->bounce_sg, mqrq->bounce_sg_len,
+		mqrq->bounce_buf, mqrq->sg[0].length);
 }
-
diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h
index 6223ef8..c1a69ac 100644
--- a/drivers/mmc/card/queue.h
+++ b/drivers/mmc/card/queue.h
@@ -4,19 +4,33 @@
 struct request;
 struct task_struct;
 
+struct mmc_blk_request {
+	struct mmc_request	mrq;
+	struct mmc_command	sbc;
+	struct mmc_command	cmd;
+	struct mmc_command	stop;
+	struct mmc_data		data;
+};
+
+struct mmc_queue_req {
+	struct request		*req;
+	struct mmc_blk_request	brq;
+	struct scatterlist	*sg;
+	char			*bounce_buf;
+	struct scatterlist	*bounce_sg;
+	unsigned int		bounce_sg_len;
+};
+
 struct mmc_queue {
 	struct mmc_card		*card;
 	struct task_struct	*thread;
 	struct semaphore	thread_sem;
 	unsigned int		flags;
-	struct request		*req;
 	int			(*issue_fn)(struct mmc_queue *, struct request *);
 	void			*data;
 	struct request_queue	*queue;
-	struct scatterlist	*sg;
-	char			*bounce_buf;
-	struct scatterlist	*bounce_sg;
-	unsigned int		bounce_sg_len;
+	struct mmc_queue_req	mqrq[1];
+	struct mmc_queue_req	*mqrq_cur;
 };
 
 extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *,
@@ -25,8 +39,9 @@ extern void mmc_cleanup_queue(struct mmc_queue *);
 extern void mmc_queue_suspend(struct mmc_queue *);
 extern void mmc_queue_resume(struct mmc_queue *);
 
-extern unsigned int mmc_queue_map_sg(struct mmc_queue *);
-extern void mmc_queue_bounce_pre(struct mmc_queue *);
-extern void mmc_queue_bounce_post(struct mmc_queue *);
+extern unsigned int mmc_queue_map_sg(struct mmc_queue *,
+				     struct mmc_queue_req *);
+extern void mmc_queue_bounce_pre(struct mmc_queue_req *);
+extern void mmc_queue_bounce_post(struct mmc_queue_req *);
 
 #endif
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 08/12] mmc: block: add a block request prepare function
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (6 preceding siblings ...)
  2011-06-28  8:11 ` [PATCH v8 07/12] mmc: block: add member in mmc queue struct to hold request data Per Forlin
@ 2011-06-28  8:11 ` Per Forlin
  2011-06-28  8:11 ` [PATCH v8 09/12] mmc: block: move error code in issue_rw_rq to a separate function Per Forlin
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

Break out code from mmc_blk_issue_rw_rq to create a
block request prepare function. This doesn't change
any functionallity. This helps when handling more
than one active block request.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/mmc/card/block.c |  218 ++++++++++++++++++++++++----------------------
 1 files changed, 114 insertions(+), 104 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 88bcc4e..a0a76f4 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -812,12 +812,15 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq,
 	 R1_CC_ERROR |		/* Card controller error */		\
 	 R1_ERROR)		/* General/unknown error */
 
-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
+			       struct mmc_card *card,
+			       int disable_multi,
+			       struct mmc_queue *mq)
 {
+	u32 readcmd, writecmd;
+	struct mmc_blk_request *brq = &mqrq->brq;
+	struct request *req = mqrq->req;
 	struct mmc_blk_data *md = mq->data;
-	struct mmc_card *card = md->queue.card;
-	struct mmc_blk_request *brq = &mq->mqrq_cur->brq;
-	int ret = 1, disable_multi = 0, retry = 0;
 
 	/*
 	 * Reliable writes are used to implement Forced Unit Access and
@@ -828,119 +831,126 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		(rq_data_dir(req) == WRITE) &&
 		(md->flags & MMC_BLK_REL_WR);
 
-	do {
-		u32 readcmd, writecmd;
-
-		memset(brq, 0, sizeof(struct mmc_blk_request));
-		brq->mrq.cmd = &brq->cmd;
-		brq->mrq.data = &brq->data;
-
-		brq->cmd.arg = blk_rq_pos(req);
-		if (!mmc_card_blockaddr(card))
-			brq->cmd.arg <<= 9;
-		brq->cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
-		brq->data.blksz = 512;
-		brq->stop.opcode = MMC_STOP_TRANSMISSION;
-		brq->stop.arg = 0;
-		brq->stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
-		brq->data.blocks = blk_rq_sectors(req);
-
-		/*
-		 * The block layer doesn't support all sector count
-		 * restrictions, so we need to be prepared for too big
-		 * requests.
-		 */
-		if (brq->data.blocks > card->host->max_blk_count)
-			brq->data.blocks = card->host->max_blk_count;
+	memset(brq, 0, sizeof(struct mmc_blk_request));
+	brq->mrq.cmd = &brq->cmd;
+	brq->mrq.data = &brq->data;
 
-		/*
-		 * After a read error, we redo the request one sector at a time
-		 * in order to accurately determine which sectors can be read
-		 * successfully.
-		 */
-		if (disable_multi && brq->data.blocks > 1)
-			brq->data.blocks = 1;
+	brq->cmd.arg = blk_rq_pos(req);
+	if (!mmc_card_blockaddr(card))
+		brq->cmd.arg <<= 9;
+	brq->cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
+	brq->data.blksz = 512;
+	brq->stop.opcode = MMC_STOP_TRANSMISSION;
+	brq->stop.arg = 0;
+	brq->stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
+	brq->data.blocks = blk_rq_sectors(req);
 
-		if (brq->data.blocks > 1 || do_rel_wr) {
-			/* SPI multiblock writes terminate using a special
-			 * token, not a STOP_TRANSMISSION request.
-			 */
-			if (!mmc_host_is_spi(card->host) ||
-			    rq_data_dir(req) == READ)
-				brq->mrq.stop = &brq->stop;
-			readcmd = MMC_READ_MULTIPLE_BLOCK;
-			writecmd = MMC_WRITE_MULTIPLE_BLOCK;
-		} else {
-			brq->mrq.stop = NULL;
-			readcmd = MMC_READ_SINGLE_BLOCK;
-			writecmd = MMC_WRITE_BLOCK;
-		}
-		if (rq_data_dir(req) == READ) {
-			brq->cmd.opcode = readcmd;
-			brq->data.flags |= MMC_DATA_READ;
-		} else {
-			brq->cmd.opcode = writecmd;
-			brq->data.flags |= MMC_DATA_WRITE;
-		}
+	/*
+	 * The block layer doesn't support all sector count
+	 * restrictions, so we need to be prepared for too big
+	 * requests.
+	 */
+	if (brq->data.blocks > card->host->max_blk_count)
+		brq->data.blocks = card->host->max_blk_count;
 
-		if (do_rel_wr)
-			mmc_apply_rel_rw(brq, card, req);
+	/*
+	 * After a read error, we redo the request one sector at a time
+	 * in order to accurately determine which sectors can be read
+	 * successfully.
+	 */
+	if (disable_multi && brq->data.blocks > 1)
+		brq->data.blocks = 1;
 
-		/*
-		 * Pre-defined multi-block transfers are preferable to
-		 * open ended-ones (and necessary for reliable writes).
-		 * However, it is not sufficient to just send CMD23,
-		 * and avoid the final CMD12, as on an error condition
-		 * CMD12 (stop) needs to be sent anyway. This, coupled
-		 * with Auto-CMD23 enhancements provided by some
-		 * hosts, means that the complexity of dealing
-		 * with this is best left to the host. If CMD23 is
-		 * supported by card and host, we'll fill sbc in and let
-		 * the host deal with handling it correctly. This means
-		 * that for hosts that don't expose MMC_CAP_CMD23, no
-		 * change of behavior will be observed.
-		 *
-		 * N.B: Some MMC cards experience perf degradation.
-		 * We'll avoid using CMD23-bounded multiblock writes for
-		 * these, while retaining features like reliable writes.
+	if (brq->data.blocks > 1 || do_rel_wr) {
+		/* SPI multiblock writes terminate using a special
+		 * token, not a STOP_TRANSMISSION request.
 		 */
+		if (!mmc_host_is_spi(card->host) ||
+		    rq_data_dir(req) == READ)
+			brq->mrq.stop = &brq->stop;
+		readcmd = MMC_READ_MULTIPLE_BLOCK;
+		writecmd = MMC_WRITE_MULTIPLE_BLOCK;
+	} else {
+		brq->mrq.stop = NULL;
+		readcmd = MMC_READ_SINGLE_BLOCK;
+		writecmd = MMC_WRITE_BLOCK;
+	}
+	if (rq_data_dir(req) == READ) {
+		brq->cmd.opcode = readcmd;
+		brq->data.flags |= MMC_DATA_READ;
+	} else {
+		brq->cmd.opcode = writecmd;
+		brq->data.flags |= MMC_DATA_WRITE;
+	}
 
-		if ((md->flags & MMC_BLK_CMD23) &&
-		    mmc_op_multi(brq->cmd.opcode) &&
-		    (do_rel_wr || !(card->quirks & MMC_QUIRK_BLK_NO_CMD23))) {
-			brq->sbc.opcode = MMC_SET_BLOCK_COUNT;
-			brq->sbc.arg = brq->data.blocks |
-				(do_rel_wr ? (1 << 31) : 0);
-			brq->sbc.flags = MMC_RSP_R1 | MMC_CMD_AC;
-			brq->mrq.sbc = &brq->sbc;
-		}
+	if (do_rel_wr)
+		mmc_apply_rel_rw(brq, card, req);
 
-		mmc_set_data_timeout(&brq->data, card);
+	/*
+	 * Pre-defined multi-block transfers are preferable to
+	 * open ended-ones (and necessary for reliable writes).
+	 * However, it is not sufficient to just send CMD23,
+	 * and avoid the final CMD12, as on an error condition
+	 * CMD12 (stop) needs to be sent anyway. This, coupled
+	 * with Auto-CMD23 enhancements provided by some
+	 * hosts, means that the complexity of dealing
+	 * with this is best left to the host. If CMD23 is
+	 * supported by card and host, we'll fill sbc in and let
+	 * the host deal with handling it correctly. This means
+	 * that for hosts that don't expose MMC_CAP_CMD23, no
+	 * change of behavior will be observed.
+	 *
+	 * N.B: Some MMC cards experience perf degradation.
+	 * We'll avoid using CMD23-bounded multiblock writes for
+	 * these, while retaining features like reliable writes.
+	 */
 
-		brq->data.sg = mq->mqrq_cur->sg;
-		brq->data.sg_len = mmc_queue_map_sg(mq, mq->mqrq_cur);
+	if ((md->flags & MMC_BLK_CMD23) &&
+	    mmc_op_multi(brq->cmd.opcode) &&
+	    (do_rel_wr || !(card->quirks & MMC_QUIRK_BLK_NO_CMD23))) {
+		brq->sbc.opcode = MMC_SET_BLOCK_COUNT;
+		brq->sbc.arg = brq->data.blocks |
+			(do_rel_wr ? (1 << 31) : 0);
+		brq->sbc.flags = MMC_RSP_R1 | MMC_CMD_AC;
+		brq->mrq.sbc = &brq->sbc;
+	}
 
-		/*
-		 * Adjust the sg list so it is the same size as the
-		 * request.
-		 */
-		if (brq->data.blocks != blk_rq_sectors(req)) {
-			int i, data_size = brq->data.blocks << 9;
-			struct scatterlist *sg;
-
-			for_each_sg(brq->data.sg, sg, brq->data.sg_len, i) {
-				data_size -= sg->length;
-				if (data_size <= 0) {
-					sg->length += data_size;
-					i++;
-					break;
-				}
+	mmc_set_data_timeout(&brq->data, card);
+
+	brq->data.sg = mqrq->sg;
+	brq->data.sg_len = mmc_queue_map_sg(mq, mqrq);
+
+	/*
+	 * Adjust the sg list so it is the same size as the
+	 * request.
+	 */
+	if (brq->data.blocks != blk_rq_sectors(req)) {
+		int i, data_size = brq->data.blocks << 9;
+		struct scatterlist *sg;
+
+		for_each_sg(brq->data.sg, sg, brq->data.sg_len, i) {
+			data_size -= sg->length;
+			if (data_size <= 0) {
+				sg->length += data_size;
+				i++;
+				break;
 			}
-			brq->data.sg_len = i;
 		}
+		brq->data.sg_len = i;
+	}
 
-		mmc_queue_bounce_pre(mq->mqrq_cur);
+	mmc_queue_bounce_pre(mqrq);
+}
 
+static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+{
+	struct mmc_blk_data *md = mq->data;
+	struct mmc_card *card = md->queue.card;
+	struct mmc_blk_request *brq = &mq->mqrq_cur->brq;
+	int ret = 1, disable_multi = 0, retry = 0;
+
+	do {
+		mmc_blk_rw_rq_prep(mq->mqrq_cur, card, disable_multi, mq);
 		mmc_wait_for_req(card->host, &brq->mrq);
 
 		mmc_queue_bounce_post(mq->mqrq_cur);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 09/12] mmc: block: move error code in issue_rw_rq to a separate function.
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (7 preceding siblings ...)
  2011-06-28  8:11 ` [PATCH v8 08/12] mmc: block: add a block request prepare function Per Forlin
@ 2011-06-28  8:11 ` Per Forlin
  2011-06-28  8:11 ` [PATCH v8 10/12] mmc: queue: add a second mmc queue request member Per Forlin
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

Break out code without functional changes. This simplifies the code and
makes way for handle two parallel request.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/mmc/card/block.c |  220 +++++++++++++++++++++++++++-------------------
 1 files changed, 131 insertions(+), 89 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index a0a76f4..7ed2c68 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -106,6 +106,16 @@ struct mmc_blk_data {
 
 static DEFINE_MUTEX(open_lock);
 
+enum mmc_blk_status {
+	MMC_BLK_SUCCESS = 0,
+	MMC_BLK_PARTIAL,
+	MMC_BLK_RETRY,
+	MMC_BLK_RETRY_SINGLE,
+	MMC_BLK_DATA_ERR,
+	MMC_BLK_CMD_ERR,
+	MMC_BLK_ABORT,
+};
+
 module_param(perdev_minors, int, 0444);
 MODULE_PARM_DESC(perdev_minors, "Minors numbers to allocate per device");
 
@@ -812,6 +822,95 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq,
 	 R1_CC_ERROR |		/* Card controller error */		\
 	 R1_ERROR)		/* General/unknown error */
 
+int mmc_blk_err_check(struct mmc_blk_request *brq,
+		      struct request *req,
+		      struct mmc_card *card,
+		      struct mmc_blk_data *md)
+{
+	int ret = MMC_BLK_SUCCESS;
+
+	/*
+	 * sbc.error indicates a problem with the set block count
+	 * command.  No data will have been transferred.
+	 *
+	 * cmd.error indicates a problem with the r/w command.  No
+	 * data will have been transferred.
+	 *
+	 * stop.error indicates a problem with the stop command.  Data
+	 * may have been transferred, or may still be transferring.
+	 */
+	if (brq->sbc.error || brq->cmd.error || brq->stop.error) {
+		switch (mmc_blk_cmd_recovery(card, req, brq)) {
+		case ERR_RETRY:
+			return MMC_BLK_RETRY;
+		case ERR_ABORT:
+			return MMC_BLK_ABORT;
+		case ERR_CONTINUE:
+			break;
+		}
+	}
+
+	/*
+	 * Check for errors relating to the execution of the
+	 * initial command - such as address errors.  No data
+	 * has been transferred.
+	 */
+	if (brq->cmd.resp[0] & CMD_ERRORS) {
+		pr_err("%s: r/w command failed, status = %#x\n",
+		       req->rq_disk->disk_name, brq->cmd.resp[0]);
+		return MMC_BLK_ABORT;
+	}
+
+	/*
+	 * Everything else is either success, or a data error of some
+	 * kind.  If it was a write, we may have transitioned to
+	 * program mode, which we have to wait for it to complete.
+	 */
+	if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {
+		u32 status;
+		do {
+			int err = get_card_status(card, &status, 5);
+			if (err) {
+				printk(KERN_ERR "%s: error %d requesting status\n",
+				       req->rq_disk->disk_name, err);
+				return MMC_BLK_CMD_ERR;
+			}
+			/*
+			 * Some cards mishandle the status bits,
+			 * so make sure to check both the busy
+			 * indication and the card state.
+			 */
+		} while (!(status & R1_READY_FOR_DATA) ||
+			 (R1_CURRENT_STATE(status) == R1_STATE_PRG));
+	}
+
+	if (brq->data.error) {
+		pr_err("%s: error %d transferring data, sector %u, nr %u, cmd response %#x, card status %#x\n",
+		       req->rq_disk->disk_name, brq->data.error,
+		       (unsigned)blk_rq_pos(req),
+		       (unsigned)blk_rq_sectors(req),
+		       brq->cmd.resp[0], brq->stop.resp[0]);
+
+		if (rq_data_dir(req) == READ) {
+			if (brq->data.blocks > 1) {
+				/* Redo read one sector at a time */
+				pr_warning("%s: retrying using single block read\n",
+					   req->rq_disk->disk_name);
+				return MMC_BLK_RETRY_SINGLE;
+			}
+			return MMC_BLK_DATA_ERR;
+		} else {
+			return MMC_BLK_CMD_ERR;
+		}
+	}
+
+	if (ret == MMC_BLK_SUCCESS &&
+	    blk_rq_bytes(req) != brq->data.bytes_xfered)
+		ret = MMC_BLK_PARTIAL;
+
+	return ret;
+}
+
 static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 			       struct mmc_card *card,
 			       int disable_multi,
@@ -948,6 +1047,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 	struct mmc_card *card = md->queue.card;
 	struct mmc_blk_request *brq = &mq->mqrq_cur->brq;
 	int ret = 1, disable_multi = 0, retry = 0;
+	enum mmc_blk_status status;
 
 	do {
 		mmc_blk_rw_rq_prep(mq->mqrq_cur, card, disable_multi, mq);
@@ -955,99 +1055,41 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 
 		mmc_queue_bounce_post(mq->mqrq_cur);
 
-		/*
-		 * sbc.error indicates a problem with the set block count
-		 * command.  No data will have been transferred.
-		 *
-		 * cmd.error indicates a problem with the r/w command.  No
-		 * data will have been transferred.
-		 *
-		 * stop.error indicates a problem with the stop command.  Data
-		 * may have been transferred, or may still be transferring.
-		 */
-		if (brq->sbc.error || brq->cmd.error || brq->stop.error) {
-			switch (mmc_blk_cmd_recovery(card, req, brq)) {
-			case ERR_RETRY:
-				if (retry++ < 5)
-					continue;
-			case ERR_ABORT:
-				goto cmd_abort;
-			case ERR_CONTINUE:
+		status = mmc_blk_err_check(brq, req, card, md);
+		switch (status) {
+		case MMC_BLK_SUCCESS:
+		case MMC_BLK_PARTIAL:
+			/*
+			 * A block was successfully transferred.
+			 */
+			spin_lock_irq(&md->lock);
+			ret = __blk_end_request(req, 0,
+						brq->data.bytes_xfered);
+			spin_unlock_irq(&md->lock);
+			break;
+		case MMC_BLK_CMD_ERR:
+			goto cmd_err;
+		case MMC_BLK_RETRY_SINGLE:
+			disable_multi = 1;
+			break;
+		case MMC_BLK_RETRY:
+			if (retry++ < 5)
 				break;
-			}
-		}
-
-		/*
-		 * Check for errors relating to the execution of the
-		 * initial command - such as address errors.  No data
-		 * has been transferred.
-		 */
-		if (brq->cmd.resp[0] & CMD_ERRORS) {
-			pr_err("%s: r/w command failed, status = %#x\n",
-				req->rq_disk->disk_name, brq->cmd.resp[0]);
+		case MMC_BLK_ABORT:
 			goto cmd_abort;
+		case MMC_BLK_DATA_ERR:
+			/*
+			 * After an error, we redo I/O one sector at a
+			 * time, so we only reach here after trying to
+			 * read a single sector.
+			 */
+			spin_lock_irq(&md->lock);
+			ret = __blk_end_request(req, -EIO,
+						brq->data.blksz);
+			spin_unlock_irq(&md->lock);
+			break;
 		}
 
-		/*
-		 * Everything else is either success, or a data error of some
-		 * kind.  If it was a write, we may have transitioned to
-		 * program mode, which we have to wait for it to complete.
-		 */
-		if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {
-			u32 status;
-			do {
-				int err = get_card_status(card, &status, 5);
-				if (err) {
-					printk(KERN_ERR "%s: error %d requesting status\n",
-					       req->rq_disk->disk_name, err);
-					goto cmd_err;
-				}
-				/*
-				 * Some cards mishandle the status bits,
-				 * so make sure to check both the busy
-				 * indication and the card state.
-				 */
-			} while (!(status & R1_READY_FOR_DATA) ||
-				 (R1_CURRENT_STATE(status) == R1_STATE_PRG));
-		}
-
-		if (brq->data.error) {
-			pr_err("%s: error %d transferring data, sector %u, nr %u, cmd response %#x, card status %#x\n",
-				req->rq_disk->disk_name, brq->data.error,
-				(unsigned)blk_rq_pos(req),
-				(unsigned)blk_rq_sectors(req),
-				brq->cmd.resp[0], brq->stop.resp[0]);
-
-			if (rq_data_dir(req) == READ) {
-				if (brq->data.blocks > 1) {
-					/* Redo read one sector at a time */
-					pr_warning("%s: retrying using single block read\n",
-						req->rq_disk->disk_name);
-					disable_multi = 1;
-					continue;
-				}
-
-				/*
-				 * After an error, we redo I/O one sector at a
-				 * time, so we only reach here after trying to
-				 * read a single sector.
-				 */
-				spin_lock_irq(&md->lock);
-				ret = __blk_end_request(req, -EIO,
-							brq->data.blksz);
-				spin_unlock_irq(&md->lock);
-				continue;
-			} else {
-				goto cmd_err;
-			}
-		}
-
-		/*
-		 * A block was successfully transferred.
-		 */
-		spin_lock_irq(&md->lock);
-		ret = __blk_end_request(req, 0, brq->data.bytes_xfered);
-		spin_unlock_irq(&md->lock);
 	} while (ret);
 
 	return 1;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 10/12] mmc: queue: add a second mmc queue request member
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (8 preceding siblings ...)
  2011-06-28  8:11 ` [PATCH v8 09/12] mmc: block: move error code in issue_rw_rq to a separate function Per Forlin
@ 2011-06-28  8:11 ` Per Forlin
  2011-06-28  8:11 ` [PATCH v8 11/12] mmc: core: add random fault injection Per Forlin
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

Add an additional mmc queue request instance to make way for
two active block requests. One request may be active while the
other request is being prepared.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/mmc/card/queue.c |   44 ++++++++++++++++++++++++++++++++++++++++++--
 drivers/mmc/card/queue.h |    3 ++-
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 4222e1a..d69d954 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -132,6 +132,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 	u64 limit = BLK_BOUNCE_HIGH;
 	int ret;
 	struct mmc_queue_req *mqrq_cur = &mq->mqrq[0];
+	struct mmc_queue_req *mqrq_prev = &mq->mqrq[1];
 
 	if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
 		limit = *mmc_dev(host)->dma_mask;
@@ -142,7 +143,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 		return -ENOMEM;
 
 	memset(&mq->mqrq_cur, 0, sizeof(mq->mqrq_cur));
+	memset(&mq->mqrq_prev, 0, sizeof(mq->mqrq_prev));
 	mq->mqrq_cur = mqrq_cur;
+	mq->mqrq_prev = mqrq_prev;
 	mq->queue->queuedata = mq;
 
 	blk_queue_prep_rq(mq->queue, mmc_prep_request);
@@ -178,9 +181,17 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 					"allocate bounce cur buffer\n",
 					mmc_card_name(card));
 			}
+			mqrq_prev->bounce_buf = kmalloc(bouncesz, GFP_KERNEL);
+			if (!mqrq_prev->bounce_buf) {
+				printk(KERN_WARNING "%s: unable to "
+					"allocate bounce prev buffer\n",
+					mmc_card_name(card));
+				kfree(mqrq_cur->bounce_buf);
+				mqrq_cur->bounce_buf = NULL;
+			}
 		}
 
-		if (mqrq_cur->bounce_buf) {
+		if (mqrq_cur->bounce_buf && mqrq_prev->bounce_buf) {
 			blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_ANY);
 			blk_queue_max_hw_sectors(mq->queue, bouncesz / 512);
 			blk_queue_max_segments(mq->queue, bouncesz / 512);
@@ -195,11 +206,19 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 			if (ret)
 				goto cleanup_queue;
 
+			mqrq_prev->sg = mmc_alloc_sg(1, &ret);
+			if (ret)
+				goto cleanup_queue;
+
+			mqrq_prev->bounce_sg =
+				mmc_alloc_sg(bouncesz / 512, &ret);
+			if (ret)
+				goto cleanup_queue;
 		}
 	}
 #endif
 
-	if (!mqrq_cur->bounce_buf) {
+	if (!mqrq_cur->bounce_buf && !mqrq_prev->bounce_buf) {
 		blk_queue_bounce_limit(mq->queue, limit);
 		blk_queue_max_hw_sectors(mq->queue,
 			min(host->max_blk_count, host->max_req_size / 512));
@@ -210,6 +229,10 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 		if (ret)
 			goto cleanup_queue;
 
+
+		mqrq_prev->sg = mmc_alloc_sg(host->max_segs, &ret);
+		if (ret)
+			goto cleanup_queue;
 	}
 
 	sema_init(&mq->thread_sem, 1);
@@ -226,6 +249,8 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
  free_bounce_sg:
 	kfree(mqrq_cur->bounce_sg);
 	mqrq_cur->bounce_sg = NULL;
+	kfree(mqrq_prev->bounce_sg);
+	mqrq_prev->bounce_sg = NULL;
 
  cleanup_queue:
 	kfree(mqrq_cur->sg);
@@ -233,6 +258,11 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
 	kfree(mqrq_cur->bounce_buf);
 	mqrq_cur->bounce_buf = NULL;
 
+	kfree(mqrq_prev->sg);
+	mqrq_prev->sg = NULL;
+	kfree(mqrq_prev->bounce_buf);
+	mqrq_prev->bounce_buf = NULL;
+
 	blk_cleanup_queue(mq->queue);
 	return ret;
 }
@@ -242,6 +272,7 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
 	struct request_queue *q = mq->queue;
 	unsigned long flags;
 	struct mmc_queue_req *mqrq_cur = mq->mqrq_cur;
+	struct mmc_queue_req *mqrq_prev = mq->mqrq_prev;
 
 	/* Make sure the queue isn't suspended, as that will deadlock */
 	mmc_queue_resume(mq);
@@ -264,6 +295,15 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
 	kfree(mqrq_cur->bounce_buf);
 	mqrq_cur->bounce_buf = NULL;
 
+	kfree(mqrq_prev->bounce_sg);
+	mqrq_prev->bounce_sg = NULL;
+
+	kfree(mqrq_prev->sg);
+	mqrq_prev->sg = NULL;
+
+	kfree(mqrq_prev->bounce_buf);
+	mqrq_prev->bounce_buf = NULL;
+
 	mq->card = NULL;
 }
 EXPORT_SYMBOL(mmc_cleanup_queue);
diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h
index c1a69ac..1a637d2 100644
--- a/drivers/mmc/card/queue.h
+++ b/drivers/mmc/card/queue.h
@@ -29,8 +29,9 @@ struct mmc_queue {
 	int			(*issue_fn)(struct mmc_queue *, struct request *);
 	void			*data;
 	struct request_queue	*queue;
-	struct mmc_queue_req	mqrq[1];
+	struct mmc_queue_req	mqrq[2];
 	struct mmc_queue_req	*mqrq_cur;
+	struct mmc_queue_req	*mqrq_prev;
 };
 
 extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *,
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 11/12] mmc: core: add random fault injection
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (9 preceding siblings ...)
  2011-06-28  8:11 ` [PATCH v8 10/12] mmc: queue: add a second mmc queue request member Per Forlin
@ 2011-06-28  8:11 ` Per Forlin
  2011-06-28  8:11 ` [PATCH v8 12/12] mmc: block: add handling for two parallel block requests in issue_rw_rq Per Forlin
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

This simple fault injection proved to be very useful to
test the error handling in the block.c rw_rq(). It may
still be useful to test if the host driver handle
pre_req() and post_req() correctly in case of errors.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/mmc/core/core.c    |   54 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/mmc/core/debugfs.c |    5 ++++
 include/linux/mmc/host.h   |    3 ++
 lib/Kconfig.debug          |   11 +++++++++
 4 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index d2d7239..62a8cc7 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -23,6 +23,8 @@
 #include <linux/log2.h>
 #include <linux/regulator/consumer.h>
 #include <linux/pm_runtime.h>
+#include <linux/fault-inject.h>
+#include <linux/random.h>
 
 #include <linux/mmc/card.h>
 #include <linux/mmc/host.h>
@@ -82,6 +84,56 @@ static void mmc_flush_scheduled_work(void)
 	flush_workqueue(workqueue);
 }
 
+#ifdef CONFIG_FAIL_MMC_REQUEST
+
+static DECLARE_FAULT_ATTR(fail_mmc_request);
+
+static int __init setup_fail_mmc_request(char *str)
+{
+	return setup_fault_attr(&fail_mmc_request, str);
+}
+__setup("fail_mmc_request=", setup_fail_mmc_request);
+
+static void mmc_should_fail_request(struct mmc_host *host,
+				    struct mmc_request *mrq)
+{
+	struct mmc_command *cmd = mrq->cmd;
+	struct mmc_data *data = mrq->data;
+	static const int data_errors[] = {
+		-ETIMEDOUT,
+		-EILSEQ,
+		-EIO,
+	};
+
+	if (!data)
+		return;
+
+	if (cmd->error || data->error || !host->make_it_fail ||
+	    !should_fail(&fail_mmc_request, data->blksz * data->blocks))
+		return;
+
+	data->error = data_errors[random32() % ARRAY_SIZE(data_errors)];
+	data->bytes_xfered = (random32() % (data->bytes_xfered >> 9)) << 9;
+}
+
+static int __init fail_mmc_request_debugfs(void)
+{
+	return init_fault_attr_dentries(&fail_mmc_request,
+					"fail_mmc_request");
+}
+
+late_initcall(fail_mmc_request_debugfs);
+
+#else /* CONFIG_FAIL_MMC_REQUEST */
+
+static void mmc_should_fail_request(struct mmc_host *host,
+				    struct mmc_request *mrq)
+{
+}
+
+#endif /* CONFIG_FAIL_MMC_REQUEST */
+
+
 /**
  *	mmc_request_done - finish processing an MMC request
  *	@host: MMC host which completed request
@@ -108,6 +160,8 @@ void mmc_request_done(struct mmc_host *host, struct mmc_request *mrq)
 		cmd->error = 0;
 		host->ops->request(host, mrq);
 	} else {
+		mmc_should_fail_request(host, mrq);
+
 		led_trigger_event(host->led, LED_OFF);
 
 		pr_debug("%s: req done (CMD%u): %d: %08x %08x %08x %08x\n",
diff --git a/drivers/mmc/core/debugfs.c b/drivers/mmc/core/debugfs.c
index 998797e..588e76f 100644
--- a/drivers/mmc/core/debugfs.c
+++ b/drivers/mmc/core/debugfs.c
@@ -188,6 +188,11 @@ void mmc_add_host_debugfs(struct mmc_host *host)
 				root, &host->clk_delay))
 		goto err_node;
 #endif
+#ifdef CONFIG_FAIL_MMC_REQUEST
+	if (!debugfs_create_u8("make-it-fail", S_IRUSR | S_IWUSR,
+			       root, &host->make_it_fail))
+		goto err_node;
+#endif
 	return;
 
 err_node:
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 59db6f2..0d0a48f 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -301,6 +301,9 @@ struct mmc_host {
 
 	struct mmc_async_req	*areq;		/* active async req */
 
+#ifdef CONFIG_FAIL_MMC_REQUEST
+	u8			make_it_fail;
+#endif
 	unsigned long		private[0] ____cacheline_aligned;
 };
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c768bcd..330fc70 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1057,6 +1057,17 @@ config FAIL_IO_TIMEOUT
 	  Only works with drivers that use the generic timeout handling,
 	  for others it wont do anything.
 
+config FAIL_MMC_REQUEST
+	bool "Fault-injection capability for MMC IO"
+	select DEBUG_FS
+	depends on FAULT_INJECTION
+	help
+	  Provide fault-injection capability for MMC IO.
+	  This will make the mmc core return data errors. This is
+	  useful for testing the error handling in the mmc block device
+	  and how the mmc host driver handle retries from
+	  the block device.
+
 config FAULT_INJECTION_DEBUG_FS
 	bool "Debugfs entries for fault-injection capabilities"
 	depends on FAULT_INJECTION && SYSFS && DEBUG_FS
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v8 12/12] mmc: block: add handling for two parallel block requests in issue_rw_rq
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (10 preceding siblings ...)
  2011-06-28  8:11 ` [PATCH v8 11/12] mmc: core: add random fault injection Per Forlin
@ 2011-06-28  8:11 ` Per Forlin
  2011-06-28  9:39   ` Per Forlin
  2011-06-28  9:54 ` [PATCH v8 00/12] use nonblock mmc requests to minimize latency Kyungmin Park
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Per Forlin @ 2011-06-28  8:11 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

Change mmc_blk_issue_rw_rq() to become asynchronous.
The execution flow looks like this:
The mmc-queue calls issue_rw_rq(), which sends the request
to the host and returns back to the mmc-queue. The mmc-queue calls
issue_rw_rq() again with a new request. This new request is prepared,
in isuue_rw_rq(), then it waits for the active request to complete before
pushing it to the host. When to mmc-queue is empty it will call
isuue_rw_rq() with req=NULL to finish off the active request
without starting a new request.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/mmc/card/block.c |   80 +++++++++++++++++++++++++++++++++++++--------
 drivers/mmc/card/queue.c |   17 +++++++---
 drivers/mmc/card/queue.h |    1 +
 3 files changed, 78 insertions(+), 20 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 7ed2c68..825741e 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -822,12 +822,14 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq,
 	 R1_CC_ERROR |		/* Card controller error */		\
 	 R1_ERROR)		/* General/unknown error */
 
-int mmc_blk_err_check(struct mmc_blk_request *brq,
-		      struct request *req,
-		      struct mmc_card *card,
-		      struct mmc_blk_data *md)
+static int mmc_blk_err_check(struct mmc_card *card,
+			     struct mmc_async_req *areq)
 {
-	int ret = MMC_BLK_SUCCESS;
+	enum mmc_blk_status ret = MMC_BLK_SUCCESS;
+	struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
+						    mmc_active);
+	struct mmc_blk_request *brq = &mq_mrq->brq;
+	struct request *req = mq_mrq->req;
 
 	/*
 	 * sbc.error indicates a problem with the set block count
@@ -1038,24 +1040,41 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 		brq->data.sg_len = i;
 	}
 
+	mqrq->mmc_active.mrq = &brq->mrq;
+	mqrq->mmc_active.err_check = mmc_blk_err_check;
+
 	mmc_queue_bounce_pre(mqrq);
 }
 
-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 {
 	struct mmc_blk_data *md = mq->data;
 	struct mmc_card *card = md->queue.card;
 	struct mmc_blk_request *brq = &mq->mqrq_cur->brq;
 	int ret = 1, disable_multi = 0, retry = 0;
 	enum mmc_blk_status status;
+	struct mmc_queue_req *mq_rq;
+	struct request *req;
+	struct mmc_async_req *areq;
 
-	do {
-		mmc_blk_rw_rq_prep(mq->mqrq_cur, card, disable_multi, mq);
-		mmc_wait_for_req(card->host, &brq->mrq);
+	if (!rqc && !mq->mqrq_prev->req)
+		return 0;
 
-		mmc_queue_bounce_post(mq->mqrq_cur);
+	do {
+		if (rqc) {
+			mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
+			areq = &mq->mqrq_cur->mmc_active;
+		} else
+			areq = NULL;
+		areq = mmc_start_req(card->host, areq, (int *) &status);
+		if (!areq)
+			return 0;
+
+		mq_rq = container_of(areq, struct mmc_queue_req, mmc_active);
+		brq = &mq_rq->brq;
+		req = mq_rq->req;
+		mmc_queue_bounce_post(mq_rq);
 
-		status = mmc_blk_err_check(brq, req, card, md);
 		switch (status) {
 		case MMC_BLK_SUCCESS:
 		case MMC_BLK_PARTIAL:
@@ -1066,6 +1085,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 			ret = __blk_end_request(req, 0,
 						brq->data.bytes_xfered);
 			spin_unlock_irq(&md->lock);
+			if (status == MMC_BLK_SUCCESS && ret) {
+				/* If this happen it is a bug */
+				printk(KERN_ERR "%s BUG rq_tot %d d_xfer %d\n",
+				       __func__, blk_rq_bytes(req),
+				       brq->data.bytes_xfered);
+				goto cmd_abort;
+			}
 			break;
 		case MMC_BLK_CMD_ERR:
 			goto cmd_err;
@@ -1087,9 +1113,19 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 			ret = __blk_end_request(req, -EIO,
 						brq->data.blksz);
 			spin_unlock_irq(&md->lock);
+			if (!ret)
+				goto start_new_req;
 			break;
 		}
 
+		if (ret) {
+			/*
+			 * In case of a none complete request
+			 * prepare it again and resend.
+			 */
+			mmc_blk_rw_rq_prep(mq_rq, card, disable_multi, mq);
+			mmc_start_req(card->host, &mq_rq->mmc_active, NULL);
+		}
 	} while (ret);
 
 	return 1;
@@ -1124,6 +1160,12 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		ret = __blk_end_request(req, -EIO, blk_rq_cur_bytes(req));
 	spin_unlock_irq(&md->lock);
 
+ start_new_req:
+	if (rqc) {
+		mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
+		mmc_start_req(card->host, &mq->mqrq_cur->mmc_active, NULL);
+	}
+
 	return 0;
 }
 
@@ -1133,26 +1175,34 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
 	struct mmc_blk_data *md = mq->data;
 	struct mmc_card *card = md->queue.card;
 
-	mmc_claim_host(card->host);
+	if (req && !mq->mqrq_prev->req)
+		/* claim host only for the first request */
+		mmc_claim_host(card->host);
+
 	ret = mmc_blk_part_switch(card, md);
 	if (ret) {
 		ret = 0;
 		goto out;
 	}
 
-	if (req->cmd_flags & REQ_DISCARD) {
+	if (req && req->cmd_flags & REQ_DISCARD) {
+		/* complete ongoing async transfer before issuing discard */
+		if (card->host->areq)
+			mmc_blk_issue_rw_rq(mq, NULL);
 		if (req->cmd_flags & REQ_SECURE)
 			ret = mmc_blk_issue_secdiscard_rq(mq, req);
 		else
 			ret = mmc_blk_issue_discard_rq(mq, req);
-	} else if (req->cmd_flags & REQ_FLUSH) {
+	} else if (req && req->cmd_flags & REQ_FLUSH) {
 		ret = mmc_blk_issue_flush(mq, req);
 	} else {
 		ret = mmc_blk_issue_rw_rq(mq, req);
 	}
 
 out:
-	mmc_release_host(card->host);
+	if (!req)
+		/* release host only when there are no more requests */
+		mmc_release_host(card->host);
 	return ret;
 }
 
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index d69d954..8c51a54 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -52,6 +52,7 @@ static int mmc_queue_thread(void *d)
 	down(&mq->thread_sem);
 	do {
 		struct request *req = NULL;
+		struct mmc_queue_req *tmp;
 
 		spin_lock_irq(q->queue_lock);
 		set_current_state(TASK_INTERRUPTIBLE);
@@ -59,7 +60,10 @@ static int mmc_queue_thread(void *d)
 		mq->mqrq_cur->req = req;
 		spin_unlock_irq(q->queue_lock);
 
-		if (!req) {
+		if (req || mq->mqrq_prev->req) {
+			set_current_state(TASK_RUNNING);
+			mq->issue_fn(mq, req);
+		} else {
 			if (kthread_should_stop()) {
 				set_current_state(TASK_RUNNING);
 				break;
@@ -67,11 +71,14 @@ static int mmc_queue_thread(void *d)
 			up(&mq->thread_sem);
 			schedule();
 			down(&mq->thread_sem);
-			continue;
 		}
-		set_current_state(TASK_RUNNING);
 
-		mq->issue_fn(mq, req);
+		/* Current request becomes previous request and vice versa. */
+		mq->mqrq_prev->brq.mrq.data = NULL;
+		mq->mqrq_prev->req = NULL;
+		tmp = mq->mqrq_prev;
+		mq->mqrq_prev = mq->mqrq_cur;
+		mq->mqrq_cur = tmp;
 	} while (1);
 	up(&mq->thread_sem);
 
@@ -97,7 +104,7 @@ static void mmc_request(struct request_queue *q)
 		return;
 	}
 
-	if (!mq->mqrq_cur->req)
+	if (!mq->mqrq_cur->req && !mq->mqrq_prev->req)
 		wake_up_process(mq->thread);
 }
 
diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h
index 1a637d2..d2a1eb4 100644
--- a/drivers/mmc/card/queue.h
+++ b/drivers/mmc/card/queue.h
@@ -19,6 +19,7 @@ struct mmc_queue_req {
 	char			*bounce_buf;
 	struct scatterlist	*bounce_sg;
 	unsigned int		bounce_sg_len;
+	struct mmc_async_req	mmc_active;
 };
 
 struct mmc_queue {
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 12/12] mmc: block: add handling for two parallel block requests in issue_rw_rq
  2011-06-28  8:11 ` [PATCH v8 12/12] mmc: block: add handling for two parallel block requests in issue_rw_rq Per Forlin
@ 2011-06-28  9:39   ` Per Forlin
  0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-06-28  9:39 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

On 28 June 2011 10:11, Per Forlin <per.forlin@linaro.org> wrote:
> Change mmc_blk_issue_rw_rq() to become asynchronous.
> The execution flow looks like this:
> The mmc-queue calls issue_rw_rq(), which sends the request
> to the host and returns back to the mmc-queue. The mmc-queue calls
> issue_rw_rq() again with a new request. This new request is prepared,
> in isuue_rw_rq(), then it waits for the active request to complete before
> pushing it to the host. When to mmc-queue is empty it will call
> isuue_rw_rq() with req=NULL to finish off the active request
> without starting a new request.
>
> Signed-off-by: Per Forlin <per.forlin@linaro.org>
> ---
>  drivers/mmc/card/block.c |   80 +++++++++++++++++++++++++++++++++++++--------
>  drivers/mmc/card/queue.c |   17 +++++++---
>  drivers/mmc/card/queue.h |    1 +
>  3 files changed, 78 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> index 7ed2c68..825741e 100644
> --- a/drivers/mmc/card/block.c
> +++ b/drivers/mmc/card/block.c
...
> @@ -1066,6 +1085,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
>                        ret = __blk_end_request(req, 0,
>                                                brq->data.bytes_xfered);
>                        spin_unlock_irq(&md->lock);
> +                       if (status == MMC_BLK_SUCCESS && ret) {
> +                               /* If this happen it is a bug */
> +                               printk(KERN_ERR "%s BUG rq_tot %d d_xfer %d\n",
> +                                      __func__, blk_rq_bytes(req),
> +                                      brq->data.bytes_xfered);
+ rqc = NULL
If there is a new request (rqc != NULL)  it will already be started
when reaching this point.
If rqc is set it will be started again at start_new_req.

I wonder if this paranoia check is necessary. If "status ==
MMC_BLK_SUCCESS" all bytes are transferred and no error returned from
mmc layer.
 __blk_end_request would always return 0 in this case, please comment
if you disagree.

...
> + start_new_req:
> +       if (rqc) {
> +               mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
> +               mmc_start_req(card->host, &mq->mqrq_cur->mmc_active, NULL);
> +       }
> +
>        return 0;
>  }

/Per

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (11 preceding siblings ...)
  2011-06-28  8:11 ` [PATCH v8 12/12] mmc: block: add handling for two parallel block requests in issue_rw_rq Per Forlin
@ 2011-06-28  9:54 ` Kyungmin Park
  2011-06-30 12:36 ` Poddar, Sourav
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Kyungmin Park @ 2011-06-28  9:54 UTC (permalink / raw)
  To: Per Forlin
  Cc: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij,
	Chris Ball

This patches are tested on samsung exynos4 platform.

Acked-by: Kyungmin Park <kyungmin.park@samsung.com>

On Tue, Jun 28, 2011 at 5:11 PM, Per Forlin <per.forlin@linaro.org> wrote:
> How significant is the cache maintenance over head?
> It depends, the eMMC are much faster now
> compared to a few years ago and cache maintenance cost more due to
> multiple cache levels and speculative cache pre-fetch. In relation the
> cost for handling the caches have increased and is now a bottle neck
> dealing with fast eMMC together with DMA.
>
> The intention for introducing non-blocking mmc requests is to minimize the
> time between a mmc request ends and another mmc request starts. In the
> current implementation the MMC controller is idle when dma_map_sg and
> dma_unmap_sg is processing. Introducing non-blocking mmc request makes it
> possible to prepare the caches for next job in parallel to an active
> mmc request.
>
> This is done by making the issue_rw_rq() non-blocking.
> The increase in throughput is proportional to the time it takes to
> prepare (major part of preparations is dma_map_sg and dma_unmap_sg)
> a request and how fast the memory is. The faster the MMC/SD is
> the more significant the prepare request time becomes. Measurements on U5500
> and Panda on eMMC and SD shows significant performance gain for large
> reads when running DMA mode. In the PIO case the performance is unchanged.
>
> There are two optional hooks pre_req() and post_req() that the host driver
> may implement in order to move work to before and after the actual mmc_request
> function is called. In the DMA case pre_req() may do dma_map_sg() and prepare
> the dma descriptor and post_req runs the dma_unmap_sg.
>
> Details on measurements from IOZone and mmc_test:
> https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
>
> Changes since v7:
>  * rebase on mmc-next, on top of Russell's updated error handling.
>  * Clarify description of mmc_start_req()
>  * Resolve compile without CONFIG_DMA_ENIGNE issue for mmci
>  * Add mmc test to measure how performance is affected by sg length
>  * Add missing wait_for_busy in mmc_test non-blocking test. This call got lost
>   in v4 of this patchset when refactoring mmc_start_req.
>  * Add sub-prefix (core block queue) to relevant patches.
>
> Per Forlin (12):
>  mmc: core: add non-blocking mmc request function
>  omap_hsmmc: add support for pre_req and post_req
>  mmci: implement pre_req() and post_req()
>  mmc: mmc_test: add debugfs file to list all tests
>  mmc: mmc_test: add test for non-blocking transfers
>  mmc: mmc_test: test to measure how sg_len affect performance
>  mmc: block: add member in mmc queue struct to hold request data
>  mmc: block: add a block request prepare function
>  mmc: block: move error code in issue_rw_rq to a separate function.
>  mmc: queue: add a second mmc queue request member
>  mmc: core: add random fault injection
>  mmc: block: add handling for two parallel block requests in
>    issue_rw_rq
>
>  drivers/mmc/card/block.c      |  505 ++++++++++++++++++++++++-----------------
>  drivers/mmc/card/mmc_test.c   |  491 ++++++++++++++++++++++++++++++++++++++--
>  drivers/mmc/card/queue.c      |  184 ++++++++++------
>  drivers/mmc/card/queue.h      |   33 ++-
>  drivers/mmc/core/core.c       |  167 +++++++++++++-
>  drivers/mmc/core/debugfs.c    |    5 +
>  drivers/mmc/host/mmci.c       |  147 +++++++++++-
>  drivers/mmc/host/mmci.h       |    8 +
>  drivers/mmc/host/omap_hsmmc.c |   87 +++++++-
>  include/linux/mmc/core.h      |    6 +-
>  include/linux/mmc/host.h      |   24 ++
>  lib/Kconfig.debug             |   11 +
>  12 files changed, 1345 insertions(+), 323 deletions(-)
>
> --
> 1.7.4.1
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (12 preceding siblings ...)
  2011-06-28  9:54 ` [PATCH v8 00/12] use nonblock mmc requests to minimize latency Kyungmin Park
@ 2011-06-30 12:36 ` Poddar, Sourav
  2011-06-30 13:11   ` S, Venkatraman
  2011-06-30 13:12 ` Arnd Bergmann
  2011-07-01 14:39 ` Linus Walleij
  15 siblings, 1 reply; 27+ messages in thread
From: Poddar, Sourav @ 2011-06-30 12:36 UTC (permalink / raw)
  To: Per Forlin
  Cc: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij,
	Chris Ball

On Tue, Jun 28, 2011 at 1:41 PM, Per Forlin <per.forlin@linaro.org> wrote:
> How significant is the cache maintenance over head?
> It depends, the eMMC are much faster now
> compared to a few years ago and cache maintenance cost more due to
> multiple cache levels and speculative cache pre-fetch. In relation the
> cost for handling the caches have increased and is now a bottle neck
> dealing with fast eMMC together with DMA.
>
> The intention for introducing non-blocking mmc requests is to minimize the
> time between a mmc request ends and another mmc request starts. In the
> current implementation the MMC controller is idle when dma_map_sg and
> dma_unmap_sg is processing. Introducing non-blocking mmc request makes it
> possible to prepare the caches for next job in parallel to an active
> mmc request.
>
> This is done by making the issue_rw_rq() non-blocking.
> The increase in throughput is proportional to the time it takes to
> prepare (major part of preparations is dma_map_sg and dma_unmap_sg)
> a request and how fast the memory is. The faster the MMC/SD is
> the more significant the prepare request time becomes. Measurements on U5500
> and Panda on eMMC and SD shows significant performance gain for large
> reads when running DMA mode. In the PIO case the performance is unchanged.
>
> There are two optional hooks pre_req() and post_req() that the host driver
> may implement in order to move work to before and after the actual mmc_request
> function is called. In the DMA case pre_req() may do dma_map_sg() and prepare
> the dma descriptor and post_req runs the dma_unmap_sg.
>
> Details on measurements from IOZone and mmc_test:
> https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
>
> Changes since v7:
>  * rebase on mmc-next, on top of Russell's updated error handling.
>  * Clarify description of mmc_start_req()
>  * Resolve compile without CONFIG_DMA_ENIGNE issue for mmci
>  * Add mmc test to measure how performance is affected by sg length
>  * Add missing wait_for_busy in mmc_test non-blocking test. This call got lost
>   in v4 of this patchset when refactoring mmc_start_req.
>  * Add sub-prefix (core block queue) to relevant patches.
>
> Per Forlin (12):
>  mmc: core: add non-blocking mmc request function
>  omap_hsmmc: add support for pre_req and post_req
>  mmci: implement pre_req() and post_req()
>  mmc: mmc_test: add debugfs file to list all tests
>  mmc: mmc_test: add test for non-blocking transfers
>  mmc: mmc_test: test to measure how sg_len affect performance
>  mmc: block: add member in mmc queue struct to hold request data
>  mmc: block: add a block request prepare function
>  mmc: block: move error code in issue_rw_rq to a separate function.
>  mmc: queue: add a second mmc queue request member
>  mmc: core: add random fault injection
>  mmc: block: add handling for two parallel block requests in
>    issue_rw_rq
>
>  drivers/mmc/card/block.c      |  505 ++++++++++++++++++++++++-----------------
>  drivers/mmc/card/mmc_test.c   |  491 ++++++++++++++++++++++++++++++++++++++--
>  drivers/mmc/card/queue.c      |  184 ++++++++++------
>  drivers/mmc/card/queue.h      |   33 ++-
>  drivers/mmc/core/core.c       |  167 +++++++++++++-
>  drivers/mmc/core/debugfs.c    |    5 +
>  drivers/mmc/host/mmci.c       |  147 +++++++++++-
>  drivers/mmc/host/mmci.h       |    8 +
>  drivers/mmc/host/omap_hsmmc.c |   87 +++++++-
>  include/linux/mmc/core.h      |    6 +-
>  include/linux/mmc/host.h      |   24 ++
>  lib/Kconfig.debug             |   11 +
>  12 files changed, 1345 insertions(+), 323 deletions(-)



Boot tested on Omap4430 Blaze board.

Tested-by: Sourav Poddar<sourav.poddar@ti.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency
  2011-06-30 12:36 ` Poddar, Sourav
@ 2011-06-30 13:11   ` S, Venkatraman
  0 siblings, 0 replies; 27+ messages in thread
From: S, Venkatraman @ 2011-06-30 13:11 UTC (permalink / raw)
  To: Poddar, Sourav
  Cc: Per Forlin, linaro-dev, Nicolas Pitre, linux-arm-kernel,
	linux-kernel, linux-mmc, Nickolay Nickolaev, Linus Walleij,
	Chris Ball

On Thu, Jun 30, 2011 at 6:06 PM, Poddar, Sourav <sourav.poddar@ti.com> wrote:
> On Tue, Jun 28, 2011 at 1:41 PM, Per Forlin <per.forlin@linaro.org> wrote:
>> How significant is the cache maintenance over head?
>> It depends, the eMMC are much faster now
>> compared to a few years ago and cache maintenance cost more due to
>> multiple cache levels and speculative cache pre-fetch. In relation the
>> cost for handling the caches have increased and is now a bottle neck
>> dealing with fast eMMC together with DMA.
>>
>> The intention for introducing non-blocking mmc requests is to minimize the
>> time between a mmc request ends and another mmc request starts. In the
>> current implementation the MMC controller is idle when dma_map_sg and
>> dma_unmap_sg is processing. Introducing non-blocking mmc request makes it
>> possible to prepare the caches for next job in parallel to an active
>> mmc request.
>>
>> This is done by making the issue_rw_rq() non-blocking.
>> The increase in throughput is proportional to the time it takes to
>> prepare (major part of preparations is dma_map_sg and dma_unmap_sg)
>> a request and how fast the memory is. The faster the MMC/SD is
>> the more significant the prepare request time becomes. Measurements on U5500
>> and Panda on eMMC and SD shows significant performance gain for large
>> reads when running DMA mode. In the PIO case the performance is unchanged.
>>
>> There are two optional hooks pre_req() and post_req() that the host driver
>> may implement in order to move work to before and after the actual mmc_request
>> function is called. In the DMA case pre_req() may do dma_map_sg() and prepare
>> the dma descriptor and post_req runs the dma_unmap_sg.
>>
>> Details on measurements from IOZone and mmc_test:
>> https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
>>
>> Changes since v7:
>>  * rebase on mmc-next, on top of Russell's updated error handling.
>>  * Clarify description of mmc_start_req()
>>  * Resolve compile without CONFIG_DMA_ENIGNE issue for mmci
>>  * Add mmc test to measure how performance is affected by sg length
>>  * Add missing wait_for_busy in mmc_test non-blocking test. This call got lost
>>   in v4 of this patchset when refactoring mmc_start_req.
>>  * Add sub-prefix (core block queue) to relevant patches.
>>
>> Per Forlin (12):
>>  mmc: core: add non-blocking mmc request function
>>  omap_hsmmc: add support for pre_req and post_req
>>  mmci: implement pre_req() and post_req()
>>  mmc: mmc_test: add debugfs file to list all tests
>>  mmc: mmc_test: add test for non-blocking transfers
>>  mmc: mmc_test: test to measure how sg_len affect performance
>>  mmc: block: add member in mmc queue struct to hold request data
>>  mmc: block: add a block request prepare function
>>  mmc: block: move error code in issue_rw_rq to a separate function.
>>  mmc: queue: add a second mmc queue request member
>>  mmc: core: add random fault injection
>>  mmc: block: add handling for two parallel block requests in
>>    issue_rw_rq
>>
>>  drivers/mmc/card/block.c      |  505 ++++++++++++++++++++++++-----------------
>>  drivers/mmc/card/mmc_test.c   |  491 ++++++++++++++++++++++++++++++++++++++--
>>  drivers/mmc/card/queue.c      |  184 ++++++++++------
>>  drivers/mmc/card/queue.h      |   33 ++-
>>  drivers/mmc/core/core.c       |  167 +++++++++++++-
>>  drivers/mmc/core/debugfs.c    |    5 +
>>  drivers/mmc/host/mmci.c       |  147 +++++++++++-
>>  drivers/mmc/host/mmci.h       |    8 +
>>  drivers/mmc/host/omap_hsmmc.c |   87 +++++++-
>>  include/linux/mmc/core.h      |    6 +-
>>  include/linux/mmc/host.h      |   24 ++
>>  lib/Kconfig.debug             |   11 +
>>  12 files changed, 1345 insertions(+), 323 deletions(-)
>
>
>
> Boot tested on Omap4430 Blaze board.
>
> Tested-by: Sourav Poddar<sourav.poddar@ti.com>
>
Reviewed for OMAP along with Sourav's tests..
Reviewed-by: Venkatraman S <svenkatr@ti.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (13 preceding siblings ...)
  2011-06-30 12:36 ` Poddar, Sourav
@ 2011-06-30 13:12 ` Arnd Bergmann
  2011-06-30 13:30   ` Russell King - ARM Linux
  2011-07-01 14:39 ` Linus Walleij
  15 siblings, 1 reply; 27+ messages in thread
From: Arnd Bergmann @ 2011-06-30 13:12 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Per Forlin, linaro-dev, Nicolas Pitre, linux-kernel, linux-mmc,
	Nickolay Nickolaev, Venkatraman S, Linus Walleij, Chris Ball

On Tuesday 28 June 2011, Per Forlin wrote:
> How significant is the cache maintenance over head?
> It depends, the eMMC are much faster now
> compared to a few years ago and cache maintenance cost more due to
> multiple cache levels and speculative cache pre-fetch. In relation the
> cost for handling the caches have increased and is now a bottle neck
> dealing with fast eMMC together with DMA.
> 
> The intention for introducing non-blocking mmc requests is to minimize the
> time between a mmc request ends and another mmc request starts. In the
> current implementation the MMC controller is idle when dma_map_sg and
> dma_unmap_sg is processing. Introducing non-blocking mmc request makes it
> possible to prepare the caches for next job in parallel to an active
> mmc request.
> 
> This is done by making the issue_rw_rq() non-blocking.
> The increase in throughput is proportional to the time it takes to
> prepare (major part of preparations is dma_map_sg and dma_unmap_sg)
> a request and how fast the memory is. The faster the MMC/SD is
> the more significant the prepare request time becomes. Measurements on U5500
> and Panda on eMMC and SD shows significant performance gain for large
> reads when running DMA mode. In the PIO case the performance is unchanged.
> 
> There are two optional hooks pre_req() and post_req() that the host driver
> may implement in order to move work to before and after the actual mmc_request
> function is called. In the DMA case pre_req() may do dma_map_sg() and prepare
> the dma descriptor and post_req runs the dma_unmap_sg.

I think this looks good enough to merge into the linux-mmc tree, the code is
clean and the benefits are clear.

Acked-by: Arnd Bergmann <arnd@arndb.de>

One logical follow-up as both a cleanup and performance optimization would be
to get rid of the mmc_queue_thread completely. When mmc_blk_issue_rq() is
non-blocking always, you can call it directly from the mmc_request()
function, instead of waking up another thread to do it for you.

	Arnd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency
  2011-06-30 13:12 ` Arnd Bergmann
@ 2011-06-30 13:30   ` Russell King - ARM Linux
  2011-07-01 16:44     ` Arnd Bergmann
  0 siblings, 1 reply; 27+ messages in thread
From: Russell King - ARM Linux @ 2011-06-30 13:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Nicolas Pitre, linaro-dev, Linus Walleij,
	linux-mmc, linux-kernel, Chris Ball, Per Forlin,
	Nickolay Nickolaev

On Thu, Jun 30, 2011 at 03:12:46PM +0200, Arnd Bergmann wrote:
> I think this looks good enough to merge into the linux-mmc tree, the code is
> clean and the benefits are clear.
> 
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> 
> One logical follow-up as both a cleanup and performance optimization would be
> to get rid of the mmc_queue_thread completely. When mmc_blk_issue_rq() is
> non-blocking always, you can call it directly from the mmc_request()
> function, instead of waking up another thread to do it for you.

It isn't anywhere near that - because you need to wait for the
request to complete, then analyze the results and if there has been
an error, send more commands and wait for their response.

To do all that in an asynchronous fashion will just create a mess of
small little functions with hard to understand code.  It's far better
to do all that in a clear procedural way in a thread.

We've been here before - with PCMCIA's card insertion code, where you
have to go through a sequence of events (insert, power up, reset, etc).
The PCMCIA code used to have a collection of small functions to do
each step, one chained after the other in a state machine fashion.
The result was horrid.  That's exactly what you'll end up with here.

Threads have their place, and this is one of them.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 05/12] mmc: mmc_test: add test for non-blocking transfers
  2011-06-28  8:11 ` [PATCH v8 05/12] mmc: mmc_test: add test for non-blocking transfers Per Forlin
@ 2011-07-01 13:29   ` Per Forlin
  0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-07-01 13:29 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

On 28 June 2011 10:11, Per Forlin <per.forlin@linaro.org> wrote:
> Add four tests for read and write performance per
> different transfer size, 4k to 4M.
>  * Read using blocking mmc request
>  * Read using non-blocking mmc request
>  * Write using blocking mmc request
>  * Write using non-blocking mmc request
>
> The host dirver must support pre_req() and post_req()
> in order to run the non-blocking test cases.
>
> Signed-off-by: Per Forlin <per.forlin@linaro.org>
> ---
>  drivers/mmc/card/mmc_test.c |  311 +++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 303 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/mmc/card/mmc_test.c b/drivers/mmc/card/mmc_test.c
> +static int mmc_test_nonblock_transfer(struct mmc_test_card *test,
> +                                     struct scatterlist *sg, unsigned sg_len,
> +                                     unsigned dev_addr, unsigned blocks,
> +                                     unsigned blksz, int write, int count)
> +{
> +       struct mmc_request mrq1;
> +       struct mmc_command cmd1;
> +       struct mmc_command stop1;
> +       struct mmc_data data1;
> +
> +       struct mmc_request mrq2;
> +       struct mmc_command cmd2;
> +       struct mmc_command stop2;
> +       struct mmc_data data2;
> +
> +       struct mmc_test_async_req test_areq[2];
> +       struct mmc_async_req *done_areq;
> +       struct mmc_async_req *cur_areq = &test_areq[0].areq;
> +       struct mmc_async_req *other_areq = &test_areq[1].areq;
> +       int i;
> +       int ret;
> +
> +       test_areq[0].test = test;
> +       test_areq[1].test = test;
> +
> +       if (!test->card->host->ops->pre_req ||
> +               !test->card->host->ops->post_req)
> +               return -RESULT_UNSUP_HOST;
Remove this error check. It is fine to run this test without these
hooks, but there will be no performance gain compared to blocking
request.

...

> +/*
> + * Multiple blocking write 4k to 4 MB chunks
> + */
> +static int mmc_test_profile_mult_write_blocking_perf(struct mmc_test_card *test)
> +{
> +       unsigned int bs[] = {1 << 12, 1 << 13, 1 << 14, 1 << 15, 1 << 16,
> +                            1 << 17, 1 << 18, 1 << 19, 1 << 20, 1 << 22};
> +       struct mmc_test_multiple_rw test_data = {
> +               .bs = bs,
> +               .size = 128*1024*1024,
I got this comment from Linus W.
Use TEST_AREA_MAX_SIZE instead of hard coded 128*1024*1024. update all
relevant functions.

/Per

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 06/12] mmc: mmc_test: test to measure how sg_len affect performance
  2011-06-28  8:11 ` [PATCH v8 06/12] mmc: mmc_test: test to measure how sg_len affect performance Per Forlin
@ 2011-07-01 13:33   ` Per Forlin
  0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-07-01 13:33 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Linus Walleij
  Cc: Chris Ball, Per Forlin

On 28 June 2011 10:11, Per Forlin <per.forlin@linaro.org> wrote:
> test that measures how the mmc bandwidth depends on the numbers of sg elements
> in the sg list. The transfer size if fixed and sg length goes from a few up
> to 512. The purpose is to measure overhead caused by multiple sg elements.
>
> Signed-off-by: Per Forlin <per.forlin@linaro.org>
> ---
>  drivers/mmc/card/mmc_test.c |  151 +++++++++++++++++++++++++++++++++++++++----
>  1 files changed, 139 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/mmc/card/mmc_test.c b/drivers/mmc/card/mmc_test.c
> -static int mmc_test_map_sg(struct mmc_test_mem *mem, unsigned long sz,
> +static int mmc_test_map_sg(struct mmc_test_mem *mem, unsigned long size,
>                           struct scatterlist *sglist, int repeat,
>                           unsigned int max_segs, unsigned int max_seg_sz,
> -                          unsigned int *sg_len)
> +                          unsigned int *sg_len, int min_sg_len)
>  {
>        struct scatterlist *sg = NULL;
>        unsigned int i;
> +       unsigned long sz = size;
>
>        sg_init_table(sglist, max_segs);
> +       if (min_sg_len > max_segs)
> +               min_sg_len = max_segs;
>
>        *sg_len = 0;
>        do {
>                for (i = 0; i < mem->cnt; i++) {
>                        unsigned long len = PAGE_SIZE << mem->arr[i].order;
>
> +                       if (min_sg_len && (size / min_sg_len < len))
> +                               len = size / min_sg_len;
Make aligned to 512
len = ALIGN(size / min_sg_len, 512);

I run into this issue running on arm realview with max test size 128k
instead of 128M.

/Per

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency
  2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
                   ` (14 preceding siblings ...)
  2011-06-30 13:12 ` Arnd Bergmann
@ 2011-07-01 14:39 ` Linus Walleij
  15 siblings, 0 replies; 27+ messages in thread
From: Linus Walleij @ 2011-07-01 14:39 UTC (permalink / raw)
  To: Per Forlin
  Cc: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, Nickolay Nickolaev, Venkatraman S, Chris Ball

On Tue, Jun 28, 2011 at 10:11 AM, Per Forlin <per.forlin@linaro.org> wrote:

> This is done by making the issue_rw_rq() non-blocking.
> The increase in throughput is proportional to the time it takes to
> prepare (major part of preparations is dma_map_sg and dma_unmap_sg)
> a request and how fast the memory is. The faster the MMC/SD is
> the more significant the prepare request time becomes. Measurements on U5500
> and Panda on eMMC and SD shows significant performance gain for large
> reads when running DMA mode. In the PIO case the performance is unchanged.

I compiled the patch set on top of latest mmc-next, had Per come over
to my desk and fix some test cases, then ran the new stress tests
on U300 plus mounted block device and performed read & write.

I found a bug in COH901318 DMA on the way and now the tests
runs run cleanly. (Patch will go to DMAengine maninainer Vinod.)

Test results below: conclusion is that not much performance is
gained on U300 with MMCI/PL180, this is because we have no
L2 cache, but we still get a small improvement of 1/2 to 1 s per test
case.

The code looks good too.

Tested/Acked-by: Linus Walleij <linus.walleij@linaro.org>

[  331.601747] mmc0: Starting tests of card mmc0:e624...
[  331.606902] mmc0: Test case 37. Write performance with blocking req
4k to 4MB...
[  378.117553] mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB)
took 46.502646972 seconds (2886 kB/s, 2818 KiB/s, 704.64 IOPS, sg_len
1)
[  413.659600] mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB)
took 35.529431000 seconds (3777 kB/s, 3689 KiB/s, 461.13 IOPS, sg_len
1)
[  443.270662] mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB)
took 29.598359002 seconds (4534 kB/s, 4428 KiB/s, 276.77 IOPS, sg_len
1)
[  469.837460] mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB)
took 26.554253999 seconds (5054 kB/s, 4936 KiB/s, 154.25 IOPS, sg_len
1)
[  497.702775] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.852746003 seconds (4818 kB/s, 4705 KiB/s, 73.52 IOPS, sg_len
1)
[  525.100160] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.384628001 seconds (4901 kB/s, 4786 KiB/s, 74.78 IOPS, sg_len
1)
[  552.955832] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.842956000 seconds (4820 kB/s, 4707 KiB/s, 73.55 IOPS, sg_len
1)
[  580.339398] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.370849000 seconds (4903 kB/s, 4788 KiB/s, 74.82 IOPS, sg_len
1)
[  607.985578] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.633430000 seconds (4857 kB/s, 4743 KiB/s, 74.11 IOPS, sg_len
1)
[  635.512579] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.514265002 seconds (4878 kB/s, 4763 KiB/s, 74.43 IOPS, sg_len
1)
[  635.525193] mmc0: Result: OK
[  635.528368] mmc0: Tests completed.
[  635.533104] mmc0: Starting tests of card mmc0:e624...
[  635.538244] mmc0: Test case 38. Write performance with non-blocking
req 4k to 4MB...
[  681.296218] mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB)
took 45.749655000 seconds (2933 kB/s, 2864 KiB/s, 716.24 IOPS, sg_len
1)
[  716.089227] mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB)
took 34.780447000 seconds (3858 kB/s, 3768 KiB/s, 471.06 IOPS, sg_len
1)
[  744.828042] mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB)
took 28.726150001 seconds (4672 kB/s, 4562 KiB/s, 285.17 IOPS, sg_len
1)
[  771.174677] mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB)
took 26.334063000 seconds (5096 kB/s, 4977 KiB/s, 155.53 IOPS, sg_len
1)
[  798.191207] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.003975000 seconds (4970 kB/s, 4853 KiB/s, 75.84 IOPS, sg_len
1)
[  825.588017] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.384043001 seconds (4901 kB/s, 4786 KiB/s, 74.78 IOPS, sg_len
1)
[  852.277635] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 26.676835000 seconds (5031 kB/s, 4913 KiB/s, 76.77 IOPS, sg_len
1)
[  879.488620] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.198205999 seconds (4934 kB/s, 4819 KiB/s, 75.29 IOPS, sg_len
1)
[  906.495492] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 26.994123001 seconds (4972 kB/s, 4855 KiB/s, 75.86 IOPS, sg_len
1)
[  933.427449] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 26.919235001 seconds (4985 kB/s, 4869 KiB/s, 76.07 IOPS, sg_len
1)
[  933.440075] mmc0: Result: OK
[  933.443247] mmc0: Tests completed.
[  933.447856] mmc0: Starting tests of card mmc0:e624...
[  933.453191] mmc0: Test case 39. Read performance with blocking req
4k to 4MB...
[  967.234708] mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB)
took 33.773703000 seconds (3974 kB/s, 3880 KiB/s, 970.22 IOPS, sg_len
1)
[  991.857781] mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB)
took 24.610504001 seconds (5453 kB/s, 5325 KiB/s, 665.73 IOPS, sg_len
1)
[ 1011.802479] mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB)
took 19.932036000 seconds (6733 kB/s, 6575 KiB/s, 410.99 IOPS, sg_len
1)
[ 1029.388711] mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB)
took 17.573680001 seconds (7637 kB/s, 7458 KiB/s, 233.07 IOPS, sg_len
1)
[ 1045.644443] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.243148000 seconds (8262 kB/s, 8069 KiB/s, 126.08 IOPS, sg_len
1)
[ 1061.899985] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.242733002 seconds (8263 kB/s, 8069 KiB/s, 126.08 IOPS, sg_len
1)
[ 1078.146701] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.233908999 seconds (8267 kB/s, 8073 KiB/s, 126.15 IOPS, sg_len
1)
[ 1094.402387] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.242875002 seconds (8263 kB/s, 8069 KiB/s, 126.08 IOPS, sg_len
1)
[ 1110.649158] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.233967999 seconds (8267 kB/s, 8073 KiB/s, 126.15 IOPS, sg_len
1)
[ 1126.905416] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.243438001 seconds (8262 kB/s, 8069 KiB/s, 126.08 IOPS, sg_len
1)
[ 1126.918129] mmc0: Result: OK
[ 1126.921358] mmc0: Tests completed.
[ 1126.925955] mmc0: Starting tests of card mmc0:e624...
[ 1126.931289] mmc0: Test case 40. Read performance with non-blocking
req 4k to 4MB...
[ 1159.685208] mmc0: Transfer of 32768 x 8 sectors (32768 x 4 KiB)
took 32.745868000 seconds (4098 kB/s, 4002 KiB/s, 1000.67 IOPS, sg_len
1)
[ 1183.516766] mmc0: Transfer of 16384 x 16 sectors (16384 x 8 KiB)
took 23.818903999 seconds (5634 kB/s, 5502 KiB/s, 687.85 IOPS, sg_len
1)
[ 1202.827382] mmc0: Transfer of 8192 x 32 sectors (8192 x 16 KiB)
took 19.297962001 seconds (6955 kB/s, 6792 KiB/s, 424.50 IOPS, sg_len
1)
[ 1219.886157] mmc0: Transfer of 4096 x 64 sectors (4096 x 32 KiB)
took 17.046200000 seconds (7873 kB/s, 7689 KiB/s, 240.28 IOPS, sg_len
1)
[ 1235.638313] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.739587001 seconds (8527 kB/s, 8327 KiB/s, 130.11 IOPS, sg_len
1)
[ 1251.391234] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.740097000 seconds (8526 kB/s, 8327 KiB/s, 130.11 IOPS, sg_len
1)
[ 1267.143799] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.739750001 seconds (8527 kB/s, 8327 KiB/s, 130.11 IOPS, sg_len
1)
[ 1282.896571] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.739964000 seconds (8527 kB/s, 8327 KiB/s, 130.11 IOPS, sg_len
1)
[ 1298.649986] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.740602001 seconds (8526 kB/s, 8326 KiB/s, 130.10 IOPS, sg_len
1)
[ 1314.394199] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.731410000 seconds (8531 kB/s, 8331 KiB/s, 130.18 IOPS, sg_len
1)
[ 1314.406920] mmc0: Result: OK
[ 1314.410167] mmc0: Tests completed.
[ 1314.414783] mmc0: Starting tests of card mmc0:e624...
[ 1314.420123] mmc0: Test case 41. Write performance blocking req 1 to
512 sg elems...
[ 1342.241715] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.813536000 seconds (4825 kB/s, 4712 KiB/s, 73.63 IOPS, sg_len
1)
[ 1369.319673] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.065227999 seconds (4958 kB/s, 4842 KiB/s, 75.66 IOPS, sg_len
8)
[ 1396.773703] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.441285001 seconds (4891 kB/s, 4776 KiB/s, 74.63 IOPS, sg_len
16)
[ 1423.675432] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 26.888910001 seconds (4991 kB/s, 4874 KiB/s, 76.16 IOPS, sg_len
16)
[ 1451.239203] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.550955999 seconds (4871 kB/s, 4757 KiB/s, 74.33 IOPS, sg_len
16)
[ 1478.262309] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.010269002 seconds (4969 kB/s, 4852 KiB/s, 75.82 IOPS, sg_len
16)
[ 1505.491671] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.216503001 seconds (4931 kB/s, 4815 KiB/s, 75.24 IOPS, sg_len
16)
[ 1532.747882] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.243356999 seconds (4926 kB/s, 4811 KiB/s, 75.17 IOPS, sg_len
16)
[ 1532.760607] mmc0: Result: OK
[ 1532.763779] mmc0: Tests completed.
[ 1532.768387] mmc0: Starting tests of card mmc0:e624...
[ 1532.773722] mmc0: Test case 42. Write performance non-blocking req
1 to 512 sg elems...
[ 1559.686860] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 26.904625000 seconds (4988 kB/s, 4871 KiB/s, 76.12 IOPS, sg_len
1)
[ 1586.632702] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 26.933068001 seconds (4983 kB/s, 4866 KiB/s, 76.04 IOPS, sg_len
8)
[ 1613.014844] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 26.369411000 seconds (5089 kB/s, 4970 KiB/s, 77.66 IOPS, sg_len
16)
[ 1640.120694] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 27.092996001 seconds (4953 kB/s, 4837 KiB/s, 75.59 IOPS, sg_len
16)
[ 1666.593943] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 26.460398000 seconds (5072 kB/s, 4953 KiB/s, 77.39 IOPS, sg_len
16)
[ 1693.477690] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 26.870933000 seconds (4994 kB/s, 4877 KiB/s, 76.21 IOPS, sg_len
16)
[ 1719.918133] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 26.427604001 seconds (5078 kB/s, 4959 KiB/s, 77.49 IOPS, sg_len
16)
[ 1746.761038] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 26.830035002 seconds (5002 kB/s, 4885 KiB/s, 76.33 IOPS, sg_len
16)
[ 1746.773743] mmc0: Result: OK
[ 1746.776905] mmc0: Tests completed.
[ 1746.781603] mmc0: Starting tests of card mmc0:e624...
[ 1746.786742] mmc0: Test case 43. Read performance blocking req 1 to
512 sg elems...
[ 1763.028662] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.233791001 seconds (8267 kB/s, 8073 KiB/s, 126.15 IOPS, sg_len
1)
[ 1779.313875] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.272372001 seconds (8248 kB/s, 8054 KiB/s, 125.85 IOPS, sg_len
8)
[ 1795.625488] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.298793000 seconds (8234 kB/s, 8041 KiB/s, 125.65 IOPS, sg_len
16)
[ 1811.937588] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.299186000 seconds (8234 kB/s, 8041 KiB/s, 125.65 IOPS, sg_len
16)
[ 1828.249349] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.298847000 seconds (8234 kB/s, 8041 KiB/s, 125.65 IOPS, sg_len
16)
[ 1844.561499] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.299234002 seconds (8234 kB/s, 8041 KiB/s, 125.65 IOPS, sg_len
16)
[ 1860.864668] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.290268999 seconds (8239 kB/s, 8045 KiB/s, 125.71 IOPS, sg_len
16)
[ 1877.177045] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 16.299461001 seconds (8234 kB/s, 8041 KiB/s, 125.64 IOPS, sg_len
16)
[ 1877.189848] mmc0: Result: OK
[ 1877.193031] mmc0: Tests completed.
[ 1877.197628] mmc0: Starting tests of card mmc0:e624...
[ 1877.202958] mmc0: Test case 44. Read performance non-blocking req 1
to 512 sg elems...
[ 1892.939499] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.728134000 seconds (8533 kB/s, 8333 KiB/s, 130.21 IOPS, sg_len
1)
[ 1908.693056] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.740710002 seconds (8526 kB/s, 8326 KiB/s, 130.10 IOPS, sg_len
8)
[ 1924.437735] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.731847999 seconds (8531 kB/s, 8331 KiB/s, 130.18 IOPS, sg_len
16)
[ 1940.190363] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.739700003 seconds (8527 kB/s, 8327 KiB/s, 130.11 IOPS, sg_len
16)
[ 1955.935298] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.732027999 seconds (8531 kB/s, 8331 KiB/s, 130.18 IOPS, sg_len
16)
[ 1971.688298] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.740083001 seconds (8526 kB/s, 8327 KiB/s, 130.11 IOPS, sg_len
16)
[ 1987.441782] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.740559000 seconds (8526 kB/s, 8326 KiB/s, 130.10 IOPS, sg_len
16)
[ 2003.195375] mmc0: Transfer of 2048 x 127 sectors (2048 x 63.5 KiB)
took 15.740680001 seconds (8526 kB/s, 8326 KiB/s, 130.10 IOPS, sg_len
16)
[ 2003.208170] mmc0: Result: OK
[ 2003.211401] mmc0: Tests completed.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency
  2011-06-30 13:30   ` Russell King - ARM Linux
@ 2011-07-01 16:44     ` Arnd Bergmann
  2011-07-02 12:29       ` Russell King - ARM Linux
  0 siblings, 1 reply; 27+ messages in thread
From: Arnd Bergmann @ 2011-07-01 16:44 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arm-kernel, Nicolas Pitre, linaro-dev, Linus Walleij,
	linux-mmc, linux-kernel, Chris Ball, Per Forlin,
	Nickolay Nickolaev

On Thursday 30 June 2011, Russell King - ARM Linux wrote:
> We've been here before - with PCMCIA's card insertion code, where you
> have to go through a sequence of events (insert, power up, reset, etc).
> The PCMCIA code used to have a collection of small functions to do
> each step, one chained after the other in a state machine fashion.
> The result was horrid.  That's exactly what you'll end up with here.
> 
> Threads have their place, and this is one of them.

Ok, fair enough. The performance enhancement is certainly here already
with getting the cache management operations out of the hot path,
and for the fully asynchronous case it's not getting better by trying
to be smarter.

At least for ARM, the overhead of the DMA mapping operations will
dwarf the overhead of the extra context switches for the foreseeable
future, so we don't need to bother.

Things might be different for coherent low-end CPU cores like Atom
when mmc device become much faster and block access becomes CPU
bound.

	Arnd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency
  2011-07-01 16:44     ` Arnd Bergmann
@ 2011-07-02 12:29       ` Russell King - ARM Linux
  2011-07-02 19:37         ` Arnd Bergmann
  0 siblings, 1 reply; 27+ messages in thread
From: Russell King - ARM Linux @ 2011-07-02 12:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Nicolas Pitre, linaro-dev, Linus Walleij,
	linux-mmc, linux-kernel, Chris Ball, Per Forlin,
	Nickolay Nickolaev

On Fri, Jul 01, 2011 at 06:44:43PM +0200, Arnd Bergmann wrote:
> On Thursday 30 June 2011, Russell King - ARM Linux wrote:
> > We've been here before - with PCMCIA's card insertion code, where you
> > have to go through a sequence of events (insert, power up, reset, etc).
> > The PCMCIA code used to have a collection of small functions to do
> > each step, one chained after the other in a state machine fashion.
> > The result was horrid.  That's exactly what you'll end up with here.
> > 
> > Threads have their place, and this is one of them.
> 
> Ok, fair enough. The performance enhancement is certainly here already
> with getting the cache management operations out of the hot path,
> and for the fully asynchronous case it's not getting better by trying
> to be smarter.
> 
> At least for ARM, the overhead of the DMA mapping operations will
> dwarf the overhead of the extra context switches for the foreseeable
> future, so we don't need to bother.
> 
> Things might be different for coherent low-end CPU cores like Atom
> when mmc device become much faster and block access becomes CPU
> bound.

One other thing to be considered here is whether this idea should be
limited to just MMC or whether it should be extended further, to
move the DMA mapping stuff out of the hot path for other block devices
too.

There are ARM systems with SATA which do 28MB/s - which could be
improved by this technique.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency
  2011-07-02 12:29       ` Russell King - ARM Linux
@ 2011-07-02 19:37         ` Arnd Bergmann
  2011-07-03 20:53           ` Per Forlin
  0 siblings, 1 reply; 27+ messages in thread
From: Arnd Bergmann @ 2011-07-02 19:37 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arm-kernel, Nicolas Pitre, linaro-dev, Linus Walleij,
	linux-mmc, linux-kernel, Chris Ball, Per Forlin,
	Nickolay Nickolaev

On Saturday 02 July 2011 14:29:38 Russell King - ARM Linux wrote:
> One other thing to be considered here is whether this idea should be
> limited to just MMC or whether it should be extended further, to
> move the DMA mapping stuff out of the hot path for other block devices
> too.
> 
> There are ARM systems with SATA which do 28MB/s - which could be
> improved by this technique.

Excellent point. We had discussed SATA items in the context of Linaro
work before, and the conclusion was always that we wouldn't need to
do any work for it in the common code. I'll make sure that we bring
it up at the meeting next month in Cambridge so we can officially
assign someone to do this.

	Arnd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency
  2011-07-02 19:37         ` Arnd Bergmann
@ 2011-07-03 20:53           ` Per Forlin
  2011-07-04  1:07             ` Nicolas Pitre
  0 siblings, 1 reply; 27+ messages in thread
From: Per Forlin @ 2011-07-03 20:53 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Russell King - ARM Linux, linux-arm-kernel, Nicolas Pitre,
	linaro-dev, Linus Walleij, linux-mmc, linux-kernel, Chris Ball,
	Nickolay Nickolaev

On 2 July 2011 21:37, Arnd Bergmann <arnd@arndb.de> wrote:
> On Saturday 02 July 2011 14:29:38 Russell King - ARM Linux wrote:
>> One other thing to be considered here is whether this idea should be
>> limited to just MMC or whether it should be extended further, to
>> move the DMA mapping stuff out of the hot path for other block devices
>> too.
>>
>> There are ARM systems with SATA which do 28MB/s - which could be
>> improved by this technique.
>
> Excellent point. We had discussed SATA items in the context of Linaro
> work before, and the conclusion was always that we wouldn't need to
> do any work for it in the common code. I'll make sure that we bring
> it up at the meeting next month in Cambridge so we can officially
> assign someone to do this.
>
Arnd, do you know if any of the Linaro boards have SATA?

Regards,
Per

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency
  2011-07-03 20:53           ` Per Forlin
@ 2011-07-04  1:07             ` Nicolas Pitre
  0 siblings, 0 replies; 27+ messages in thread
From: Nicolas Pitre @ 2011-07-04  1:07 UTC (permalink / raw)
  To: Per Forlin
  Cc: Arnd Bergmann, Russell King - ARM Linux, linux-arm-kernel,
	linaro-dev, Linus Walleij, linux-mmc, lkml, Chris Ball,
	Nickolay Nickolaev

On Sun, 3 Jul 2011, Per Forlin wrote:

> On 2 July 2011 21:37, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Saturday 02 July 2011 14:29:38 Russell King - ARM Linux wrote:
> >> One other thing to be considered here is whether this idea should be
> >> limited to just MMC or whether it should be extended further, to
> >> move the DMA mapping stuff out of the hot path for other block devices
> >> too.
> >>
> >> There are ARM systems with SATA which do 28MB/s - which could be
> >> improved by this technique.
> >
> > Excellent point. We had discussed SATA items in the context of Linaro
> > work before, and the conclusion was always that we wouldn't need to
> > do any work for it in the common code. I'll make sure that we bring
> > it up at the meeting next month in Cambridge so we can officially
> > assign someone to do this.
> >
> Arnd, do you know if any of the Linaro boards have SATA?

The latest i.mx53 boards have SATA.


Nicolas

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-07-04  1:07 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-28  8:11 [PATCH v8 00/12] use nonblock mmc requests to minimize latency Per Forlin
2011-06-28  8:11 ` [PATCH v8 01/12] mmc: core: add non-blocking mmc request function Per Forlin
2011-06-28  8:11 ` [PATCH v8 02/12] omap_hsmmc: add support for pre_req and post_req Per Forlin
2011-06-28  8:11 ` [PATCH v8 03/12] mmci: implement pre_req() and post_req() Per Forlin
2011-06-28  8:11 ` [PATCH v8 04/12] mmc: mmc_test: add debugfs file to list all tests Per Forlin
2011-06-28  8:11 ` [PATCH v8 05/12] mmc: mmc_test: add test for non-blocking transfers Per Forlin
2011-07-01 13:29   ` Per Forlin
2011-06-28  8:11 ` [PATCH v8 06/12] mmc: mmc_test: test to measure how sg_len affect performance Per Forlin
2011-07-01 13:33   ` Per Forlin
2011-06-28  8:11 ` [PATCH v8 07/12] mmc: block: add member in mmc queue struct to hold request data Per Forlin
2011-06-28  8:11 ` [PATCH v8 08/12] mmc: block: add a block request prepare function Per Forlin
2011-06-28  8:11 ` [PATCH v8 09/12] mmc: block: move error code in issue_rw_rq to a separate function Per Forlin
2011-06-28  8:11 ` [PATCH v8 10/12] mmc: queue: add a second mmc queue request member Per Forlin
2011-06-28  8:11 ` [PATCH v8 11/12] mmc: core: add random fault injection Per Forlin
2011-06-28  8:11 ` [PATCH v8 12/12] mmc: block: add handling for two parallel block requests in issue_rw_rq Per Forlin
2011-06-28  9:39   ` Per Forlin
2011-06-28  9:54 ` [PATCH v8 00/12] use nonblock mmc requests to minimize latency Kyungmin Park
2011-06-30 12:36 ` Poddar, Sourav
2011-06-30 13:11   ` S, Venkatraman
2011-06-30 13:12 ` Arnd Bergmann
2011-06-30 13:30   ` Russell King - ARM Linux
2011-07-01 16:44     ` Arnd Bergmann
2011-07-02 12:29       ` Russell King - ARM Linux
2011-07-02 19:37         ` Arnd Bergmann
2011-07-03 20:53           ` Per Forlin
2011-07-04  1:07             ` Nicolas Pitre
2011-07-01 14:39 ` Linus Walleij

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).