All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6] mmc: documentation of mmc non-blocking request usage and design.
@ 2011-07-10 19:21 ` Per Forlin
  0 siblings, 0 replies; 9+ messages in thread
From: Per Forlin @ 2011-07-10 19:21 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, linux-doc, Venkatraman S, Linus Walleij,
	Kyungmin Park, Arnd Bergmann, Sourav Poddar, Chris Ball,
	J Freyensee
  Cc: Randy Dunlap, Per Forlin

Documentation about the background and the design of mmc non-blocking.
Host driver guidelines to minimize request preparation overhead.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
---
ChangeLog:
 v2: - Minor updates after proofreading comments from Chris
 v3: - Minor updates after more comments from Chris
 v4: - Minor updates after comments from Randy
 v5: - Fixed one more comment and Acked-by from Randy
 v6: - Write out full function names and use () for all functions,
       feedback from James.

 Documentation/mmc/00-INDEX          |    2 +
 Documentation/mmc/mmc-async-req.txt |   88 +++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/mmc/mmc-async-req.txt

diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX
index 93dd7a7..a9ba672 100644
--- a/Documentation/mmc/00-INDEX
+++ b/Documentation/mmc/00-INDEX
@@ -4,3 +4,5 @@ mmc-dev-attrs.txt
         - info on SD and MMC device attributes
 mmc-dev-parts.txt
         - info on SD and MMC device partitions
+mmc-async-req.txt
+        - info on mmc asynchronous requests
diff --git a/Documentation/mmc/mmc-async-req.txt b/Documentation/mmc/mmc-async-req.txt
new file mode 100644
index 0000000..aac5634
--- /dev/null
+++ b/Documentation/mmc/mmc-async-req.txt
@@ -0,0 +1,88 @@
+Rationale
+=========
+
+How significant is the cache maintenance overhead?
+It depends. Fast eMMC and multiple cache levels with speculative cache
+pre-fetch makes the cache overhead relatively significant. If the DMA
+preparations for the next request are done in parallel with the current
+transfer, the DMA preparation overhead would not affect the MMC performance.
+The intention of non-blocking (asynchronous) MMC requests is to minimize the
+time between when an MMC request ends and another MMC request begins.
+Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
+dma_unmap_sg are processing. Using non-blocking MMC requests makes it
+possible to prepare the caches for next job in parallel with an active
+MMC request.
+
+MMC block driver
+================
+
+The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
+The increase in throughput is proportional to the time it takes to
+prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
+a request and how fast the memory is. The faster the MMC/SD is
+the more significant the prepare request time becomes. Roughly the expected
+performance gain is 5% for large writes and 10% on large reads on a L2 cache
+platform. In power save mode, when clocks run on a lower frequency, the DMA
+preparation may cost even more. As long as these slower preparations are run
+in parallel with the transfer performance won't be affected.
+
+Details on measurements from IOZone and mmc_test
+================================================
+
+https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
+
+MMC core API extension
+======================
+
+There is one new public function mmc_start_req().
+It starts a new MMC command request for a host. The function isn't
+truly non-blocking. If there is on ongoing async request it waits
+for completion of that request and starts the new one and returns. It
+doesn't wait for the new request to complete. If there is no ongoing
+request it starts the new request and returns immediately.
+
+MMC host extensions
+===================
+
+There are two optional members in the
+mmc_host_ops -- pre_req() and post_req() -- that the host
+driver may implement in order to move work to before and after the actual
+mmc_host_ops.request() function is called. In the DMA case pre_req() may do
+dma_map_sg() and prepare the DMA descriptor, and post_req() runs
+the dma_unmap_sg().
+
+Optimize for the first request
+==============================
+
+The first request in a series of requests can't be prepared in parallel with
+the previous transfer, since there is no previous request.
+The argument is_first_req in pre_req() indicates that there is no previous
+request. The host driver may optimize for this scenario to minimize
+the performance loss. A way to optimize for this is to split the current
+request in two chunks, prepare the first chunk and start the request,
+and finally prepare the second chunk and start the transfer.
+
+Pseudocode to handle is_first_req scenario with minimal prepare overhead:
+
+if (is_first_req && req->size > threshold)
+   /* start MMC transfer for the complete transfer size */
+   mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
+
+   /*
+    * Begin to prepare DMA while cmd is being processed by MMC.
+    * The first chunk of the request should take the same time
+    * to prepare as the "MMC process command time".
+    * If prepare time exceeds MMC cmd time
+    * the transfer is delayed, guesstimate max 4k as first chunk size.
+    */
+    prepare_1st_chunk_for_dma(req);
+    /* flush pending desc to the DMAC (dmaengine.h) */
+    dma_issue_pending(req->dma_desc);
+
+    prepare_2nd_chunk_for_dma(req);
+    /*
+     * The second issue_pending should be called before MMC runs out
+     * of the first chunk. If the MMC runs out of the first data chunk
+     * before this call, the transfer is delayed.
+     */
+    dma_issue_pending(req->dma_desc);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v6] mmc: documentation of mmc non-blocking request usage and design.
@ 2011-07-10 19:21 ` Per Forlin
  0 siblings, 0 replies; 9+ messages in thread
From: Per Forlin @ 2011-07-10 19:21 UTC (permalink / raw)
  To: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, linux-doc
  Cc: Randy Dunlap, Per Forlin

Documentation about the background and the design of mmc non-blocking.
Host driver guidelines to minimize request preparation overhead.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
---
ChangeLog:
 v2: - Minor updates after proofreading comments from Chris
 v3: - Minor updates after more comments from Chris
 v4: - Minor updates after comments from Randy
 v5: - Fixed one more comment and Acked-by from Randy
 v6: - Write out full function names and use () for all functions,
       feedback from James.

 Documentation/mmc/00-INDEX          |    2 +
 Documentation/mmc/mmc-async-req.txt |   88 +++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/mmc/mmc-async-req.txt

diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX
index 93dd7a7..a9ba672 100644
--- a/Documentation/mmc/00-INDEX
+++ b/Documentation/mmc/00-INDEX
@@ -4,3 +4,5 @@ mmc-dev-attrs.txt
         - info on SD and MMC device attributes
 mmc-dev-parts.txt
         - info on SD and MMC device partitions
+mmc-async-req.txt
+        - info on mmc asynchronous requests
diff --git a/Documentation/mmc/mmc-async-req.txt b/Documentation/mmc/mmc-async-req.txt
new file mode 100644
index 0000000..aac5634
--- /dev/null
+++ b/Documentation/mmc/mmc-async-req.txt
@@ -0,0 +1,88 @@
+Rationale
+=========
+
+How significant is the cache maintenance overhead?
+It depends. Fast eMMC and multiple cache levels with speculative cache
+pre-fetch makes the cache overhead relatively significant. If the DMA
+preparations for the next request are done in parallel with the current
+transfer, the DMA preparation overhead would not affect the MMC performance.
+The intention of non-blocking (asynchronous) MMC requests is to minimize the
+time between when an MMC request ends and another MMC request begins.
+Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
+dma_unmap_sg are processing. Using non-blocking MMC requests makes it
+possible to prepare the caches for next job in parallel with an active
+MMC request.
+
+MMC block driver
+================
+
+The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
+The increase in throughput is proportional to the time it takes to
+prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
+a request and how fast the memory is. The faster the MMC/SD is
+the more significant the prepare request time becomes. Roughly the expected
+performance gain is 5% for large writes and 10% on large reads on a L2 cache
+platform. In power save mode, when clocks run on a lower frequency, the DMA
+preparation may cost even more. As long as these slower preparations are run
+in parallel with the transfer performance won't be affected.
+
+Details on measurements from IOZone and mmc_test
+================================================
+
+https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
+
+MMC core API extension
+======================
+
+There is one new public function mmc_start_req().
+It starts a new MMC command request for a host. The function isn't
+truly non-blocking. If there is on ongoing async request it waits
+for completion of that request and starts the new one and returns. It
+doesn't wait for the new request to complete. If there is no ongoing
+request it starts the new request and returns immediately.
+
+MMC host extensions
+===================
+
+There are two optional members in the
+mmc_host_ops -- pre_req() and post_req() -- that the host
+driver may implement in order to move work to before and after the actual
+mmc_host_ops.request() function is called. In the DMA case pre_req() may do
+dma_map_sg() and prepare the DMA descriptor, and post_req() runs
+the dma_unmap_sg().
+
+Optimize for the first request
+==============================
+
+The first request in a series of requests can't be prepared in parallel with
+the previous transfer, since there is no previous request.
+The argument is_first_req in pre_req() indicates that there is no previous
+request. The host driver may optimize for this scenario to minimize
+the performance loss. A way to optimize for this is to split the current
+request in two chunks, prepare the first chunk and start the request,
+and finally prepare the second chunk and start the transfer.
+
+Pseudocode to handle is_first_req scenario with minimal prepare overhead:
+
+if (is_first_req && req->size > threshold)
+   /* start MMC transfer for the complete transfer size */
+   mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
+
+   /*
+    * Begin to prepare DMA while cmd is being processed by MMC.
+    * The first chunk of the request should take the same time
+    * to prepare as the "MMC process command time".
+    * If prepare time exceeds MMC cmd time
+    * the transfer is delayed, guesstimate max 4k as first chunk size.
+    */
+    prepare_1st_chunk_for_dma(req);
+    /* flush pending desc to the DMAC (dmaengine.h) */
+    dma_issue_pending(req->dma_desc);
+
+    prepare_2nd_chunk_for_dma(req);
+    /*
+     * The second issue_pending should be called before MMC runs out
+     * of the first chunk. If the MMC runs out of the first data chunk
+     * before this call, the transfer is delayed.
+     */
+    dma_issue_pending(req->dma_desc);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v6] mmc: documentation of mmc non-blocking request usage and design.
@ 2011-07-10 19:21 ` Per Forlin
  0 siblings, 0 replies; 9+ messages in thread
From: Per Forlin @ 2011-07-10 19:21 UTC (permalink / raw)
  To: linux-arm-kernel

Documentation about the background and the design of mmc non-blocking.
Host driver guidelines to minimize request preparation overhead.

Signed-off-by: Per Forlin <per.forlin@linaro.org>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
---
ChangeLog:
 v2: - Minor updates after proofreading comments from Chris
 v3: - Minor updates after more comments from Chris
 v4: - Minor updates after comments from Randy
 v5: - Fixed one more comment and Acked-by from Randy
 v6: - Write out full function names and use () for all functions,
       feedback from James.

 Documentation/mmc/00-INDEX          |    2 +
 Documentation/mmc/mmc-async-req.txt |   88 +++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/mmc/mmc-async-req.txt

diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX
index 93dd7a7..a9ba672 100644
--- a/Documentation/mmc/00-INDEX
+++ b/Documentation/mmc/00-INDEX
@@ -4,3 +4,5 @@ mmc-dev-attrs.txt
         - info on SD and MMC device attributes
 mmc-dev-parts.txt
         - info on SD and MMC device partitions
+mmc-async-req.txt
+        - info on mmc asynchronous requests
diff --git a/Documentation/mmc/mmc-async-req.txt b/Documentation/mmc/mmc-async-req.txt
new file mode 100644
index 0000000..aac5634
--- /dev/null
+++ b/Documentation/mmc/mmc-async-req.txt
@@ -0,0 +1,88 @@
+Rationale
+=========
+
+How significant is the cache maintenance overhead?
+It depends. Fast eMMC and multiple cache levels with speculative cache
+pre-fetch makes the cache overhead relatively significant. If the DMA
+preparations for the next request are done in parallel with the current
+transfer, the DMA preparation overhead would not affect the MMC performance.
+The intention of non-blocking (asynchronous) MMC requests is to minimize the
+time between when an MMC request ends and another MMC request begins.
+Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
+dma_unmap_sg are processing. Using non-blocking MMC requests makes it
+possible to prepare the caches for next job in parallel with an active
+MMC request.
+
+MMC block driver
+================
+
+The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
+The increase in throughput is proportional to the time it takes to
+prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
+a request and how fast the memory is. The faster the MMC/SD is
+the more significant the prepare request time becomes. Roughly the expected
+performance gain is 5% for large writes and 10% on large reads on a L2 cache
+platform. In power save mode, when clocks run on a lower frequency, the DMA
+preparation may cost even more. As long as these slower preparations are run
+in parallel with the transfer performance won't be affected.
+
+Details on measurements from IOZone and mmc_test
+================================================
+
+https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
+
+MMC core API extension
+======================
+
+There is one new public function mmc_start_req().
+It starts a new MMC command request for a host. The function isn't
+truly non-blocking. If there is on ongoing async request it waits
+for completion of that request and starts the new one and returns. It
+doesn't wait for the new request to complete. If there is no ongoing
+request it starts the new request and returns immediately.
+
+MMC host extensions
+===================
+
+There are two optional members in the
+mmc_host_ops -- pre_req() and post_req() -- that the host
+driver may implement in order to move work to before and after the actual
+mmc_host_ops.request() function is called. In the DMA case pre_req() may do
+dma_map_sg() and prepare the DMA descriptor, and post_req() runs
+the dma_unmap_sg().
+
+Optimize for the first request
+==============================
+
+The first request in a series of requests can't be prepared in parallel with
+the previous transfer, since there is no previous request.
+The argument is_first_req in pre_req() indicates that there is no previous
+request. The host driver may optimize for this scenario to minimize
+the performance loss. A way to optimize for this is to split the current
+request in two chunks, prepare the first chunk and start the request,
+and finally prepare the second chunk and start the transfer.
+
+Pseudocode to handle is_first_req scenario with minimal prepare overhead:
+
+if (is_first_req && req->size > threshold)
+   /* start MMC transfer for the complete transfer size */
+   mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
+
+   /*
+    * Begin to prepare DMA while cmd is being processed by MMC.
+    * The first chunk of the request should take the same time
+    * to prepare as the "MMC process command time".
+    * If prepare time exceeds MMC cmd time
+    * the transfer is delayed, guesstimate max 4k as first chunk size.
+    */
+    prepare_1st_chunk_for_dma(req);
+    /* flush pending desc to the DMAC (dmaengine.h) */
+    dma_issue_pending(req->dma_desc);
+
+    prepare_2nd_chunk_for_dma(req);
+    /*
+     * The second issue_pending should be called before MMC runs out
+     * of the first chunk. If the MMC runs out of the first data chunk
+     * before this call, the transfer is delayed.
+     */
+    dma_issue_pending(req->dma_desc);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v6] mmc: documentation of mmc non-blocking request usage and design.
  2011-07-10 19:21 ` Per Forlin
@ 2011-07-12  0:22   ` J Freyensee
  -1 siblings, 0 replies; 9+ messages in thread
From: J Freyensee @ 2011-07-12  0:22 UTC (permalink / raw)
  To: Per Forlin
  Cc: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, linux-doc, Venkatraman S, Linus Walleij,
	Kyungmin Park, Arnd Bergmann, Sourav Poddar, Chris Ball,
	Randy Dunlap

On 07/10/2011 12:21 PM, Per Forlin wrote:
> Documentation about the background and the design of mmc non-blocking.
> Host driver guidelines to minimize request preparation overhead.
>
> Signed-off-by: Per Forlin<per.forlin@linaro.org>
> Acked-by: Randy Dunlap<rdunlap@xenotime.net>
> ---
> ChangeLog:
>   v2: - Minor updates after proofreading comments from Chris
>   v3: - Minor updates after more comments from Chris
>   v4: - Minor updates after comments from Randy
>   v5: - Fixed one more comment and Acked-by from Randy
>   v6: - Write out full function names and use () for all functions,
>         feedback from James.
>
>   Documentation/mmc/00-INDEX          |    2 +
>   Documentation/mmc/mmc-async-req.txt |   88 +++++++++++++++++++++++++++++++++++
>   2 files changed, 90 insertions(+), 0 deletions(-)
>   create mode 100644 Documentation/mmc/mmc-async-req.txt
>
> diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX
> index 93dd7a7..a9ba672 100644
> --- a/Documentation/mmc/00-INDEX
> +++ b/Documentation/mmc/00-INDEX
> @@ -4,3 +4,5 @@ mmc-dev-attrs.txt
>           - info on SD and MMC device attributes
>   mmc-dev-parts.txt
>           - info on SD and MMC device partitions
> +mmc-async-req.txt
> +        - info on mmc asynchronous requests
> diff --git a/Documentation/mmc/mmc-async-req.txt b/Documentation/mmc/mmc-async-req.txt
> new file mode 100644
> index 0000000..aac5634
> --- /dev/null
> +++ b/Documentation/mmc/mmc-async-req.txt
> @@ -0,0 +1,88 @@
> +Rationale
> +=========
> +
> +How significant is the cache maintenance overhead?
> +It depends. Fast eMMC and multiple cache levels with speculative cache
> +pre-fetch makes the cache overhead relatively significant. If the DMA
> +preparations for the next request are done in parallel with the current
> +transfer, the DMA preparation overhead would not affect the MMC performance.
> +The intention of non-blocking (asynchronous) MMC requests is to minimize the
> +time between when an MMC request ends and another MMC request begins.
> +Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
> +dma_unmap_sg are processing. Using non-blocking MMC requests makes it
> +possible to prepare the caches for next job in parallel with an active
> +MMC request.
> +
> +MMC block driver
> +================
> +
> +The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
> +The increase in throughput is proportional to the time it takes to
> +prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
> +a request and how fast the memory is. The faster the MMC/SD is
> +the more significant the prepare request time becomes. Roughly the expected
> +performance gain is 5% for large writes and 10% on large reads on a L2 cache
> +platform. In power save mode, when clocks run on a lower frequency, the DMA
> +preparation may cost even more. As long as these slower preparations are run
> +in parallel with the transfer performance won't be affected.
> +
> +Details on measurements from IOZone and mmc_test
> +================================================
> +
> +https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
> +
> +MMC core API extension
> +======================
> +
> +There is one new public function mmc_start_req().
> +It starts a new MMC command request for a host. The function isn't
> +truly non-blocking. If there is on ongoing async request it waits
> +for completion of that request and starts the new one and returns. It
> +doesn't wait for the new request to complete. If there is no ongoing
> +request it starts the new request and returns immediately.
> +
> +MMC host extensions
> +===================
> +
> +There are two optional members in the
> +mmc_host_ops -- pre_req() and post_req() -- that the host
> +driver may implement in order to move work to before and after the actual
> +mmc_host_ops.request() function is called. In the DMA case pre_req() may do
> +dma_map_sg() and prepare the DMA descriptor, and post_req() runs
> +the dma_unmap_sg().
> +

One question: Is the 'Optimize for the first request' below an example 
of how to use the 'MMC host extensions' above?  So just using 
'mmc_start_req()' in an MMC client driver would not be beneficial if the 
MMC host was not also using the MMC host extensions, right?

Thanks,
Jay

> +Optimize for the first request
> +==============================
> +
> +The first request in a series of requests can't be prepared in parallel with
> +the previous transfer, since there is no previous request.
> +The argument is_first_req in pre_req() indicates that there is no previous
> +request. The host driver may optimize for this scenario to minimize
> +the performance loss. A way to optimize for this is to split the current
> +request in two chunks, prepare the first chunk and start the request,
> +and finally prepare the second chunk and start the transfer.
> +
> +Pseudocode to handle is_first_req scenario with minimal prepare overhead:
> +
> +if (is_first_req&&  req->size>  threshold)
> +   /* start MMC transfer for the complete transfer size */
> +   mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
> +
> +   /*
> +    * Begin to prepare DMA while cmd is being processed by MMC.
> +    * The first chunk of the request should take the same time
> +    * to prepare as the "MMC process command time".
> +    * If prepare time exceeds MMC cmd time
> +    * the transfer is delayed, guesstimate max 4k as first chunk size.
> +    */
> +    prepare_1st_chunk_for_dma(req);
> +    /* flush pending desc to the DMAC (dmaengine.h) */
> +    dma_issue_pending(req->dma_desc);
> +
> +    prepare_2nd_chunk_for_dma(req);
> +    /*
> +     * The second issue_pending should be called before MMC runs out
> +     * of the first chunk. If the MMC runs out of the first data chunk
> +     * before this call, the transfer is delayed.
> +     */
> +    dma_issue_pending(req->dma_desc);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v6] mmc: documentation of mmc non-blocking request usage and design.
@ 2011-07-12  0:22   ` J Freyensee
  0 siblings, 0 replies; 9+ messages in thread
From: J Freyensee @ 2011-07-12  0:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/10/2011 12:21 PM, Per Forlin wrote:
> Documentation about the background and the design of mmc non-blocking.
> Host driver guidelines to minimize request preparation overhead.
>
> Signed-off-by: Per Forlin<per.forlin@linaro.org>
> Acked-by: Randy Dunlap<rdunlap@xenotime.net>
> ---
> ChangeLog:
>   v2: - Minor updates after proofreading comments from Chris
>   v3: - Minor updates after more comments from Chris
>   v4: - Minor updates after comments from Randy
>   v5: - Fixed one more comment and Acked-by from Randy
>   v6: - Write out full function names and use () for all functions,
>         feedback from James.
>
>   Documentation/mmc/00-INDEX          |    2 +
>   Documentation/mmc/mmc-async-req.txt |   88 +++++++++++++++++++++++++++++++++++
>   2 files changed, 90 insertions(+), 0 deletions(-)
>   create mode 100644 Documentation/mmc/mmc-async-req.txt
>
> diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX
> index 93dd7a7..a9ba672 100644
> --- a/Documentation/mmc/00-INDEX
> +++ b/Documentation/mmc/00-INDEX
> @@ -4,3 +4,5 @@ mmc-dev-attrs.txt
>           - info on SD and MMC device attributes
>   mmc-dev-parts.txt
>           - info on SD and MMC device partitions
> +mmc-async-req.txt
> +        - info on mmc asynchronous requests
> diff --git a/Documentation/mmc/mmc-async-req.txt b/Documentation/mmc/mmc-async-req.txt
> new file mode 100644
> index 0000000..aac5634
> --- /dev/null
> +++ b/Documentation/mmc/mmc-async-req.txt
> @@ -0,0 +1,88 @@
> +Rationale
> +=========
> +
> +How significant is the cache maintenance overhead?
> +It depends. Fast eMMC and multiple cache levels with speculative cache
> +pre-fetch makes the cache overhead relatively significant. If the DMA
> +preparations for the next request are done in parallel with the current
> +transfer, the DMA preparation overhead would not affect the MMC performance.
> +The intention of non-blocking (asynchronous) MMC requests is to minimize the
> +time between when an MMC request ends and another MMC request begins.
> +Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
> +dma_unmap_sg are processing. Using non-blocking MMC requests makes it
> +possible to prepare the caches for next job in parallel with an active
> +MMC request.
> +
> +MMC block driver
> +================
> +
> +The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
> +The increase in throughput is proportional to the time it takes to
> +prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
> +a request and how fast the memory is. The faster the MMC/SD is
> +the more significant the prepare request time becomes. Roughly the expected
> +performance gain is 5% for large writes and 10% on large reads on a L2 cache
> +platform. In power save mode, when clocks run on a lower frequency, the DMA
> +preparation may cost even more. As long as these slower preparations are run
> +in parallel with the transfer performance won't be affected.
> +
> +Details on measurements from IOZone and mmc_test
> +================================================
> +
> +https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
> +
> +MMC core API extension
> +======================
> +
> +There is one new public function mmc_start_req().
> +It starts a new MMC command request for a host. The function isn't
> +truly non-blocking. If there is on ongoing async request it waits
> +for completion of that request and starts the new one and returns. It
> +doesn't wait for the new request to complete. If there is no ongoing
> +request it starts the new request and returns immediately.
> +
> +MMC host extensions
> +===================
> +
> +There are two optional members in the
> +mmc_host_ops -- pre_req() and post_req() -- that the host
> +driver may implement in order to move work to before and after the actual
> +mmc_host_ops.request() function is called. In the DMA case pre_req() may do
> +dma_map_sg() and prepare the DMA descriptor, and post_req() runs
> +the dma_unmap_sg().
> +

One question: Is the 'Optimize for the first request' below an example 
of how to use the 'MMC host extensions' above?  So just using 
'mmc_start_req()' in an MMC client driver would not be beneficial if the 
MMC host was not also using the MMC host extensions, right?

Thanks,
Jay

> +Optimize for the first request
> +==============================
> +
> +The first request in a series of requests can't be prepared in parallel with
> +the previous transfer, since there is no previous request.
> +The argument is_first_req in pre_req() indicates that there is no previous
> +request. The host driver may optimize for this scenario to minimize
> +the performance loss. A way to optimize for this is to split the current
> +request in two chunks, prepare the first chunk and start the request,
> +and finally prepare the second chunk and start the transfer.
> +
> +Pseudocode to handle is_first_req scenario with minimal prepare overhead:
> +
> +if (is_first_req&&  req->size>  threshold)
> +   /* start MMC transfer for the complete transfer size */
> +   mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
> +
> +   /*
> +    * Begin to prepare DMA while cmd is being processed by MMC.
> +    * The first chunk of the request should take the same time
> +    * to prepare as the "MMC process command time".
> +    * If prepare time exceeds MMC cmd time
> +    * the transfer is delayed, guesstimate max 4k as first chunk size.
> +    */
> +    prepare_1st_chunk_for_dma(req);
> +    /* flush pending desc to the DMAC (dmaengine.h) */
> +    dma_issue_pending(req->dma_desc);
> +
> +    prepare_2nd_chunk_for_dma(req);
> +    /*
> +     * The second issue_pending should be called before MMC runs out
> +     * of the first chunk. If the MMC runs out of the first data chunk
> +     * before this call, the transfer is delayed.
> +     */
> +    dma_issue_pending(req->dma_desc);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6] mmc: documentation of mmc non-blocking request usage and design.
  2011-07-12  0:22   ` J Freyensee
@ 2011-07-12 11:24     ` Per Forlin
  -1 siblings, 0 replies; 9+ messages in thread
From: Per Forlin @ 2011-07-12 11:24 UTC (permalink / raw)
  To: J Freyensee
  Cc: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, linux-doc, Venkatraman S, Linus Walleij,
	Kyungmin Park, Arnd Bergmann, Sourav Poddar, Chris Ball,
	Randy Dunlap

On 12 July 2011 02:22, J Freyensee <james_p_freyensee@linux.intel.com> wrote:
> On 07/10/2011 12:21 PM, Per Forlin wrote:
>> +MMC host extensions
>> +===================
>> +
>> +There are two optional members in the
>> +mmc_host_ops -- pre_req() and post_req() -- that the host
>> +driver may implement in order to move work to before and after the actual
>> +mmc_host_ops.request() function is called. In the DMA case pre_req() may
>> do
>> +dma_map_sg() and prepare the DMA descriptor, and post_req() runs
>> +the dma_unmap_sg().
>> +
>
> One question: Is the 'Optimize for the first request' below an example of
> how to use the 'MMC host extensions' above?
Yes.

>  So just using 'mmc_start_req()'
> in an MMC client driver would not be beneficial if the MMC host was not also
> using the MMC host extensions, right?
>
Yes. There may be some exceptions where there is gain without any host
driver change. For instance when using dma_bounce, the bounce_copy
would be done in parallel with an active transfer.
Future work on eMMC HPI support (Venkatraman is working on this) can
use the new extensions to interrupt ongoing write to make way for
higher prio read.

> Thanks,
> Jay

You're welcome,
Per

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v6] mmc: documentation of mmc non-blocking request usage and design.
@ 2011-07-12 11:24     ` Per Forlin
  0 siblings, 0 replies; 9+ messages in thread
From: Per Forlin @ 2011-07-12 11:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 July 2011 02:22, J Freyensee <james_p_freyensee@linux.intel.com> wrote:
> On 07/10/2011 12:21 PM, Per Forlin wrote:
>> +MMC host extensions
>> +===================
>> +
>> +There are two optional members in the
>> +mmc_host_ops -- pre_req() and post_req() -- that the host
>> +driver may implement in order to move work to before and after the actual
>> +mmc_host_ops.request() function is called. In the DMA case pre_req() may
>> do
>> +dma_map_sg() and prepare the DMA descriptor, and post_req() runs
>> +the dma_unmap_sg().
>> +
>
> One question: Is the 'Optimize for the first request' below an example of
> how to use the 'MMC host extensions' above?
Yes.

> ?So just using 'mmc_start_req()'
> in an MMC client driver would not be beneficial if the MMC host was not also
> using the MMC host extensions, right?
>
Yes. There may be some exceptions where there is gain without any host
driver change. For instance when using dma_bounce, the bounce_copy
would be done in parallel with an active transfer.
Future work on eMMC HPI support (Venkatraman is working on this) can
use the new extensions to interrupt ongoing write to make way for
higher prio read.

> Thanks,
> Jay

You're welcome,
Per

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6] mmc: documentation of mmc non-blocking request usage and design.
  2011-07-10 19:21 ` Per Forlin
@ 2011-07-13 14:33   ` Chris Ball
  -1 siblings, 0 replies; 9+ messages in thread
From: Chris Ball @ 2011-07-13 14:33 UTC (permalink / raw)
  To: Per Forlin
  Cc: linaro-dev, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	linux-mmc, linux-doc, Randy Dunlap

Hi Per,

On Sun, Jul 10 2011, Per Forlin wrote:
> Documentation about the background and the design of mmc non-blocking.
> Host driver guidelines to minimize request preparation overhead.
>
> Signed-off-by: Per Forlin <per.forlin@linaro.org>
> Acked-by: Randy Dunlap <rdunlap@xenotime.net>

Pushed v6 to mmc-next for 3.1, thanks.

- Chris.
-- 
Chris Ball   <cjb@laptop.org>   <http://printf.net/>
One Laptop Per Child

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v6] mmc: documentation of mmc non-blocking request usage and design.
@ 2011-07-13 14:33   ` Chris Ball
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Ball @ 2011-07-13 14:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Per,

On Sun, Jul 10 2011, Per Forlin wrote:
> Documentation about the background and the design of mmc non-blocking.
> Host driver guidelines to minimize request preparation overhead.
>
> Signed-off-by: Per Forlin <per.forlin@linaro.org>
> Acked-by: Randy Dunlap <rdunlap@xenotime.net>

Pushed v6 to mmc-next for 3.1, thanks.

- Chris.
-- 
Chris Ball   <cjb@laptop.org>   <http://printf.net/>
One Laptop Per Child

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-07-13 14:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-10 19:21 [PATCH v6] mmc: documentation of mmc non-blocking request usage and design Per Forlin
2011-07-10 19:21 ` Per Forlin
2011-07-10 19:21 ` Per Forlin
2011-07-12  0:22 ` J Freyensee
2011-07-12  0:22   ` J Freyensee
2011-07-12 11:24   ` Per Forlin
2011-07-12 11:24     ` Per Forlin
2011-07-13 14:33 ` Chris Ball
2011-07-13 14:33   ` Chris Ball

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.