* [PATCH 0/5] mmc: add double buffering for mmc block requests
@ 2011-01-12 18:13 ` Per Forlin
0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:13 UTC (permalink / raw)
To: linux-mmc, linux-arm-kernel, linux-kernel, dev; +Cc: Chris Ball, Per Forlin
Add support to prepare one MMC request while another is active on
the host. This is done by making the issue_rw_rq() asynchronous.
The increase in throughput is proportional to the time it takes to
prepare a request and how fast the memory is. The faster the MMC/SD is
the more significant the prepare request time becomes. Measurements on U5500
and U8500 on eMMC shows significant performance gain for DMA on MMC for large
reads. In the PIO case there is some gain in performance for large reads too.
There seems to be no or small performance gain for write, don't have a good
explanation for this yet.
There are two optional hooks pre_req() and post_req() that the host driver
may implement in order to improve double buffering. In the DMA case pre_req()
may do dma_map_sg() and prepare the dma descriptor and post_req runs the
dma_unmap_sg.
The mmci host driver implementation for double buffering is not intended
nor ready for mainline yet. It is only an example of how to implement
pre_req() and post_req(). The reason for this is that the basic DMA support
for MMCI is not complete yet. The mmci patches are sent in a separate patch
series "[FYI 0/4] arm: mmci: example implementation of double buffering".
Issues/Questions for issue_rw_rq() in block.c:
* Is it safe to claim the host for the first MMC request and wait to release
it until the MMC queue is empty again? Or must the host be claimed and
released for every request?
* Is it possible to predict the result from __blk_end_request().
If there are no errors for a completed MMC request and the
blk_rq_bytes(req) == data.bytes_xfered, will it be guaranteed that
__blk_end_request will return 0?
Here follows the IOZone results for u8500 v1.1 on eMMC.
The numbers for DMA are a bit to good here due to the fact that the
CPU speed is decreased compared to u8500 v2. This makes the cache handling
even more significant.
Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
Relative diff: VANILLA-MMC-PIO -> 2BUF-MMC-PIO
cpu load is abs diff
random random
KB reclen write rewrite read reread read write
51200 4 +0% +0% +0% +0% +0% +0%
cpu: +0.1 -0.1 -0.5 -0.3 -0.1 -0.0
51200 8 +0% +0% +6% +6% +8% +0%
cpu: +0.1 -0.1 -0.3 -0.4 -0.8 +0.0
51200 16 +0% -2% +0% +0% -3% +0%
cpu: +0.0 -0.2 +0.0 +0.0 -0.2 +0.0
51200 32 +0% +1% +0% +0% +0% +0%
cpu: +0.1 +0.0 -0.3 +0.0 +0.0 +0.0
51200 64 +0% +0% +0% +0% +0% +0%
cpu: +0.1 +0.0 +0.0 +0.0 +0.0 +0.0
51200 128 +0% +1% +1% +1% +1% +0%
cpu: +0.0 +0.2 +0.1 -0.3 +0.4 +0.0
51200 256 +0% +0% +1% +1% +1% +0%
cpu: +0.0 -0.0 +0.1 +0.1 +0.1 +0.0
51200 512 +0% +1% +2% +2% +2% +0%
cpu: +0.1 +0.0 +0.2 +0.2 +0.2 +0.1
51200 1024 +0% +2% +2% +2% +3% +0%
cpu: +0.2 +0.1 +0.2 +0.5 -0.8 +0.0
51200 2048 +0% +2% +3% +3% +3% +0%
cpu: +0.0 -0.2 +0.4 +0.8 -0.5 +0.2
51200 4096 +0% +1% +3% +3% +3% +1%
cpu: +0.2 +0.1 +0.9 +0.9 +0.5 +0.1
51200 8192 +1% +0% +3% +3% +3% +1%
cpu: +0.2 +0.2 +1.3 +1.3 +1.0 +0.0
51200 16384 +0% +1% +3% +3% +3% +1%
cpu: +0.2 +0.1 +1.0 +1.3 +1.0 +0.5
Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
cpu load is abs diff
random random
KB reclen write rewrite read reread read write
51200 4 +0% -3% +6% +5% +5% +0%
cpu: +0.0 -0.2 -0.6 -0.1 +0.3 +0.0
51200 8 +0% +0% +7% +7% +7% +0%
cpu: +0.0 +0.1 +0.8 +0.6 +0.9 +0.0
51200 16 +0% +0% +7% +7% +8% +0%
cpu: +0.0 -0.0 +0.7 +0.7 +0.8 +0.0
51200 32 +0% +0% +8% +8% +9% +0%
cpu: +0.0 +0.1 +0.7 +0.7 +0.3 +0.0
51200 64 +0% +1% +9% +9% +9% +0%
cpu: +0.0 +0.0 +0.8 +0.7 +0.8 +0.0
51200 128 +1% +0% +13% +13% +14% +0%
cpu: +0.2 +0.0 +1.0 +1.0 +1.1 +0.0
51200 256 +1% +2% +8% +8% +11% +0%
cpu: +0.0 +0.3 +0.0 +0.7 +1.5 +0.0
51200 512 +1% +2% +16% +16% +17% +0%
cpu: +0.2 +0.2 +2.2 +2.1 +2.2 +0.1
51200 1024 +1% +2% +20% +20% +20% +1%
cpu: +0.2 +0.1 +2.6 +1.9 +2.6 +0.0
51200 2048 +0% +2% +22% +22% +21% +0%
cpu: +0.0 +0.3 +2.3 +2.9 +2.1 -0.0
51200 4096 +1% +2% +23% +23% +23% +1%
cpu: +0.2 +0.1 +2.0 +3.2 +3.1 +0.0
51200 8192 +1% +5% +24% +24% +24% +1%
cpu: +1.4 -0.0 +4.2 +3.0 +2.8 +0.1
51200 16384 +1% +3% +24% +24% +24% +2%
cpu: +0.0 +0.3 +3.4 +3.8 +3.7 +0.1
Here follows the IOZone results for u5500 on eMMC.
These numbers for DMA are more as expected.
Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
cpu load is abs diff
random random
KB reclen write rewrite read reread read write
51200 128 +1% +1% +10% +9% +10% +0%
cpu: +0.1 +0.0 +1.3 +0.1 +0.8 +0.1
51200 256 +2% +2% +7% +7% +9% +0%
cpu: +0.1 +0.4 +0.5 +0.6 +0.7 +0.0
51200 512 +2% +2% +12% +12% +12% +1%
cpu: +0.4 +0.6 +1.8 +2.4 +2.4 +0.2
51200 1024 +2% +3% +14% +14% +14% +0%
cpu: +0.3 +0.1 +2.1 +1.4 +1.4 +0.2
51200 2048 +3% +3% +16% +16% +16% +1%
cpu: +0.2 +0.2 +2.5 +1.8 +2.4 -0.2
51200 4096 +3% +3% +17% +17% +18% +3%
cpu: +0.1 -0.1 +2.7 +2.0 +2.7 -0.1
51200 8192 +3% +3% +18% +18% +18% +3%
cpu: -0.1 +0.2 +3.0 +2.3 +2.2 +0.2
51200 16384 +3% +3% +18% +18% +18% +4%
cpu: +0.2 +0.2 +2.8 +3.5 +2.4 -0.0
Per Forlin (5):
mmc: add member in mmc queue struct to hold request data
mmc: Add a block request prepare function
mmc: Add a second mmc queue request member
mmc: Store the mmc block request struct in mmc queue
mmc: Add double buffering for mmc block requests
drivers/mmc/card/block.c | 337 ++++++++++++++++++++++++++++++----------------
drivers/mmc/card/queue.c | 171 +++++++++++++++---------
drivers/mmc/card/queue.h | 31 +++-
drivers/mmc/core/core.c | 77 +++++++++--
include/linux/mmc/core.h | 7 +-
include/linux/mmc/host.h | 8 +
6 files changed, 432 insertions(+), 199 deletions(-)
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 0/5] mmc: add double buffering for mmc block requests
@ 2011-01-12 18:13 ` Per Forlin
0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:13 UTC (permalink / raw)
To: linux-arm-kernel
Add support to prepare one MMC request while another is active on
the host. This is done by making the issue_rw_rq() asynchronous.
The increase in throughput is proportional to the time it takes to
prepare a request and how fast the memory is. The faster the MMC/SD is
the more significant the prepare request time becomes. Measurements on U5500
and U8500 on eMMC shows significant performance gain for DMA on MMC for large
reads. In the PIO case there is some gain in performance for large reads too.
There seems to be no or small performance gain for write, don't have a good
explanation for this yet.
There are two optional hooks pre_req() and post_req() that the host driver
may implement in order to improve double buffering. In the DMA case pre_req()
may do dma_map_sg() and prepare the dma descriptor and post_req runs the
dma_unmap_sg.
The mmci host driver implementation for double buffering is not intended
nor ready for mainline yet. It is only an example of how to implement
pre_req() and post_req(). The reason for this is that the basic DMA support
for MMCI is not complete yet. The mmci patches are sent in a separate patch
series "[FYI 0/4] arm: mmci: example implementation of double buffering".
Issues/Questions for issue_rw_rq() in block.c:
* Is it safe to claim the host for the first MMC request and wait to release
it until the MMC queue is empty again? Or must the host be claimed and
released for every request?
* Is it possible to predict the result from __blk_end_request().
If there are no errors for a completed MMC request and the
blk_rq_bytes(req) == data.bytes_xfered, will it be guaranteed that
__blk_end_request will return 0?
Here follows the IOZone results for u8500 v1.1 on eMMC.
The numbers for DMA are a bit to good here due to the fact that the
CPU speed is decreased compared to u8500 v2. This makes the cache handling
even more significant.
Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
Relative diff: VANILLA-MMC-PIO -> 2BUF-MMC-PIO
cpu load is abs diff
random random
KB reclen write rewrite read reread read write
51200 4 +0% +0% +0% +0% +0% +0%
cpu: +0.1 -0.1 -0.5 -0.3 -0.1 -0.0
51200 8 +0% +0% +6% +6% +8% +0%
cpu: +0.1 -0.1 -0.3 -0.4 -0.8 +0.0
51200 16 +0% -2% +0% +0% -3% +0%
cpu: +0.0 -0.2 +0.0 +0.0 -0.2 +0.0
51200 32 +0% +1% +0% +0% +0% +0%
cpu: +0.1 +0.0 -0.3 +0.0 +0.0 +0.0
51200 64 +0% +0% +0% +0% +0% +0%
cpu: +0.1 +0.0 +0.0 +0.0 +0.0 +0.0
51200 128 +0% +1% +1% +1% +1% +0%
cpu: +0.0 +0.2 +0.1 -0.3 +0.4 +0.0
51200 256 +0% +0% +1% +1% +1% +0%
cpu: +0.0 -0.0 +0.1 +0.1 +0.1 +0.0
51200 512 +0% +1% +2% +2% +2% +0%
cpu: +0.1 +0.0 +0.2 +0.2 +0.2 +0.1
51200 1024 +0% +2% +2% +2% +3% +0%
cpu: +0.2 +0.1 +0.2 +0.5 -0.8 +0.0
51200 2048 +0% +2% +3% +3% +3% +0%
cpu: +0.0 -0.2 +0.4 +0.8 -0.5 +0.2
51200 4096 +0% +1% +3% +3% +3% +1%
cpu: +0.2 +0.1 +0.9 +0.9 +0.5 +0.1
51200 8192 +1% +0% +3% +3% +3% +1%
cpu: +0.2 +0.2 +1.3 +1.3 +1.0 +0.0
51200 16384 +0% +1% +3% +3% +3% +1%
cpu: +0.2 +0.1 +1.0 +1.3 +1.0 +0.5
Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
cpu load is abs diff
random random
KB reclen write rewrite read reread read write
51200 4 +0% -3% +6% +5% +5% +0%
cpu: +0.0 -0.2 -0.6 -0.1 +0.3 +0.0
51200 8 +0% +0% +7% +7% +7% +0%
cpu: +0.0 +0.1 +0.8 +0.6 +0.9 +0.0
51200 16 +0% +0% +7% +7% +8% +0%
cpu: +0.0 -0.0 +0.7 +0.7 +0.8 +0.0
51200 32 +0% +0% +8% +8% +9% +0%
cpu: +0.0 +0.1 +0.7 +0.7 +0.3 +0.0
51200 64 +0% +1% +9% +9% +9% +0%
cpu: +0.0 +0.0 +0.8 +0.7 +0.8 +0.0
51200 128 +1% +0% +13% +13% +14% +0%
cpu: +0.2 +0.0 +1.0 +1.0 +1.1 +0.0
51200 256 +1% +2% +8% +8% +11% +0%
cpu: +0.0 +0.3 +0.0 +0.7 +1.5 +0.0
51200 512 +1% +2% +16% +16% +17% +0%
cpu: +0.2 +0.2 +2.2 +2.1 +2.2 +0.1
51200 1024 +1% +2% +20% +20% +20% +1%
cpu: +0.2 +0.1 +2.6 +1.9 +2.6 +0.0
51200 2048 +0% +2% +22% +22% +21% +0%
cpu: +0.0 +0.3 +2.3 +2.9 +2.1 -0.0
51200 4096 +1% +2% +23% +23% +23% +1%
cpu: +0.2 +0.1 +2.0 +3.2 +3.1 +0.0
51200 8192 +1% +5% +24% +24% +24% +1%
cpu: +1.4 -0.0 +4.2 +3.0 +2.8 +0.1
51200 16384 +1% +3% +24% +24% +24% +2%
cpu: +0.0 +0.3 +3.4 +3.8 +3.7 +0.1
Here follows the IOZone results for u5500 on eMMC.
These numbers for DMA are more as expected.
Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
cpu load is abs diff
random random
KB reclen write rewrite read reread read write
51200 128 +1% +1% +10% +9% +10% +0%
cpu: +0.1 +0.0 +1.3 +0.1 +0.8 +0.1
51200 256 +2% +2% +7% +7% +9% +0%
cpu: +0.1 +0.4 +0.5 +0.6 +0.7 +0.0
51200 512 +2% +2% +12% +12% +12% +1%
cpu: +0.4 +0.6 +1.8 +2.4 +2.4 +0.2
51200 1024 +2% +3% +14% +14% +14% +0%
cpu: +0.3 +0.1 +2.1 +1.4 +1.4 +0.2
51200 2048 +3% +3% +16% +16% +16% +1%
cpu: +0.2 +0.2 +2.5 +1.8 +2.4 -0.2
51200 4096 +3% +3% +17% +17% +18% +3%
cpu: +0.1 -0.1 +2.7 +2.0 +2.7 -0.1
51200 8192 +3% +3% +18% +18% +18% +3%
cpu: -0.1 +0.2 +3.0 +2.3 +2.2 +0.2
51200 16384 +3% +3% +18% +18% +18% +4%
cpu: +0.2 +0.2 +2.8 +3.5 +2.4 -0.0
Per Forlin (5):
mmc: add member in mmc queue struct to hold request data
mmc: Add a block request prepare function
mmc: Add a second mmc queue request member
mmc: Store the mmc block request struct in mmc queue
mmc: Add double buffering for mmc block requests
drivers/mmc/card/block.c | 337 ++++++++++++++++++++++++++++++----------------
drivers/mmc/card/queue.c | 171 +++++++++++++++---------
drivers/mmc/card/queue.h | 31 +++-
drivers/mmc/core/core.c | 77 +++++++++--
include/linux/mmc/core.h | 7 +-
include/linux/mmc/host.h | 8 +
6 files changed, 432 insertions(+), 199 deletions(-)
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 1/5] mmc: add member in mmc queue struct to hold request data
2011-01-12 18:13 ` Per Forlin
@ 2011-01-12 18:13 ` Per Forlin
-1 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:13 UTC (permalink / raw)
To: linux-mmc, linux-arm-kernel, linux-kernel, dev; +Cc: Chris Ball, Per Forlin
The way the request data is organized in the mmc queue struct
it only allows processing of one request at the time.
This patch adds a new struct to hold mmc queue request data such as
sg list, request and bounce buffers, and update functions depending on
the mmc queue struct. This lies the ground for
using multiple active request for one mmc queue.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
drivers/mmc/card/block.c | 8 ++--
drivers/mmc/card/queue.c | 129 ++++++++++++++++++++++++----------------------
drivers/mmc/card/queue.h | 22 +++++---
3 files changed, 85 insertions(+), 74 deletions(-)
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 217f820..be51bde 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -398,8 +398,8 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
mmc_set_data_timeout(&brq.data, card);
- brq.data.sg = mq->sg;
- brq.data.sg_len = mmc_queue_map_sg(mq);
+ brq.data.sg = mq->mqrq_cur->sg;
+ brq.data.sg_len = mmc_queue_map_sg(mq, mq->mqrq_cur);
/*
* Adjust the sg list so it is the same size as the
@@ -420,11 +420,11 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
brq.data.sg_len = i;
}
- mmc_queue_bounce_pre(mq);
+ mmc_queue_bounce_pre(mq->mqrq_cur);
mmc_wait_for_req(card->host, &brq.mrq);
- mmc_queue_bounce_post(mq);
+ mmc_queue_bounce_post(mq->mqrq_cur);
/*
* Check for errors here, but don't jump to cmd_err
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 4e42d03..8a8d88b 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -57,7 +57,7 @@ static int mmc_queue_thread(void *d)
set_current_state(TASK_INTERRUPTIBLE);
if (!blk_queue_plugged(q))
req = blk_fetch_request(q);
- mq->req = req;
+ mq->mqrq_cur->req = req;
spin_unlock_irq(q->queue_lock);
if (!req) {
@@ -98,10 +98,25 @@ static void mmc_request(struct request_queue *q)
return;
}
- if (!mq->req)
+ if (!mq->mqrq_cur->req)
wake_up_process(mq->thread);
}
+struct scatterlist *mmc_alloc_sg(int sg_len, int *err)
+{
+ struct scatterlist *sg;
+
+ sg = kmalloc(sizeof(struct scatterlist)*sg_len, GFP_KERNEL);
+ if (!sg)
+ *err = -ENOMEM;
+ else {
+ *err = 0;
+ sg_init_table(sg, sg_len);
+ }
+
+ return sg;
+}
+
/**
* mmc_init_queue - initialise a queue structure.
* @mq: mmc queue
@@ -115,6 +130,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
struct mmc_host *host = card->host;
u64 limit = BLK_BOUNCE_HIGH;
int ret;
+ struct mmc_queue_req *mqrq_cur = &mq->mqrq[0];
if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
limit = *mmc_dev(host)->dma_mask;
@@ -124,8 +140,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
if (!mq->queue)
return -ENOMEM;
+ memset(&mq->mqrq_cur, 0, sizeof(mq->mqrq_cur));
+ mq->mqrq_cur = mqrq_cur;
mq->queue->queuedata = mq;
- mq->req = NULL;
blk_queue_prep_rq(mq->queue, mmc_prep_request);
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, mq->queue);
@@ -159,53 +176,44 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
bouncesz = host->max_blk_count * 512;
if (bouncesz > 512) {
- mq->bounce_buf = kmalloc(bouncesz, GFP_KERNEL);
- if (!mq->bounce_buf) {
+ mqrq_cur->bounce_buf = kmalloc(bouncesz, GFP_KERNEL);
+ if (!mqrq_cur->bounce_buf) {
printk(KERN_WARNING "%s: unable to "
- "allocate bounce buffer\n",
+ "allocate bounce cur buffer\n",
mmc_card_name(card));
}
}
- if (mq->bounce_buf) {
+ if (mqrq_cur->bounce_buf) {
blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_ANY);
blk_queue_max_hw_sectors(mq->queue, bouncesz / 512);
blk_queue_max_segments(mq->queue, bouncesz / 512);
blk_queue_max_segment_size(mq->queue, bouncesz);
- mq->sg = kmalloc(sizeof(struct scatterlist),
- GFP_KERNEL);
- if (!mq->sg) {
- ret = -ENOMEM;
+ mqrq_cur->sg = mmc_alloc_sg(1, &ret);
+ if (ret)
goto cleanup_queue;
- }
- sg_init_table(mq->sg, 1);
- mq->bounce_sg = kmalloc(sizeof(struct scatterlist) *
- bouncesz / 512, GFP_KERNEL);
- if (!mq->bounce_sg) {
- ret = -ENOMEM;
+ mqrq_cur->bounce_sg =
+ mmc_alloc_sg(bouncesz / 512, &ret);
+ if (ret)
goto cleanup_queue;
- }
- sg_init_table(mq->bounce_sg, bouncesz / 512);
+
}
}
#endif
- if (!mq->bounce_buf) {
+ if (!mqrq_cur->bounce_buf) {
blk_queue_bounce_limit(mq->queue, limit);
blk_queue_max_hw_sectors(mq->queue,
min(host->max_blk_count, host->max_req_size / 512));
blk_queue_max_segments(mq->queue, host->max_segs);
blk_queue_max_segment_size(mq->queue, host->max_seg_size);
- mq->sg = kmalloc(sizeof(struct scatterlist) *
- host->max_segs, GFP_KERNEL);
- if (!mq->sg) {
- ret = -ENOMEM;
+ mqrq_cur->sg = mmc_alloc_sg(host->max_segs, &ret);
+ if (ret)
goto cleanup_queue;
- }
- sg_init_table(mq->sg, host->max_segs);
+
}
sema_init(&mq->thread_sem, 1);
@@ -220,16 +228,15 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
return 0;
free_bounce_sg:
- if (mq->bounce_sg)
- kfree(mq->bounce_sg);
- mq->bounce_sg = NULL;
+ kfree(mqrq_cur->bounce_sg);
+ mqrq_cur->bounce_sg = NULL;
+
cleanup_queue:
- if (mq->sg)
- kfree(mq->sg);
- mq->sg = NULL;
- if (mq->bounce_buf)
- kfree(mq->bounce_buf);
- mq->bounce_buf = NULL;
+ kfree(mqrq_cur->sg);
+ mqrq_cur->sg = NULL;
+ kfree(mqrq_cur->bounce_buf);
+ mqrq_cur->bounce_buf = NULL;
+
blk_cleanup_queue(mq->queue);
return ret;
}
@@ -238,6 +245,7 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
{
struct request_queue *q = mq->queue;
unsigned long flags;
+ struct mmc_queue_req *mqrq_cur = mq->mqrq_cur;
/* Make sure the queue isn't suspended, as that will deadlock */
mmc_queue_resume(mq);
@@ -251,16 +259,14 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
blk_start_queue(q);
spin_unlock_irqrestore(q->queue_lock, flags);
- if (mq->bounce_sg)
- kfree(mq->bounce_sg);
- mq->bounce_sg = NULL;
+ kfree(mqrq_cur->bounce_sg);
+ mqrq_cur->bounce_sg = NULL;
- kfree(mq->sg);
- mq->sg = NULL;
+ kfree(mqrq_cur->sg);
+ mqrq_cur->sg = NULL;
- if (mq->bounce_buf)
- kfree(mq->bounce_buf);
- mq->bounce_buf = NULL;
+ kfree(mqrq_cur->bounce_buf);
+ mqrq_cur->bounce_buf = NULL;
mq->card = NULL;
}
@@ -313,27 +319,27 @@ void mmc_queue_resume(struct mmc_queue *mq)
/*
* Prepare the sg list(s) to be handed of to the host driver
*/
-unsigned int mmc_queue_map_sg(struct mmc_queue *mq)
+unsigned int mmc_queue_map_sg(struct mmc_queue *mq, struct mmc_queue_req *mqrq)
{
unsigned int sg_len;
size_t buflen;
struct scatterlist *sg;
int i;
- if (!mq->bounce_buf)
- return blk_rq_map_sg(mq->queue, mq->req, mq->sg);
+ if (!mqrq->bounce_buf)
+ return blk_rq_map_sg(mq->queue, mqrq->req, mqrq->sg);
- BUG_ON(!mq->bounce_sg);
+ BUG_ON(!mqrq->bounce_sg);
- sg_len = blk_rq_map_sg(mq->queue, mq->req, mq->bounce_sg);
+ sg_len = blk_rq_map_sg(mq->queue, mqrq->req, mqrq->bounce_sg);
- mq->bounce_sg_len = sg_len;
+ mqrq->bounce_sg_len = sg_len;
buflen = 0;
- for_each_sg(mq->bounce_sg, sg, sg_len, i)
+ for_each_sg(mqrq->bounce_sg, sg, sg_len, i)
buflen += sg->length;
- sg_init_one(mq->sg, mq->bounce_buf, buflen);
+ sg_init_one(mqrq->sg, mqrq->bounce_buf, buflen);
return 1;
}
@@ -342,19 +348,19 @@ unsigned int mmc_queue_map_sg(struct mmc_queue *mq)
* If writing, bounce the data to the buffer before the request
* is sent to the host driver
*/
-void mmc_queue_bounce_pre(struct mmc_queue *mq)
+void mmc_queue_bounce_pre(struct mmc_queue_req *mqrq)
{
unsigned long flags;
- if (!mq->bounce_buf)
+ if (!mqrq->bounce_buf)
return;
- if (rq_data_dir(mq->req) != WRITE)
+ if (rq_data_dir(mqrq->req) != WRITE)
return;
local_irq_save(flags);
- sg_copy_to_buffer(mq->bounce_sg, mq->bounce_sg_len,
- mq->bounce_buf, mq->sg[0].length);
+ sg_copy_to_buffer(mqrq->bounce_sg, mqrq->bounce_sg_len,
+ mqrq->bounce_buf, mqrq->sg[0].length);
local_irq_restore(flags);
}
@@ -362,19 +368,18 @@ void mmc_queue_bounce_pre(struct mmc_queue *mq)
* If reading, bounce the data from the buffer after the request
* has been handled by the host driver
*/
-void mmc_queue_bounce_post(struct mmc_queue *mq)
+void mmc_queue_bounce_post(struct mmc_queue_req *mqrq)
{
unsigned long flags;
- if (!mq->bounce_buf)
+ if (!mqrq->bounce_buf)
return;
- if (rq_data_dir(mq->req) != READ)
+ if (rq_data_dir(mqrq->req) != READ)
return;
local_irq_save(flags);
- sg_copy_from_buffer(mq->bounce_sg, mq->bounce_sg_len,
- mq->bounce_buf, mq->sg[0].length);
+ sg_copy_from_buffer(mqrq->bounce_sg, mqrq->bounce_sg_len,
+ mqrq->bounce_buf, mqrq->sg[0].length);
local_irq_restore(flags);
}
-
diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h
index 64e66e0..96c440d 100644
--- a/drivers/mmc/card/queue.h
+++ b/drivers/mmc/card/queue.h
@@ -4,19 +4,24 @@
struct request;
struct task_struct;
+struct mmc_queue_req {
+ struct request *req;
+ struct scatterlist *sg;
+ char *bounce_buf;
+ struct scatterlist *bounce_sg;
+ unsigned int bounce_sg_len;
+};
+
struct mmc_queue {
struct mmc_card *card;
struct task_struct *thread;
struct semaphore thread_sem;
unsigned int flags;
- struct request *req;
int (*issue_fn)(struct mmc_queue *, struct request *);
void *data;
struct request_queue *queue;
- struct scatterlist *sg;
- char *bounce_buf;
- struct scatterlist *bounce_sg;
- unsigned int bounce_sg_len;
+ struct mmc_queue_req mqrq[1];
+ struct mmc_queue_req *mqrq_cur;
};
extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *);
@@ -24,8 +29,9 @@ extern void mmc_cleanup_queue(struct mmc_queue *);
extern void mmc_queue_suspend(struct mmc_queue *);
extern void mmc_queue_resume(struct mmc_queue *);
-extern unsigned int mmc_queue_map_sg(struct mmc_queue *);
-extern void mmc_queue_bounce_pre(struct mmc_queue *);
-extern void mmc_queue_bounce_post(struct mmc_queue *);
+extern unsigned int mmc_queue_map_sg(struct mmc_queue *,
+ struct mmc_queue_req *);
+extern void mmc_queue_bounce_pre(struct mmc_queue_req *);
+extern void mmc_queue_bounce_post(struct mmc_queue_req *);
#endif
--
1.7.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 1/5] mmc: add member in mmc queue struct to hold request data
@ 2011-01-12 18:13 ` Per Forlin
0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:13 UTC (permalink / raw)
To: linux-arm-kernel
The way the request data is organized in the mmc queue struct
it only allows processing of one request at the time.
This patch adds a new struct to hold mmc queue request data such as
sg list, request and bounce buffers, and update functions depending on
the mmc queue struct. This lies the ground for
using multiple active request for one mmc queue.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
drivers/mmc/card/block.c | 8 ++--
drivers/mmc/card/queue.c | 129 ++++++++++++++++++++++++----------------------
drivers/mmc/card/queue.h | 22 +++++---
3 files changed, 85 insertions(+), 74 deletions(-)
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 217f820..be51bde 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -398,8 +398,8 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
mmc_set_data_timeout(&brq.data, card);
- brq.data.sg = mq->sg;
- brq.data.sg_len = mmc_queue_map_sg(mq);
+ brq.data.sg = mq->mqrq_cur->sg;
+ brq.data.sg_len = mmc_queue_map_sg(mq, mq->mqrq_cur);
/*
* Adjust the sg list so it is the same size as the
@@ -420,11 +420,11 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
brq.data.sg_len = i;
}
- mmc_queue_bounce_pre(mq);
+ mmc_queue_bounce_pre(mq->mqrq_cur);
mmc_wait_for_req(card->host, &brq.mrq);
- mmc_queue_bounce_post(mq);
+ mmc_queue_bounce_post(mq->mqrq_cur);
/*
* Check for errors here, but don't jump to cmd_err
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 4e42d03..8a8d88b 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -57,7 +57,7 @@ static int mmc_queue_thread(void *d)
set_current_state(TASK_INTERRUPTIBLE);
if (!blk_queue_plugged(q))
req = blk_fetch_request(q);
- mq->req = req;
+ mq->mqrq_cur->req = req;
spin_unlock_irq(q->queue_lock);
if (!req) {
@@ -98,10 +98,25 @@ static void mmc_request(struct request_queue *q)
return;
}
- if (!mq->req)
+ if (!mq->mqrq_cur->req)
wake_up_process(mq->thread);
}
+struct scatterlist *mmc_alloc_sg(int sg_len, int *err)
+{
+ struct scatterlist *sg;
+
+ sg = kmalloc(sizeof(struct scatterlist)*sg_len, GFP_KERNEL);
+ if (!sg)
+ *err = -ENOMEM;
+ else {
+ *err = 0;
+ sg_init_table(sg, sg_len);
+ }
+
+ return sg;
+}
+
/**
* mmc_init_queue - initialise a queue structure.
* @mq: mmc queue
@@ -115,6 +130,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
struct mmc_host *host = card->host;
u64 limit = BLK_BOUNCE_HIGH;
int ret;
+ struct mmc_queue_req *mqrq_cur = &mq->mqrq[0];
if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
limit = *mmc_dev(host)->dma_mask;
@@ -124,8 +140,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
if (!mq->queue)
return -ENOMEM;
+ memset(&mq->mqrq_cur, 0, sizeof(mq->mqrq_cur));
+ mq->mqrq_cur = mqrq_cur;
mq->queue->queuedata = mq;
- mq->req = NULL;
blk_queue_prep_rq(mq->queue, mmc_prep_request);
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, mq->queue);
@@ -159,53 +176,44 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
bouncesz = host->max_blk_count * 512;
if (bouncesz > 512) {
- mq->bounce_buf = kmalloc(bouncesz, GFP_KERNEL);
- if (!mq->bounce_buf) {
+ mqrq_cur->bounce_buf = kmalloc(bouncesz, GFP_KERNEL);
+ if (!mqrq_cur->bounce_buf) {
printk(KERN_WARNING "%s: unable to "
- "allocate bounce buffer\n",
+ "allocate bounce cur buffer\n",
mmc_card_name(card));
}
}
- if (mq->bounce_buf) {
+ if (mqrq_cur->bounce_buf) {
blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_ANY);
blk_queue_max_hw_sectors(mq->queue, bouncesz / 512);
blk_queue_max_segments(mq->queue, bouncesz / 512);
blk_queue_max_segment_size(mq->queue, bouncesz);
- mq->sg = kmalloc(sizeof(struct scatterlist),
- GFP_KERNEL);
- if (!mq->sg) {
- ret = -ENOMEM;
+ mqrq_cur->sg = mmc_alloc_sg(1, &ret);
+ if (ret)
goto cleanup_queue;
- }
- sg_init_table(mq->sg, 1);
- mq->bounce_sg = kmalloc(sizeof(struct scatterlist) *
- bouncesz / 512, GFP_KERNEL);
- if (!mq->bounce_sg) {
- ret = -ENOMEM;
+ mqrq_cur->bounce_sg =
+ mmc_alloc_sg(bouncesz / 512, &ret);
+ if (ret)
goto cleanup_queue;
- }
- sg_init_table(mq->bounce_sg, bouncesz / 512);
+
}
}
#endif
- if (!mq->bounce_buf) {
+ if (!mqrq_cur->bounce_buf) {
blk_queue_bounce_limit(mq->queue, limit);
blk_queue_max_hw_sectors(mq->queue,
min(host->max_blk_count, host->max_req_size / 512));
blk_queue_max_segments(mq->queue, host->max_segs);
blk_queue_max_segment_size(mq->queue, host->max_seg_size);
- mq->sg = kmalloc(sizeof(struct scatterlist) *
- host->max_segs, GFP_KERNEL);
- if (!mq->sg) {
- ret = -ENOMEM;
+ mqrq_cur->sg = mmc_alloc_sg(host->max_segs, &ret);
+ if (ret)
goto cleanup_queue;
- }
- sg_init_table(mq->sg, host->max_segs);
+
}
sema_init(&mq->thread_sem, 1);
@@ -220,16 +228,15 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
return 0;
free_bounce_sg:
- if (mq->bounce_sg)
- kfree(mq->bounce_sg);
- mq->bounce_sg = NULL;
+ kfree(mqrq_cur->bounce_sg);
+ mqrq_cur->bounce_sg = NULL;
+
cleanup_queue:
- if (mq->sg)
- kfree(mq->sg);
- mq->sg = NULL;
- if (mq->bounce_buf)
- kfree(mq->bounce_buf);
- mq->bounce_buf = NULL;
+ kfree(mqrq_cur->sg);
+ mqrq_cur->sg = NULL;
+ kfree(mqrq_cur->bounce_buf);
+ mqrq_cur->bounce_buf = NULL;
+
blk_cleanup_queue(mq->queue);
return ret;
}
@@ -238,6 +245,7 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
{
struct request_queue *q = mq->queue;
unsigned long flags;
+ struct mmc_queue_req *mqrq_cur = mq->mqrq_cur;
/* Make sure the queue isn't suspended, as that will deadlock */
mmc_queue_resume(mq);
@@ -251,16 +259,14 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
blk_start_queue(q);
spin_unlock_irqrestore(q->queue_lock, flags);
- if (mq->bounce_sg)
- kfree(mq->bounce_sg);
- mq->bounce_sg = NULL;
+ kfree(mqrq_cur->bounce_sg);
+ mqrq_cur->bounce_sg = NULL;
- kfree(mq->sg);
- mq->sg = NULL;
+ kfree(mqrq_cur->sg);
+ mqrq_cur->sg = NULL;
- if (mq->bounce_buf)
- kfree(mq->bounce_buf);
- mq->bounce_buf = NULL;
+ kfree(mqrq_cur->bounce_buf);
+ mqrq_cur->bounce_buf = NULL;
mq->card = NULL;
}
@@ -313,27 +319,27 @@ void mmc_queue_resume(struct mmc_queue *mq)
/*
* Prepare the sg list(s) to be handed of to the host driver
*/
-unsigned int mmc_queue_map_sg(struct mmc_queue *mq)
+unsigned int mmc_queue_map_sg(struct mmc_queue *mq, struct mmc_queue_req *mqrq)
{
unsigned int sg_len;
size_t buflen;
struct scatterlist *sg;
int i;
- if (!mq->bounce_buf)
- return blk_rq_map_sg(mq->queue, mq->req, mq->sg);
+ if (!mqrq->bounce_buf)
+ return blk_rq_map_sg(mq->queue, mqrq->req, mqrq->sg);
- BUG_ON(!mq->bounce_sg);
+ BUG_ON(!mqrq->bounce_sg);
- sg_len = blk_rq_map_sg(mq->queue, mq->req, mq->bounce_sg);
+ sg_len = blk_rq_map_sg(mq->queue, mqrq->req, mqrq->bounce_sg);
- mq->bounce_sg_len = sg_len;
+ mqrq->bounce_sg_len = sg_len;
buflen = 0;
- for_each_sg(mq->bounce_sg, sg, sg_len, i)
+ for_each_sg(mqrq->bounce_sg, sg, sg_len, i)
buflen += sg->length;
- sg_init_one(mq->sg, mq->bounce_buf, buflen);
+ sg_init_one(mqrq->sg, mqrq->bounce_buf, buflen);
return 1;
}
@@ -342,19 +348,19 @@ unsigned int mmc_queue_map_sg(struct mmc_queue *mq)
* If writing, bounce the data to the buffer before the request
* is sent to the host driver
*/
-void mmc_queue_bounce_pre(struct mmc_queue *mq)
+void mmc_queue_bounce_pre(struct mmc_queue_req *mqrq)
{
unsigned long flags;
- if (!mq->bounce_buf)
+ if (!mqrq->bounce_buf)
return;
- if (rq_data_dir(mq->req) != WRITE)
+ if (rq_data_dir(mqrq->req) != WRITE)
return;
local_irq_save(flags);
- sg_copy_to_buffer(mq->bounce_sg, mq->bounce_sg_len,
- mq->bounce_buf, mq->sg[0].length);
+ sg_copy_to_buffer(mqrq->bounce_sg, mqrq->bounce_sg_len,
+ mqrq->bounce_buf, mqrq->sg[0].length);
local_irq_restore(flags);
}
@@ -362,19 +368,18 @@ void mmc_queue_bounce_pre(struct mmc_queue *mq)
* If reading, bounce the data from the buffer after the request
* has been handled by the host driver
*/
-void mmc_queue_bounce_post(struct mmc_queue *mq)
+void mmc_queue_bounce_post(struct mmc_queue_req *mqrq)
{
unsigned long flags;
- if (!mq->bounce_buf)
+ if (!mqrq->bounce_buf)
return;
- if (rq_data_dir(mq->req) != READ)
+ if (rq_data_dir(mqrq->req) != READ)
return;
local_irq_save(flags);
- sg_copy_from_buffer(mq->bounce_sg, mq->bounce_sg_len,
- mq->bounce_buf, mq->sg[0].length);
+ sg_copy_from_buffer(mqrq->bounce_sg, mqrq->bounce_sg_len,
+ mqrq->bounce_buf, mqrq->sg[0].length);
local_irq_restore(flags);
}
-
diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h
index 64e66e0..96c440d 100644
--- a/drivers/mmc/card/queue.h
+++ b/drivers/mmc/card/queue.h
@@ -4,19 +4,24 @@
struct request;
struct task_struct;
+struct mmc_queue_req {
+ struct request *req;
+ struct scatterlist *sg;
+ char *bounce_buf;
+ struct scatterlist *bounce_sg;
+ unsigned int bounce_sg_len;
+};
+
struct mmc_queue {
struct mmc_card *card;
struct task_struct *thread;
struct semaphore thread_sem;
unsigned int flags;
- struct request *req;
int (*issue_fn)(struct mmc_queue *, struct request *);
void *data;
struct request_queue *queue;
- struct scatterlist *sg;
- char *bounce_buf;
- struct scatterlist *bounce_sg;
- unsigned int bounce_sg_len;
+ struct mmc_queue_req mqrq[1];
+ struct mmc_queue_req *mqrq_cur;
};
extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *);
@@ -24,8 +29,9 @@ extern void mmc_cleanup_queue(struct mmc_queue *);
extern void mmc_queue_suspend(struct mmc_queue *);
extern void mmc_queue_resume(struct mmc_queue *);
-extern unsigned int mmc_queue_map_sg(struct mmc_queue *);
-extern void mmc_queue_bounce_pre(struct mmc_queue *);
-extern void mmc_queue_bounce_post(struct mmc_queue *);
+extern unsigned int mmc_queue_map_sg(struct mmc_queue *,
+ struct mmc_queue_req *);
+extern void mmc_queue_bounce_pre(struct mmc_queue_req *);
+extern void mmc_queue_bounce_post(struct mmc_queue_req *);
#endif
--
1.7.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 2/5] mmc: Add a block request prepare function
2011-01-12 18:13 ` Per Forlin
@ 2011-01-12 18:14 ` Per Forlin
-1 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:14 UTC (permalink / raw)
To: linux-mmc, linux-arm-kernel, linux-kernel, dev; +Cc: Chris Ball, Per Forlin
Break out code from mmc_blk_issue_rw_rq to create a
block request prepare function. This doesn't change
any functionallity.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
drivers/mmc/card/block.c | 173 +++++++++++++++++++++++++---------------------
1 files changed, 94 insertions(+), 79 deletions(-)
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index be51bde..3f98b15 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -331,97 +331,112 @@ out:
return err ? 0 : 1;
}
-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_rw_rq_prep(struct mmc_blk_request *brq,
+ struct mmc_queue_req *mqrq,
+ struct request *req,
+ struct mmc_card *card,
+ int disable_multi,
+ struct mmc_queue *mq)
{
- struct mmc_blk_data *md = mq->data;
- struct mmc_card *card = md->queue.card;
- struct mmc_blk_request brq;
- int ret = 1, disable_multi = 0;
+ u32 readcmd, writecmd;
- mmc_claim_host(card->host);
- do {
- struct mmc_command cmd;
- u32 readcmd, writecmd, status = 0;
-
- memset(&brq, 0, sizeof(struct mmc_blk_request));
- brq.mrq.cmd = &brq.cmd;
- brq.mrq.data = &brq.data;
-
- brq.cmd.arg = blk_rq_pos(req);
- if (!mmc_card_blockaddr(card))
- brq.cmd.arg <<= 9;
- brq.cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
- brq.data.blksz = 512;
- brq.stop.opcode = MMC_STOP_TRANSMISSION;
- brq.stop.arg = 0;
- brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
- brq.data.blocks = blk_rq_sectors(req);
+ memset(brq, 0, sizeof(struct mmc_blk_request));
- /*
- * The block layer doesn't support all sector count
- * restrictions, so we need to be prepared for too big
- * requests.
- */
- if (brq.data.blocks > card->host->max_blk_count)
- brq.data.blocks = card->host->max_blk_count;
+ brq->mrq.cmd = &brq->cmd;
+ brq->mrq.data = &brq->data;
- /*
- * After a read error, we redo the request one sector at a time
- * in order to accurately determine which sectors can be read
- * successfully.
- */
- if (disable_multi && brq.data.blocks > 1)
- brq.data.blocks = 1;
-
- if (brq.data.blocks > 1) {
- /* SPI multiblock writes terminate using a special
- * token, not a STOP_TRANSMISSION request.
- */
- if (!mmc_host_is_spi(card->host)
- || rq_data_dir(req) == READ)
- brq.mrq.stop = &brq.stop;
- readcmd = MMC_READ_MULTIPLE_BLOCK;
- writecmd = MMC_WRITE_MULTIPLE_BLOCK;
- } else {
- brq.mrq.stop = NULL;
- readcmd = MMC_READ_SINGLE_BLOCK;
- writecmd = MMC_WRITE_BLOCK;
- }
- if (rq_data_dir(req) == READ) {
- brq.cmd.opcode = readcmd;
- brq.data.flags |= MMC_DATA_READ;
- } else {
- brq.cmd.opcode = writecmd;
- brq.data.flags |= MMC_DATA_WRITE;
- }
+ brq->cmd.arg = blk_rq_pos(req);
+ if (!mmc_card_blockaddr(card))
+ brq->cmd.arg <<= 9;
+ brq->cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
+ brq->data.blksz = 512;
+ brq->stop.opcode = MMC_STOP_TRANSMISSION;
+ brq->stop.arg = 0;
+ brq->stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
+ brq->data.blocks = blk_rq_sectors(req);
- mmc_set_data_timeout(&brq.data, card);
+ /*
+ * The block layer doesn't support all sector count
+ * restrictions, so we need to be prepared for too big
+ * requests.
+ */
+ if (brq->data.blocks > card->host->max_blk_count)
+ brq->data.blocks = card->host->max_blk_count;
- brq.data.sg = mq->mqrq_cur->sg;
- brq.data.sg_len = mmc_queue_map_sg(mq, mq->mqrq_cur);
+ /*
+ * After a read error, we redo the request one sector at a time
+ * in order to accurately determine which sectors can be read
+ * successfully.
+ */
+ if (disable_multi && brq->data.blocks > 1)
+ brq->data.blocks = 1;
- /*
- * Adjust the sg list so it is the same size as the
- * request.
+
+ if (brq->data.blocks > 1) {
+ /* SPI multiblock writes terminate using a special
+ * token, not a STOP_TRANSMISSION request.
*/
- if (brq.data.blocks != blk_rq_sectors(req)) {
- int i, data_size = brq.data.blocks << 9;
- struct scatterlist *sg;
-
- for_each_sg(brq.data.sg, sg, brq.data.sg_len, i) {
- data_size -= sg->length;
- if (data_size <= 0) {
- sg->length += data_size;
- i++;
- break;
- }
+ if (!mmc_host_is_spi(card->host)
+ || rq_data_dir(req) == READ)
+ brq->mrq.stop = &brq->stop;
+ readcmd = MMC_READ_MULTIPLE_BLOCK;
+ writecmd = MMC_WRITE_MULTIPLE_BLOCK;
+ } else {
+ brq->mrq.stop = NULL;
+ readcmd = MMC_READ_SINGLE_BLOCK;
+ writecmd = MMC_WRITE_BLOCK;
+ }
+ if (rq_data_dir(req) == READ) {
+ brq->cmd.opcode = readcmd;
+ brq->data.flags |= MMC_DATA_READ;
+ } else {
+ brq->cmd.opcode = writecmd;
+ brq->data.flags |= MMC_DATA_WRITE;
+ }
+
+ mmc_set_data_timeout(&brq->data, card);
+
+ brq->data.sg = mqrq->sg;
+ brq->data.sg_len = mmc_queue_map_sg(mq, mqrq);
+
+ /*
+ * Adjust the sg list so it is the same size as the
+ * request.
+ */
+ if (brq->data.blocks != blk_rq_sectors(req)) {
+ int i, data_size = brq->data.blocks << 9;
+ struct scatterlist *sg;
+
+ for_each_sg(brq->data.sg, sg, brq->data.sg_len, i) {
+ data_size -= sg->length;
+ if (data_size <= 0) {
+ sg->length += data_size;
+ i++;
+ break;
}
- brq.data.sg_len = i;
+ brq->data.sg_len = i;
}
+ }
+
+ mmc_queue_bounce_pre(mqrq);
+}
- mmc_queue_bounce_pre(mq->mqrq_cur);
+static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+{
+ struct mmc_blk_data *md = mq->data;
+ struct mmc_card *card = md->queue.card;
+ struct mmc_blk_request brq;
+ int ret = 1, disable_multi = 0;
+
+ mmc_claim_host(card->host);
+
+ do {
+ struct mmc_command cmd;
+ u32 status = 0;
+ mmc_blk_issue_rw_rq_prep(&brq, mq->mqrq_cur, req, card,
+ disable_multi, mq);
mmc_wait_for_req(card->host, &brq.mrq);
mmc_queue_bounce_post(mq->mqrq_cur);
--
1.7.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 2/5] mmc: Add a block request prepare function
@ 2011-01-12 18:14 ` Per Forlin
0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:14 UTC (permalink / raw)
To: linux-arm-kernel
Break out code from mmc_blk_issue_rw_rq to create a
block request prepare function. This doesn't change
any functionallity.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
drivers/mmc/card/block.c | 173 +++++++++++++++++++++++++---------------------
1 files changed, 94 insertions(+), 79 deletions(-)
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index be51bde..3f98b15 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -331,97 +331,112 @@ out:
return err ? 0 : 1;
}
-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_issue_rw_rq_prep(struct mmc_blk_request *brq,
+ struct mmc_queue_req *mqrq,
+ struct request *req,
+ struct mmc_card *card,
+ int disable_multi,
+ struct mmc_queue *mq)
{
- struct mmc_blk_data *md = mq->data;
- struct mmc_card *card = md->queue.card;
- struct mmc_blk_request brq;
- int ret = 1, disable_multi = 0;
+ u32 readcmd, writecmd;
- mmc_claim_host(card->host);
- do {
- struct mmc_command cmd;
- u32 readcmd, writecmd, status = 0;
-
- memset(&brq, 0, sizeof(struct mmc_blk_request));
- brq.mrq.cmd = &brq.cmd;
- brq.mrq.data = &brq.data;
-
- brq.cmd.arg = blk_rq_pos(req);
- if (!mmc_card_blockaddr(card))
- brq.cmd.arg <<= 9;
- brq.cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
- brq.data.blksz = 512;
- brq.stop.opcode = MMC_STOP_TRANSMISSION;
- brq.stop.arg = 0;
- brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
- brq.data.blocks = blk_rq_sectors(req);
+ memset(brq, 0, sizeof(struct mmc_blk_request));
- /*
- * The block layer doesn't support all sector count
- * restrictions, so we need to be prepared for too big
- * requests.
- */
- if (brq.data.blocks > card->host->max_blk_count)
- brq.data.blocks = card->host->max_blk_count;
+ brq->mrq.cmd = &brq->cmd;
+ brq->mrq.data = &brq->data;
- /*
- * After a read error, we redo the request one sector at a time
- * in order to accurately determine which sectors can be read
- * successfully.
- */
- if (disable_multi && brq.data.blocks > 1)
- brq.data.blocks = 1;
-
- if (brq.data.blocks > 1) {
- /* SPI multiblock writes terminate using a special
- * token, not a STOP_TRANSMISSION request.
- */
- if (!mmc_host_is_spi(card->host)
- || rq_data_dir(req) == READ)
- brq.mrq.stop = &brq.stop;
- readcmd = MMC_READ_MULTIPLE_BLOCK;
- writecmd = MMC_WRITE_MULTIPLE_BLOCK;
- } else {
- brq.mrq.stop = NULL;
- readcmd = MMC_READ_SINGLE_BLOCK;
- writecmd = MMC_WRITE_BLOCK;
- }
- if (rq_data_dir(req) == READ) {
- brq.cmd.opcode = readcmd;
- brq.data.flags |= MMC_DATA_READ;
- } else {
- brq.cmd.opcode = writecmd;
- brq.data.flags |= MMC_DATA_WRITE;
- }
+ brq->cmd.arg = blk_rq_pos(req);
+ if (!mmc_card_blockaddr(card))
+ brq->cmd.arg <<= 9;
+ brq->cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
+ brq->data.blksz = 512;
+ brq->stop.opcode = MMC_STOP_TRANSMISSION;
+ brq->stop.arg = 0;
+ brq->stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
+ brq->data.blocks = blk_rq_sectors(req);
- mmc_set_data_timeout(&brq.data, card);
+ /*
+ * The block layer doesn't support all sector count
+ * restrictions, so we need to be prepared for too big
+ * requests.
+ */
+ if (brq->data.blocks > card->host->max_blk_count)
+ brq->data.blocks = card->host->max_blk_count;
- brq.data.sg = mq->mqrq_cur->sg;
- brq.data.sg_len = mmc_queue_map_sg(mq, mq->mqrq_cur);
+ /*
+ * After a read error, we redo the request one sector at a time
+ * in order to accurately determine which sectors can be read
+ * successfully.
+ */
+ if (disable_multi && brq->data.blocks > 1)
+ brq->data.blocks = 1;
- /*
- * Adjust the sg list so it is the same size as the
- * request.
+
+ if (brq->data.blocks > 1) {
+ /* SPI multiblock writes terminate using a special
+ * token, not a STOP_TRANSMISSION request.
*/
- if (brq.data.blocks != blk_rq_sectors(req)) {
- int i, data_size = brq.data.blocks << 9;
- struct scatterlist *sg;
-
- for_each_sg(brq.data.sg, sg, brq.data.sg_len, i) {
- data_size -= sg->length;
- if (data_size <= 0) {
- sg->length += data_size;
- i++;
- break;
- }
+ if (!mmc_host_is_spi(card->host)
+ || rq_data_dir(req) == READ)
+ brq->mrq.stop = &brq->stop;
+ readcmd = MMC_READ_MULTIPLE_BLOCK;
+ writecmd = MMC_WRITE_MULTIPLE_BLOCK;
+ } else {
+ brq->mrq.stop = NULL;
+ readcmd = MMC_READ_SINGLE_BLOCK;
+ writecmd = MMC_WRITE_BLOCK;
+ }
+ if (rq_data_dir(req) == READ) {
+ brq->cmd.opcode = readcmd;
+ brq->data.flags |= MMC_DATA_READ;
+ } else {
+ brq->cmd.opcode = writecmd;
+ brq->data.flags |= MMC_DATA_WRITE;
+ }
+
+ mmc_set_data_timeout(&brq->data, card);
+
+ brq->data.sg = mqrq->sg;
+ brq->data.sg_len = mmc_queue_map_sg(mq, mqrq);
+
+ /*
+ * Adjust the sg list so it is the same size as the
+ * request.
+ */
+ if (brq->data.blocks != blk_rq_sectors(req)) {
+ int i, data_size = brq->data.blocks << 9;
+ struct scatterlist *sg;
+
+ for_each_sg(brq->data.sg, sg, brq->data.sg_len, i) {
+ data_size -= sg->length;
+ if (data_size <= 0) {
+ sg->length += data_size;
+ i++;
+ break;
}
- brq.data.sg_len = i;
+ brq->data.sg_len = i;
}
+ }
+
+ mmc_queue_bounce_pre(mqrq);
+}
- mmc_queue_bounce_pre(mq->mqrq_cur);
+static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+{
+ struct mmc_blk_data *md = mq->data;
+ struct mmc_card *card = md->queue.card;
+ struct mmc_blk_request brq;
+ int ret = 1, disable_multi = 0;
+
+ mmc_claim_host(card->host);
+
+ do {
+ struct mmc_command cmd;
+ u32 status = 0;
+ mmc_blk_issue_rw_rq_prep(&brq, mq->mqrq_cur, req, card,
+ disable_multi, mq);
mmc_wait_for_req(card->host, &brq.mrq);
mmc_queue_bounce_post(mq->mqrq_cur);
--
1.7.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 3/5] mmc: Add a second mmc queue request member
2011-01-12 18:13 ` Per Forlin
@ 2011-01-12 18:14 ` Per Forlin
-1 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:14 UTC (permalink / raw)
To: linux-mmc, linux-arm-kernel, linux-kernel, dev; +Cc: Chris Ball, Per Forlin
Add an additional mmc queue request instance to make way for
double buffering. One request may be active while the
other request is being prepared.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
drivers/mmc/card/queue.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
drivers/mmc/card/queue.h | 4 +++-
2 files changed, 45 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 8a8d88b..30d4707 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -131,6 +131,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
u64 limit = BLK_BOUNCE_HIGH;
int ret;
struct mmc_queue_req *mqrq_cur = &mq->mqrq[0];
+ struct mmc_queue_req *mqrq_prev = &mq->mqrq[1];
if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
limit = *mmc_dev(host)->dma_mask;
@@ -141,7 +142,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
return -ENOMEM;
memset(&mq->mqrq_cur, 0, sizeof(mq->mqrq_cur));
+ memset(&mq->mqrq_prev, 0, sizeof(mq->mqrq_prev));
mq->mqrq_cur = mqrq_cur;
+ mq->mqrq_prev = mqrq_prev;
mq->queue->queuedata = mq;
blk_queue_prep_rq(mq->queue, mmc_prep_request);
@@ -182,9 +185,17 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
"allocate bounce cur buffer\n",
mmc_card_name(card));
}
+ mqrq_prev->bounce_buf = kmalloc(bouncesz, GFP_KERNEL);
+ if (!mqrq_prev->bounce_buf) {
+ printk(KERN_WARNING "%s: unable to "
+ "allocate bounce prev buffer\n",
+ mmc_card_name(card));
+ kfree(mqrq_cur->bounce_buf);
+ mqrq_cur->bounce_buf = NULL;
+ }
}
- if (mqrq_cur->bounce_buf) {
+ if (mqrq_cur->bounce_buf && mqrq_prev->bounce_buf) {
blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_ANY);
blk_queue_max_hw_sectors(mq->queue, bouncesz / 512);
blk_queue_max_segments(mq->queue, bouncesz / 512);
@@ -199,11 +210,19 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
if (ret)
goto cleanup_queue;
+ mqrq_prev->sg = mmc_alloc_sg(1, &ret);
+ if (ret)
+ goto cleanup_queue;
+
+ mqrq_prev->bounce_sg =
+ mmc_alloc_sg(bouncesz / 512, &ret);
+ if (ret)
+ goto cleanup_queue;
}
}
#endif
- if (!mqrq_cur->bounce_buf) {
+ if (!mqrq_cur->bounce_buf && !mqrq_prev->bounce_buf) {
blk_queue_bounce_limit(mq->queue, limit);
blk_queue_max_hw_sectors(mq->queue,
min(host->max_blk_count, host->max_req_size / 512));
@@ -214,6 +233,10 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
if (ret)
goto cleanup_queue;
+
+ mqrq_prev->sg = mmc_alloc_sg(host->max_segs, &ret);
+ if (ret)
+ goto cleanup_queue;
}
sema_init(&mq->thread_sem, 1);
@@ -230,6 +253,8 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
free_bounce_sg:
kfree(mqrq_cur->bounce_sg);
mqrq_cur->bounce_sg = NULL;
+ kfree(mqrq_prev->bounce_sg);
+ mqrq_prev->bounce_sg = NULL;
cleanup_queue:
kfree(mqrq_cur->sg);
@@ -237,6 +262,11 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
kfree(mqrq_cur->bounce_buf);
mqrq_cur->bounce_buf = NULL;
+ kfree(mqrq_prev->sg);
+ mqrq_prev->sg = NULL;
+ kfree(mqrq_prev->bounce_buf);
+ mqrq_prev->bounce_buf = NULL;
+
blk_cleanup_queue(mq->queue);
return ret;
}
@@ -246,6 +276,7 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
struct request_queue *q = mq->queue;
unsigned long flags;
struct mmc_queue_req *mqrq_cur = mq->mqrq_cur;
+ struct mmc_queue_req *mqrq_prev = mq->mqrq_prev;
/* Make sure the queue isn't suspended, as that will deadlock */
mmc_queue_resume(mq);
@@ -268,6 +299,15 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
kfree(mqrq_cur->bounce_buf);
mqrq_cur->bounce_buf = NULL;
+ kfree(mqrq_prev->bounce_sg);
+ mqrq_prev->bounce_sg = NULL;
+
+ kfree(mqrq_prev->sg);
+ mqrq_prev->sg = NULL;
+
+ kfree(mqrq_prev->bounce_buf);
+ mqrq_prev->bounce_buf = NULL;
+
mq->card = NULL;
}
EXPORT_SYMBOL(mmc_cleanup_queue);
diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h
index 96c440d..f65eb88 100644
--- a/drivers/mmc/card/queue.h
+++ b/drivers/mmc/card/queue.h
@@ -20,8 +20,10 @@ struct mmc_queue {
int (*issue_fn)(struct mmc_queue *, struct request *);
void *data;
struct request_queue *queue;
- struct mmc_queue_req mqrq[1];
+
+ struct mmc_queue_req mqrq[2];
struct mmc_queue_req *mqrq_cur;
+ struct mmc_queue_req *mqrq_prev;
};
extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *);
--
1.7.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 3/5] mmc: Add a second mmc queue request member
@ 2011-01-12 18:14 ` Per Forlin
0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:14 UTC (permalink / raw)
To: linux-arm-kernel
Add an additional mmc queue request instance to make way for
double buffering. One request may be active while the
other request is being prepared.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
drivers/mmc/card/queue.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
drivers/mmc/card/queue.h | 4 +++-
2 files changed, 45 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 8a8d88b..30d4707 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -131,6 +131,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
u64 limit = BLK_BOUNCE_HIGH;
int ret;
struct mmc_queue_req *mqrq_cur = &mq->mqrq[0];
+ struct mmc_queue_req *mqrq_prev = &mq->mqrq[1];
if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
limit = *mmc_dev(host)->dma_mask;
@@ -141,7 +142,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
return -ENOMEM;
memset(&mq->mqrq_cur, 0, sizeof(mq->mqrq_cur));
+ memset(&mq->mqrq_prev, 0, sizeof(mq->mqrq_prev));
mq->mqrq_cur = mqrq_cur;
+ mq->mqrq_prev = mqrq_prev;
mq->queue->queuedata = mq;
blk_queue_prep_rq(mq->queue, mmc_prep_request);
@@ -182,9 +185,17 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
"allocate bounce cur buffer\n",
mmc_card_name(card));
}
+ mqrq_prev->bounce_buf = kmalloc(bouncesz, GFP_KERNEL);
+ if (!mqrq_prev->bounce_buf) {
+ printk(KERN_WARNING "%s: unable to "
+ "allocate bounce prev buffer\n",
+ mmc_card_name(card));
+ kfree(mqrq_cur->bounce_buf);
+ mqrq_cur->bounce_buf = NULL;
+ }
}
- if (mqrq_cur->bounce_buf) {
+ if (mqrq_cur->bounce_buf && mqrq_prev->bounce_buf) {
blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_ANY);
blk_queue_max_hw_sectors(mq->queue, bouncesz / 512);
blk_queue_max_segments(mq->queue, bouncesz / 512);
@@ -199,11 +210,19 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
if (ret)
goto cleanup_queue;
+ mqrq_prev->sg = mmc_alloc_sg(1, &ret);
+ if (ret)
+ goto cleanup_queue;
+
+ mqrq_prev->bounce_sg =
+ mmc_alloc_sg(bouncesz / 512, &ret);
+ if (ret)
+ goto cleanup_queue;
}
}
#endif
- if (!mqrq_cur->bounce_buf) {
+ if (!mqrq_cur->bounce_buf && !mqrq_prev->bounce_buf) {
blk_queue_bounce_limit(mq->queue, limit);
blk_queue_max_hw_sectors(mq->queue,
min(host->max_blk_count, host->max_req_size / 512));
@@ -214,6 +233,10 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
if (ret)
goto cleanup_queue;
+
+ mqrq_prev->sg = mmc_alloc_sg(host->max_segs, &ret);
+ if (ret)
+ goto cleanup_queue;
}
sema_init(&mq->thread_sem, 1);
@@ -230,6 +253,8 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
free_bounce_sg:
kfree(mqrq_cur->bounce_sg);
mqrq_cur->bounce_sg = NULL;
+ kfree(mqrq_prev->bounce_sg);
+ mqrq_prev->bounce_sg = NULL;
cleanup_queue:
kfree(mqrq_cur->sg);
@@ -237,6 +262,11 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
kfree(mqrq_cur->bounce_buf);
mqrq_cur->bounce_buf = NULL;
+ kfree(mqrq_prev->sg);
+ mqrq_prev->sg = NULL;
+ kfree(mqrq_prev->bounce_buf);
+ mqrq_prev->bounce_buf = NULL;
+
blk_cleanup_queue(mq->queue);
return ret;
}
@@ -246,6 +276,7 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
struct request_queue *q = mq->queue;
unsigned long flags;
struct mmc_queue_req *mqrq_cur = mq->mqrq_cur;
+ struct mmc_queue_req *mqrq_prev = mq->mqrq_prev;
/* Make sure the queue isn't suspended, as that will deadlock */
mmc_queue_resume(mq);
@@ -268,6 +299,15 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
kfree(mqrq_cur->bounce_buf);
mqrq_cur->bounce_buf = NULL;
+ kfree(mqrq_prev->bounce_sg);
+ mqrq_prev->bounce_sg = NULL;
+
+ kfree(mqrq_prev->sg);
+ mqrq_prev->sg = NULL;
+
+ kfree(mqrq_prev->bounce_buf);
+ mqrq_prev->bounce_buf = NULL;
+
mq->card = NULL;
}
EXPORT_SYMBOL(mmc_cleanup_queue);
diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h
index 96c440d..f65eb88 100644
--- a/drivers/mmc/card/queue.h
+++ b/drivers/mmc/card/queue.h
@@ -20,8 +20,10 @@ struct mmc_queue {
int (*issue_fn)(struct mmc_queue *, struct request *);
void *data;
struct request_queue *queue;
- struct mmc_queue_req mqrq[1];
+
+ struct mmc_queue_req mqrq[2];
struct mmc_queue_req *mqrq_cur;
+ struct mmc_queue_req *mqrq_prev;
};
extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *);
--
1.7.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 4/5] mmc: Store the mmc block request struct in mmc queue
2011-01-12 18:13 ` Per Forlin
@ 2011-01-12 18:14 ` Per Forlin
-1 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:14 UTC (permalink / raw)
To: linux-mmc, linux-arm-kernel, linux-kernel, dev; +Cc: Chris Ball, Per Forlin
Move the mmc block request to the mmc queue struct in
order to make way for processing two brqs simultanously.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
drivers/mmc/card/block.c | 68 +++++++++++++++++++++-------------------------
drivers/mmc/card/queue.h | 9 +++++-
2 files changed, 39 insertions(+), 38 deletions(-)
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 3f98b15..028b2b8 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -165,13 +165,6 @@ static const struct block_device_operations mmc_bdops = {
.owner = THIS_MODULE,
};
-struct mmc_blk_request {
- struct mmc_request mrq;
- struct mmc_command cmd;
- struct mmc_command stop;
- struct mmc_data data;
-};
-
static u32 mmc_sd_num_wr_blocks(struct mmc_card *card)
{
int err;
@@ -422,11 +415,11 @@ static void mmc_blk_issue_rw_rq_prep(struct mmc_blk_request *brq,
mmc_queue_bounce_pre(mqrq);
}
-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
{
struct mmc_blk_data *md = mq->data;
struct mmc_card *card = md->queue.card;
- struct mmc_blk_request brq;
+ struct mmc_blk_request *brqc = &mq->mqrq_cur->brq;
int ret = 1, disable_multi = 0;
mmc_claim_host(card->host);
@@ -435,9 +428,9 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
struct mmc_command cmd;
u32 status = 0;
- mmc_blk_issue_rw_rq_prep(&brq, mq->mqrq_cur, req, card,
+ mmc_blk_issue_rw_rq_prep(brqc, mq->mqrq_cur, rqc, card,
disable_multi, mq);
- mmc_wait_for_req(card->host, &brq.mrq);
+ mmc_wait_for_req(card->host, &brqc->mrq);
mmc_queue_bounce_post(mq->mqrq_cur);
@@ -446,43 +439,43 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
* until later as we need to wait for the card to leave
* programming mode even when things go wrong.
*/
- if (brq.cmd.error || brq.data.error || brq.stop.error) {
- if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
+ if (brqc->cmd.error || brqc->data.error || brqc->stop.error) {
+ if (brqc->data.blocks > 1 && rq_data_dir(rqc) == READ) {
/* Redo read one sector at a time */
printk(KERN_WARNING "%s: retrying using single "
- "block read\n", req->rq_disk->disk_name);
+ "block read\n", rqc->rq_disk->disk_name);
disable_multi = 1;
continue;
}
- status = get_card_status(card, req);
+ status = get_card_status(card, rqc);
}
- if (brq.cmd.error) {
+ if (brqc->cmd.error) {
printk(KERN_ERR "%s: error %d sending read/write "
"command, response %#x, card status %#x\n",
- req->rq_disk->disk_name, brq.cmd.error,
- brq.cmd.resp[0], status);
+ rqc->rq_disk->disk_name, brqc->cmd.error,
+ brqc->cmd.resp[0], status);
}
- if (brq.data.error) {
- if (brq.data.error == -ETIMEDOUT && brq.mrq.stop)
+ if (brqc->data.error) {
+ if (brqc->data.error == -ETIMEDOUT && brqc->mrq.stop)
/* 'Stop' response contains card status */
- status = brq.mrq.stop->resp[0];
+ status = brqc->mrq.stop->resp[0];
printk(KERN_ERR "%s: error %d transferring data,"
" sector %u, nr %u, card status %#x\n",
- req->rq_disk->disk_name, brq.data.error,
- (unsigned)blk_rq_pos(req),
- (unsigned)blk_rq_sectors(req), status);
+ rqc->rq_disk->disk_name, brqc->data.error,
+ (unsigned)blk_rq_pos(rqc),
+ (unsigned)blk_rq_sectors(rqc), status);
}
- if (brq.stop.error) {
+ if (brqc->stop.error) {
printk(KERN_ERR "%s: error %d sending stop command, "
"response %#x, card status %#x\n",
- req->rq_disk->disk_name, brq.stop.error,
- brq.stop.resp[0], status);
+ rqc->rq_disk->disk_name, brqc->stop.error,
+ brqc->stop.resp[0], status);
}
- if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {
+ if (!mmc_host_is_spi(card->host) && rq_data_dir(rqc) != READ) {
do {
int err;
@@ -492,7 +485,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
err = mmc_wait_for_cmd(card->host, &cmd, 5);
if (err) {
printk(KERN_ERR "%s: error %d requesting status\n",
- req->rq_disk->disk_name, err);
+ rqc->rq_disk->disk_name, err);
goto cmd_err;
}
/*
@@ -506,21 +499,22 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
#if 0
if (cmd.resp[0] & ~0x00000900)
printk(KERN_ERR "%s: status = %08x\n",
- req->rq_disk->disk_name, cmd.resp[0]);
+ rqc->rq_disk->disk_name, cmd.resp[0]);
if (mmc_decode_status(cmd.resp))
goto cmd_err;
#endif
}
- if (brq.cmd.error || brq.stop.error || brq.data.error) {
- if (rq_data_dir(req) == READ) {
+ if (brqc->cmd.error || brqc->stop.error || brqc->data.error) {
+ if (rq_data_dir(rqc) == READ) {
/*
* After an error, we redo I/O one sector at a
* time, so we only reach here after trying to
* read a single sector.
*/
spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, -EIO, brq.data.blksz);
+ ret = __blk_end_request(rqc, -EIO,
+ brqc->data.blksz);
spin_unlock_irq(&md->lock);
continue;
}
@@ -531,7 +525,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
* A block was successfully transferred.
*/
spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
+ ret = __blk_end_request(rqc, 0, brqc->data.bytes_xfered);
spin_unlock_irq(&md->lock);
} while (ret);
@@ -554,12 +548,12 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
blocks = mmc_sd_num_wr_blocks(card);
if (blocks != (u32)-1) {
spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, 0, blocks << 9);
+ ret = __blk_end_request(rqc, 0, blocks << 9);
spin_unlock_irq(&md->lock);
}
} else {
spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
+ ret = __blk_end_request(rqc, 0, brqc->data.bytes_xfered);
spin_unlock_irq(&md->lock);
}
@@ -567,7 +561,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
spin_lock_irq(&md->lock);
while (ret)
- ret = __blk_end_request(req, -EIO, blk_rq_cur_bytes(req));
+ ret = __blk_end_request(rqc, -EIO, blk_rq_cur_bytes(rqc));
spin_unlock_irq(&md->lock);
return 0;
diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h
index f65eb88..bf3dee9 100644
--- a/drivers/mmc/card/queue.h
+++ b/drivers/mmc/card/queue.h
@@ -4,12 +4,20 @@
struct request;
struct task_struct;
+struct mmc_blk_request {
+ struct mmc_request mrq;
+ struct mmc_command cmd;
+ struct mmc_command stop;
+ struct mmc_data data;
+};
+
struct mmc_queue_req {
struct request *req;
struct scatterlist *sg;
char *bounce_buf;
struct scatterlist *bounce_sg;
unsigned int bounce_sg_len;
+ struct mmc_blk_request brq;
};
struct mmc_queue {
@@ -20,7 +28,6 @@ struct mmc_queue {
int (*issue_fn)(struct mmc_queue *, struct request *);
void *data;
struct request_queue *queue;
-
struct mmc_queue_req mqrq[2];
struct mmc_queue_req *mqrq_cur;
struct mmc_queue_req *mqrq_prev;
--
1.7.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 4/5] mmc: Store the mmc block request struct in mmc queue
@ 2011-01-12 18:14 ` Per Forlin
0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:14 UTC (permalink / raw)
To: linux-arm-kernel
Move the mmc block request to the mmc queue struct in
order to make way for processing two brqs simultanously.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
drivers/mmc/card/block.c | 68 +++++++++++++++++++++-------------------------
drivers/mmc/card/queue.h | 9 +++++-
2 files changed, 39 insertions(+), 38 deletions(-)
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 3f98b15..028b2b8 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -165,13 +165,6 @@ static const struct block_device_operations mmc_bdops = {
.owner = THIS_MODULE,
};
-struct mmc_blk_request {
- struct mmc_request mrq;
- struct mmc_command cmd;
- struct mmc_command stop;
- struct mmc_data data;
-};
-
static u32 mmc_sd_num_wr_blocks(struct mmc_card *card)
{
int err;
@@ -422,11 +415,11 @@ static void mmc_blk_issue_rw_rq_prep(struct mmc_blk_request *brq,
mmc_queue_bounce_pre(mqrq);
}
-static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
{
struct mmc_blk_data *md = mq->data;
struct mmc_card *card = md->queue.card;
- struct mmc_blk_request brq;
+ struct mmc_blk_request *brqc = &mq->mqrq_cur->brq;
int ret = 1, disable_multi = 0;
mmc_claim_host(card->host);
@@ -435,9 +428,9 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
struct mmc_command cmd;
u32 status = 0;
- mmc_blk_issue_rw_rq_prep(&brq, mq->mqrq_cur, req, card,
+ mmc_blk_issue_rw_rq_prep(brqc, mq->mqrq_cur, rqc, card,
disable_multi, mq);
- mmc_wait_for_req(card->host, &brq.mrq);
+ mmc_wait_for_req(card->host, &brqc->mrq);
mmc_queue_bounce_post(mq->mqrq_cur);
@@ -446,43 +439,43 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
* until later as we need to wait for the card to leave
* programming mode even when things go wrong.
*/
- if (brq.cmd.error || brq.data.error || brq.stop.error) {
- if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
+ if (brqc->cmd.error || brqc->data.error || brqc->stop.error) {
+ if (brqc->data.blocks > 1 && rq_data_dir(rqc) == READ) {
/* Redo read one sector at a time */
printk(KERN_WARNING "%s: retrying using single "
- "block read\n", req->rq_disk->disk_name);
+ "block read\n", rqc->rq_disk->disk_name);
disable_multi = 1;
continue;
}
- status = get_card_status(card, req);
+ status = get_card_status(card, rqc);
}
- if (brq.cmd.error) {
+ if (brqc->cmd.error) {
printk(KERN_ERR "%s: error %d sending read/write "
"command, response %#x, card status %#x\n",
- req->rq_disk->disk_name, brq.cmd.error,
- brq.cmd.resp[0], status);
+ rqc->rq_disk->disk_name, brqc->cmd.error,
+ brqc->cmd.resp[0], status);
}
- if (brq.data.error) {
- if (brq.data.error == -ETIMEDOUT && brq.mrq.stop)
+ if (brqc->data.error) {
+ if (brqc->data.error == -ETIMEDOUT && brqc->mrq.stop)
/* 'Stop' response contains card status */
- status = brq.mrq.stop->resp[0];
+ status = brqc->mrq.stop->resp[0];
printk(KERN_ERR "%s: error %d transferring data,"
" sector %u, nr %u, card status %#x\n",
- req->rq_disk->disk_name, brq.data.error,
- (unsigned)blk_rq_pos(req),
- (unsigned)blk_rq_sectors(req), status);
+ rqc->rq_disk->disk_name, brqc->data.error,
+ (unsigned)blk_rq_pos(rqc),
+ (unsigned)blk_rq_sectors(rqc), status);
}
- if (brq.stop.error) {
+ if (brqc->stop.error) {
printk(KERN_ERR "%s: error %d sending stop command, "
"response %#x, card status %#x\n",
- req->rq_disk->disk_name, brq.stop.error,
- brq.stop.resp[0], status);
+ rqc->rq_disk->disk_name, brqc->stop.error,
+ brqc->stop.resp[0], status);
}
- if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {
+ if (!mmc_host_is_spi(card->host) && rq_data_dir(rqc) != READ) {
do {
int err;
@@ -492,7 +485,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
err = mmc_wait_for_cmd(card->host, &cmd, 5);
if (err) {
printk(KERN_ERR "%s: error %d requesting status\n",
- req->rq_disk->disk_name, err);
+ rqc->rq_disk->disk_name, err);
goto cmd_err;
}
/*
@@ -506,21 +499,22 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
#if 0
if (cmd.resp[0] & ~0x00000900)
printk(KERN_ERR "%s: status = %08x\n",
- req->rq_disk->disk_name, cmd.resp[0]);
+ rqc->rq_disk->disk_name, cmd.resp[0]);
if (mmc_decode_status(cmd.resp))
goto cmd_err;
#endif
}
- if (brq.cmd.error || brq.stop.error || brq.data.error) {
- if (rq_data_dir(req) == READ) {
+ if (brqc->cmd.error || brqc->stop.error || brqc->data.error) {
+ if (rq_data_dir(rqc) == READ) {
/*
* After an error, we redo I/O one sector at a
* time, so we only reach here after trying to
* read a single sector.
*/
spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, -EIO, brq.data.blksz);
+ ret = __blk_end_request(rqc, -EIO,
+ brqc->data.blksz);
spin_unlock_irq(&md->lock);
continue;
}
@@ -531,7 +525,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
* A block was successfully transferred.
*/
spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
+ ret = __blk_end_request(rqc, 0, brqc->data.bytes_xfered);
spin_unlock_irq(&md->lock);
} while (ret);
@@ -554,12 +548,12 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
blocks = mmc_sd_num_wr_blocks(card);
if (blocks != (u32)-1) {
spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, 0, blocks << 9);
+ ret = __blk_end_request(rqc, 0, blocks << 9);
spin_unlock_irq(&md->lock);
}
} else {
spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
+ ret = __blk_end_request(rqc, 0, brqc->data.bytes_xfered);
spin_unlock_irq(&md->lock);
}
@@ -567,7 +561,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
spin_lock_irq(&md->lock);
while (ret)
- ret = __blk_end_request(req, -EIO, blk_rq_cur_bytes(req));
+ ret = __blk_end_request(rqc, -EIO, blk_rq_cur_bytes(rqc));
spin_unlock_irq(&md->lock);
return 0;
diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h
index f65eb88..bf3dee9 100644
--- a/drivers/mmc/card/queue.h
+++ b/drivers/mmc/card/queue.h
@@ -4,12 +4,20 @@
struct request;
struct task_struct;
+struct mmc_blk_request {
+ struct mmc_request mrq;
+ struct mmc_command cmd;
+ struct mmc_command stop;
+ struct mmc_data data;
+};
+
struct mmc_queue_req {
struct request *req;
struct scatterlist *sg;
char *bounce_buf;
struct scatterlist *bounce_sg;
unsigned int bounce_sg_len;
+ struct mmc_blk_request brq;
};
struct mmc_queue {
@@ -20,7 +28,6 @@ struct mmc_queue {
int (*issue_fn)(struct mmc_queue *, struct request *);
void *data;
struct request_queue *queue;
-
struct mmc_queue_req mqrq[2];
struct mmc_queue_req *mqrq_cur;
struct mmc_queue_req *mqrq_prev;
--
1.7.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 5/5] mmc: Add double buffering for mmc block requests
2011-01-12 18:13 ` Per Forlin
@ 2011-01-12 18:14 ` Per Forlin
-1 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:14 UTC (permalink / raw)
To: linux-mmc, linux-arm-kernel, linux-kernel, dev; +Cc: Chris Ball, Per Forlin
Change mmc_blk_issue_rw_rq() to become asynchronous.
The execution flow looks like this:
The mmc-queue calls issue_rw_rq(), which sends the request
to the host and returns back to the mmc-queue. The mmc-queue calls
isuue_rw_rq() again with a new request. This new request is prepared,
in isuue_rw_rq(), then it waits for the active request to complete before
pushing it to the host. When to mmc-queue is empty it will call
isuue_rw_rq() with req=NULL to finish off the active request
without starting a new request.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
drivers/mmc/card/block.c | 170 +++++++++++++++++++++++++++++++++++----------
drivers/mmc/card/queue.c | 2 +-
drivers/mmc/core/core.c | 77 ++++++++++++++++++---
include/linux/mmc/core.h | 7 ++-
include/linux/mmc/host.h | 8 ++
5 files changed, 214 insertions(+), 50 deletions(-)
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 028b2b8..11e6e97 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -420,62 +420,98 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
struct mmc_blk_data *md = mq->data;
struct mmc_card *card = md->queue.card;
struct mmc_blk_request *brqc = &mq->mqrq_cur->brq;
- int ret = 1, disable_multi = 0;
+ struct mmc_blk_request *brqp = &mq->mqrq_prev->brq;
+ struct mmc_queue_req *mqrqp = mq->mqrq_prev;
+ struct request *rqp = mqrqp->req;
+ int ret = 0;
+ int disable_multi = 0;
+ bool complete_transfer = true;
+
+ if (!rqc && !rqp) {
+ brqc->mrq.data = NULL;
+ brqp->mrq.data = NULL;
+ return 0;
+ }
- mmc_claim_host(card->host);
+ /*
+ * TODO: Find out if it is OK to only claim host for the first request.
+ * For the first request the previous request is NULL
+ */
+ if (!rqp && rqc)
+ mmc_claim_host(card->host);
+
+ if (rqc) {
+ /* Prepare a new request */
+ mmc_blk_issue_rw_rq_prep(brqc, mq->mqrq_cur,
+ rqc, card, 0, mq);
+ mmc_pre_req(card->host, &brqc->mrq, !rqp);
+ }
do {
struct mmc_command cmd;
u32 status = 0;
- mmc_blk_issue_rw_rq_prep(brqc, mq->mqrq_cur, rqc, card,
- disable_multi, mq);
- mmc_wait_for_req(card->host, &brqc->mrq);
-
- mmc_queue_bounce_post(mq->mqrq_cur);
+ /* In case of error redo prepare and resend */
+ if (ret) {
+ mmc_blk_issue_rw_rq_prep(brqp, mqrqp, rqp, card,
+ disable_multi, mq);
+ mmc_pre_req(card->host, &brqc->mrq, !rqp);
+ mmc_start_req(card->host, &brqp->mrq);
+ }
+ /*
+ * If there is an ongoing request, indicated by rqp, wait for
+ * it to finish before starting a new one.
+ */
+ if (rqp) {
+ mmc_wait_for_req_done(&brqp->mrq);
+ } else {
+ /* start a new asynchronous request */
+ mmc_start_req(card->host, &brqc->mrq);
+ goto out;
+ }
/*
* Check for errors here, but don't jump to cmd_err
* until later as we need to wait for the card to leave
* programming mode even when things go wrong.
*/
- if (brqc->cmd.error || brqc->data.error || brqc->stop.error) {
- if (brqc->data.blocks > 1 && rq_data_dir(rqc) == READ) {
+ if (brqp->cmd.error || brqp->data.error || brqp->stop.error) {
+ if (brqp->data.blocks > 1 && rq_data_dir(rqp) == READ) {
/* Redo read one sector at a time */
printk(KERN_WARNING "%s: retrying using single "
- "block read\n", rqc->rq_disk->disk_name);
+ "block read\n", rqp->rq_disk->disk_name);
disable_multi = 1;
continue;
}
- status = get_card_status(card, rqc);
+ status = get_card_status(card, rqp);
}
- if (brqc->cmd.error) {
+ if (brqp->cmd.error) {
printk(KERN_ERR "%s: error %d sending read/write "
"command, response %#x, card status %#x\n",
- rqc->rq_disk->disk_name, brqc->cmd.error,
- brqc->cmd.resp[0], status);
+ rqp->rq_disk->disk_name, brqp->cmd.error,
+ brqp->cmd.resp[0], status);
}
- if (brqc->data.error) {
- if (brqc->data.error == -ETIMEDOUT && brqc->mrq.stop)
+ if (brqp->data.error) {
+ if (brqp->data.error == -ETIMEDOUT && brqp->mrq.stop)
/* 'Stop' response contains card status */
- status = brqc->mrq.stop->resp[0];
+ status = brqp->mrq.stop->resp[0];
printk(KERN_ERR "%s: error %d transferring data,"
" sector %u, nr %u, card status %#x\n",
- rqc->rq_disk->disk_name, brqc->data.error,
- (unsigned)blk_rq_pos(rqc),
- (unsigned)blk_rq_sectors(rqc), status);
+ rqp->rq_disk->disk_name, brqp->data.error,
+ (unsigned)blk_rq_pos(rqp),
+ (unsigned)blk_rq_sectors(rqp), status);
}
- if (brqc->stop.error) {
+ if (brqp->stop.error) {
printk(KERN_ERR "%s: error %d sending stop command, "
"response %#x, card status %#x\n",
- rqc->rq_disk->disk_name, brqc->stop.error,
- brqc->stop.resp[0], status);
+ rqp->rq_disk->disk_name, brqp->stop.error,
+ brqp->stop.resp[0], status);
}
- if (!mmc_host_is_spi(card->host) && rq_data_dir(rqc) != READ) {
+ if (!mmc_host_is_spi(card->host) && rq_data_dir(rqp) != READ) {
do {
int err;
@@ -485,7 +521,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
err = mmc_wait_for_cmd(card->host, &cmd, 5);
if (err) {
printk(KERN_ERR "%s: error %d requesting status\n",
- rqc->rq_disk->disk_name, err);
+ rqp->rq_disk->disk_name, err);
goto cmd_err;
}
/*
@@ -499,22 +535,22 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
#if 0
if (cmd.resp[0] & ~0x00000900)
printk(KERN_ERR "%s: status = %08x\n",
- rqc->rq_disk->disk_name, cmd.resp[0]);
+ rqp->rq_disk->disk_name, cmd.resp[0]);
if (mmc_decode_status(cmd.resp))
goto cmd_err;
#endif
}
- if (brqc->cmd.error || brqc->stop.error || brqc->data.error) {
- if (rq_data_dir(rqc) == READ) {
+ if (brqp->cmd.error || brqp->stop.error || brqp->data.error) {
+ if (rq_data_dir(rqp) == READ) {
/*
* After an error, we redo I/O one sector at a
* time, so we only reach here after trying to
* read a single sector.
*/
spin_lock_irq(&md->lock);
- ret = __blk_end_request(rqc, -EIO,
- brqc->data.blksz);
+ ret = __blk_end_request(rqp, -EIO,
+ brqp->data.blksz);
spin_unlock_irq(&md->lock);
continue;
}
@@ -524,14 +560,72 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
/*
* A block was successfully transferred.
*/
+ /*
+ * TODO: Find out if it safe to only check if
+ * blk_rq_bytes(req) == data.bytes_xfered to make sure
+ * the entire request is completed. If equal, defer
+ * __blk_end_request until after the new request is started.
+ */
+ if (blk_rq_bytes(rqp) != brqp->data.bytes_xfered ||
+ !complete_transfer) {
+ complete_transfer = false;
+ mmc_post_req(card->host, &brqp->mrq);
+ mmc_queue_bounce_post(mqrqp);
+
+ spin_lock_irq(&md->lock);
+ ret = __blk_end_request(rqp, 0,
+ brqp->data.bytes_xfered);
+ spin_unlock_irq(&md->lock);
+ }
+ } while (ret);
+
+ /* Previous request is completed, start the new request if any */
+ if (rqc)
+ mmc_start_req(card->host, &brqc->mrq);
+
+ /* Post process the previous request while the new request is active */
+ if (complete_transfer) {
+ mmc_post_req(card->host, &brqp->mrq);
+ mmc_queue_bounce_post(mqrqp);
+
spin_lock_irq(&md->lock);
- ret = __blk_end_request(rqc, 0, brqc->data.bytes_xfered);
+ ret = __blk_end_request(rqp, 0, brqp->data.bytes_xfered);
spin_unlock_irq(&md->lock);
- } while (ret);
- mmc_release_host(card->host);
+ /*
+ * TODO: Make sure "ret" can never be true and remove the
+ * if-statement and the code inside it.
+ */
+ if (ret) {
+ /* This should never happen */
+ printk(KERN_ERR "[%s] BUG: rq_bytes %d xfered %d\n",
+ __func__, blk_rq_bytes(rqp),
+ brqp->data.bytes_xfered);
+ BUG();
+ }
+ }
+ /* 1 indicates one request has been completed */
+ ret = 1;
+ out:
+ /*
+ * TODO: Find out if it is OK to only release host after the
+ * last request. For the last request the current request
+ * is NULL, which means no requests are pending.
+ */
+ if (!rqc)
+ mmc_release_host(card->host);
+
+ do {
+ /* Current request becomes previous request and vice versa. */
+ struct mmc_queue_req *tmp;
+ mq->mqrq_prev->brq.mrq.data = NULL;
+ mq->mqrq_prev->req = NULL;
+ tmp = mq->mqrq_prev;
+ mq->mqrq_prev = mq->mqrq_cur;
+ mq->mqrq_cur = tmp;
+ } while (0);
- return 1;
+ return ret;
cmd_err:
/*
@@ -548,12 +642,12 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
blocks = mmc_sd_num_wr_blocks(card);
if (blocks != (u32)-1) {
spin_lock_irq(&md->lock);
- ret = __blk_end_request(rqc, 0, blocks << 9);
+ ret = __blk_end_request(rqp, 0, blocks << 9);
spin_unlock_irq(&md->lock);
}
} else {
spin_lock_irq(&md->lock);
- ret = __blk_end_request(rqc, 0, brqc->data.bytes_xfered);
+ ret = __blk_end_request(rqp, 0, brqp->data.bytes_xfered);
spin_unlock_irq(&md->lock);
}
@@ -561,7 +655,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
spin_lock_irq(&md->lock);
while (ret)
- ret = __blk_end_request(rqc, -EIO, blk_rq_cur_bytes(rqc));
+ ret = __blk_end_request(rqp, -EIO, blk_rq_cur_bytes(rqp));
spin_unlock_irq(&md->lock);
return 0;
@@ -569,7 +663,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
{
- if (req->cmd_flags & REQ_DISCARD) {
+ if (req && req->cmd_flags & REQ_DISCARD) {
if (req->cmd_flags & REQ_SECURE)
return mmc_blk_issue_secdiscard_rq(mq, req);
else
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 30d4707..30f8ae9 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -60,6 +60,7 @@ static int mmc_queue_thread(void *d)
mq->mqrq_cur->req = req;
spin_unlock_irq(q->queue_lock);
+ mq->issue_fn(mq, req);
if (!req) {
if (kthread_should_stop()) {
set_current_state(TASK_RUNNING);
@@ -72,7 +73,6 @@ static int mmc_queue_thread(void *d)
}
set_current_state(TASK_RUNNING);
- mq->issue_fn(mq, req);
} while (1);
up(&mq->thread_sem);
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index a3a780f..63b4684 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -195,30 +195,87 @@ mmc_start_request(struct mmc_host *host, struct mmc_request *mrq)
static void mmc_wait_done(struct mmc_request *mrq)
{
- complete(mrq->done_data);
+ complete(&mrq->completion);
}
/**
- * mmc_wait_for_req - start a request and wait for completion
+ * mmc_pre_req - Prepare for a new request
+ * @host: MMC host to prepare command
+ * @mrq: MMC request to prepare for
+ * @host_is_idle: true if the host it not processing a request,
+ * false if a request may be active on the host.
+ *
+ * mmc_pre_req() is called in prior to mmc_start_req() to let
+ * host prepare for the new request. Preparation of a request may be
+ * performed while another request is running on the host.
+ */
+void mmc_pre_req(struct mmc_host *host, struct mmc_request *mrq,
+ bool host_is_idle)
+{
+ if (host->ops->pre_req)
+ host->ops->pre_req(host, mrq, host_is_idle);
+}
+EXPORT_SYMBOL(mmc_pre_req);
+
+/**
+ * mmc_post_req - Post process a completed request
+ * @host: MMC host to post process command
+ * @mrq: MMC request to post process for
+ *
+ * Let the host post process a completed request. Post processing of
+ * a request may be performed while another reuqest is running.
+ */
+void mmc_post_req(struct mmc_host *host, struct mmc_request *mrq)
+{
+ if (host->ops->post_req)
+ host->ops->post_req(host, mrq);
+}
+EXPORT_SYMBOL(mmc_post_req);
+
+/**
+ * mmc_start_req - start a request
* @host: MMC host to start command
* @mrq: MMC request to start
*
- * Start a new MMC custom command request for a host, and wait
- * for the command to complete. Does not attempt to parse the
- * response.
+ * Start a new MMC custom command request for a host.
+ * Does not wait for the command to complete.
*/
-void mmc_wait_for_req(struct mmc_host *host, struct mmc_request *mrq)
+void mmc_start_req(struct mmc_host *host, struct mmc_request *mrq)
{
- DECLARE_COMPLETION_ONSTACK(complete);
-
- mrq->done_data = &complete;
+ init_completion(&mrq->completion);
mrq->done = mmc_wait_done;
mmc_start_request(host, mrq);
+}
+EXPORT_SYMBOL(mmc_start_req);
- wait_for_completion(&complete);
+/**
+ * mmc_wait_for_req_done - wait for completion of request
+ * @mrq: MMC request to wait for
+ *
+ * Wait for the command to complete. Does not attempt to parse the
+ * response.
+ */
+void mmc_wait_for_req_done(struct mmc_request *mrq)
+{
+ wait_for_completion(&mrq->completion);
}
+EXPORT_SYMBOL(mmc_wait_for_req_done);
+/**
+ * mmc_wait_for_req - start a request and wait for completion
+ * @host: MMC host to start command
+ * @mrq: MMC request to start
+ *
+ * Start a new MMC custom command request for a host, and wait
+ * for the command to complete. Does not attempt to parse the
+ * response.
+ */
+void mmc_wait_for_req(struct mmc_host *host, struct mmc_request *mrq)
+{
+ mmc_start_req(host, mrq);
+ mmc_wait_for_req_done(mrq);
+}
EXPORT_SYMBOL(mmc_wait_for_req);
/**
diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h
index 64e013f..da504f7 100644
--- a/include/linux/mmc/core.h
+++ b/include/linux/mmc/core.h
@@ -124,13 +124,18 @@ struct mmc_request {
struct mmc_data *data;
struct mmc_command *stop;
- void *done_data; /* completion data */
+ struct completion completion;
void (*done)(struct mmc_request *);/* completion function */
};
struct mmc_host;
struct mmc_card;
+extern void mmc_pre_req(struct mmc_host *host, struct mmc_request *mrq,
+ bool host_is_idle);
+extern void mmc_post_req(struct mmc_host *host, struct mmc_request *mrq);
+extern void mmc_start_req(struct mmc_host *host, struct mmc_request *mrq);
+extern void mmc_wait_for_req_done(struct mmc_request *mrq);
extern void mmc_wait_for_req(struct mmc_host *, struct mmc_request *);
extern int mmc_wait_for_cmd(struct mmc_host *, struct mmc_command *, int);
extern int mmc_wait_for_app_cmd(struct mmc_host *, struct mmc_card *,
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 30f6fad..b85463b 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -88,6 +88,14 @@ struct mmc_host_ops {
*/
int (*enable)(struct mmc_host *host);
int (*disable)(struct mmc_host *host, int lazy);
+ /*
+ * It is optional for the host to implement pre_req and post_req in
+ * order to support double buffering of requests (prepare one
+ * request while another request is active).
+ */
+ void (*post_req)(struct mmc_host *host, struct mmc_request *req);
+ void (*pre_req)(struct mmc_host *host, struct mmc_request *req,
+ bool host_is_idle);
void (*request)(struct mmc_host *host, struct mmc_request *req);
/*
* Avoid calling these three functions too often or in a "fast path",
--
1.7.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 5/5] mmc: Add double buffering for mmc block requests
@ 2011-01-12 18:14 ` Per Forlin
0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:14 UTC (permalink / raw)
To: linux-arm-kernel
Change mmc_blk_issue_rw_rq() to become asynchronous.
The execution flow looks like this:
The mmc-queue calls issue_rw_rq(), which sends the request
to the host and returns back to the mmc-queue. The mmc-queue calls
isuue_rw_rq() again with a new request. This new request is prepared,
in isuue_rw_rq(), then it waits for the active request to complete before
pushing it to the host. When to mmc-queue is empty it will call
isuue_rw_rq() with req=NULL to finish off the active request
without starting a new request.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
drivers/mmc/card/block.c | 170 +++++++++++++++++++++++++++++++++++----------
drivers/mmc/card/queue.c | 2 +-
drivers/mmc/core/core.c | 77 ++++++++++++++++++---
include/linux/mmc/core.h | 7 ++-
include/linux/mmc/host.h | 8 ++
5 files changed, 214 insertions(+), 50 deletions(-)
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 028b2b8..11e6e97 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -420,62 +420,98 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
struct mmc_blk_data *md = mq->data;
struct mmc_card *card = md->queue.card;
struct mmc_blk_request *brqc = &mq->mqrq_cur->brq;
- int ret = 1, disable_multi = 0;
+ struct mmc_blk_request *brqp = &mq->mqrq_prev->brq;
+ struct mmc_queue_req *mqrqp = mq->mqrq_prev;
+ struct request *rqp = mqrqp->req;
+ int ret = 0;
+ int disable_multi = 0;
+ bool complete_transfer = true;
+
+ if (!rqc && !rqp) {
+ brqc->mrq.data = NULL;
+ brqp->mrq.data = NULL;
+ return 0;
+ }
- mmc_claim_host(card->host);
+ /*
+ * TODO: Find out if it is OK to only claim host for the first request.
+ * For the first request the previous request is NULL
+ */
+ if (!rqp && rqc)
+ mmc_claim_host(card->host);
+
+ if (rqc) {
+ /* Prepare a new request */
+ mmc_blk_issue_rw_rq_prep(brqc, mq->mqrq_cur,
+ rqc, card, 0, mq);
+ mmc_pre_req(card->host, &brqc->mrq, !rqp);
+ }
do {
struct mmc_command cmd;
u32 status = 0;
- mmc_blk_issue_rw_rq_prep(brqc, mq->mqrq_cur, rqc, card,
- disable_multi, mq);
- mmc_wait_for_req(card->host, &brqc->mrq);
-
- mmc_queue_bounce_post(mq->mqrq_cur);
+ /* In case of error redo prepare and resend */
+ if (ret) {
+ mmc_blk_issue_rw_rq_prep(brqp, mqrqp, rqp, card,
+ disable_multi, mq);
+ mmc_pre_req(card->host, &brqc->mrq, !rqp);
+ mmc_start_req(card->host, &brqp->mrq);
+ }
+ /*
+ * If there is an ongoing request, indicated by rqp, wait for
+ * it to finish before starting a new one.
+ */
+ if (rqp) {
+ mmc_wait_for_req_done(&brqp->mrq);
+ } else {
+ /* start a new asynchronous request */
+ mmc_start_req(card->host, &brqc->mrq);
+ goto out;
+ }
/*
* Check for errors here, but don't jump to cmd_err
* until later as we need to wait for the card to leave
* programming mode even when things go wrong.
*/
- if (brqc->cmd.error || brqc->data.error || brqc->stop.error) {
- if (brqc->data.blocks > 1 && rq_data_dir(rqc) == READ) {
+ if (brqp->cmd.error || brqp->data.error || brqp->stop.error) {
+ if (brqp->data.blocks > 1 && rq_data_dir(rqp) == READ) {
/* Redo read one sector at a time */
printk(KERN_WARNING "%s: retrying using single "
- "block read\n", rqc->rq_disk->disk_name);
+ "block read\n", rqp->rq_disk->disk_name);
disable_multi = 1;
continue;
}
- status = get_card_status(card, rqc);
+ status = get_card_status(card, rqp);
}
- if (brqc->cmd.error) {
+ if (brqp->cmd.error) {
printk(KERN_ERR "%s: error %d sending read/write "
"command, response %#x, card status %#x\n",
- rqc->rq_disk->disk_name, brqc->cmd.error,
- brqc->cmd.resp[0], status);
+ rqp->rq_disk->disk_name, brqp->cmd.error,
+ brqp->cmd.resp[0], status);
}
- if (brqc->data.error) {
- if (brqc->data.error == -ETIMEDOUT && brqc->mrq.stop)
+ if (brqp->data.error) {
+ if (brqp->data.error == -ETIMEDOUT && brqp->mrq.stop)
/* 'Stop' response contains card status */
- status = brqc->mrq.stop->resp[0];
+ status = brqp->mrq.stop->resp[0];
printk(KERN_ERR "%s: error %d transferring data,"
" sector %u, nr %u, card status %#x\n",
- rqc->rq_disk->disk_name, brqc->data.error,
- (unsigned)blk_rq_pos(rqc),
- (unsigned)blk_rq_sectors(rqc), status);
+ rqp->rq_disk->disk_name, brqp->data.error,
+ (unsigned)blk_rq_pos(rqp),
+ (unsigned)blk_rq_sectors(rqp), status);
}
- if (brqc->stop.error) {
+ if (brqp->stop.error) {
printk(KERN_ERR "%s: error %d sending stop command, "
"response %#x, card status %#x\n",
- rqc->rq_disk->disk_name, brqc->stop.error,
- brqc->stop.resp[0], status);
+ rqp->rq_disk->disk_name, brqp->stop.error,
+ brqp->stop.resp[0], status);
}
- if (!mmc_host_is_spi(card->host) && rq_data_dir(rqc) != READ) {
+ if (!mmc_host_is_spi(card->host) && rq_data_dir(rqp) != READ) {
do {
int err;
@@ -485,7 +521,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
err = mmc_wait_for_cmd(card->host, &cmd, 5);
if (err) {
printk(KERN_ERR "%s: error %d requesting status\n",
- rqc->rq_disk->disk_name, err);
+ rqp->rq_disk->disk_name, err);
goto cmd_err;
}
/*
@@ -499,22 +535,22 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
#if 0
if (cmd.resp[0] & ~0x00000900)
printk(KERN_ERR "%s: status = %08x\n",
- rqc->rq_disk->disk_name, cmd.resp[0]);
+ rqp->rq_disk->disk_name, cmd.resp[0]);
if (mmc_decode_status(cmd.resp))
goto cmd_err;
#endif
}
- if (brqc->cmd.error || brqc->stop.error || brqc->data.error) {
- if (rq_data_dir(rqc) == READ) {
+ if (brqp->cmd.error || brqp->stop.error || brqp->data.error) {
+ if (rq_data_dir(rqp) == READ) {
/*
* After an error, we redo I/O one sector at a
* time, so we only reach here after trying to
* read a single sector.
*/
spin_lock_irq(&md->lock);
- ret = __blk_end_request(rqc, -EIO,
- brqc->data.blksz);
+ ret = __blk_end_request(rqp, -EIO,
+ brqp->data.blksz);
spin_unlock_irq(&md->lock);
continue;
}
@@ -524,14 +560,72 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
/*
* A block was successfully transferred.
*/
+ /*
+ * TODO: Find out if it safe to only check if
+ * blk_rq_bytes(req) == data.bytes_xfered to make sure
+ * the entire request is completed. If equal, defer
+ * __blk_end_request until after the new request is started.
+ */
+ if (blk_rq_bytes(rqp) != brqp->data.bytes_xfered ||
+ !complete_transfer) {
+ complete_transfer = false;
+ mmc_post_req(card->host, &brqp->mrq);
+ mmc_queue_bounce_post(mqrqp);
+
+ spin_lock_irq(&md->lock);
+ ret = __blk_end_request(rqp, 0,
+ brqp->data.bytes_xfered);
+ spin_unlock_irq(&md->lock);
+ }
+ } while (ret);
+
+ /* Previous request is completed, start the new request if any */
+ if (rqc)
+ mmc_start_req(card->host, &brqc->mrq);
+
+ /* Post process the previous request while the new request is active */
+ if (complete_transfer) {
+ mmc_post_req(card->host, &brqp->mrq);
+ mmc_queue_bounce_post(mqrqp);
+
spin_lock_irq(&md->lock);
- ret = __blk_end_request(rqc, 0, brqc->data.bytes_xfered);
+ ret = __blk_end_request(rqp, 0, brqp->data.bytes_xfered);
spin_unlock_irq(&md->lock);
- } while (ret);
- mmc_release_host(card->host);
+ /*
+ * TODO: Make sure "ret" can never be true and remove the
+ * if-statement and the code inside it.
+ */
+ if (ret) {
+ /* This should never happen */
+ printk(KERN_ERR "[%s] BUG: rq_bytes %d xfered %d\n",
+ __func__, blk_rq_bytes(rqp),
+ brqp->data.bytes_xfered);
+ BUG();
+ }
+ }
+ /* 1 indicates one request has been completed */
+ ret = 1;
+ out:
+ /*
+ * TODO: Find out if it is OK to only release host after the
+ * last request. For the last request the current request
+ * is NULL, which means no requests are pending.
+ */
+ if (!rqc)
+ mmc_release_host(card->host);
+
+ do {
+ /* Current request becomes previous request and vice versa. */
+ struct mmc_queue_req *tmp;
+ mq->mqrq_prev->brq.mrq.data = NULL;
+ mq->mqrq_prev->req = NULL;
+ tmp = mq->mqrq_prev;
+ mq->mqrq_prev = mq->mqrq_cur;
+ mq->mqrq_cur = tmp;
+ } while (0);
- return 1;
+ return ret;
cmd_err:
/*
@@ -548,12 +642,12 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
blocks = mmc_sd_num_wr_blocks(card);
if (blocks != (u32)-1) {
spin_lock_irq(&md->lock);
- ret = __blk_end_request(rqc, 0, blocks << 9);
+ ret = __blk_end_request(rqp, 0, blocks << 9);
spin_unlock_irq(&md->lock);
}
} else {
spin_lock_irq(&md->lock);
- ret = __blk_end_request(rqc, 0, brqc->data.bytes_xfered);
+ ret = __blk_end_request(rqp, 0, brqp->data.bytes_xfered);
spin_unlock_irq(&md->lock);
}
@@ -561,7 +655,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
spin_lock_irq(&md->lock);
while (ret)
- ret = __blk_end_request(rqc, -EIO, blk_rq_cur_bytes(rqc));
+ ret = __blk_end_request(rqp, -EIO, blk_rq_cur_bytes(rqp));
spin_unlock_irq(&md->lock);
return 0;
@@ -569,7 +663,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
{
- if (req->cmd_flags & REQ_DISCARD) {
+ if (req && req->cmd_flags & REQ_DISCARD) {
if (req->cmd_flags & REQ_SECURE)
return mmc_blk_issue_secdiscard_rq(mq, req);
else
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 30d4707..30f8ae9 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -60,6 +60,7 @@ static int mmc_queue_thread(void *d)
mq->mqrq_cur->req = req;
spin_unlock_irq(q->queue_lock);
+ mq->issue_fn(mq, req);
if (!req) {
if (kthread_should_stop()) {
set_current_state(TASK_RUNNING);
@@ -72,7 +73,6 @@ static int mmc_queue_thread(void *d)
}
set_current_state(TASK_RUNNING);
- mq->issue_fn(mq, req);
} while (1);
up(&mq->thread_sem);
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index a3a780f..63b4684 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -195,30 +195,87 @@ mmc_start_request(struct mmc_host *host, struct mmc_request *mrq)
static void mmc_wait_done(struct mmc_request *mrq)
{
- complete(mrq->done_data);
+ complete(&mrq->completion);
}
/**
- * mmc_wait_for_req - start a request and wait for completion
+ * mmc_pre_req - Prepare for a new request
+ * @host: MMC host to prepare command
+ * @mrq: MMC request to prepare for
+ * @host_is_idle: true if the host it not processing a request,
+ * false if a request may be active on the host.
+ *
+ * mmc_pre_req() is called in prior to mmc_start_req() to let
+ * host prepare for the new request. Preparation of a request may be
+ * performed while another request is running on the host.
+ */
+void mmc_pre_req(struct mmc_host *host, struct mmc_request *mrq,
+ bool host_is_idle)
+{
+ if (host->ops->pre_req)
+ host->ops->pre_req(host, mrq, host_is_idle);
+}
+EXPORT_SYMBOL(mmc_pre_req);
+
+/**
+ * mmc_post_req - Post process a completed request
+ * @host: MMC host to post process command
+ * @mrq: MMC request to post process for
+ *
+ * Let the host post process a completed request. Post processing of
+ * a request may be performed while another reuqest is running.
+ */
+void mmc_post_req(struct mmc_host *host, struct mmc_request *mrq)
+{
+ if (host->ops->post_req)
+ host->ops->post_req(host, mrq);
+}
+EXPORT_SYMBOL(mmc_post_req);
+
+/**
+ * mmc_start_req - start a request
* @host: MMC host to start command
* @mrq: MMC request to start
*
- * Start a new MMC custom command request for a host, and wait
- * for the command to complete. Does not attempt to parse the
- * response.
+ * Start a new MMC custom command request for a host.
+ * Does not wait for the command to complete.
*/
-void mmc_wait_for_req(struct mmc_host *host, struct mmc_request *mrq)
+void mmc_start_req(struct mmc_host *host, struct mmc_request *mrq)
{
- DECLARE_COMPLETION_ONSTACK(complete);
-
- mrq->done_data = &complete;
+ init_completion(&mrq->completion);
mrq->done = mmc_wait_done;
mmc_start_request(host, mrq);
+}
+EXPORT_SYMBOL(mmc_start_req);
- wait_for_completion(&complete);
+/**
+ * mmc_wait_for_req_done - wait for completion of request
+ * @mrq: MMC request to wait for
+ *
+ * Wait for the command to complete. Does not attempt to parse the
+ * response.
+ */
+void mmc_wait_for_req_done(struct mmc_request *mrq)
+{
+ wait_for_completion(&mrq->completion);
}
+EXPORT_SYMBOL(mmc_wait_for_req_done);
+/**
+ * mmc_wait_for_req - start a request and wait for completion
+ * @host: MMC host to start command
+ * @mrq: MMC request to start
+ *
+ * Start a new MMC custom command request for a host, and wait
+ * for the command to complete. Does not attempt to parse the
+ * response.
+ */
+void mmc_wait_for_req(struct mmc_host *host, struct mmc_request *mrq)
+{
+ mmc_start_req(host, mrq);
+ mmc_wait_for_req_done(mrq);
+}
EXPORT_SYMBOL(mmc_wait_for_req);
/**
diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h
index 64e013f..da504f7 100644
--- a/include/linux/mmc/core.h
+++ b/include/linux/mmc/core.h
@@ -124,13 +124,18 @@ struct mmc_request {
struct mmc_data *data;
struct mmc_command *stop;
- void *done_data; /* completion data */
+ struct completion completion;
void (*done)(struct mmc_request *);/* completion function */
};
struct mmc_host;
struct mmc_card;
+extern void mmc_pre_req(struct mmc_host *host, struct mmc_request *mrq,
+ bool host_is_idle);
+extern void mmc_post_req(struct mmc_host *host, struct mmc_request *mrq);
+extern void mmc_start_req(struct mmc_host *host, struct mmc_request *mrq);
+extern void mmc_wait_for_req_done(struct mmc_request *mrq);
extern void mmc_wait_for_req(struct mmc_host *, struct mmc_request *);
extern int mmc_wait_for_cmd(struct mmc_host *, struct mmc_command *, int);
extern int mmc_wait_for_app_cmd(struct mmc_host *, struct mmc_card *,
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 30f6fad..b85463b 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -88,6 +88,14 @@ struct mmc_host_ops {
*/
int (*enable)(struct mmc_host *host);
int (*disable)(struct mmc_host *host, int lazy);
+ /*
+ * It is optional for the host to implement pre_req and post_req in
+ * order to support double buffering of requests (prepare one
+ * request while another request is active).
+ */
+ void (*post_req)(struct mmc_host *host, struct mmc_request *req);
+ void (*pre_req)(struct mmc_host *host, struct mmc_request *req,
+ bool host_is_idle);
void (*request)(struct mmc_host *host, struct mmc_request *req);
/*
* Avoid calling these three functions too often or in a "fast path",
--
1.7.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] mmc: add double buffering for mmc block requests
2011-01-12 18:13 ` Per Forlin
@ 2011-01-12 18:24 ` Per Forlin
-1 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:24 UTC (permalink / raw)
To: linux-mmc, linux-arm-kernel, linux-kernel, linaro-dev
Cc: Chris Ball, Per Forlin
I mistyped the linaro email in this patch series.
Sorry for the mess
/Per
On 12 January 2011 19:13, Per Forlin <per.forlin@linaro.org> wrote:
> Add support to prepare one MMC request while another is active on
> the host. This is done by making the issue_rw_rq() asynchronous.
> The increase in throughput is proportional to the time it takes to
> prepare a request and how fast the memory is. The faster the MMC/SD is
> the more significant the prepare request time becomes. Measurements on U5500
> and U8500 on eMMC shows significant performance gain for DMA on MMC for large
> reads. In the PIO case there is some gain in performance for large reads too.
> There seems to be no or small performance gain for write, don't have a good
> explanation for this yet.
>
> There are two optional hooks pre_req() and post_req() that the host driver
> may implement in order to improve double buffering. In the DMA case pre_req()
> may do dma_map_sg() and prepare the dma descriptor and post_req runs the
> dma_unmap_sg.
>
> The mmci host driver implementation for double buffering is not intended
> nor ready for mainline yet. It is only an example of how to implement
> pre_req() and post_req(). The reason for this is that the basic DMA support
> for MMCI is not complete yet. The mmci patches are sent in a separate patch
> series "[FYI 0/4] arm: mmci: example implementation of double buffering".
>
> Issues/Questions for issue_rw_rq() in block.c:
> * Is it safe to claim the host for the first MMC request and wait to release
> it until the MMC queue is empty again? Or must the host be claimed and
> released for every request?
> * Is it possible to predict the result from __blk_end_request().
> If there are no errors for a completed MMC request and the
> blk_rq_bytes(req) == data.bytes_xfered, will it be guaranteed that
> __blk_end_request will return 0?
>
> Here follows the IOZone results for u8500 v1.1 on eMMC.
> The numbers for DMA are a bit to good here due to the fact that the
> CPU speed is decreased compared to u8500 v2. This makes the cache handling
> even more significant.
>
> Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
>
> Relative diff: VANILLA-MMC-PIO -> 2BUF-MMC-PIO
> cpu load is abs diff
> random random
> KB reclen write rewrite read reread read write
> 51200 4 +0% +0% +0% +0% +0% +0%
> cpu: +0.1 -0.1 -0.5 -0.3 -0.1 -0.0
>
> 51200 8 +0% +0% +6% +6% +8% +0%
> cpu: +0.1 -0.1 -0.3 -0.4 -0.8 +0.0
>
> 51200 16 +0% -2% +0% +0% -3% +0%
> cpu: +0.0 -0.2 +0.0 +0.0 -0.2 +0.0
>
> 51200 32 +0% +1% +0% +0% +0% +0%
> cpu: +0.1 +0.0 -0.3 +0.0 +0.0 +0.0
>
> 51200 64 +0% +0% +0% +0% +0% +0%
> cpu: +0.1 +0.0 +0.0 +0.0 +0.0 +0.0
>
> 51200 128 +0% +1% +1% +1% +1% +0%
> cpu: +0.0 +0.2 +0.1 -0.3 +0.4 +0.0
>
> 51200 256 +0% +0% +1% +1% +1% +0%
> cpu: +0.0 -0.0 +0.1 +0.1 +0.1 +0.0
>
> 51200 512 +0% +1% +2% +2% +2% +0%
> cpu: +0.1 +0.0 +0.2 +0.2 +0.2 +0.1
>
> 51200 1024 +0% +2% +2% +2% +3% +0%
> cpu: +0.2 +0.1 +0.2 +0.5 -0.8 +0.0
>
> 51200 2048 +0% +2% +3% +3% +3% +0%
> cpu: +0.0 -0.2 +0.4 +0.8 -0.5 +0.2
>
> 51200 4096 +0% +1% +3% +3% +3% +1%
> cpu: +0.2 +0.1 +0.9 +0.9 +0.5 +0.1
>
> 51200 8192 +1% +0% +3% +3% +3% +1%
> cpu: +0.2 +0.2 +1.3 +1.3 +1.0 +0.0
>
> 51200 16384 +0% +1% +3% +3% +3% +1%
> cpu: +0.2 +0.1 +1.0 +1.3 +1.0 +0.5
>
> Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
> cpu load is abs diff
> random random
> KB reclen write rewrite read reread read write
> 51200 4 +0% -3% +6% +5% +5% +0%
> cpu: +0.0 -0.2 -0.6 -0.1 +0.3 +0.0
>
> 51200 8 +0% +0% +7% +7% +7% +0%
> cpu: +0.0 +0.1 +0.8 +0.6 +0.9 +0.0
>
> 51200 16 +0% +0% +7% +7% +8% +0%
> cpu: +0.0 -0.0 +0.7 +0.7 +0.8 +0.0
>
> 51200 32 +0% +0% +8% +8% +9% +0%
> cpu: +0.0 +0.1 +0.7 +0.7 +0.3 +0.0
>
> 51200 64 +0% +1% +9% +9% +9% +0%
> cpu: +0.0 +0.0 +0.8 +0.7 +0.8 +0.0
>
> 51200 128 +1% +0% +13% +13% +14% +0%
> cpu: +0.2 +0.0 +1.0 +1.0 +1.1 +0.0
>
> 51200 256 +1% +2% +8% +8% +11% +0%
> cpu: +0.0 +0.3 +0.0 +0.7 +1.5 +0.0
>
> 51200 512 +1% +2% +16% +16% +17% +0%
> cpu: +0.2 +0.2 +2.2 +2.1 +2.2 +0.1
>
> 51200 1024 +1% +2% +20% +20% +20% +1%
> cpu: +0.2 +0.1 +2.6 +1.9 +2.6 +0.0
>
> 51200 2048 +0% +2% +22% +22% +21% +0%
> cpu: +0.0 +0.3 +2.3 +2.9 +2.1 -0.0
>
> 51200 4096 +1% +2% +23% +23% +23% +1%
> cpu: +0.2 +0.1 +2.0 +3.2 +3.1 +0.0
>
> 51200 8192 +1% +5% +24% +24% +24% +1%
> cpu: +1.4 -0.0 +4.2 +3.0 +2.8 +0.1
>
> 51200 16384 +1% +3% +24% +24% +24% +2%
> cpu: +0.0 +0.3 +3.4 +3.8 +3.7 +0.1
>
> Here follows the IOZone results for u5500 on eMMC.
> These numbers for DMA are more as expected.
>
> Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
>
> Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
> cpu load is abs diff
> random random
> KB reclen write rewrite read reread read write
> 51200 128 +1% +1% +10% +9% +10% +0%
> cpu: +0.1 +0.0 +1.3 +0.1 +0.8 +0.1
>
> 51200 256 +2% +2% +7% +7% +9% +0%
> cpu: +0.1 +0.4 +0.5 +0.6 +0.7 +0.0
>
> 51200 512 +2% +2% +12% +12% +12% +1%
> cpu: +0.4 +0.6 +1.8 +2.4 +2.4 +0.2
>
> 51200 1024 +2% +3% +14% +14% +14% +0%
> cpu: +0.3 +0.1 +2.1 +1.4 +1.4 +0.2
>
> 51200 2048 +3% +3% +16% +16% +16% +1%
> cpu: +0.2 +0.2 +2.5 +1.8 +2.4 -0.2
>
> 51200 4096 +3% +3% +17% +17% +18% +3%
> cpu: +0.1 -0.1 +2.7 +2.0 +2.7 -0.1
>
> 51200 8192 +3% +3% +18% +18% +18% +3%
> cpu: -0.1 +0.2 +3.0 +2.3 +2.2 +0.2
>
> 51200 16384 +3% +3% +18% +18% +18% +4%
> cpu: +0.2 +0.2 +2.8 +3.5 +2.4 -0.0
>
> Per Forlin (5):
> mmc: add member in mmc queue struct to hold request data
> mmc: Add a block request prepare function
> mmc: Add a second mmc queue request member
> mmc: Store the mmc block request struct in mmc queue
> mmc: Add double buffering for mmc block requests
>
> drivers/mmc/card/block.c | 337 ++++++++++++++++++++++++++++++----------------
> drivers/mmc/card/queue.c | 171 +++++++++++++++---------
> drivers/mmc/card/queue.h | 31 +++-
> drivers/mmc/core/core.c | 77 +++++++++--
> include/linux/mmc/core.h | 7 +-
> include/linux/mmc/host.h | 8 +
> 6 files changed, 432 insertions(+), 199 deletions(-)
>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 0/5] mmc: add double buffering for mmc block requests
@ 2011-01-12 18:24 ` Per Forlin
0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-12 18:24 UTC (permalink / raw)
To: linux-arm-kernel
I mistyped the linaro email in this patch series.
Sorry for the mess
/Per
On 12 January 2011 19:13, Per Forlin <per.forlin@linaro.org> wrote:
> Add support to prepare one MMC request while another is active on
> the host. This is done by making the issue_rw_rq() asynchronous.
> The increase in throughput is proportional to the time it takes to
> prepare a request and how fast the memory is. The faster the MMC/SD is
> the more significant the prepare request time becomes. Measurements on U5500
> and U8500 on eMMC shows significant performance gain for DMA on MMC for large
> reads. In the PIO case there is some gain in performance for large reads too.
> There seems to be no or small performance gain for write, don't have a good
> explanation for this yet.
>
> There are two optional hooks pre_req() and post_req() that the host driver
> may implement in order to improve double buffering. In the DMA case pre_req()
> may do dma_map_sg() and prepare the dma descriptor and post_req runs the
> dma_unmap_sg.
>
> The mmci host driver implementation for double buffering is not intended
> nor ready for mainline yet. It is only an example of how to implement
> pre_req() and post_req(). The reason for this is that the basic DMA support
> for MMCI is not complete yet. The mmci patches are sent in a separate patch
> series "[FYI 0/4] arm: mmci: example implementation of double buffering".
>
> Issues/Questions for issue_rw_rq() in block.c:
> * Is it safe to claim the host for the first MMC request and wait to release
> ?it until the MMC queue is empty again? Or must the host be claimed and
> ?released for every request?
> * Is it possible to predict the result from __blk_end_request().
> ?If there are no errors for a completed MMC request and the
> ?blk_rq_bytes(req) == data.bytes_xfered, will it be guaranteed that
> ?__blk_end_request will return 0?
>
> Here follows the IOZone results for u8500 v1.1 on eMMC.
> The numbers for DMA are a bit to good here due to the fact that the
> CPU speed is decreased compared to u8500 v2. This makes the cache handling
> even more significant.
>
> Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
>
> Relative diff: VANILLA-MMC-PIO -> 2BUF-MMC-PIO
> cpu load is abs diff
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?random ?random
> ? ? ? ?KB ? ? ?reclen ?write ? rewrite read ? ?reread ?read ? ?write
> ? ? ? ?51200 ? 4 ? ? ? +0% ? ? +0% ? ? +0% ? ? +0% ? ? +0% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.1 ? ?-0.1 ? ?-0.5 ? ?-0.3 ? ?-0.1 ? ?-0.0
>
> ? ? ? ?51200 ? 8 ? ? ? +0% ? ? +0% ? ? +6% ? ? +6% ? ? +8% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.1 ? ?-0.1 ? ?-0.3 ? ?-0.4 ? ?-0.8 ? ?+0.0
>
> ? ? ? ?51200 ? 16 ? ? ?+0% ? ? -2% ? ? +0% ? ? +0% ? ? -3% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.0 ? ?-0.2 ? ?+0.0 ? ?+0.0 ? ?-0.2 ? ?+0.0
>
> ? ? ? ?51200 ? 32 ? ? ?+0% ? ? +1% ? ? +0% ? ? +0% ? ? +0% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.1 ? ?+0.0 ? ?-0.3 ? ?+0.0 ? ?+0.0 ? ?+0.0
>
> ? ? ? ?51200 ? 64 ? ? ?+0% ? ? +0% ? ? +0% ? ? +0% ? ? +0% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.1 ? ?+0.0 ? ?+0.0 ? ?+0.0 ? ?+0.0 ? ?+0.0
>
> ? ? ? ?51200 ? 128 ? ? +0% ? ? +1% ? ? +1% ? ? +1% ? ? +1% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.0 ? ?+0.2 ? ?+0.1 ? ?-0.3 ? ?+0.4 ? ?+0.0
>
> ? ? ? ?51200 ? 256 ? ? +0% ? ? +0% ? ? +1% ? ? +1% ? ? +1% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.0 ? ?-0.0 ? ?+0.1 ? ?+0.1 ? ?+0.1 ? ?+0.0
>
> ? ? ? ?51200 ? 512 ? ? +0% ? ? +1% ? ? +2% ? ? +2% ? ? +2% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.1 ? ?+0.0 ? ?+0.2 ? ?+0.2 ? ?+0.2 ? ?+0.1
>
> ? ? ? ?51200 ? 1024 ? ?+0% ? ? +2% ? ? +2% ? ? +2% ? ? +3% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.2 ? ?+0.1 ? ?+0.2 ? ?+0.5 ? ?-0.8 ? ?+0.0
>
> ? ? ? ?51200 ? 2048 ? ?+0% ? ? +2% ? ? +3% ? ? +3% ? ? +3% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.0 ? ?-0.2 ? ?+0.4 ? ?+0.8 ? ?-0.5 ? ?+0.2
>
> ? ? ? ?51200 ? 4096 ? ?+0% ? ? +1% ? ? +3% ? ? +3% ? ? +3% ? ? +1%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.2 ? ?+0.1 ? ?+0.9 ? ?+0.9 ? ?+0.5 ? ?+0.1
>
> ? ? ? ?51200 ? 8192 ? ?+1% ? ? +0% ? ? +3% ? ? +3% ? ? +3% ? ? +1%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.2 ? ?+0.2 ? ?+1.3 ? ?+1.3 ? ?+1.0 ? ?+0.0
>
> ? ? ? ?51200 ? 16384 ? +0% ? ? +1% ? ? +3% ? ? +3% ? ? +3% ? ? +1%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.2 ? ?+0.1 ? ?+1.0 ? ?+1.3 ? ?+1.0 ? ?+0.5
>
> Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
> cpu load is abs diff
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?random ?random
> ? ? ? ?KB ? ? ?reclen ?write ? rewrite read ? ?reread ?read ? ?write
> ? ? ? ?51200 ? 4 ? ? ? +0% ? ? -3% ? ? +6% ? ? +5% ? ? +5% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.0 ? ?-0.2 ? ?-0.6 ? ?-0.1 ? ?+0.3 ? ?+0.0
>
> ? ? ? ?51200 ? 8 ? ? ? +0% ? ? +0% ? ? +7% ? ? +7% ? ? +7% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.0 ? ?+0.1 ? ?+0.8 ? ?+0.6 ? ?+0.9 ? ?+0.0
>
> ? ? ? ?51200 ? 16 ? ? ?+0% ? ? +0% ? ? +7% ? ? +7% ? ? +8% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.0 ? ?-0.0 ? ?+0.7 ? ?+0.7 ? ?+0.8 ? ?+0.0
>
> ? ? ? ?51200 ? 32 ? ? ?+0% ? ? +0% ? ? +8% ? ? +8% ? ? +9% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.0 ? ?+0.1 ? ?+0.7 ? ?+0.7 ? ?+0.3 ? ?+0.0
>
> ? ? ? ?51200 ? 64 ? ? ?+0% ? ? +1% ? ? +9% ? ? +9% ? ? +9% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.0 ? ?+0.0 ? ?+0.8 ? ?+0.7 ? ?+0.8 ? ?+0.0
>
> ? ? ? ?51200 ? 128 ? ? +1% ? ? +0% ? ? +13% ? ?+13% ? ?+14% ? ?+0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.2 ? ?+0.0 ? ?+1.0 ? ?+1.0 ? ?+1.1 ? ?+0.0
>
> ? ? ? ?51200 ? 256 ? ? +1% ? ? +2% ? ? +8% ? ? +8% ? ? +11% ? ?+0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.0 ? ?+0.3 ? ?+0.0 ? ?+0.7 ? ?+1.5 ? ?+0.0
>
> ? ? ? ?51200 ? 512 ? ? +1% ? ? +2% ? ? +16% ? ?+16% ? ?+17% ? ?+0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.2 ? ?+0.2 ? ?+2.2 ? ?+2.1 ? ?+2.2 ? ?+0.1
>
> ? ? ? ?51200 ? 1024 ? ?+1% ? ? +2% ? ? +20% ? ?+20% ? ?+20% ? ?+1%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.2 ? ?+0.1 ? ?+2.6 ? ?+1.9 ? ?+2.6 ? ?+0.0
>
> ? ? ? ?51200 ? 2048 ? ?+0% ? ? +2% ? ? +22% ? ?+22% ? ?+21% ? ?+0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.0 ? ?+0.3 ? ?+2.3 ? ?+2.9 ? ?+2.1 ? ?-0.0
>
> ? ? ? ?51200 ? 4096 ? ?+1% ? ? +2% ? ? +23% ? ?+23% ? ?+23% ? ?+1%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.2 ? ?+0.1 ? ?+2.0 ? ?+3.2 ? ?+3.1 ? ?+0.0
>
> ? ? ? ?51200 ? 8192 ? ?+1% ? ? +5% ? ? +24% ? ?+24% ? ?+24% ? ?+1%
> ? ? ? ?cpu: ? ? ? ? ? ?+1.4 ? ?-0.0 ? ?+4.2 ? ?+3.0 ? ?+2.8 ? ?+0.1
>
> ? ? ? ?51200 ? 16384 ? +1% ? ? +3% ? ? +24% ? ?+24% ? ?+24% ? ?+2%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.0 ? ?+0.3 ? ?+3.4 ? ?+3.8 ? ?+3.7 ? ?+0.1
>
> Here follows the IOZone results for u5500 on eMMC.
> These numbers for DMA are more as expected.
>
> Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
>
> Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
> cpu load is abs diff
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?random ?random
> ? ? ? ?KB ? ? ?reclen ?write ? rewrite read ? ?reread ?read ? ?write
> ? ? ? ?51200 ? 128 ? ? +1% ? ? +1% ? ? +10% ? ?+9% ? ? +10% ? ?+0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.1 ? ?+0.0 ? ?+1.3 ? ?+0.1 ? ?+0.8 ? ?+0.1
>
> ? ? ? ?51200 ? 256 ? ? +2% ? ? +2% ? ? +7% ? ? +7% ? ? +9% ? ? +0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.1 ? ?+0.4 ? ?+0.5 ? ?+0.6 ? ?+0.7 ? ?+0.0
>
> ? ? ? ?51200 ? 512 ? ? +2% ? ? +2% ? ? +12% ? ?+12% ? ?+12% ? ?+1%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.4 ? ?+0.6 ? ?+1.8 ? ?+2.4 ? ?+2.4 ? ?+0.2
>
> ? ? ? ?51200 ? 1024 ? ?+2% ? ? +3% ? ? +14% ? ?+14% ? ?+14% ? ?+0%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.3 ? ?+0.1 ? ?+2.1 ? ?+1.4 ? ?+1.4 ? ?+0.2
>
> ? ? ? ?51200 ? 2048 ? ?+3% ? ? +3% ? ? +16% ? ?+16% ? ?+16% ? ?+1%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.2 ? ?+0.2 ? ?+2.5 ? ?+1.8 ? ?+2.4 ? ?-0.2
>
> ? ? ? ?51200 ? 4096 ? ?+3% ? ? +3% ? ? +17% ? ?+17% ? ?+18% ? ?+3%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.1 ? ?-0.1 ? ?+2.7 ? ?+2.0 ? ?+2.7 ? ?-0.1
>
> ? ? ? ?51200 ? 8192 ? ?+3% ? ? +3% ? ? +18% ? ?+18% ? ?+18% ? ?+3%
> ? ? ? ?cpu: ? ? ? ? ? ?-0.1 ? ?+0.2 ? ?+3.0 ? ?+2.3 ? ?+2.2 ? ?+0.2
>
> ? ? ? ?51200 ? 16384 ? +3% ? ? +3% ? ? +18% ? ?+18% ? ?+18% ? ?+4%
> ? ? ? ?cpu: ? ? ? ? ? ?+0.2 ? ?+0.2 ? ?+2.8 ? ?+3.5 ? ?+2.4 ? ?-0.0
>
> Per Forlin (5):
> ?mmc: add member in mmc queue struct to hold request data
> ?mmc: Add a block request prepare function
> ?mmc: Add a second mmc queue request member
> ?mmc: Store the mmc block request struct in mmc queue
> ?mmc: Add double buffering for mmc block requests
>
> ?drivers/mmc/card/block.c | ?337 ++++++++++++++++++++++++++++++----------------
> ?drivers/mmc/card/queue.c | ?171 +++++++++++++++---------
> ?drivers/mmc/card/queue.h | ? 31 +++-
> ?drivers/mmc/core/core.c ?| ? 77 +++++++++--
> ?include/linux/mmc/core.h | ? ?7 +-
> ?include/linux/mmc/host.h | ? ?8 +
> ?6 files changed, 432 insertions(+), 199 deletions(-)
>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] mmc: add double buffering for mmc block requests
2011-01-12 18:13 ` Per Forlin
@ 2011-01-18 2:35 ` Jaehoon Chung
-1 siblings, 0 replies; 27+ messages in thread
From: Jaehoon Chung @ 2011-01-18 2:35 UTC (permalink / raw)
To: Per Forlin
Cc: linux-mmc, linux-arm-kernel, linux-kernel, dev, Chris Ball,
Kyungmin Park
Hi Per..
it is interesting approach..so
we want to test your double buffering in our environment(Samsung SoC).
Did you test with SDHCI?
If you tested with SDHCI, i want to know how much increase the performance.
Thanks,
Jaehoon Chung
Per Forlin wrote:
> Add support to prepare one MMC request while another is active on
> the host. This is done by making the issue_rw_rq() asynchronous.
> The increase in throughput is proportional to the time it takes to
> prepare a request and how fast the memory is. The faster the MMC/SD is
> the more significant the prepare request time becomes. Measurements on U5500
> and U8500 on eMMC shows significant performance gain for DMA on MMC for large
> reads. In the PIO case there is some gain in performance for large reads too.
> There seems to be no or small performance gain for write, don't have a good
> explanation for this yet.
>
> There are two optional hooks pre_req() and post_req() that the host driver
> may implement in order to improve double buffering. In the DMA case pre_req()
> may do dma_map_sg() and prepare the dma descriptor and post_req runs the
> dma_unmap_sg.
>
> The mmci host driver implementation for double buffering is not intended
> nor ready for mainline yet. It is only an example of how to implement
> pre_req() and post_req(). The reason for this is that the basic DMA support
> for MMCI is not complete yet. The mmci patches are sent in a separate patch
> series "[FYI 0/4] arm: mmci: example implementation of double buffering".
>
> Issues/Questions for issue_rw_rq() in block.c:
> * Is it safe to claim the host for the first MMC request and wait to release
> it until the MMC queue is empty again? Or must the host be claimed and
> released for every request?
> * Is it possible to predict the result from __blk_end_request().
> If there are no errors for a completed MMC request and the
> blk_rq_bytes(req) == data.bytes_xfered, will it be guaranteed that
> __blk_end_request will return 0?
>
> Here follows the IOZone results for u8500 v1.1 on eMMC.
> The numbers for DMA are a bit to good here due to the fact that the
> CPU speed is decreased compared to u8500 v2. This makes the cache handling
> even more significant.
>
> Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
>
> Relative diff: VANILLA-MMC-PIO -> 2BUF-MMC-PIO
> cpu load is abs diff
> random random
> KB reclen write rewrite read reread read write
> 51200 4 +0% +0% +0% +0% +0% +0%
> cpu: +0.1 -0.1 -0.5 -0.3 -0.1 -0.0
>
> 51200 8 +0% +0% +6% +6% +8% +0%
> cpu: +0.1 -0.1 -0.3 -0.4 -0.8 +0.0
>
> 51200 16 +0% -2% +0% +0% -3% +0%
> cpu: +0.0 -0.2 +0.0 +0.0 -0.2 +0.0
>
> 51200 32 +0% +1% +0% +0% +0% +0%
> cpu: +0.1 +0.0 -0.3 +0.0 +0.0 +0.0
>
> 51200 64 +0% +0% +0% +0% +0% +0%
> cpu: +0.1 +0.0 +0.0 +0.0 +0.0 +0.0
>
> 51200 128 +0% +1% +1% +1% +1% +0%
> cpu: +0.0 +0.2 +0.1 -0.3 +0.4 +0.0
>
> 51200 256 +0% +0% +1% +1% +1% +0%
> cpu: +0.0 -0.0 +0.1 +0.1 +0.1 +0.0
>
> 51200 512 +0% +1% +2% +2% +2% +0%
> cpu: +0.1 +0.0 +0.2 +0.2 +0.2 +0.1
>
> 51200 1024 +0% +2% +2% +2% +3% +0%
> cpu: +0.2 +0.1 +0.2 +0.5 -0.8 +0.0
>
> 51200 2048 +0% +2% +3% +3% +3% +0%
> cpu: +0.0 -0.2 +0.4 +0.8 -0.5 +0.2
>
> 51200 4096 +0% +1% +3% +3% +3% +1%
> cpu: +0.2 +0.1 +0.9 +0.9 +0.5 +0.1
>
> 51200 8192 +1% +0% +3% +3% +3% +1%
> cpu: +0.2 +0.2 +1.3 +1.3 +1.0 +0.0
>
> 51200 16384 +0% +1% +3% +3% +3% +1%
> cpu: +0.2 +0.1 +1.0 +1.3 +1.0 +0.5
>
> Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
> cpu load is abs diff
> random random
> KB reclen write rewrite read reread read write
> 51200 4 +0% -3% +6% +5% +5% +0%
> cpu: +0.0 -0.2 -0.6 -0.1 +0.3 +0.0
>
> 51200 8 +0% +0% +7% +7% +7% +0%
> cpu: +0.0 +0.1 +0.8 +0.6 +0.9 +0.0
>
> 51200 16 +0% +0% +7% +7% +8% +0%
> cpu: +0.0 -0.0 +0.7 +0.7 +0.8 +0.0
>
> 51200 32 +0% +0% +8% +8% +9% +0%
> cpu: +0.0 +0.1 +0.7 +0.7 +0.3 +0.0
>
> 51200 64 +0% +1% +9% +9% +9% +0%
> cpu: +0.0 +0.0 +0.8 +0.7 +0.8 +0.0
>
> 51200 128 +1% +0% +13% +13% +14% +0%
> cpu: +0.2 +0.0 +1.0 +1.0 +1.1 +0.0
>
> 51200 256 +1% +2% +8% +8% +11% +0%
> cpu: +0.0 +0.3 +0.0 +0.7 +1.5 +0.0
>
> 51200 512 +1% +2% +16% +16% +17% +0%
> cpu: +0.2 +0.2 +2.2 +2.1 +2.2 +0.1
>
> 51200 1024 +1% +2% +20% +20% +20% +1%
> cpu: +0.2 +0.1 +2.6 +1.9 +2.6 +0.0
>
> 51200 2048 +0% +2% +22% +22% +21% +0%
> cpu: +0.0 +0.3 +2.3 +2.9 +2.1 -0.0
>
> 51200 4096 +1% +2% +23% +23% +23% +1%
> cpu: +0.2 +0.1 +2.0 +3.2 +3.1 +0.0
>
> 51200 8192 +1% +5% +24% +24% +24% +1%
> cpu: +1.4 -0.0 +4.2 +3.0 +2.8 +0.1
>
> 51200 16384 +1% +3% +24% +24% +24% +2%
> cpu: +0.0 +0.3 +3.4 +3.8 +3.7 +0.1
>
> Here follows the IOZone results for u5500 on eMMC.
> These numbers for DMA are more as expected.
>
> Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
>
> Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
> cpu load is abs diff
> random random
> KB reclen write rewrite read reread read write
> 51200 128 +1% +1% +10% +9% +10% +0%
> cpu: +0.1 +0.0 +1.3 +0.1 +0.8 +0.1
>
> 51200 256 +2% +2% +7% +7% +9% +0%
> cpu: +0.1 +0.4 +0.5 +0.6 +0.7 +0.0
>
> 51200 512 +2% +2% +12% +12% +12% +1%
> cpu: +0.4 +0.6 +1.8 +2.4 +2.4 +0.2
>
> 51200 1024 +2% +3% +14% +14% +14% +0%
> cpu: +0.3 +0.1 +2.1 +1.4 +1.4 +0.2
>
> 51200 2048 +3% +3% +16% +16% +16% +1%
> cpu: +0.2 +0.2 +2.5 +1.8 +2.4 -0.2
>
> 51200 4096 +3% +3% +17% +17% +18% +3%
> cpu: +0.1 -0.1 +2.7 +2.0 +2.7 -0.1
>
> 51200 8192 +3% +3% +18% +18% +18% +3%
> cpu: -0.1 +0.2 +3.0 +2.3 +2.2 +0.2
>
> 51200 16384 +3% +3% +18% +18% +18% +4%
> cpu: +0.2 +0.2 +2.8 +3.5 +2.4 -0.0
>
> Per Forlin (5):
> mmc: add member in mmc queue struct to hold request data
> mmc: Add a block request prepare function
> mmc: Add a second mmc queue request member
> mmc: Store the mmc block request struct in mmc queue
> mmc: Add double buffering for mmc block requests
>
> drivers/mmc/card/block.c | 337 ++++++++++++++++++++++++++++++----------------
> drivers/mmc/card/queue.c | 171 +++++++++++++++---------
> drivers/mmc/card/queue.h | 31 +++-
> drivers/mmc/core/core.c | 77 +++++++++--
> include/linux/mmc/core.h | 7 +-
> include/linux/mmc/host.h | 8 +
> 6 files changed, 432 insertions(+), 199 deletions(-)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 0/5] mmc: add double buffering for mmc block requests
@ 2011-01-18 2:35 ` Jaehoon Chung
0 siblings, 0 replies; 27+ messages in thread
From: Jaehoon Chung @ 2011-01-18 2:35 UTC (permalink / raw)
To: linux-arm-kernel
Hi Per..
it is interesting approach..so
we want to test your double buffering in our environment(Samsung SoC).
Did you test with SDHCI?
If you tested with SDHCI, i want to know how much increase the performance.
Thanks,
Jaehoon Chung
Per Forlin wrote:
> Add support to prepare one MMC request while another is active on
> the host. This is done by making the issue_rw_rq() asynchronous.
> The increase in throughput is proportional to the time it takes to
> prepare a request and how fast the memory is. The faster the MMC/SD is
> the more significant the prepare request time becomes. Measurements on U5500
> and U8500 on eMMC shows significant performance gain for DMA on MMC for large
> reads. In the PIO case there is some gain in performance for large reads too.
> There seems to be no or small performance gain for write, don't have a good
> explanation for this yet.
>
> There are two optional hooks pre_req() and post_req() that the host driver
> may implement in order to improve double buffering. In the DMA case pre_req()
> may do dma_map_sg() and prepare the dma descriptor and post_req runs the
> dma_unmap_sg.
>
> The mmci host driver implementation for double buffering is not intended
> nor ready for mainline yet. It is only an example of how to implement
> pre_req() and post_req(). The reason for this is that the basic DMA support
> for MMCI is not complete yet. The mmci patches are sent in a separate patch
> series "[FYI 0/4] arm: mmci: example implementation of double buffering".
>
> Issues/Questions for issue_rw_rq() in block.c:
> * Is it safe to claim the host for the first MMC request and wait to release
> it until the MMC queue is empty again? Or must the host be claimed and
> released for every request?
> * Is it possible to predict the result from __blk_end_request().
> If there are no errors for a completed MMC request and the
> blk_rq_bytes(req) == data.bytes_xfered, will it be guaranteed that
> __blk_end_request will return 0?
>
> Here follows the IOZone results for u8500 v1.1 on eMMC.
> The numbers for DMA are a bit to good here due to the fact that the
> CPU speed is decreased compared to u8500 v2. This makes the cache handling
> even more significant.
>
> Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
>
> Relative diff: VANILLA-MMC-PIO -> 2BUF-MMC-PIO
> cpu load is abs diff
> random random
> KB reclen write rewrite read reread read write
> 51200 4 +0% +0% +0% +0% +0% +0%
> cpu: +0.1 -0.1 -0.5 -0.3 -0.1 -0.0
>
> 51200 8 +0% +0% +6% +6% +8% +0%
> cpu: +0.1 -0.1 -0.3 -0.4 -0.8 +0.0
>
> 51200 16 +0% -2% +0% +0% -3% +0%
> cpu: +0.0 -0.2 +0.0 +0.0 -0.2 +0.0
>
> 51200 32 +0% +1% +0% +0% +0% +0%
> cpu: +0.1 +0.0 -0.3 +0.0 +0.0 +0.0
>
> 51200 64 +0% +0% +0% +0% +0% +0%
> cpu: +0.1 +0.0 +0.0 +0.0 +0.0 +0.0
>
> 51200 128 +0% +1% +1% +1% +1% +0%
> cpu: +0.0 +0.2 +0.1 -0.3 +0.4 +0.0
>
> 51200 256 +0% +0% +1% +1% +1% +0%
> cpu: +0.0 -0.0 +0.1 +0.1 +0.1 +0.0
>
> 51200 512 +0% +1% +2% +2% +2% +0%
> cpu: +0.1 +0.0 +0.2 +0.2 +0.2 +0.1
>
> 51200 1024 +0% +2% +2% +2% +3% +0%
> cpu: +0.2 +0.1 +0.2 +0.5 -0.8 +0.0
>
> 51200 2048 +0% +2% +3% +3% +3% +0%
> cpu: +0.0 -0.2 +0.4 +0.8 -0.5 +0.2
>
> 51200 4096 +0% +1% +3% +3% +3% +1%
> cpu: +0.2 +0.1 +0.9 +0.9 +0.5 +0.1
>
> 51200 8192 +1% +0% +3% +3% +3% +1%
> cpu: +0.2 +0.2 +1.3 +1.3 +1.0 +0.0
>
> 51200 16384 +0% +1% +3% +3% +3% +1%
> cpu: +0.2 +0.1 +1.0 +1.3 +1.0 +0.5
>
> Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
> cpu load is abs diff
> random random
> KB reclen write rewrite read reread read write
> 51200 4 +0% -3% +6% +5% +5% +0%
> cpu: +0.0 -0.2 -0.6 -0.1 +0.3 +0.0
>
> 51200 8 +0% +0% +7% +7% +7% +0%
> cpu: +0.0 +0.1 +0.8 +0.6 +0.9 +0.0
>
> 51200 16 +0% +0% +7% +7% +8% +0%
> cpu: +0.0 -0.0 +0.7 +0.7 +0.8 +0.0
>
> 51200 32 +0% +0% +8% +8% +9% +0%
> cpu: +0.0 +0.1 +0.7 +0.7 +0.3 +0.0
>
> 51200 64 +0% +1% +9% +9% +9% +0%
> cpu: +0.0 +0.0 +0.8 +0.7 +0.8 +0.0
>
> 51200 128 +1% +0% +13% +13% +14% +0%
> cpu: +0.2 +0.0 +1.0 +1.0 +1.1 +0.0
>
> 51200 256 +1% +2% +8% +8% +11% +0%
> cpu: +0.0 +0.3 +0.0 +0.7 +1.5 +0.0
>
> 51200 512 +1% +2% +16% +16% +17% +0%
> cpu: +0.2 +0.2 +2.2 +2.1 +2.2 +0.1
>
> 51200 1024 +1% +2% +20% +20% +20% +1%
> cpu: +0.2 +0.1 +2.6 +1.9 +2.6 +0.0
>
> 51200 2048 +0% +2% +22% +22% +21% +0%
> cpu: +0.0 +0.3 +2.3 +2.9 +2.1 -0.0
>
> 51200 4096 +1% +2% +23% +23% +23% +1%
> cpu: +0.2 +0.1 +2.0 +3.2 +3.1 +0.0
>
> 51200 8192 +1% +5% +24% +24% +24% +1%
> cpu: +1.4 -0.0 +4.2 +3.0 +2.8 +0.1
>
> 51200 16384 +1% +3% +24% +24% +24% +2%
> cpu: +0.0 +0.3 +3.4 +3.8 +3.7 +0.1
>
> Here follows the IOZone results for u5500 on eMMC.
> These numbers for DMA are more as expected.
>
> Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
>
> Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
> cpu load is abs diff
> random random
> KB reclen write rewrite read reread read write
> 51200 128 +1% +1% +10% +9% +10% +0%
> cpu: +0.1 +0.0 +1.3 +0.1 +0.8 +0.1
>
> 51200 256 +2% +2% +7% +7% +9% +0%
> cpu: +0.1 +0.4 +0.5 +0.6 +0.7 +0.0
>
> 51200 512 +2% +2% +12% +12% +12% +1%
> cpu: +0.4 +0.6 +1.8 +2.4 +2.4 +0.2
>
> 51200 1024 +2% +3% +14% +14% +14% +0%
> cpu: +0.3 +0.1 +2.1 +1.4 +1.4 +0.2
>
> 51200 2048 +3% +3% +16% +16% +16% +1%
> cpu: +0.2 +0.2 +2.5 +1.8 +2.4 -0.2
>
> 51200 4096 +3% +3% +17% +17% +18% +3%
> cpu: +0.1 -0.1 +2.7 +2.0 +2.7 -0.1
>
> 51200 8192 +3% +3% +18% +18% +18% +3%
> cpu: -0.1 +0.2 +3.0 +2.3 +2.2 +0.2
>
> 51200 16384 +3% +3% +18% +18% +18% +4%
> cpu: +0.2 +0.2 +2.8 +3.5 +2.4 -0.0
>
> Per Forlin (5):
> mmc: add member in mmc queue struct to hold request data
> mmc: Add a block request prepare function
> mmc: Add a second mmc queue request member
> mmc: Store the mmc block request struct in mmc queue
> mmc: Add double buffering for mmc block requests
>
> drivers/mmc/card/block.c | 337 ++++++++++++++++++++++++++++++----------------
> drivers/mmc/card/queue.c | 171 +++++++++++++++---------
> drivers/mmc/card/queue.h | 31 +++-
> drivers/mmc/core/core.c | 77 +++++++++--
> include/linux/mmc/core.h | 7 +-
> include/linux/mmc/host.h | 8 +
> 6 files changed, 432 insertions(+), 199 deletions(-)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] mmc: add double buffering for mmc block requests
2011-01-18 2:35 ` Jaehoon Chung
@ 2011-01-18 8:12 ` Per Forlin
-1 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-18 8:12 UTC (permalink / raw)
To: Jaehoon Chung
Cc: linux-mmc, linux-arm-kernel, linux-kernel, linaro-dev,
Chris Ball, Kyungmin Park
On 18 January 2011 03:35, Jaehoon Chung <jh80.chung@samsung.com> wrote:
> Hi Per..
>
> it is interesting approach..so
> we want to test your double buffering in our environment(Samsung SoC).
>
> Did you test with SDHCI?
So far I have only tested on mmci for board u5500 and u8500. I happily
test on a different board if I can get hold of one.
> If you tested with SDHCI, i want to know how much increase the performance.
>
> Thanks,
> Jaehoon Chung
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 0/5] mmc: add double buffering for mmc block requests
@ 2011-01-18 8:12 ` Per Forlin
0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-18 8:12 UTC (permalink / raw)
To: linux-arm-kernel
On 18 January 2011 03:35, Jaehoon Chung <jh80.chung@samsung.com> wrote:
> Hi Per..
>
> it is interesting approach..so
> we want to test your double buffering in our environment(Samsung SoC).
>
> Did you test with SDHCI?
So far I have only tested on mmci for board u5500 and u8500. I happily
test on a different board if I can get hold of one.
> If you tested with SDHCI, i want to know how much increase the performance.
>
> Thanks,
> Jaehoon Chung
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] mmc: add double buffering for mmc block requests
@ 2011-01-28 8:28 ` Per Forlin
0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-28 8:28 UTC (permalink / raw)
To: Jaehoon Chung
Cc: linux-mmc, linux-arm-kernel, linux-kernel, linaro-dev,
Chris Ball, Kyungmin Park
Hi Jaehoon,
Have you had the chance to test the patches on the Samsung Soc. I can
sketch a implementation for sdhci if that helps? Unfortunately I don't
have any hardware to test the SDHCI driver unless there is support in
QEMU?
I really think I need to test this on more hardware in order to get
more results for the patchset and hopefully more attention as well.
BR
Per
On 18 January 2011 09:12, Per Forlin <per.forlin@linaro.org> wrote:
> On 18 January 2011 03:35, Jaehoon Chung <jh80.chung@samsung.com> wrote:
>> Hi Per..
>>
>> it is interesting approach..so
>> we want to test your double buffering in our environment(Samsung SoC).
>>
>> Did you test with SDHCI?
> So far I have only tested on mmci for board u5500 and u8500. I happily
> test on a different board if I can get hold of one.
>
>> If you tested with SDHCI, i want to know how much increase the performance.
>>
>> Thanks,
>> Jaehoon Chung
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] mmc: add double buffering for mmc block requests
@ 2011-01-28 8:28 ` Per Forlin
0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-28 8:28 UTC (permalink / raw)
To: Jaehoon Chung
Cc: linaro-dev-cunTk1MwBs8s++Sfvej+rw,
linux-mmc-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Kyungmin Park, Chris Ball,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
Hi Jaehoon,
Have you had the chance to test the patches on the Samsung Soc. I can
sketch a implementation for sdhci if that helps? Unfortunately I don't
have any hardware to test the SDHCI driver unless there is support in
QEMU?
I really think I need to test this on more hardware in order to get
more results for the patchset and hopefully more attention as well.
BR
Per
On 18 January 2011 09:12, Per Forlin <per.forlin-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On 18 January 2011 03:35, Jaehoon Chung <jh80.chung-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org> wrote:
>> Hi Per..
>>
>> it is interesting approach..so
>> we want to test your double buffering in our environment(Samsung SoC).
>>
>> Did you test with SDHCI?
> So far I have only tested on mmci for board u5500 and u8500. I happily
> test on a different board if I can get hold of one.
>
>> If you tested with SDHCI, i want to know how much increase the performance.
>>
>> Thanks,
>> Jaehoon Chung
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 0/5] mmc: add double buffering for mmc block requests
@ 2011-01-28 8:28 ` Per Forlin
0 siblings, 0 replies; 27+ messages in thread
From: Per Forlin @ 2011-01-28 8:28 UTC (permalink / raw)
To: linux-arm-kernel
Hi Jaehoon,
Have you had the chance to test the patches on the Samsung Soc. I can
sketch a implementation for sdhci if that helps? Unfortunately I don't
have any hardware to test the SDHCI driver unless there is support in
QEMU?
I really think I need to test this on more hardware in order to get
more results for the patchset and hopefully more attention as well.
BR
Per
On 18 January 2011 09:12, Per Forlin <per.forlin@linaro.org> wrote:
> On 18 January 2011 03:35, Jaehoon Chung <jh80.chung@samsung.com> wrote:
>> Hi Per..
>>
>> it is interesting approach..so
>> we want to test your double buffering in our environment(Samsung SoC).
>>
>> Did you test with SDHCI?
> So far I have only tested on mmci for board u5500 and u8500. I happily
> test on a different board if I can get hold of one.
>
>> If you tested with SDHCI, i want to know how much increase the performance.
>>
>> Thanks,
>> Jaehoon Chung
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] mmc: add double buffering for mmc block requests
2011-01-28 8:28 ` Per Forlin
@ 2011-01-30 8:23 ` Jaehoon Chung
-1 siblings, 0 replies; 27+ messages in thread
From: Jaehoon Chung @ 2011-01-30 8:23 UTC (permalink / raw)
To: Per Forlin
Cc: Jaehoon Chung, linux-mmc, linux-arm-kernel, linux-kernel,
linaro-dev, Chris Ball, Kyungmin Park
Hi Per.
If you sketch a implementation for sdhci. i can test the patches on the Samsung Soc
Your help is very nice for me :).i want to know how much improve the performance
after applied your patch.
Regards,
Jaehoon Chung
Per Forlin wrote:
> Hi Jaehoon,
>
> Have you had the chance to test the patches on the Samsung Soc. I can
> sketch a implementation for sdhci if that helps? Unfortunately I don't
> have any hardware to test the SDHCI driver unless there is support in
> QEMU?
> I really think I need to test this on more hardware in order to get
> more results for the patchset and hopefully more attention as well.
>
> BR
> Per
>
> On 18 January 2011 09:12, Per Forlin <per.forlin@linaro.org> wrote:
>> On 18 January 2011 03:35, Jaehoon Chung <jh80.chung@samsung.com> wrote:
>>> Hi Per..
>>>
>>> it is interesting approach..so
>>> we want to test your double buffering in our environment(Samsung SoC).
>>>
>>> Did you test with SDHCI?
>> So far I have only tested on mmci for board u5500 and u8500. I happily
>> test on a different board if I can get hold of one.
>>
>>> If you tested with SDHCI, i want to know how much increase the performance.
>>>
>>> Thanks,
>>> Jaehoon Chung
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 0/5] mmc: add double buffering for mmc block requests
@ 2011-01-30 8:23 ` Jaehoon Chung
0 siblings, 0 replies; 27+ messages in thread
From: Jaehoon Chung @ 2011-01-30 8:23 UTC (permalink / raw)
To: linux-arm-kernel
Hi Per.
If you sketch a implementation for sdhci. i can test the patches on the Samsung Soc
Your help is very nice for me :).i want to know how much improve the performance
after applied your patch.
Regards,
Jaehoon Chung
Per Forlin wrote:
> Hi Jaehoon,
>
> Have you had the chance to test the patches on the Samsung Soc. I can
> sketch a implementation for sdhci if that helps? Unfortunately I don't
> have any hardware to test the SDHCI driver unless there is support in
> QEMU?
> I really think I need to test this on more hardware in order to get
> more results for the patchset and hopefully more attention as well.
>
> BR
> Per
>
> On 18 January 2011 09:12, Per Forlin <per.forlin@linaro.org> wrote:
>> On 18 January 2011 03:35, Jaehoon Chung <jh80.chung@samsung.com> wrote:
>>> Hi Per..
>>>
>>> it is interesting approach..so
>>> we want to test your double buffering in our environment(Samsung SoC).
>>>
>>> Did you test with SDHCI?
>> So far I have only tested on mmci for board u5500 and u8500. I happily
>> test on a different board if I can get hold of one.
>>
>>> If you tested with SDHCI, i want to know how much increase the performance.
>>>
>>> Thanks,
>>> Jaehoon Chung
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] mmc: add double buffering for mmc block requests
2011-01-12 18:13 ` Per Forlin
@ 2011-02-05 17:02 ` Russell King - ARM Linux
-1 siblings, 0 replies; 27+ messages in thread
From: Russell King - ARM Linux @ 2011-02-05 17:02 UTC (permalink / raw)
To: Per Forlin, Catalin Marinas
Cc: linux-mmc, linux-arm-kernel, linux-kernel, dev, Chris Ball
On Wed, Jan 12, 2011 at 07:13:58PM +0100, Per Forlin wrote:
> Add support to prepare one MMC request while another is active on
> the host. This is done by making the issue_rw_rq() asynchronous.
> The increase in throughput is proportional to the time it takes to
> prepare a request and how fast the memory is. The faster the MMC/SD is
> the more significant the prepare request time becomes. Measurements on U5500
> and U8500 on eMMC shows significant performance gain for DMA on MMC for large
> reads. In the PIO case there is some gain in performance for large reads too.
> There seems to be no or small performance gain for write, don't have a good
> explanation for this yet.
It might be worth seeing what effect the following patch has. This
moves the dsb out of the cache operations into a separate function,
so we only do one dsb per DMA mapping/unmapping operation. That's
particularly significant for the scattergather code.
I don't remember the reason why this was dropped as a candidate for
merging - could that be because the dsb needs to be before the outer
cache maintainence? Adding Catalin for comment on that.
arch/arm/include/asm/cacheflush.h | 4 ++++
arch/arm/include/asm/dma-mapping.h | 8 ++++++++
arch/arm/mm/cache-fa.S | 13 +++++++------
arch/arm/mm/cache-v3.S | 3 +++
arch/arm/mm/cache-v4.S | 3 +++
arch/arm/mm/cache-v4wb.S | 9 +++++++--
arch/arm/mm/cache-v4wt.S | 3 +++
arch/arm/mm/cache-v6.S | 13 +++++++------
arch/arm/mm/cache-v7.S | 9 ++++++---
arch/arm/mm/dma-mapping.c | 12 ++++++++++++
arch/arm/mm/proc-arm1020e.S | 10 +++++++---
arch/arm/mm/proc-arm1022.S | 10 +++++++---
arch/arm/mm/proc-arm1026.S | 10 +++++++---
arch/arm/mm/proc-arm920.S | 10 +++++++---
arch/arm/mm/proc-arm922.S | 10 +++++++---
arch/arm/mm/proc-arm925.S | 10 +++++++---
arch/arm/mm/proc-arm926.S | 10 +++++++---
arch/arm/mm/proc-arm940.S | 10 +++++++---
arch/arm/mm/proc-arm946.S | 10 +++++++---
arch/arm/mm/proc-feroceon.S | 13 ++++++++-----
arch/arm/mm/proc-mohawk.S | 10 +++++++---
arch/arm/mm/proc-xsc3.S | 10 +++++++---
arch/arm/mm/proc-xscale.S | 10 +++++++---
23 files changed, 152 insertions(+), 58 deletions(-)
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index e290885..5928e78 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -223,6 +223,7 @@ struct cpu_cache_fns {
void (*dma_map_area)(const void *, size_t, int);
void (*dma_unmap_area)(const void *, size_t, int);
+ void (*dma_barrier)(void);
void (*dma_flush_range)(const void *, const void *);
};
@@ -250,6 +251,7 @@ extern struct cpu_cache_fns cpu_cache;
*/
#define dmac_map_area cpu_cache.dma_map_area
#define dmac_unmap_area cpu_cache.dma_unmap_area
+#define dmac_barrier cpu_cache.dma_barrier
#define dmac_flush_range cpu_cache.dma_flush_range
#else
@@ -278,10 +280,12 @@ extern void __cpuc_flush_dcache_area(void *, size_t);
*/
#define dmac_map_area __glue(_CACHE,_dma_map_area)
#define dmac_unmap_area __glue(_CACHE,_dma_unmap_area)
+#define dmac_barrier __glue(_CACHE,_dma_barrier)
#define dmac_flush_range __glue(_CACHE,_dma_flush_range)
extern void dmac_map_area(const void *, size_t, int);
extern void dmac_unmap_area(const void *, size_t, int);
+extern void dmac_barrier(void);
extern void dmac_flush_range(const void *, const void *);
#endif
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 256ee1c..1371db7 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -115,6 +115,8 @@ static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
___dma_page_dev_to_cpu(page, off, size, dir);
}
+extern void __dma_barrier(enum dma_data_direction);
+
/*
* Return whether the given device DMA address mask can be supported
* properly. For example, if your device can only drive the low 24-bits
@@ -378,6 +380,7 @@ static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
BUG_ON(!valid_dma_direction(dir));
addr = __dma_map_single(dev, cpu_addr, size, dir);
+ __dma_barrier(dir);
debug_dma_map_page(dev, virt_to_page(cpu_addr),
(unsigned long)cpu_addr & ~PAGE_MASK, size,
dir, addr, true);
@@ -407,6 +410,7 @@ static inline dma_addr_t dma_map_page(struct device *dev, struct page *page,
BUG_ON(!valid_dma_direction(dir));
addr = __dma_map_page(dev, page, offset, size, dir);
+ __dma_barrier(dir);
debug_dma_map_page(dev, page, offset, size, dir, addr, false);
return addr;
@@ -431,6 +435,7 @@ static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
{
debug_dma_unmap_page(dev, handle, size, dir, true);
__dma_unmap_single(dev, handle, size, dir);
+ __dma_barrier(dir);
}
/**
@@ -452,6 +457,7 @@ static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
{
debug_dma_unmap_page(dev, handle, size, dir, false);
__dma_unmap_page(dev, handle, size, dir);
+ __dma_barrier(dir);
}
/**
@@ -484,6 +490,7 @@ static inline void dma_sync_single_range_for_cpu(struct device *dev,
return;
__dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir);
+ __dma_barrier(dir);
}
static inline void dma_sync_single_range_for_device(struct device *dev,
@@ -498,6 +505,7 @@ static inline void dma_sync_single_range_for_device(struct device *dev,
return;
__dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir);
+ __dma_barrier(dir);
}
static inline void dma_sync_single_for_cpu(struct device *dev,
diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S
index 7148e53..cdcfae2 100644
--- a/arch/arm/mm/cache-fa.S
+++ b/arch/arm/mm/cache-fa.S
@@ -179,8 +179,6 @@ fa_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mov r0, #0
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -197,8 +195,6 @@ fa_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mov r0, #0
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -212,8 +208,6 @@ ENTRY(fa_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mov r0, #0
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -240,6 +234,12 @@ ENTRY(fa_dma_unmap_area)
mov pc, lr
ENDPROC(fa_dma_unmap_area)
+ENTRY(fa_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
+ mov pc, lr
+ENDPROC(fa_dma_barrier)
+
__INITDATA
.type fa_cache_fns, #object
@@ -253,5 +253,6 @@ ENTRY(fa_cache_fns)
.long fa_flush_kern_dcache_area
.long fa_dma_map_area
.long fa_dma_unmap_area
+ .long fa_dma_barrier
.long fa_dma_flush_range
.size fa_cache_fns, . - fa_cache_fns
diff --git a/arch/arm/mm/cache-v3.S b/arch/arm/mm/cache-v3.S
index c2ff3c5..df34458 100644
--- a/arch/arm/mm/cache-v3.S
+++ b/arch/arm/mm/cache-v3.S
@@ -123,9 +123,11 @@ ENTRY(v3_dma_unmap_area)
* - dir - DMA direction
*/
ENTRY(v3_dma_map_area)
+ENTRY(v3_dma_barrier)
mov pc, lr
ENDPROC(v3_dma_unmap_area)
ENDPROC(v3_dma_map_area)
+ENDPROC(v3_dma_barrier)
__INITDATA
@@ -140,5 +142,6 @@ ENTRY(v3_cache_fns)
.long v3_flush_kern_dcache_area
.long v3_dma_map_area
.long v3_dma_unmap_area
+ .long v3_dma_barrier
.long v3_dma_flush_range
.size v3_cache_fns, . - v3_cache_fns
diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S
index 4810f7e..20260b1 100644
--- a/arch/arm/mm/cache-v4.S
+++ b/arch/arm/mm/cache-v4.S
@@ -135,9 +135,11 @@ ENTRY(v4_dma_unmap_area)
* - dir - DMA direction
*/
ENTRY(v4_dma_map_area)
+ENTRY(v4_dma_barrier)
mov pc, lr
ENDPROC(v4_dma_unmap_area)
ENDPROC(v4_dma_map_area)
+ENDPROC(v4_dma_barrier)
__INITDATA
@@ -152,5 +154,6 @@ ENTRY(v4_cache_fns)
.long v4_flush_kern_dcache_area
.long v4_dma_map_area
.long v4_dma_unmap_area
+ .long v4_dma_barrier
.long v4_dma_flush_range
.size v4_cache_fns, . - v4_cache_fns
diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S
index df8368a..9c9c875 100644
--- a/arch/arm/mm/cache-v4wb.S
+++ b/arch/arm/mm/cache-v4wb.S
@@ -194,7 +194,6 @@ v4wb_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -211,7 +210,6 @@ v4wb_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -251,6 +249,12 @@ ENTRY(v4wb_dma_unmap_area)
mov pc, lr
ENDPROC(v4wb_dma_unmap_area)
+ENTRY(v4wb_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
+ mov pc, lr
+ENDPROC(v4wb_dma_barrier)
+
__INITDATA
.type v4wb_cache_fns, #object
@@ -264,5 +268,6 @@ ENTRY(v4wb_cache_fns)
.long v4wb_flush_kern_dcache_area
.long v4wb_dma_map_area
.long v4wb_dma_unmap_area
+ .long v4wb_dma_barrier
.long v4wb_dma_flush_range
.size v4wb_cache_fns, . - v4wb_cache_fns
diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S
index 45c7031..223eea4 100644
--- a/arch/arm/mm/cache-v4wt.S
+++ b/arch/arm/mm/cache-v4wt.S
@@ -191,9 +191,11 @@ ENTRY(v4wt_dma_unmap_area)
* - dir - DMA direction
*/
ENTRY(v4wt_dma_map_area)
+ENTRY(v4wt_dma_barrier)
mov pc, lr
ENDPROC(v4wt_dma_unmap_area)
ENDPROC(v4wt_dma_map_area)
+ENDPROC(v4wt_dma_barrier)
__INITDATA
@@ -208,5 +210,6 @@ ENTRY(v4wt_cache_fns)
.long v4wt_flush_kern_dcache_area
.long v4wt_dma_map_area
.long v4wt_dma_unmap_area
+ .long v4wt_dma_barrier
.long v4wt_dma_flush_range
.size v4wt_cache_fns, . - v4wt_cache_fns
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index 9d89c67..b294854 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -238,8 +238,6 @@ v6_dma_inv_range:
strlo r2, [r0] @ write for ownership
#endif
blo 1b
- mov r0, #0
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -261,8 +259,6 @@ v6_dma_clean_range:
add r0, r0, #D_CACHE_LINE_SIZE
cmp r0, r1
blo 1b
- mov r0, #0
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -289,8 +285,6 @@ ENTRY(v6_dma_flush_range)
strlob r2, [r0] @ write for ownership
#endif
blo 1b
- mov r0, #0
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -327,6 +321,12 @@ ENTRY(v6_dma_unmap_area)
mov pc, lr
ENDPROC(v6_dma_unmap_area)
+ENTRY(v6_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
+ mov pc, lr
+ENDPROC(v6_dma_barrier)
+
__INITDATA
.type v6_cache_fns, #object
@@ -340,5 +340,6 @@ ENTRY(v6_cache_fns)
.long v6_flush_kern_dcache_area
.long v6_dma_map_area
.long v6_dma_unmap_area
+ .long v6_dma_barrier
.long v6_dma_flush_range
.size v6_cache_fns, . - v6_cache_fns
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index bcd64f2..d89d55a 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -255,7 +255,6 @@ v7_dma_inv_range:
add r0, r0, r2
cmp r0, r1
blo 1b
- dsb
mov pc, lr
ENDPROC(v7_dma_inv_range)
@@ -273,7 +272,6 @@ v7_dma_clean_range:
add r0, r0, r2
cmp r0, r1
blo 1b
- dsb
mov pc, lr
ENDPROC(v7_dma_clean_range)
@@ -291,7 +289,6 @@ ENTRY(v7_dma_flush_range)
add r0, r0, r2
cmp r0, r1
blo 1b
- dsb
mov pc, lr
ENDPROC(v7_dma_flush_range)
@@ -321,6 +318,11 @@ ENTRY(v7_dma_unmap_area)
mov pc, lr
ENDPROC(v7_dma_unmap_area)
+ENTRY(v7_dma_barrier)
+ dsb
+ mov pc, lr
+ENDPROC(v7_dma_barrier)
+
__INITDATA
.type v7_cache_fns, #object
@@ -334,5 +336,6 @@ ENTRY(v7_cache_fns)
.long v7_flush_kern_dcache_area
.long v7_dma_map_area
.long v7_dma_unmap_area
+ .long v7_dma_barrier
.long v7_dma_flush_range
.size v7_cache_fns, . - v7_cache_fns
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 64daef2..d807f38 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -97,6 +97,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
memset(ptr, 0, size);
dmac_flush_range(ptr, ptr + size);
outer_flush_range(__pa(ptr), __pa(ptr) + size);
+ dmac_barrier();
return page;
}
@@ -542,6 +543,12 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off,
}
EXPORT_SYMBOL(___dma_page_dev_to_cpu);
+void __dma_barrier(enum dma_data_direction dir)
+{
+ dmac_barrier();
+}
+EXPORT_SYMBOL(__dma_barrier);
+
/**
* dma_map_sg - map a set of SG buffers for streaming mode DMA
* @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -572,6 +579,7 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
if (dma_mapping_error(dev, s->dma_address))
goto bad_mapping;
}
+ __dma_barrier(dir);
debug_dma_map_sg(dev, sg, nents, nents, dir);
return nents;
@@ -602,6 +610,8 @@ void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
for_each_sg(sg, s, nents, i)
__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
+
+ __dma_barrier(dir);
}
EXPORT_SYMBOL(dma_unmap_sg);
@@ -627,6 +637,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
s->length, dir);
}
+ __dma_barrier(dir);
debug_dma_sync_sg_for_cpu(dev, sg, nents, dir);
}
EXPORT_SYMBOL(dma_sync_sg_for_cpu);
@@ -653,6 +664,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
s->length, dir);
}
+ __dma_barrier(dir);
debug_dma_sync_sg_for_device(dev, sg, nents, dir);
}
EXPORT_SYMBOL(dma_sync_sg_for_device);
diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S
index d278298..fea33c9 100644
--- a/arch/arm/mm/proc-arm1020e.S
+++ b/arch/arm/mm/proc-arm1020e.S
@@ -281,7 +281,6 @@ arm1020e_dma_inv_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -303,7 +302,6 @@ arm1020e_dma_clean_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -323,7 +321,6 @@ ENTRY(arm1020e_dma_flush_range)
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -350,6 +347,12 @@ ENTRY(arm1020e_dma_unmap_area)
mov pc, lr
ENDPROC(arm1020e_dma_unmap_area)
+ENTRY(arm1020e_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm1020e_dma_barrier)
+
ENTRY(arm1020e_cache_fns)
.long arm1020e_flush_icache_all
.long arm1020e_flush_kern_cache_all
@@ -360,6 +363,7 @@ ENTRY(arm1020e_cache_fns)
.long arm1020e_flush_kern_dcache_area
.long arm1020e_dma_map_area
.long arm1020e_dma_unmap_area
+ .long arm1020e_dma_barrier
.long arm1020e_dma_flush_range
.align 5
diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S
index ce13e4a..ba1a7df 100644
--- a/arch/arm/mm/proc-arm1022.S
+++ b/arch/arm/mm/proc-arm1022.S
@@ -270,7 +270,6 @@ arm1022_dma_inv_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -292,7 +291,6 @@ arm1022_dma_clean_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -312,7 +310,6 @@ ENTRY(arm1022_dma_flush_range)
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -339,6 +336,12 @@ ENTRY(arm1022_dma_unmap_area)
mov pc, lr
ENDPROC(arm1022_dma_unmap_area)
+ENTRY(arm1022_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm1022_dma_barrier)
+
ENTRY(arm1022_cache_fns)
.long arm1022_flush_icache_all
.long arm1022_flush_kern_cache_all
@@ -349,6 +352,7 @@ ENTRY(arm1022_cache_fns)
.long arm1022_flush_kern_dcache_area
.long arm1022_dma_map_area
.long arm1022_dma_unmap_area
+ .long arm1022_dma_barrier
.long arm1022_dma_flush_range
.align 5
diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S
index 636672a..de648f1 100644
--- a/arch/arm/mm/proc-arm1026.S
+++ b/arch/arm/mm/proc-arm1026.S
@@ -264,7 +264,6 @@ arm1026_dma_inv_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -286,7 +285,6 @@ arm1026_dma_clean_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -306,7 +304,6 @@ ENTRY(arm1026_dma_flush_range)
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -333,6 +330,12 @@ ENTRY(arm1026_dma_unmap_area)
mov pc, lr
ENDPROC(arm1026_dma_unmap_area)
+ENTRY(arm1026_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm1026_dma_barrier)
+
ENTRY(arm1026_cache_fns)
.long arm1026_flush_icache_all
.long arm1026_flush_kern_cache_all
@@ -343,6 +346,7 @@ ENTRY(arm1026_cache_fns)
.long arm1026_flush_kern_dcache_area
.long arm1026_dma_map_area
.long arm1026_dma_unmap_area
+ .long arm1026_dma_barrier
.long arm1026_dma_flush_range
.align 5
diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S
index 8be8199..ec74093 100644
--- a/arch/arm/mm/proc-arm920.S
+++ b/arch/arm/mm/proc-arm920.S
@@ -252,7 +252,6 @@ arm920_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -271,7 +270,6 @@ arm920_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -288,7 +286,6 @@ ENTRY(arm920_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -315,6 +312,12 @@ ENTRY(arm920_dma_unmap_area)
mov pc, lr
ENDPROC(arm920_dma_unmap_area)
+ENTRY(arm920_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm920_dma_barrier)
+
ENTRY(arm920_cache_fns)
.long arm920_flush_icache_all
.long arm920_flush_kern_cache_all
@@ -325,6 +328,7 @@ ENTRY(arm920_cache_fns)
.long arm920_flush_kern_dcache_area
.long arm920_dma_map_area
.long arm920_dma_unmap_area
+ .long arm920_dma_barrier
.long arm920_dma_flush_range
#endif
diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S
index c0ff8e4..474d4c6 100644
--- a/arch/arm/mm/proc-arm922.S
+++ b/arch/arm/mm/proc-arm922.S
@@ -254,7 +254,6 @@ arm922_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -273,7 +272,6 @@ arm922_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -290,7 +288,6 @@ ENTRY(arm922_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -317,6 +314,12 @@ ENTRY(arm922_dma_unmap_area)
mov pc, lr
ENDPROC(arm922_dma_unmap_area)
+ENTRY(arm922_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm922_dma_barrier)
+
ENTRY(arm922_cache_fns)
.long arm922_flush_icache_all
.long arm922_flush_kern_cache_all
@@ -327,6 +330,7 @@ ENTRY(arm922_cache_fns)
.long arm922_flush_kern_dcache_area
.long arm922_dma_map_area
.long arm922_dma_unmap_area
+ .long arm922_dma_barrier
.long arm922_dma_flush_range
#endif
diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S
index 3c6cffe..0336ae3 100644
--- a/arch/arm/mm/proc-arm925.S
+++ b/arch/arm/mm/proc-arm925.S
@@ -302,7 +302,6 @@ arm925_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -323,7 +322,6 @@ arm925_dma_clean_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -345,7 +343,6 @@ ENTRY(arm925_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -372,6 +369,12 @@ ENTRY(arm925_dma_unmap_area)
mov pc, lr
ENDPROC(arm925_dma_unmap_area)
+ENTRY(arm925_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm925_dma_barrier)
+
ENTRY(arm925_cache_fns)
.long arm925_flush_icache_all
.long arm925_flush_kern_cache_all
@@ -382,6 +385,7 @@ ENTRY(arm925_cache_fns)
.long arm925_flush_kern_dcache_area
.long arm925_dma_map_area
.long arm925_dma_unmap_area
+ .long arm925_dma_barrier
.long arm925_dma_flush_range
ENTRY(cpu_arm925_dcache_clean_area)
diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S
index 75b707c..473bbe6 100644
--- a/arch/arm/mm/proc-arm926.S
+++ b/arch/arm/mm/proc-arm926.S
@@ -265,7 +265,6 @@ arm926_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -286,7 +285,6 @@ arm926_dma_clean_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -308,7 +306,6 @@ ENTRY(arm926_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -335,6 +332,12 @@ ENTRY(arm926_dma_unmap_area)
mov pc, lr
ENDPROC(arm926_dma_unmap_area)
+ENTRY(arm926_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm926_dma_barrier)
+
ENTRY(arm926_cache_fns)
.long arm926_flush_icache_all
.long arm926_flush_kern_cache_all
@@ -345,6 +348,7 @@ ENTRY(arm926_cache_fns)
.long arm926_flush_kern_dcache_area
.long arm926_dma_map_area
.long arm926_dma_unmap_area
+ .long arm926_dma_barrier
.long arm926_dma_flush_range
ENTRY(cpu_arm926_dcache_clean_area)
diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S
index 1af1657..c44c963 100644
--- a/arch/arm/mm/proc-arm940.S
+++ b/arch/arm/mm/proc-arm940.S
@@ -187,7 +187,6 @@ arm940_dma_inv_range:
bcs 2b @ entries 63 to 0
subs r1, r1, #1 << 4
bcs 1b @ segments 7 to 0
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -211,7 +210,6 @@ ENTRY(cpu_arm940_dcache_clean_area)
subs r1, r1, #1 << 4
bcs 1b @ segments 7 to 0
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -237,7 +235,6 @@ ENTRY(arm940_dma_flush_range)
bcs 2b @ entries 63 to 0
subs r1, r1, #1 << 4
bcs 1b @ segments 7 to 0
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -264,6 +261,12 @@ ENTRY(arm940_dma_unmap_area)
mov pc, lr
ENDPROC(arm940_dma_unmap_area)
+ENTRY(arm940_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, ip, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm940_dma_barrier)
+
ENTRY(arm940_cache_fns)
.long arm940_flush_icache_all
.long arm940_flush_kern_cache_all
@@ -274,6 +277,7 @@ ENTRY(arm940_cache_fns)
.long arm940_flush_kern_dcache_area
.long arm940_dma_map_area
.long arm940_dma_unmap_area
+ .long arm940_dma_barrier
.long arm940_dma_flush_range
__CPUINIT
diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S
index 1664b6a..11e9ad7 100644
--- a/arch/arm/mm/proc-arm946.S
+++ b/arch/arm/mm/proc-arm946.S
@@ -234,7 +234,6 @@ arm946_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -255,7 +254,6 @@ arm946_dma_clean_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -279,7 +277,6 @@ ENTRY(arm946_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -306,6 +303,12 @@ ENTRY(arm946_dma_unmap_area)
mov pc, lr
ENDPROC(arm946_dma_unmap_area)
+ENTRY(arm946_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm946_dma_barrier)
+
ENTRY(arm946_cache_fns)
.long arm946_flush_icache_all
.long arm946_flush_kern_cache_all
@@ -316,6 +319,7 @@ ENTRY(arm946_cache_fns)
.long arm946_flush_kern_dcache_area
.long arm946_dma_map_area
.long arm946_dma_unmap_area
+ .long arm946_dma_barrier
.long arm946_dma_flush_range
diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S
index 53e6323..50a309e 100644
--- a/arch/arm/mm/proc-feroceon.S
+++ b/arch/arm/mm/proc-feroceon.S
@@ -290,7 +290,6 @@ feroceon_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
.align 5
@@ -326,7 +325,6 @@ feroceon_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
.align 5
@@ -339,7 +337,6 @@ feroceon_range_dma_clean_range:
mcr p15, 5, r0, c15, c13, 0 @ D clean range start
mcr p15, 5, r1, c15, c13, 1 @ D clean range top
msr cpsr_c, r2 @ restore interrupts
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -357,7 +354,6 @@ ENTRY(feroceon_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
.align 5
@@ -370,7 +366,6 @@ ENTRY(feroceon_range_dma_flush_range)
mcr p15, 5, r0, c15, c15, 0 @ D clean/inv range start
mcr p15, 5, r1, c15, c15, 1 @ D clean/inv range top
msr cpsr_c, r2 @ restore interrupts
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -411,6 +406,12 @@ ENTRY(feroceon_dma_unmap_area)
mov pc, lr
ENDPROC(feroceon_dma_unmap_area)
+ENTRY(feroceon_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(feroceon_dma_barrier)
+
ENTRY(feroceon_cache_fns)
.long feroceon_flush_icache_all
.long feroceon_flush_kern_cache_all
@@ -421,6 +422,7 @@ ENTRY(feroceon_cache_fns)
.long feroceon_flush_kern_dcache_area
.long feroceon_dma_map_area
.long feroceon_dma_unmap_area
+ .long feroceon_dma_barrier
.long feroceon_dma_flush_range
ENTRY(feroceon_range_cache_fns)
@@ -433,6 +435,7 @@ ENTRY(feroceon_range_cache_fns)
.long feroceon_range_flush_kern_dcache_area
.long feroceon_range_dma_map_area
.long feroceon_dma_unmap_area
+ .long feroceon_dma_barrier
.long feroceon_range_dma_flush_range
.align 5
diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S
index caa3115..09e8883 100644
--- a/arch/arm/mm/proc-mohawk.S
+++ b/arch/arm/mm/proc-mohawk.S
@@ -224,7 +224,6 @@ mohawk_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -243,7 +242,6 @@ mohawk_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -261,7 +259,6 @@ ENTRY(mohawk_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -288,6 +285,12 @@ ENTRY(mohawk_dma_unmap_area)
mov pc, lr
ENDPROC(mohawk_dma_unmap_area)
+ENTRY(mohawk_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(mohawk_dma_barrier)
+
ENTRY(mohawk_cache_fns)
.long mohawk_flush_kern_cache_all
.long mohawk_flush_user_cache_all
@@ -297,6 +300,7 @@ ENTRY(mohawk_cache_fns)
.long mohawk_flush_kern_dcache_area
.long mohawk_dma_map_area
.long mohawk_dma_unmap_area
+ .long mohawk_dma_barrier
.long mohawk_dma_flush_range
ENTRY(cpu_mohawk_dcache_clean_area)
diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S
index 046b3d8..d033ed4 100644
--- a/arch/arm/mm/proc-xsc3.S
+++ b/arch/arm/mm/proc-xsc3.S
@@ -274,7 +274,6 @@ xsc3_dma_inv_range:
add r0, r0, #CACHELINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ data write barrier
mov pc, lr
/*
@@ -291,7 +290,6 @@ xsc3_dma_clean_range:
add r0, r0, #CACHELINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ data write barrier
mov pc, lr
/*
@@ -308,7 +306,6 @@ ENTRY(xsc3_dma_flush_range)
add r0, r0, #CACHELINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ data write barrier
mov pc, lr
/*
@@ -335,6 +332,12 @@ ENTRY(xsc3_dma_unmap_area)
mov pc, lr
ENDPROC(xsc3_dma_unmap_area)
+ENTRY(xsc3_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ data write barrier
+ mov pc, lr
+ENDPROC(xsc3_dma_barrier)
+
ENTRY(xsc3_cache_fns)
.long xsc3_flush_icache_all
.long xsc3_flush_kern_cache_all
@@ -345,6 +348,7 @@ ENTRY(xsc3_cache_fns)
.long xsc3_flush_kern_dcache_area
.long xsc3_dma_map_area
.long xsc3_dma_unmap_area
+ .long xsc3_dma_barrier
.long xsc3_dma_flush_range
ENTRY(cpu_xsc3_dcache_clean_area)
diff --git a/arch/arm/mm/proc-xscale.S b/arch/arm/mm/proc-xscale.S
index 63037e2..e390ae6 100644
--- a/arch/arm/mm/proc-xscale.S
+++ b/arch/arm/mm/proc-xscale.S
@@ -332,7 +332,6 @@ xscale_dma_inv_range:
add r0, r0, #CACHELINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ Drain Write (& Fill) Buffer
mov pc, lr
/*
@@ -349,7 +348,6 @@ xscale_dma_clean_range:
add r0, r0, #CACHELINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ Drain Write (& Fill) Buffer
mov pc, lr
/*
@@ -367,7 +365,6 @@ ENTRY(xscale_dma_flush_range)
add r0, r0, #CACHELINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ Drain Write (& Fill) Buffer
mov pc, lr
/*
@@ -407,6 +404,12 @@ ENTRY(xscale_dma_unmap_area)
mov pc, lr
ENDPROC(xscale_dma_unmap_area)
+ENTRY(xscale_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ Drain Write (& Fill) Buffer
+ mov pc, lr
+ENDPROC(xscsale_dma_barrier)
+
ENTRY(xscale_cache_fns)
.long xscale_flush_icache_all
.long xscale_flush_kern_cache_all
@@ -417,6 +420,7 @@ ENTRY(xscale_cache_fns)
.long xscale_flush_kern_dcache_area
.long xscale_dma_map_area
.long xscale_dma_unmap_area
+ .long xscale_dma_barrier
.long xscale_dma_flush_range
/*
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 0/5] mmc: add double buffering for mmc block requests
@ 2011-02-05 17:02 ` Russell King - ARM Linux
0 siblings, 0 replies; 27+ messages in thread
From: Russell King - ARM Linux @ 2011-02-05 17:02 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Jan 12, 2011 at 07:13:58PM +0100, Per Forlin wrote:
> Add support to prepare one MMC request while another is active on
> the host. This is done by making the issue_rw_rq() asynchronous.
> The increase in throughput is proportional to the time it takes to
> prepare a request and how fast the memory is. The faster the MMC/SD is
> the more significant the prepare request time becomes. Measurements on U5500
> and U8500 on eMMC shows significant performance gain for DMA on MMC for large
> reads. In the PIO case there is some gain in performance for large reads too.
> There seems to be no or small performance gain for write, don't have a good
> explanation for this yet.
It might be worth seeing what effect the following patch has. This
moves the dsb out of the cache operations into a separate function,
so we only do one dsb per DMA mapping/unmapping operation. That's
particularly significant for the scattergather code.
I don't remember the reason why this was dropped as a candidate for
merging - could that be because the dsb needs to be before the outer
cache maintainence? Adding Catalin for comment on that.
arch/arm/include/asm/cacheflush.h | 4 ++++
arch/arm/include/asm/dma-mapping.h | 8 ++++++++
arch/arm/mm/cache-fa.S | 13 +++++++------
arch/arm/mm/cache-v3.S | 3 +++
arch/arm/mm/cache-v4.S | 3 +++
arch/arm/mm/cache-v4wb.S | 9 +++++++--
arch/arm/mm/cache-v4wt.S | 3 +++
arch/arm/mm/cache-v6.S | 13 +++++++------
arch/arm/mm/cache-v7.S | 9 ++++++---
arch/arm/mm/dma-mapping.c | 12 ++++++++++++
arch/arm/mm/proc-arm1020e.S | 10 +++++++---
arch/arm/mm/proc-arm1022.S | 10 +++++++---
arch/arm/mm/proc-arm1026.S | 10 +++++++---
arch/arm/mm/proc-arm920.S | 10 +++++++---
arch/arm/mm/proc-arm922.S | 10 +++++++---
arch/arm/mm/proc-arm925.S | 10 +++++++---
arch/arm/mm/proc-arm926.S | 10 +++++++---
arch/arm/mm/proc-arm940.S | 10 +++++++---
arch/arm/mm/proc-arm946.S | 10 +++++++---
arch/arm/mm/proc-feroceon.S | 13 ++++++++-----
arch/arm/mm/proc-mohawk.S | 10 +++++++---
arch/arm/mm/proc-xsc3.S | 10 +++++++---
arch/arm/mm/proc-xscale.S | 10 +++++++---
23 files changed, 152 insertions(+), 58 deletions(-)
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index e290885..5928e78 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -223,6 +223,7 @@ struct cpu_cache_fns {
void (*dma_map_area)(const void *, size_t, int);
void (*dma_unmap_area)(const void *, size_t, int);
+ void (*dma_barrier)(void);
void (*dma_flush_range)(const void *, const void *);
};
@@ -250,6 +251,7 @@ extern struct cpu_cache_fns cpu_cache;
*/
#define dmac_map_area cpu_cache.dma_map_area
#define dmac_unmap_area cpu_cache.dma_unmap_area
+#define dmac_barrier cpu_cache.dma_barrier
#define dmac_flush_range cpu_cache.dma_flush_range
#else
@@ -278,10 +280,12 @@ extern void __cpuc_flush_dcache_area(void *, size_t);
*/
#define dmac_map_area __glue(_CACHE,_dma_map_area)
#define dmac_unmap_area __glue(_CACHE,_dma_unmap_area)
+#define dmac_barrier __glue(_CACHE,_dma_barrier)
#define dmac_flush_range __glue(_CACHE,_dma_flush_range)
extern void dmac_map_area(const void *, size_t, int);
extern void dmac_unmap_area(const void *, size_t, int);
+extern void dmac_barrier(void);
extern void dmac_flush_range(const void *, const void *);
#endif
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 256ee1c..1371db7 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -115,6 +115,8 @@ static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
___dma_page_dev_to_cpu(page, off, size, dir);
}
+extern void __dma_barrier(enum dma_data_direction);
+
/*
* Return whether the given device DMA address mask can be supported
* properly. For example, if your device can only drive the low 24-bits
@@ -378,6 +380,7 @@ static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
BUG_ON(!valid_dma_direction(dir));
addr = __dma_map_single(dev, cpu_addr, size, dir);
+ __dma_barrier(dir);
debug_dma_map_page(dev, virt_to_page(cpu_addr),
(unsigned long)cpu_addr & ~PAGE_MASK, size,
dir, addr, true);
@@ -407,6 +410,7 @@ static inline dma_addr_t dma_map_page(struct device *dev, struct page *page,
BUG_ON(!valid_dma_direction(dir));
addr = __dma_map_page(dev, page, offset, size, dir);
+ __dma_barrier(dir);
debug_dma_map_page(dev, page, offset, size, dir, addr, false);
return addr;
@@ -431,6 +435,7 @@ static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
{
debug_dma_unmap_page(dev, handle, size, dir, true);
__dma_unmap_single(dev, handle, size, dir);
+ __dma_barrier(dir);
}
/**
@@ -452,6 +457,7 @@ static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
{
debug_dma_unmap_page(dev, handle, size, dir, false);
__dma_unmap_page(dev, handle, size, dir);
+ __dma_barrier(dir);
}
/**
@@ -484,6 +490,7 @@ static inline void dma_sync_single_range_for_cpu(struct device *dev,
return;
__dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir);
+ __dma_barrier(dir);
}
static inline void dma_sync_single_range_for_device(struct device *dev,
@@ -498,6 +505,7 @@ static inline void dma_sync_single_range_for_device(struct device *dev,
return;
__dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir);
+ __dma_barrier(dir);
}
static inline void dma_sync_single_for_cpu(struct device *dev,
diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S
index 7148e53..cdcfae2 100644
--- a/arch/arm/mm/cache-fa.S
+++ b/arch/arm/mm/cache-fa.S
@@ -179,8 +179,6 @@ fa_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mov r0, #0
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -197,8 +195,6 @@ fa_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mov r0, #0
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -212,8 +208,6 @@ ENTRY(fa_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mov r0, #0
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -240,6 +234,12 @@ ENTRY(fa_dma_unmap_area)
mov pc, lr
ENDPROC(fa_dma_unmap_area)
+ENTRY(fa_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
+ mov pc, lr
+ENDPROC(fa_dma_barrier)
+
__INITDATA
.type fa_cache_fns, #object
@@ -253,5 +253,6 @@ ENTRY(fa_cache_fns)
.long fa_flush_kern_dcache_area
.long fa_dma_map_area
.long fa_dma_unmap_area
+ .long fa_dma_barrier
.long fa_dma_flush_range
.size fa_cache_fns, . - fa_cache_fns
diff --git a/arch/arm/mm/cache-v3.S b/arch/arm/mm/cache-v3.S
index c2ff3c5..df34458 100644
--- a/arch/arm/mm/cache-v3.S
+++ b/arch/arm/mm/cache-v3.S
@@ -123,9 +123,11 @@ ENTRY(v3_dma_unmap_area)
* - dir - DMA direction
*/
ENTRY(v3_dma_map_area)
+ENTRY(v3_dma_barrier)
mov pc, lr
ENDPROC(v3_dma_unmap_area)
ENDPROC(v3_dma_map_area)
+ENDPROC(v3_dma_barrier)
__INITDATA
@@ -140,5 +142,6 @@ ENTRY(v3_cache_fns)
.long v3_flush_kern_dcache_area
.long v3_dma_map_area
.long v3_dma_unmap_area
+ .long v3_dma_barrier
.long v3_dma_flush_range
.size v3_cache_fns, . - v3_cache_fns
diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S
index 4810f7e..20260b1 100644
--- a/arch/arm/mm/cache-v4.S
+++ b/arch/arm/mm/cache-v4.S
@@ -135,9 +135,11 @@ ENTRY(v4_dma_unmap_area)
* - dir - DMA direction
*/
ENTRY(v4_dma_map_area)
+ENTRY(v4_dma_barrier)
mov pc, lr
ENDPROC(v4_dma_unmap_area)
ENDPROC(v4_dma_map_area)
+ENDPROC(v4_dma_barrier)
__INITDATA
@@ -152,5 +154,6 @@ ENTRY(v4_cache_fns)
.long v4_flush_kern_dcache_area
.long v4_dma_map_area
.long v4_dma_unmap_area
+ .long v4_dma_barrier
.long v4_dma_flush_range
.size v4_cache_fns, . - v4_cache_fns
diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S
index df8368a..9c9c875 100644
--- a/arch/arm/mm/cache-v4wb.S
+++ b/arch/arm/mm/cache-v4wb.S
@@ -194,7 +194,6 @@ v4wb_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -211,7 +210,6 @@ v4wb_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -251,6 +249,12 @@ ENTRY(v4wb_dma_unmap_area)
mov pc, lr
ENDPROC(v4wb_dma_unmap_area)
+ENTRY(v4wb_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
+ mov pc, lr
+ENDPROC(v4wb_dma_barrier)
+
__INITDATA
.type v4wb_cache_fns, #object
@@ -264,5 +268,6 @@ ENTRY(v4wb_cache_fns)
.long v4wb_flush_kern_dcache_area
.long v4wb_dma_map_area
.long v4wb_dma_unmap_area
+ .long v4wb_dma_barrier
.long v4wb_dma_flush_range
.size v4wb_cache_fns, . - v4wb_cache_fns
diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S
index 45c7031..223eea4 100644
--- a/arch/arm/mm/cache-v4wt.S
+++ b/arch/arm/mm/cache-v4wt.S
@@ -191,9 +191,11 @@ ENTRY(v4wt_dma_unmap_area)
* - dir - DMA direction
*/
ENTRY(v4wt_dma_map_area)
+ENTRY(v4wt_dma_barrier)
mov pc, lr
ENDPROC(v4wt_dma_unmap_area)
ENDPROC(v4wt_dma_map_area)
+ENDPROC(v4wt_dma_barrier)
__INITDATA
@@ -208,5 +210,6 @@ ENTRY(v4wt_cache_fns)
.long v4wt_flush_kern_dcache_area
.long v4wt_dma_map_area
.long v4wt_dma_unmap_area
+ .long v4wt_dma_barrier
.long v4wt_dma_flush_range
.size v4wt_cache_fns, . - v4wt_cache_fns
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index 9d89c67..b294854 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -238,8 +238,6 @@ v6_dma_inv_range:
strlo r2, [r0] @ write for ownership
#endif
blo 1b
- mov r0, #0
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -261,8 +259,6 @@ v6_dma_clean_range:
add r0, r0, #D_CACHE_LINE_SIZE
cmp r0, r1
blo 1b
- mov r0, #0
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -289,8 +285,6 @@ ENTRY(v6_dma_flush_range)
strlob r2, [r0] @ write for ownership
#endif
blo 1b
- mov r0, #0
- mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
mov pc, lr
/*
@@ -327,6 +321,12 @@ ENTRY(v6_dma_unmap_area)
mov pc, lr
ENDPROC(v6_dma_unmap_area)
+ENTRY(v6_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
+ mov pc, lr
+ENDPROC(v6_dma_barrier)
+
__INITDATA
.type v6_cache_fns, #object
@@ -340,5 +340,6 @@ ENTRY(v6_cache_fns)
.long v6_flush_kern_dcache_area
.long v6_dma_map_area
.long v6_dma_unmap_area
+ .long v6_dma_barrier
.long v6_dma_flush_range
.size v6_cache_fns, . - v6_cache_fns
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index bcd64f2..d89d55a 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -255,7 +255,6 @@ v7_dma_inv_range:
add r0, r0, r2
cmp r0, r1
blo 1b
- dsb
mov pc, lr
ENDPROC(v7_dma_inv_range)
@@ -273,7 +272,6 @@ v7_dma_clean_range:
add r0, r0, r2
cmp r0, r1
blo 1b
- dsb
mov pc, lr
ENDPROC(v7_dma_clean_range)
@@ -291,7 +289,6 @@ ENTRY(v7_dma_flush_range)
add r0, r0, r2
cmp r0, r1
blo 1b
- dsb
mov pc, lr
ENDPROC(v7_dma_flush_range)
@@ -321,6 +318,11 @@ ENTRY(v7_dma_unmap_area)
mov pc, lr
ENDPROC(v7_dma_unmap_area)
+ENTRY(v7_dma_barrier)
+ dsb
+ mov pc, lr
+ENDPROC(v7_dma_barrier)
+
__INITDATA
.type v7_cache_fns, #object
@@ -334,5 +336,6 @@ ENTRY(v7_cache_fns)
.long v7_flush_kern_dcache_area
.long v7_dma_map_area
.long v7_dma_unmap_area
+ .long v7_dma_barrier
.long v7_dma_flush_range
.size v7_cache_fns, . - v7_cache_fns
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 64daef2..d807f38 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -97,6 +97,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
memset(ptr, 0, size);
dmac_flush_range(ptr, ptr + size);
outer_flush_range(__pa(ptr), __pa(ptr) + size);
+ dmac_barrier();
return page;
}
@@ -542,6 +543,12 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off,
}
EXPORT_SYMBOL(___dma_page_dev_to_cpu);
+void __dma_barrier(enum dma_data_direction dir)
+{
+ dmac_barrier();
+}
+EXPORT_SYMBOL(__dma_barrier);
+
/**
* dma_map_sg - map a set of SG buffers for streaming mode DMA
* @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -572,6 +579,7 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
if (dma_mapping_error(dev, s->dma_address))
goto bad_mapping;
}
+ __dma_barrier(dir);
debug_dma_map_sg(dev, sg, nents, nents, dir);
return nents;
@@ -602,6 +610,8 @@ void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
for_each_sg(sg, s, nents, i)
__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
+
+ __dma_barrier(dir);
}
EXPORT_SYMBOL(dma_unmap_sg);
@@ -627,6 +637,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
s->length, dir);
}
+ __dma_barrier(dir);
debug_dma_sync_sg_for_cpu(dev, sg, nents, dir);
}
EXPORT_SYMBOL(dma_sync_sg_for_cpu);
@@ -653,6 +664,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
s->length, dir);
}
+ __dma_barrier(dir);
debug_dma_sync_sg_for_device(dev, sg, nents, dir);
}
EXPORT_SYMBOL(dma_sync_sg_for_device);
diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S
index d278298..fea33c9 100644
--- a/arch/arm/mm/proc-arm1020e.S
+++ b/arch/arm/mm/proc-arm1020e.S
@@ -281,7 +281,6 @@ arm1020e_dma_inv_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -303,7 +302,6 @@ arm1020e_dma_clean_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -323,7 +321,6 @@ ENTRY(arm1020e_dma_flush_range)
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -350,6 +347,12 @@ ENTRY(arm1020e_dma_unmap_area)
mov pc, lr
ENDPROC(arm1020e_dma_unmap_area)
+ENTRY(arm1020e_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm1020e_dma_barrier)
+
ENTRY(arm1020e_cache_fns)
.long arm1020e_flush_icache_all
.long arm1020e_flush_kern_cache_all
@@ -360,6 +363,7 @@ ENTRY(arm1020e_cache_fns)
.long arm1020e_flush_kern_dcache_area
.long arm1020e_dma_map_area
.long arm1020e_dma_unmap_area
+ .long arm1020e_dma_barrier
.long arm1020e_dma_flush_range
.align 5
diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S
index ce13e4a..ba1a7df 100644
--- a/arch/arm/mm/proc-arm1022.S
+++ b/arch/arm/mm/proc-arm1022.S
@@ -270,7 +270,6 @@ arm1022_dma_inv_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -292,7 +291,6 @@ arm1022_dma_clean_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -312,7 +310,6 @@ ENTRY(arm1022_dma_flush_range)
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -339,6 +336,12 @@ ENTRY(arm1022_dma_unmap_area)
mov pc, lr
ENDPROC(arm1022_dma_unmap_area)
+ENTRY(arm1022_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm1022_dma_barrier)
+
ENTRY(arm1022_cache_fns)
.long arm1022_flush_icache_all
.long arm1022_flush_kern_cache_all
@@ -349,6 +352,7 @@ ENTRY(arm1022_cache_fns)
.long arm1022_flush_kern_dcache_area
.long arm1022_dma_map_area
.long arm1022_dma_unmap_area
+ .long arm1022_dma_barrier
.long arm1022_dma_flush_range
.align 5
diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S
index 636672a..de648f1 100644
--- a/arch/arm/mm/proc-arm1026.S
+++ b/arch/arm/mm/proc-arm1026.S
@@ -264,7 +264,6 @@ arm1026_dma_inv_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -286,7 +285,6 @@ arm1026_dma_clean_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -306,7 +304,6 @@ ENTRY(arm1026_dma_flush_range)
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -333,6 +330,12 @@ ENTRY(arm1026_dma_unmap_area)
mov pc, lr
ENDPROC(arm1026_dma_unmap_area)
+ENTRY(arm1026_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm1026_dma_barrier)
+
ENTRY(arm1026_cache_fns)
.long arm1026_flush_icache_all
.long arm1026_flush_kern_cache_all
@@ -343,6 +346,7 @@ ENTRY(arm1026_cache_fns)
.long arm1026_flush_kern_dcache_area
.long arm1026_dma_map_area
.long arm1026_dma_unmap_area
+ .long arm1026_dma_barrier
.long arm1026_dma_flush_range
.align 5
diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S
index 8be8199..ec74093 100644
--- a/arch/arm/mm/proc-arm920.S
+++ b/arch/arm/mm/proc-arm920.S
@@ -252,7 +252,6 @@ arm920_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -271,7 +270,6 @@ arm920_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -288,7 +286,6 @@ ENTRY(arm920_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -315,6 +312,12 @@ ENTRY(arm920_dma_unmap_area)
mov pc, lr
ENDPROC(arm920_dma_unmap_area)
+ENTRY(arm920_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm920_dma_barrier)
+
ENTRY(arm920_cache_fns)
.long arm920_flush_icache_all
.long arm920_flush_kern_cache_all
@@ -325,6 +328,7 @@ ENTRY(arm920_cache_fns)
.long arm920_flush_kern_dcache_area
.long arm920_dma_map_area
.long arm920_dma_unmap_area
+ .long arm920_dma_barrier
.long arm920_dma_flush_range
#endif
diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S
index c0ff8e4..474d4c6 100644
--- a/arch/arm/mm/proc-arm922.S
+++ b/arch/arm/mm/proc-arm922.S
@@ -254,7 +254,6 @@ arm922_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -273,7 +272,6 @@ arm922_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -290,7 +288,6 @@ ENTRY(arm922_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -317,6 +314,12 @@ ENTRY(arm922_dma_unmap_area)
mov pc, lr
ENDPROC(arm922_dma_unmap_area)
+ENTRY(arm922_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm922_dma_barrier)
+
ENTRY(arm922_cache_fns)
.long arm922_flush_icache_all
.long arm922_flush_kern_cache_all
@@ -327,6 +330,7 @@ ENTRY(arm922_cache_fns)
.long arm922_flush_kern_dcache_area
.long arm922_dma_map_area
.long arm922_dma_unmap_area
+ .long arm922_dma_barrier
.long arm922_dma_flush_range
#endif
diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S
index 3c6cffe..0336ae3 100644
--- a/arch/arm/mm/proc-arm925.S
+++ b/arch/arm/mm/proc-arm925.S
@@ -302,7 +302,6 @@ arm925_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -323,7 +322,6 @@ arm925_dma_clean_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -345,7 +343,6 @@ ENTRY(arm925_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -372,6 +369,12 @@ ENTRY(arm925_dma_unmap_area)
mov pc, lr
ENDPROC(arm925_dma_unmap_area)
+ENTRY(arm925_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm925_dma_barrier)
+
ENTRY(arm925_cache_fns)
.long arm925_flush_icache_all
.long arm925_flush_kern_cache_all
@@ -382,6 +385,7 @@ ENTRY(arm925_cache_fns)
.long arm925_flush_kern_dcache_area
.long arm925_dma_map_area
.long arm925_dma_unmap_area
+ .long arm925_dma_barrier
.long arm925_dma_flush_range
ENTRY(cpu_arm925_dcache_clean_area)
diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S
index 75b707c..473bbe6 100644
--- a/arch/arm/mm/proc-arm926.S
+++ b/arch/arm/mm/proc-arm926.S
@@ -265,7 +265,6 @@ arm926_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -286,7 +285,6 @@ arm926_dma_clean_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -308,7 +306,6 @@ ENTRY(arm926_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -335,6 +332,12 @@ ENTRY(arm926_dma_unmap_area)
mov pc, lr
ENDPROC(arm926_dma_unmap_area)
+ENTRY(arm926_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm926_dma_barrier)
+
ENTRY(arm926_cache_fns)
.long arm926_flush_icache_all
.long arm926_flush_kern_cache_all
@@ -345,6 +348,7 @@ ENTRY(arm926_cache_fns)
.long arm926_flush_kern_dcache_area
.long arm926_dma_map_area
.long arm926_dma_unmap_area
+ .long arm926_dma_barrier
.long arm926_dma_flush_range
ENTRY(cpu_arm926_dcache_clean_area)
diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S
index 1af1657..c44c963 100644
--- a/arch/arm/mm/proc-arm940.S
+++ b/arch/arm/mm/proc-arm940.S
@@ -187,7 +187,6 @@ arm940_dma_inv_range:
bcs 2b @ entries 63 to 0
subs r1, r1, #1 << 4
bcs 1b @ segments 7 to 0
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -211,7 +210,6 @@ ENTRY(cpu_arm940_dcache_clean_area)
subs r1, r1, #1 << 4
bcs 1b @ segments 7 to 0
#endif
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -237,7 +235,6 @@ ENTRY(arm940_dma_flush_range)
bcs 2b @ entries 63 to 0
subs r1, r1, #1 << 4
bcs 1b @ segments 7 to 0
- mcr p15, 0, ip, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -264,6 +261,12 @@ ENTRY(arm940_dma_unmap_area)
mov pc, lr
ENDPROC(arm940_dma_unmap_area)
+ENTRY(arm940_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, ip, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm940_dma_barrier)
+
ENTRY(arm940_cache_fns)
.long arm940_flush_icache_all
.long arm940_flush_kern_cache_all
@@ -274,6 +277,7 @@ ENTRY(arm940_cache_fns)
.long arm940_flush_kern_dcache_area
.long arm940_dma_map_area
.long arm940_dma_unmap_area
+ .long arm940_dma_barrier
.long arm940_dma_flush_range
__CPUINIT
diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S
index 1664b6a..11e9ad7 100644
--- a/arch/arm/mm/proc-arm946.S
+++ b/arch/arm/mm/proc-arm946.S
@@ -234,7 +234,6 @@ arm946_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -255,7 +254,6 @@ arm946_dma_clean_range:
cmp r0, r1
blo 1b
#endif
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -279,7 +277,6 @@ ENTRY(arm946_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -306,6 +303,12 @@ ENTRY(arm946_dma_unmap_area)
mov pc, lr
ENDPROC(arm946_dma_unmap_area)
+ENTRY(arm946_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(arm946_dma_barrier)
+
ENTRY(arm946_cache_fns)
.long arm946_flush_icache_all
.long arm946_flush_kern_cache_all
@@ -316,6 +319,7 @@ ENTRY(arm946_cache_fns)
.long arm946_flush_kern_dcache_area
.long arm946_dma_map_area
.long arm946_dma_unmap_area
+ .long arm946_dma_barrier
.long arm946_dma_flush_range
diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S
index 53e6323..50a309e 100644
--- a/arch/arm/mm/proc-feroceon.S
+++ b/arch/arm/mm/proc-feroceon.S
@@ -290,7 +290,6 @@ feroceon_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
.align 5
@@ -326,7 +325,6 @@ feroceon_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
.align 5
@@ -339,7 +337,6 @@ feroceon_range_dma_clean_range:
mcr p15, 5, r0, c15, c13, 0 @ D clean range start
mcr p15, 5, r1, c15, c13, 1 @ D clean range top
msr cpsr_c, r2 @ restore interrupts
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -357,7 +354,6 @@ ENTRY(feroceon_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
.align 5
@@ -370,7 +366,6 @@ ENTRY(feroceon_range_dma_flush_range)
mcr p15, 5, r0, c15, c15, 0 @ D clean/inv range start
mcr p15, 5, r1, c15, c15, 1 @ D clean/inv range top
msr cpsr_c, r2 @ restore interrupts
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -411,6 +406,12 @@ ENTRY(feroceon_dma_unmap_area)
mov pc, lr
ENDPROC(feroceon_dma_unmap_area)
+ENTRY(feroceon_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(feroceon_dma_barrier)
+
ENTRY(feroceon_cache_fns)
.long feroceon_flush_icache_all
.long feroceon_flush_kern_cache_all
@@ -421,6 +422,7 @@ ENTRY(feroceon_cache_fns)
.long feroceon_flush_kern_dcache_area
.long feroceon_dma_map_area
.long feroceon_dma_unmap_area
+ .long feroceon_dma_barrier
.long feroceon_dma_flush_range
ENTRY(feroceon_range_cache_fns)
@@ -433,6 +435,7 @@ ENTRY(feroceon_range_cache_fns)
.long feroceon_range_flush_kern_dcache_area
.long feroceon_range_dma_map_area
.long feroceon_dma_unmap_area
+ .long feroceon_dma_barrier
.long feroceon_range_dma_flush_range
.align 5
diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S
index caa3115..09e8883 100644
--- a/arch/arm/mm/proc-mohawk.S
+++ b/arch/arm/mm/proc-mohawk.S
@@ -224,7 +224,6 @@ mohawk_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -243,7 +242,6 @@ mohawk_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -261,7 +259,6 @@ ENTRY(mohawk_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ drain WB
mov pc, lr
/*
@@ -288,6 +285,12 @@ ENTRY(mohawk_dma_unmap_area)
mov pc, lr
ENDPROC(mohawk_dma_unmap_area)
+ENTRY(mohawk_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ drain WB
+ mov pc, lr
+ENDPROC(mohawk_dma_barrier)
+
ENTRY(mohawk_cache_fns)
.long mohawk_flush_kern_cache_all
.long mohawk_flush_user_cache_all
@@ -297,6 +300,7 @@ ENTRY(mohawk_cache_fns)
.long mohawk_flush_kern_dcache_area
.long mohawk_dma_map_area
.long mohawk_dma_unmap_area
+ .long mohawk_dma_barrier
.long mohawk_dma_flush_range
ENTRY(cpu_mohawk_dcache_clean_area)
diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S
index 046b3d8..d033ed4 100644
--- a/arch/arm/mm/proc-xsc3.S
+++ b/arch/arm/mm/proc-xsc3.S
@@ -274,7 +274,6 @@ xsc3_dma_inv_range:
add r0, r0, #CACHELINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ data write barrier
mov pc, lr
/*
@@ -291,7 +290,6 @@ xsc3_dma_clean_range:
add r0, r0, #CACHELINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ data write barrier
mov pc, lr
/*
@@ -308,7 +306,6 @@ ENTRY(xsc3_dma_flush_range)
add r0, r0, #CACHELINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ data write barrier
mov pc, lr
/*
@@ -335,6 +332,12 @@ ENTRY(xsc3_dma_unmap_area)
mov pc, lr
ENDPROC(xsc3_dma_unmap_area)
+ENTRY(xsc3_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ data write barrier
+ mov pc, lr
+ENDPROC(xsc3_dma_barrier)
+
ENTRY(xsc3_cache_fns)
.long xsc3_flush_icache_all
.long xsc3_flush_kern_cache_all
@@ -345,6 +348,7 @@ ENTRY(xsc3_cache_fns)
.long xsc3_flush_kern_dcache_area
.long xsc3_dma_map_area
.long xsc3_dma_unmap_area
+ .long xsc3_dma_barrier
.long xsc3_dma_flush_range
ENTRY(cpu_xsc3_dcache_clean_area)
diff --git a/arch/arm/mm/proc-xscale.S b/arch/arm/mm/proc-xscale.S
index 63037e2..e390ae6 100644
--- a/arch/arm/mm/proc-xscale.S
+++ b/arch/arm/mm/proc-xscale.S
@@ -332,7 +332,6 @@ xscale_dma_inv_range:
add r0, r0, #CACHELINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ Drain Write (& Fill) Buffer
mov pc, lr
/*
@@ -349,7 +348,6 @@ xscale_dma_clean_range:
add r0, r0, #CACHELINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ Drain Write (& Fill) Buffer
mov pc, lr
/*
@@ -367,7 +365,6 @@ ENTRY(xscale_dma_flush_range)
add r0, r0, #CACHELINESIZE
cmp r0, r1
blo 1b
- mcr p15, 0, r0, c7, c10, 4 @ Drain Write (& Fill) Buffer
mov pc, lr
/*
@@ -407,6 +404,12 @@ ENTRY(xscale_dma_unmap_area)
mov pc, lr
ENDPROC(xscale_dma_unmap_area)
+ENTRY(xscale_dma_barrier)
+ mov r0, #0
+ mcr p15, 0, r0, c7, c10, 4 @ Drain Write (& Fill) Buffer
+ mov pc, lr
+ENDPROC(xscsale_dma_barrier)
+
ENTRY(xscale_cache_fns)
.long xscale_flush_icache_all
.long xscale_flush_kern_cache_all
@@ -417,6 +420,7 @@ ENTRY(xscale_cache_fns)
.long xscale_flush_kern_dcache_area
.long xscale_dma_map_area
.long xscale_dma_unmap_area
+ .long xscale_dma_barrier
.long xscale_dma_flush_range
/*
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] mmc: add double buffering for mmc block requests
2011-02-05 17:02 ` Russell King - ARM Linux
@ 2011-02-05 20:36 ` Russell King - ARM Linux
-1 siblings, 0 replies; 27+ messages in thread
From: Russell King - ARM Linux @ 2011-02-05 20:36 UTC (permalink / raw)
To: Per Forlin, Catalin Marinas
Cc: Chris Ball, linux-mmc, linux-kernel, linux-arm-kernel, dev
On Sat, Feb 05, 2011 at 05:02:55PM +0000, Russell King - ARM Linux wrote:
> On Wed, Jan 12, 2011 at 07:13:58PM +0100, Per Forlin wrote:
> > Add support to prepare one MMC request while another is active on
> > the host. This is done by making the issue_rw_rq() asynchronous.
> > The increase in throughput is proportional to the time it takes to
> > prepare a request and how fast the memory is. The faster the MMC/SD is
> > the more significant the prepare request time becomes. Measurements on U5500
> > and U8500 on eMMC shows significant performance gain for DMA on MMC for large
> > reads. In the PIO case there is some gain in performance for large reads too.
> > There seems to be no or small performance gain for write, don't have a good
> > explanation for this yet.
>
> It might be worth seeing what effect the following patch has. This
> moves the dsb out of the cache operations into a separate function,
> so we only do one dsb per DMA mapping/unmapping operation. That's
> particularly significant for the scattergather code.
>
> I don't remember the reason why this was dropped as a candidate for
> merging - could that be because the dsb needs to be before the outer
> cache maintainence? Adding Catalin for comment on that.
FWIW, trying this with MMC on OMAP4, I see no measurable difference in
performance nor CPU usage.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 0/5] mmc: add double buffering for mmc block requests
@ 2011-02-05 20:36 ` Russell King - ARM Linux
0 siblings, 0 replies; 27+ messages in thread
From: Russell King - ARM Linux @ 2011-02-05 20:36 UTC (permalink / raw)
To: linux-arm-kernel
On Sat, Feb 05, 2011 at 05:02:55PM +0000, Russell King - ARM Linux wrote:
> On Wed, Jan 12, 2011 at 07:13:58PM +0100, Per Forlin wrote:
> > Add support to prepare one MMC request while another is active on
> > the host. This is done by making the issue_rw_rq() asynchronous.
> > The increase in throughput is proportional to the time it takes to
> > prepare a request and how fast the memory is. The faster the MMC/SD is
> > the more significant the prepare request time becomes. Measurements on U5500
> > and U8500 on eMMC shows significant performance gain for DMA on MMC for large
> > reads. In the PIO case there is some gain in performance for large reads too.
> > There seems to be no or small performance gain for write, don't have a good
> > explanation for this yet.
>
> It might be worth seeing what effect the following patch has. This
> moves the dsb out of the cache operations into a separate function,
> so we only do one dsb per DMA mapping/unmapping operation. That's
> particularly significant for the scattergather code.
>
> I don't remember the reason why this was dropped as a candidate for
> merging - could that be because the dsb needs to be before the outer
> cache maintainence? Adding Catalin for comment on that.
FWIW, trying this with MMC on OMAP4, I see no measurable difference in
performance nor CPU usage.
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2011-02-05 20:36 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-12 18:13 [PATCH 0/5] mmc: add double buffering for mmc block requests Per Forlin
2011-01-12 18:13 ` Per Forlin
2011-01-12 18:13 ` [PATCH 1/5] mmc: add member in mmc queue struct to hold request data Per Forlin
2011-01-12 18:13 ` Per Forlin
2011-01-12 18:14 ` [PATCH 2/5] mmc: Add a block request prepare function Per Forlin
2011-01-12 18:14 ` Per Forlin
2011-01-12 18:14 ` [PATCH 3/5] mmc: Add a second mmc queue request member Per Forlin
2011-01-12 18:14 ` Per Forlin
2011-01-12 18:14 ` [PATCH 4/5] mmc: Store the mmc block request struct in mmc queue Per Forlin
2011-01-12 18:14 ` Per Forlin
2011-01-12 18:14 ` [PATCH 5/5] mmc: Add double buffering for mmc block requests Per Forlin
2011-01-12 18:14 ` Per Forlin
2011-01-12 18:24 ` [PATCH 0/5] mmc: add " Per Forlin
2011-01-12 18:24 ` Per Forlin
2011-01-18 2:35 ` Jaehoon Chung
2011-01-18 2:35 ` Jaehoon Chung
2011-01-18 8:12 ` Per Forlin
2011-01-18 8:12 ` Per Forlin
2011-01-28 8:28 ` Per Forlin
2011-01-28 8:28 ` Per Forlin
2011-01-28 8:28 ` Per Forlin
2011-01-30 8:23 ` Jaehoon Chung
2011-01-30 8:23 ` Jaehoon Chung
2011-02-05 17:02 ` Russell King - ARM Linux
2011-02-05 17:02 ` Russell King - ARM Linux
2011-02-05 20:36 ` Russell King - ARM Linux
2011-02-05 20:36 ` Russell King - ARM Linux
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.