From: Linus Walleij <linus.walleij@linaro.org>
To: linux-mmc@vger.kernel.org, Ulf Hansson <ulf.hansson@linaro.org>,
Adrian Hunter <adrian.hunter@intel.com>,
Paolo Valente <paolo.valente@linaro.org>
Cc: Chunyan Zhang <zhang.chunyan@linaro.org>,
Baolin Wang <baolin.wang@linaro.org>,
linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
Christoph Hellwig <hch@lst.de>, Arnd Bergmann <arnd@arndb.de>,
Linus Walleij <linus.walleij@linaro.org>
Subject: [PATCH 00/16] multiqueue for MMC/SD third try
Date: Thu, 9 Feb 2017 16:33:47 +0100 [thread overview]
Message-ID: <20170209153403.9730-1-linus.walleij@linaro.org> (raw)
The following is the latest attempt at a rewriting the MMC/SD
stack to cope with multiqueueing.
If you just want to grab a branch and test the patches with
your hardware, I put a git branch with this series here:
https://git.kernel.org/cgit/linux/kernel/git/linusw/linux-stericsson.git/log/?h=mmc-mq-next-2017-02-09
It's based on Ulf's v4.10-rc3-based tree, so quick reminder:
git checkout -b test v4.10-rc3
git pull git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson.git mmc-mq-next-2017-02-09
Should get you a testable "test" branch.
These patches are clearly v4.12 material. They get increasinly
controversial and needing review the further into the series
you go. The last patch for multiqueue is marked RFC for a
reason.
Every time I do this it seems to be an extensive rewrite of the
whole world. Anyways this is based on the other ~16 patches that
were already merged for the upcoming v4.11.
The rationale for this approach was Arnd's suggestion to try to
switch the MMC/SD stack around so as to complete requests as
quickly as possible from the device driver so that new requests
can be issued. We are doing this now: the polling loop that was
pulling NULL out of the request queue and driving the pipeline
with a loop is gone.
We are not issueing new requests from interrupt context: I still
have to post a work for it. I don't know if that is possible.
There is the retune and background operations that need to be
checked after every command and yeah, it needs to happen in
blocking context as far as I know.
We have parallelism in pre/post hooks also with multiqueue.
All asynchronous optimization that was there for the old block layer
is now also there for multiqueue. There is even a new interesting
optimization that make bounce buffers be bounced asynchronously
with this change.
We still use the trick to set the queue depth to 2 to get two
parallel requests pushed down to the host.
Adrian: I know I made quite extensive violence on your queueue
handling reusing it in a way that is probably totally counter to
your command queueing patch series. I'm sorry. I guess you can
see where it is going if you follow the series. I also killed the
host context, right off, after reducing the synchronization needs
to zero. I hope you will be interested in the result though!
Does this perform? The numbers follow. I will discuss my
conclusions after the figures. All the tests are done on a cold
booted Ux500 system.
Before this patch series, based on my earlier cleanups
and refactorings on Ulf's next branch ending with
"mmc: core: start to break apart mmc_start_areq()":
time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 45.126404 seconds, 22.7MB/s
real 0m 45.13s
user 0m 0.02s
sys 0m 7.60s
mount /dev/mmcblk0p1 /mnt/
cd /mnt/
time find . > /dev/null
real 0m 3.61s
user 0m 0.30s
sys 0m 1.56s
Command line used: iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
Output is in kBytes/sec
random random
kB reclen write rewrite read reread read write
20480 4 2046 2114 5981 6008 5971 40
20480 8 4825 4622 9104 9118 9070 81
20480 16 5767 5929 12250 12253 12209 166
20480 32 6242 6303 14920 14917 14879 337
20480 64 6598 5907 16758 16760 16739 695
20480 128 6807 6837 17863 17869 17788 1387
20480 256 6922 6925 18497 18490 18482 3076
20480 512 7273 7313 18636 18407 18829 7344
20480 1024 7339 7332 17695 18785 18472 7441
20480 2048 7419 7471 19166 18812 18797 7474
20480 4096 7598 7714 21006 20975 21180 7708
20480 8192 7632 7830 22328 22315 22201 7828
20480 16384 7412 7903 23070 23046 22849 7913
With "mmc: core: move the asynchronous post-processing"
time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 52.166992 seconds, 19.6MB/s
real 0m 52.17s
user 0m 0.01s
sys 0m 6.96s
mount /dev/mmcblk0p1 /mnt/
cd /mnt/
time find . > /dev/null
real 0m 3.88s
user 0m 0.35s
sys 0m 1.60s
Command line used: iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
Output is in kBytes/sec
random random
kB reclen write rewrite read reread read write
20480 4 2072 2200 6030 6066 6005 40
20480 8 4847 5106 9174 9178 9123 81
20480 16 5791 5934 12301 12299 12260 166
20480 32 6252 6311 14906 14943 14919 337
20480 64 6607 6699 16776 16787 16756 690
20480 128 6836 6880 17868 17880 17873 1419
20480 256 6967 6955 18442 17112 18490 3072
20480 512 7320 7359 18818 18738 18477 7310
20480 1024 7350 7426 18297 18551 18357 7429
20480 2048 7439 7476 18035 19111 17670 7486
20480 4096 7655 7728 19688 19557 19758 7738
20480 8192 7640 7848 20675 20718 20787 7823
20480 16384 7489 7934 21225 21186 21555 7943
With "mmc: queue: issue requests in massive parallel"
time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 49.308167 seconds, 20.8MB/s
real 0m 49.31s
user 0m 0.00s
sys 0m 7.11s
mount /dev/mmcblk0p1 /mnt/
cd /mnt/
time find . > /dev/null
real 0m 3.70s
user 0m 0.19s
sys 0m 1.73s
Command line used: iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
Output is in kBytes/sec
random random
kB reclen write rewrite read reread read write
20480 4 1709 1761 5963 5321 5909 40
20480 8 4736 5059 9089 9092 9055 81
20480 16 5772 5928 12217 12229 12184 165
20480 32 6237 6279 14898 14899 14875 336
20480 64 6599 6663 16759 16760 16741 683
20480 128 6804 6790 17869 17869 17864 1393
20480 256 6863 6883 18485 18488 18501 3105
20480 512 7223 7249 18807 18810 18812 7259
20480 1024 7311 7321 18684 18467 18201 7328
20480 2048 7405 7457 18560 18044 18343 7451
20480 4096 7596 7684 20742 21154 21153 7711
20480 8192 7593 7802 21743 21721 22090 7804
20480 16384 7399 7873 21539 22670 22828 7876
With "RFC: mmc: switch MMC/SD to use blk-mq multiqueueing v3"
time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 46.240479 seconds, 22.1MB/s
real 0m 46.25s
user 0m 0.03s
sys 0m 6.42s
mount /dev/mmcblk0p1 /mnt/
cd /mnt/
time find . > /dev/null
real 0m 4.13s
user 0m 0.40s
sys 0m 1.64s
Command line used: iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
Output is in kBytes/sec
random random
kB reclen write rewrite read reread read write
20480 4 1786 1806 6055 6061 5360 40
20480 8 4849 5088 9167 9175 9120 81
20480 16 5807 5975 12273 12256 12240 166
20480 32 6275 6317 14929 14931 14905 338
20480 64 6629 6708 16755 16783 16758 688
20480 128 6856 6884 17890 17804 17873 1420
20480 256 6927 6946 18104 17826 18389 3038
20480 512 7296 7280 18720 18752 18819 7284
20480 1024 7286 7415 18583 18598 18516 7403
20480 2048 7435 7470 18378 18268 18682 7471
20480 4096 7670 7786 21364 21275 20761 7766
20480 8192 7637 7868 22193 21994 22100 7850
20480 16384 7416 7921 23050 23051 22726 7955
The iozone results seem a bit consistent and all values seem to
be noisy and not say much. I don't know why really, maybe the test
is simply not relevant, the tests don't seem to be significantly
affected by any of the patches, so let's focus on the dd and find
tests.
You can see there are three steps:
- I do some necessary refactoring and need to move postprocessing
to after the requests have been completed. This clearly, as you
can see, introduce a performance regression in the dd test with
the patch:
"mmc: core: move the asynchronous post-processing"
It seems the random seek with find isn't much affected.
- I continue the refactoring and get to the point of issueing
requests immediately after every successful transfer, and the
dd performance is restored with patch
"mmc: queue: issue requests in massive parallel"
- Then I add multiqueue on top of the cake. So before the change
we have the nice performance we want so we can study the effect
of just introducing multiqueueing in the last patch
"RFC: mmc: switch MMC/SD to use blk-mq multiqueueing v3"
What immediately jumps out at you is that linear read/writes
perform just as nicely or actually better with MQ than with the
old block layer.
What is amazing is that just a little randomness, such as the
find . > /dev/null immediately seems to visibly regress with MQ.
My best guess is that it is caused by the absence of the block
scheduler.
I do not know if my conclusions are right or anything, please
scrutinize.
Linus Walleij (16):
mmc: core: move some code in mmc_start_areq()
mmc: core: refactor asynchronous request finalization
mmc: core: refactor mmc_request_done()
mmc: core: move the asynchronous post-processing
mmc: core: add a kthread for completing requests
mmc: core: replace waitqueue with worker
mmc: core: do away with is_done_rcv
mmc: core: do away with is_new_req
mmc: core: kill off the context info
mmc: queue: simplify queue logic
mmc: block: shuffle retry and error handling
mmc: queue: stop flushing the pipeline with NULL
mmc: queue: issue struct mmc_queue_req items
mmc: queue: get/put struct mmc_queue_req
mmc: queue: issue requests in massive parallel
RFC: mmc: switch MMC/SD to use blk-mq multiqueueing v3
drivers/mmc/core/block.c | 426 +++++++++++++++++++++++------------------------
drivers/mmc/core/block.h | 10 +-
drivers/mmc/core/bus.c | 1 -
drivers/mmc/core/core.c | 228 ++++++++++++-------------
drivers/mmc/core/core.h | 2 -
drivers/mmc/core/host.c | 2 +-
drivers/mmc/core/queue.c | 337 ++++++++++++++-----------------------
drivers/mmc/core/queue.h | 21 ++-
include/linux/mmc/core.h | 9 +-
include/linux/mmc/host.h | 24 +--
10 files changed, 481 insertions(+), 579 deletions(-)
--
2.9.3
next reply other threads:[~2017-02-09 15:33 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-09 15:33 Linus Walleij [this message]
2017-02-09 15:33 ` [PATCH 01/16] mmc: core: move some code in mmc_start_areq() Linus Walleij
[not found] ` <CGME20170228145506epcas1p1dd72cc5738c3f36df97ac06603ad2731@epcas1p1.samsung.com>
2017-02-28 14:55 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:33 ` [PATCH 02/16] mmc: core: refactor asynchronous request finalization Linus Walleij
[not found] ` <CGME20170228145552epcas5p4a43c23971d58b30ad6ab9d2c612abe9a@epcas5p4.samsung.com>
2017-02-28 14:55 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:33 ` [PATCH 03/16] mmc: core: refactor mmc_request_done() Linus Walleij
[not found] ` <CGME20170228145627epcas1p18fb6390b7ae14a6961fac9703712e0a0@epcas1p1.samsung.com>
2017-02-28 14:56 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:33 ` [PATCH 04/16] mmc: core: move the asynchronous post-processing Linus Walleij
2017-02-09 15:33 ` [PATCH 05/16] mmc: core: add a kthread for completing requests Linus Walleij
[not found] ` <CGME20170228145719epcas5p33d013fd48483bfba477b3f607dcdccb4@epcas5p3.samsung.com>
2017-02-28 14:57 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:33 ` [PATCH 06/16] mmc: core: replace waitqueue with worker Linus Walleij
2017-02-22 13:29 ` Adrian Hunter
2017-03-09 22:49 ` Linus Walleij
2017-03-10 14:21 ` Adrian Hunter
2017-03-10 22:05 ` Jens Axboe
2017-03-13 9:25 ` Adrian Hunter
2017-03-13 14:19 ` Jens Axboe
2017-03-14 12:59 ` Adrian Hunter
2017-03-14 14:36 ` Jens Axboe
2017-03-14 14:43 ` Christoph Hellwig
2017-03-14 14:52 ` Jens Axboe
2017-03-28 7:47 ` Linus Walleij
2017-03-28 7:46 ` Linus Walleij
[not found] ` <CGME20170228161023epcas5p3916c2e171d57b8c7814be7841fbab3aa@epcas5p3.samsung.com>
2017-02-28 16:10 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:33 ` [PATCH 07/16] mmc: core: do away with is_done_rcv Linus Walleij
[not found] ` <CGME20170228161047epcas1p2f307733cb1c441d0c290e794a04a06a8@epcas1p2.samsung.com>
2017-02-28 16:10 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:33 ` [PATCH 08/16] mmc: core: do away with is_new_req Linus Walleij
[not found] ` <CGME20170228161102epcas5p25dc3b560013599fda6cc750f6d528595@epcas5p2.samsung.com>
2017-02-28 16:11 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:33 ` [PATCH 09/16] mmc: core: kill off the context info Linus Walleij
[not found] ` <CGME20170228161117epcas5p20a6e62146733466b98c0ef4ea6efbb5f@epcas5p2.samsung.com>
2017-02-28 16:11 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:33 ` [PATCH 10/16] mmc: queue: simplify queue logic Linus Walleij
[not found] ` <CGME20170228161132epcas5p265793e8675aa2f1e5dd199a9ee0ab6f1@epcas5p2.samsung.com>
2017-02-28 16:11 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:33 ` [PATCH 11/16] mmc: block: shuffle retry and error handling Linus Walleij
[not found] ` <CGME20170228174522epcas5p34dce6477eb96f7e0fb38431c4de35f60@epcas5p3.samsung.com>
2017-02-28 17:45 ` Bartlomiej Zolnierkiewicz
[not found] ` <CGME20170301114559epcas5p1a0c32fbc3a5573a6f1c6291792ea1b2e@epcas5p1.samsung.com>
2017-03-01 11:45 ` Bartlomiej Zolnierkiewicz
[not found] ` <CGME20170301155243epcas1p1140ce11db60b31065a0356525a2ee0a0@epcas1p1.samsung.com>
2017-03-01 15:52 ` Bartlomiej Zolnierkiewicz
[not found] ` <CGME20170301155822epcas5p103373c6afbd516e4792ebef9bb202b94@epcas5p1.samsung.com>
2017-03-01 15:58 ` Bartlomiej Zolnierkiewicz
[not found] ` <CGME20170301174856epcas5p16bdf861a0117a33f9dad37a81449a95e@epcas5p1.samsung.com>
2017-03-01 17:48 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:33 ` [PATCH 12/16] mmc: queue: stop flushing the pipeline with NULL Linus Walleij
[not found] ` <CGME20170228180309epcas5p317af83f41d3b0426868dcfd660bd0aec@epcas5p3.samsung.com>
2017-02-28 18:03 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:34 ` [PATCH 13/16] mmc: queue: issue struct mmc_queue_req items Linus Walleij
[not found] ` <CGME20170228181009epcas1p4ca0e714214097d07d7172182ba8e032b@epcas1p4.samsung.com>
2017-02-28 18:10 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:34 ` [PATCH 14/16] mmc: queue: get/put struct mmc_queue_req Linus Walleij
[not found] ` <CGME20170228182149epcas1p28789bce5433cee1579e8b8d083ba5811@epcas1p2.samsung.com>
2017-02-28 18:21 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:34 ` [PATCH 15/16] mmc: queue: issue requests in massive parallel Linus Walleij
[not found] ` <CGME20170301120247epcas1p1ad2be24dc9bbd1218dab8f565fb82b27@epcas1p1.samsung.com>
2017-03-01 12:02 ` Bartlomiej Zolnierkiewicz
2017-02-09 15:34 ` [PATCH 16/16] RFC: mmc: switch MMC/SD to use blk-mq multiqueueing v3 Linus Walleij
2017-02-09 15:39 ` [PATCH 00/16] multiqueue for MMC/SD third try Christoph Hellwig
2017-02-11 13:03 ` Avri Altman
2017-02-11 13:03 ` Avri Altman
2017-02-12 16:16 ` Linus Walleij
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170209153403.9730-1-linus.walleij@linaro.org \
--to=linus.walleij@linaro.org \
--cc=adrian.hunter@intel.com \
--cc=arnd@arndb.de \
--cc=axboe@kernel.dk \
--cc=baolin.wang@linaro.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-mmc@vger.kernel.org \
--cc=paolo.valente@linaro.org \
--cc=ulf.hansson@linaro.org \
--cc=zhang.chunyan@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.