* [RESEND] [PATCH] readahead:add blk_run_backing_dev
@ 2009-05-29 5:35 Hisashi Hifumi
2009-06-01 0:36 ` Andrew Morton
2009-09-22 20:58 ` Andrew Morton
0 siblings, 2 replies; 65+ messages in thread
From: Hisashi Hifumi @ 2009-05-29 5:35 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel
Hi Andrew.
I added blk_run_backing_dev on page_cache_async_readahead
so readahead I/O is unpluged to improve throughput on
especially RAID environment.
The normal case is, if page N become uptodate at time T(N), then
T(N) <= T(N+1) holds. With RAID (and NFS to some degree), there
is no strict ordering, the data arrival time depends on
runtime status of individual disks, which breaks that formula. So
in do_generic_file_read(), just after submitting the async readahead IO
request, the current page may well be uptodate, so the page won't be locked,
and the block device won't be implicitly unplugged:
if (PageReadahead(page))
page_cache_async_readahead()
if (!PageUptodate(page))
goto page_not_up_to_date;
//...
page_not_up_to_date:
lock_page_killable(page);
Therefore explicit unplugging can help.
Following is the test result with dd.
#dd if=testdir/testfile of=/dev/null bs=16384
-2.6.30-rc6
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s
-2.6.30-rc6-patched
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s
(7Disks RAID-0 Array)
-2.6.30-rc6
1054976+0 records in
1054976+0 records out
17284726784 bytes (17 GB) copied, 212.233 seconds, 81.4 MB/s
-2.6.30-rc6-patched
1054976+0 records out
17284726784 bytes (17 GB) copied, 198.878 seconds, 86.9 MB/s
(7Disks RAID-5 Array)
Thanks.
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
mm/readahead.c | 10 ++++++++++
1 file changed, 10 insertions(+)
--- linux.orig/mm/readahead.c
+++ linux/mm/readahead.c
@@ -490,5 +490,15 @@ page_cache_async_readahead(struct addres
/* do read-ahead */
ondemand_readahead(mapping, ra, filp, true, offset, req_size);
+
+ /*
+ * Normally the current page is !uptodate and lock_page() will be
+ * immediately called to implicitly unplug the device. However this
+ * is not always true for RAID conifgurations, where data arrives
+ * not strictly in their submission order. In this case we need to
+ * explicitly kick off the IO.
+ */
+ if (PageUptodate(page))
+ blk_run_backing_dev(mapping->backing_dev_info, NULL);
}
EXPORT_SYMBOL_GPL(page_cache_async_readahead);
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-05-29 5:35 [RESEND] [PATCH] readahead:add blk_run_backing_dev Hisashi Hifumi
@ 2009-06-01 0:36 ` Andrew Morton
2009-06-01 1:04 ` Hisashi Hifumi
2009-09-22 20:58 ` Andrew Morton
1 sibling, 1 reply; 65+ messages in thread
From: Andrew Morton @ 2009-06-01 0:36 UTC (permalink / raw)
To: Hisashi Hifumi; +Cc: linux-kernel, linux-fsdevel
On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> wrote:
> I added blk_run_backing_dev on page_cache_async_readahead
> so readahead I/O is unpluged to improve throughput on
> especially RAID environment.
I skipped the last version of this because KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> said "Please attach blktrace analysis ;)".
I'm not sure why he asked for that, but he's a smart chap and
presumably had his reasons.
If you think that such an analysis is unneeded, or isn't worth the time
to generate then please tell us that. But please don't just ignore the
request!
Thanks.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-01 0:36 ` Andrew Morton
@ 2009-06-01 1:04 ` Hisashi Hifumi
2009-06-05 15:15 ` Alan D. Brunelle
0 siblings, 1 reply; 65+ messages in thread
From: Hisashi Hifumi @ 2009-06-01 1:04 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel
At 09:36 09/06/01, Andrew Morton wrote:
>On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi
><hifumi.hisashi@oss.ntt.co.jp> wrote:
>
>> I added blk_run_backing_dev on page_cache_async_readahead
>> so readahead I/O is unpluged to improve throughput on
>> especially RAID environment.
>
>I skipped the last version of this because KOSAKI Motohiro
><kosaki.motohiro@jp.fujitsu.com> said "Please attach blktrace analysis ;)".
>
>I'm not sure why he asked for that, but he's a smart chap and
>presumably had his reasons.
>
>If you think that such an analysis is unneeded, or isn't worth the time
>to generate then please tell us that. But please don't just ignore the
>request!
Hi Andrew.
Sorry for this.
I did not ignore KOSAKI Motohiro's request.
I've got blktrace output for both with and without the patch,
but I just did not clarify the reason for throuput improvement
from this result.
I do not notice any difference except around unplug behavior by dd.
Comments?
-2.6.30-rc6
8,0 3 177784 50.001437357 0 C R 8717567 + 512 [0]
8,0 3 177785 50.001635405 4148 A R 8718079 + 256 <- (8,1) 8718016
8,0 3 177786 50.001635675 4148 Q R 8718079 + 256 [dd]
8,0 3 177787 50.001637517 4148 G R 8718079 + 256 [dd]
8,0 3 177788 50.001638954 4148 P N [dd]
8,0 3 177789 50.001639290 4148 I R 8718079 + 256 [dd]
8,0 3 177790 50.001765339 4148 A R 8718335 + 256 <- (8,1) 8718272
8,0 3 177791 50.001765699 4148 Q R 8718335 + 256 [dd]
8,0 3 177792 50.001766971 4148 M R 8718335 + 256 [dd]
8,0 3 177793 50.001768243 4148 U N [dd] 1
8,0 3 177794 50.001769464 4148 D R 8718079 + 512 [dd]
8,0 3 177795 50.003815034 0 C R 8718079 + 512 [0]
8,0 3 177796 50.004008636 4148 A R 8718591 + 256 <- (8,1) 8718528
8,0 3 177797 50.004008951 4148 Q R 8718591 + 256 [dd]
8,0 3 177798 50.004010787 4148 G R 8718591 + 256 [dd]
8,0 3 177799 50.004012089 4148 P N [dd]
8,0 3 177800 50.004012641 4148 I R 8718591 + 256 [dd]
8,0 3 177801 50.004139944 4148 A R 8718847 + 256 <- (8,1) 8718784
8,0 3 177802 50.004140298 4148 Q R 8718847 + 256 [dd]
8,0 3 177803 50.004141393 4148 M R 8718847 + 256 [dd]
8,0 3 177804 50.004142815 4148 U N [dd] 1
8,0 3 177805 50.004144003 4148 D R 8718591 + 512 [dd]
8,0 3 177806 50.007151480 0 C R 8718591 + 512 [0]
8,0 3 177807 50.007344467 4148 A R 8719103 + 256 <- (8,1) 8719040
8,0 3 177808 50.007344779 4148 Q R 8719103 + 256 [dd]
8,0 3 177809 50.007346636 4148 G R 8719103 + 256 [dd]
8,0 3 177810 50.007347821 4148 P N [dd]
8,0 3 177811 50.007348346 4148 I R 8719103 + 256 [dd]
8,0 3 177812 50.007480827 4148 A R 8719359 + 256 <- (8,1) 8719296
8,0 3 177813 50.007481187 4148 Q R 8719359 + 256 [dd]
8,0 3 177814 50.007482669 4148 M R 8719359 + 256 [dd]
8,0 3 177815 50.007483965 4148 U N [dd] 1
8,0 3 177816 50.007485171 4148 D R 8719103 + 512 [dd]
8,0 3 177817 50.009885672 0 C R 8719103 + 512 [0]
8,0 3 177818 50.010077696 4148 A R 8719615 + 256 <- (8,1) 8719552
8,0 3 177819 50.010078008 4148 Q R 8719615 + 256 [dd]
8,0 3 177820 50.010079841 4148 G R 8719615 + 256 [dd]
8,0 3 177821 50.010081227 4148 P N [dd]
8,0 3 177822 50.010081560 4148 I R 8719615 + 256 [dd]
8,0 3 177823 50.010208686 4148 A R 8719871 + 256 <- (8,1) 8719808
8,0 3 177824 50.010209046 4148 Q R 8719871 + 256 [dd]
8,0 3 177825 50.010210366 4148 M R 8719871 + 256 [dd]
8,0 3 177826 50.010211686 4148 U N [dd] 1
8,0 3 177827 50.010212916 4148 D R 8719615 + 512 [dd]
8,0 3 177828 50.013880081 0 C R 8719615 + 512 [0]
8,0 3 177829 50.014071235 4148 A R 8720127 + 256 <- (8,1) 8720064
8,0 3 177830 50.014071544 4148 Q R 8720127 + 256 [dd]
8,0 3 177831 50.014073332 4148 G R 8720127 + 256 [dd]
8,0 3 177832 50.014074517 4148 P N [dd]
8,0 3 177833 50.014075084 4148 I R 8720127 + 256 [dd]
8,0 3 177834 50.014201763 4148 A R 8720383 + 256 <- (8,1) 8720320
8,0 3 177835 50.014202123 4148 Q R 8720383 + 256 [dd]
8,0 3 177836 50.014203608 4148 M R 8720383 + 256 [dd]
8,0 3 177837 50.014204889 4148 U N [dd] 1
8,0 3 177838 50.014206095 4148 D R 8720127 + 512 [dd]
8,0 3 177839 50.017545281 0 C R 8720127 + 512 [0]
8,0 3 177840 50.017741679 4148 A R 8720639 + 256 <- (8,1) 8720576
8,0 3 177841 50.017742006 4148 Q R 8720639 + 256 [dd]
8,0 3 177842 50.017743848 4148 G R 8720639 + 256 [dd]
8,0 3 177843 50.017745318 4148 P N [dd]
8,0 3 177844 50.017745672 4148 I R 8720639 + 256 [dd]
8,0 3 177845 50.017876956 4148 A R 8720895 + 256 <- (8,1) 8720832
8,0 3 177846 50.017877286 4148 Q R 8720895 + 256 [dd]
8,0 3 177847 50.017878615 4148 M R 8720895 + 256 [dd]
8,0 3 177848 50.017880082 4148 U N [dd] 1
8,0 3 177849 50.017881339 4148 D R 8720639 + 512 [dd]
8,0 3 177850 50.020674534 0 C R 8720639 + 512 [0]
8,0 3 177851 50.020864689 4148 A R 8721151 + 256 <- (8,1) 8721088
8,0 3 177852 50.020865007 4148 Q R 8721151 + 256 [dd]
8,0 3 177853 50.020866900 4148 G R 8721151 + 256 [dd]
8,0 3 177854 50.020868283 4148 P N [dd]
8,0 3 177855 50.020868628 4148 I R 8721151 + 256 [dd]
8,0 3 177856 50.020997302 4148 A R 8721407 + 256 <- (8,1) 8721344
8,0 3 177857 50.020997662 4148 Q R 8721407 + 256 [dd]
8,0 3 177858 50.020998976 4148 M R 8721407 + 256 [dd]
8,0 3 177859 50.021000305 4148 U N [dd] 1
8,0 3 177860 50.021001520 4148 D R 8721151 + 512 [dd]
8,0 3 177861 50.024269136 0 C R 8721151 + 512 [0]
8,0 3 177862 50.024460931 4148 A R 8721663 + 256 <- (8,1) 8721600
8,0 3 177863 50.024461337 4148 Q R 8721663 + 256 [dd]
8,0 3 177864 50.024463175 4148 G R 8721663 + 256 [dd]
8,0 3 177865 50.024464537 4148 P N [dd]
8,0 3 177866 50.024464871 4148 I R 8721663 + 256 [dd]
8,0 3 177867 50.024597943 4148 A R 8721919 + 256 <- (8,1) 8721856
8,0 3 177868 50.024598213 4148 Q R 8721919 + 256 [dd]
8,0 3 177869 50.024599323 4148 M R 8721919 + 256 [dd]
8,0 3 177870 50.024600751 4148 U N [dd] 1
8,0 3 177871 50.024602104 4148 D R 8721663 + 512 [dd]
8,0 3 177872 50.026966145 0 C R 8721663 + 512 [0]
8,0 3 177873 50.027157245 4148 A R 8722175 + 256 <- (8,1) 8722112
8,0 3 177874 50.027157563 4148 Q R 8722175 + 256 [dd]
8,0 3 177875 50.027159351 4148 G R 8722175 + 256 [dd]
8,0 3 177876 50.027160731 4148 P N [dd]
8,0 3 177877 50.027161064 4148 I R 8722175 + 256 [dd]
8,0 3 177878 50.027288745 4148 A R 8722431 + 256 <- (8,1) 8722368
8,0 3 177879 50.027289105 4148 Q R 8722431 + 256 [dd]
8,0 3 177880 50.027290206 4148 M R 8722431 + 256 [dd]
8,0 3 177881 50.027291697 4148 U N [dd] 1
8,0 3 177882 50.027293119 4148 D R 8722175 + 512 [dd]
8,0 3 177883 50.030406105 0 C R 8722175 + 512 [0]
8,0 3 177884 50.030600613 4148 A R 8722687 + 256 <- (8,1) 8722624
8,0 3 177885 50.030601199 4148 Q R 8722687 + 256 [dd]
8,0 3 177886 50.030603269 4148 G R 8722687 + 256 [dd]
8,0 3 177887 50.030604463 4148 P N [dd]
8,0 3 177888 50.030604799 4148 I R 8722687 + 256 [dd]
8,0 3 177889 50.030731757 4148 A R 8722943 + 256 <- (8,1) 8722880
8,0 3 177890 50.030732117 4148 Q R 8722943 + 256 [dd]
8,0 3 177891 50.030733397 4148 M R 8722943 + 256 [dd]
8,0 3 177892 50.030734882 4148 U N [dd] 1
8,0 3 177893 50.030736109 4148 D R 8722687 + 512 [dd]
8,0 3 177894 50.032916699 0 C R 8722687 + 512 [0]
8,0 3 177895 50.033176618 4148 A R 8723199 + 256 <- (8,1) 8723136
8,0 3 177896 50.033177218 4148 Q R 8723199 + 256 [dd]
8,0 3 177897 50.033181433 4148 G R 8723199 + 256 [dd]
8,0 3 177898 50.033184757 4148 P N [dd]
8,0 3 177899 50.033185642 4148 I R 8723199 + 256 [dd]
8,0 3 177900 50.033371264 4148 A R 8723455 + 256 <- (8,1) 8723392
8,0 3 177901 50.033371717 4148 Q R 8723455 + 256 [dd]
8,0 3 177902 50.033374015 4148 M R 8723455 + 256 [dd]
8,0 3 177903 50.033376814 4148 U N [dd] 1
8,0 3 177904 50.033380126 4148 D R 8723199 + 512 [dd]
8,0 3 177905 50.036715133 0 C R 8723199 + 512 [0]
8,0 3 177906 50.036971296 4148 A R 8723711 + 256 <- (8,1) 8723648
8,0 3 177907 50.036972136 4148 Q R 8723711 + 256 [dd]
8,0 3 177908 50.036975673 4148 G R 8723711 + 256 [dd]
8,0 3 177909 50.036978277 4148 P N [dd]
8,0 3 177910 50.036979450 4148 I R 8723711 + 256 [dd]
8,0 3 177911 50.037162429 4148 A R 8723967 + 256 <- (8,1) 8723904
8,0 3 177912 50.037162840 4148 Q R 8723967 + 256 [dd]
8,0 3 177913 50.037164967 4148 M R 8723967 + 256 [dd]
8,0 3 177914 50.037167223 4148 U N [dd] 1
8,0 3 177915 50.037170001 4148 D R 8723711 + 512 [dd]
8,0 3 177916 50.040521790 0 C R 8723711 + 512 [0]
8,0 3 177917 50.040729738 4148 A R 8724223 + 256 <- (8,1) 8724160
8,0 3 177918 50.040730200 4148 Q R 8724223 + 256 [dd]
8,0 3 177919 50.040732060 4148 G R 8724223 + 256 [dd]
8,0 3 177920 50.040733551 4148 P N [dd]
8,0 3 177921 50.040734109 4148 I R 8724223 + 256 [dd]
8,0 3 177922 50.040860173 4148 A R 8724479 + 160 <- (8,1) 8724416
8,0 3 177923 50.040860536 4148 Q R 8724479 + 160 [dd]
8,0 3 177924 50.040861517 4148 M R 8724479 + 160 [dd]
8,0 3 177925 50.040872542 4148 A R 1055943 + 8 <- (8,1) 1055880
8,0 3 177926 50.040872800 4148 Q R 1055943 + 8 [dd]
8,0 3 177927 50.040874849 4148 G R 1055943 + 8 [dd]
8,0 3 177928 50.040875485 4148 I R 1055943 + 8 [dd]
8,0 3 177929 50.040877045 4148 U N [dd] 2
8,0 3 177930 50.040878625 4148 D R 8724223 + 416 [dd]
8,0 3 177931 50.040895335 4148 D R 1055943 + 8 [dd]
8,0 3 177932 50.044383267 0 C R 8724223 + 416 [0]
8,0 3 177933 50.044704725 0 C R 1055943 + 8 [0]
8,0 3 177934 50.044749068 4148 A R 8724639 + 96 <- (8,1) 8724576
8,0 3 177935 50.044749472 4148 Q R 8724639 + 96 [dd]
8,0 3 177936 50.044752184 4148 G R 8724639 + 96 [dd]
8,0 3 177937 50.044753552 4148 P N [dd]
8,0 3 177938 50.044754032 4148 I R 8724639 + 96 [dd]
8,0 3 177939 50.044896095 4148 A R 8724735 + 256 <- (8,1) 8724672
8,0 3 177940 50.044896443 4148 Q R 8724735 + 256 [dd]
8,0 3 177941 50.044897538 4148 M R 8724735 + 256 [dd]
8,0 3 177942 50.044948546 4148 U N [dd] 1
8,0 3 177943 50.044950001 4148 D R 8724639 + 352 [dd]
8,0 3 177944 50.047150137 0 C R 8724639 + 352 [0]
8,0 3 177945 50.047294824 4148 A R 8724991 + 256 <- (8,1) 8724928
8,0 3 177946 50.047295142 4148 Q R 8724991 + 256 [dd]
8,0 3 177947 50.047296978 4148 G R 8724991 + 256 [dd]
8,0 3 177948 50.047298301 4148 P N [dd]
8,0 3 177949 50.047298637 4148 I R 8724991 + 256 [dd]
8,0 3 177950 50.047429027 4148 A R 8725247 + 256 <- (8,1) 8725184
8,0 3 177951 50.047429387 4148 Q R 8725247 + 256 [dd]
8,0 3 177952 50.047430479 4148 M R 8725247 + 256 [dd]
8,0 3 177953 50.047431736 4148 U N [dd] 1
8,0 3 177954 50.047432951 4148 D R 8724991 + 512 [dd]
8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
8,0 3 177956 50.050507961 4148 A R 8725503 + 256 <- (8,1) 8725440
8,0 3 177957 50.050508273 4148 Q R 8725503 + 256 [dd]
8,0 3 177958 50.050510139 4148 G R 8725503 + 256 [dd]
8,0 3 177959 50.050511522 4148 P N [dd]
8,0 3 177960 50.050512062 4148 I R 8725503 + 256 [dd]
8,0 3 177961 50.050645393 4148 A R 8725759 + 256 <- (8,1) 8725696
8,0 3 177962 50.050645867 4148 Q R 8725759 + 256 [dd]
8,0 3 177963 50.050647171 4148 M R 8725759 + 256 [dd]
8,0 3 177964 50.050648593 4148 U N [dd] 1
8,0 3 177965 50.050649985 4148 D R 8725503 + 512 [dd]
8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
8,0 3 177967 50.053576324 4148 A R 8726015 + 256 <- (8,1) 8725952
8,0 3 177968 50.053576615 4148 Q R 8726015 + 256 [dd]
8,0 3 177969 50.053578994 4148 G R 8726015 + 256 [dd]
8,0 3 177970 50.053580173 4148 P N [dd]
8,0 3 177971 50.053580509 4148 I R 8726015 + 256 [dd]
8,0 3 177972 50.053711503 4148 A R 8726271 + 256 <- (8,1) 8726208
8,0 3 177973 50.053712001 4148 Q R 8726271 + 256 [dd]
8,0 3 177974 50.053713332 4148 M R 8726271 + 256 [dd]
8,0 3 177975 50.053714583 4148 U N [dd] 1
8,0 3 177976 50.053715768 4148 D R 8726015 + 512 [dd]
8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
8,0 3 177978 50.057161408 4148 A R 8726527 + 256 <- (8,1) 8726464
8,0 3 177979 50.057161726 4148 Q R 8726527 + 256 [dd]
8,0 3 177980 50.057163718 4148 G R 8726527 + 256 [dd]
8,0 3 177981 50.057165098 4148 P N [dd]
8,0 3 177982 50.057165431 4148 I R 8726527 + 256 [dd]
8,0 3 177983 50.057294630 4148 A R 8726783 + 256 <- (8,1) 8726720
8,0 3 177984 50.057294990 4148 Q R 8726783 + 256 [dd]
8,0 3 177985 50.057296070 4148 M R 8726783 + 256 [dd]
8,0 3 177986 50.057297402 4148 U N [dd] 1
8,0 3 177987 50.057298899 4148 D R 8726527 + 512 [dd]
8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
8,0 3 177989 50.060523768 4148 A R 8727039 + 256 <- (8,1) 8726976
8,0 3 177990 50.060524095 4148 Q R 8727039 + 256 [dd]
8,0 3 177991 50.060525910 4148 G R 8727039 + 256 [dd]
8,0 3 177992 50.060527239 4148 P N [dd]
8,0 3 177993 50.060527575 4148 I R 8727039 + 256 [dd]
8,0 3 177994 50.060662280 4148 A R 8727295 + 256 <- (8,1) 8727232
8,0 3 177995 50.060662778 4148 Q R 8727295 + 256 [dd]
8,0 3 177996 50.060663993 4148 M R 8727295 + 256 [dd]
8,0 3 177997 50.060665403 4148 U N [dd] 1
8,0 3 177998 50.060666999 4148 D R 8727039 + 512 [dd]
8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
8,0 3 178000 50.064113177 4148 A R 8727551 + 256 <- (8,1) 8727488
8,0 3 178001 50.064113492 4148 Q R 8727551 + 256 [dd]
8,0 3 178002 50.064115373 4148 G R 8727551 + 256 [dd]
-2.6.30-rc6-patched
8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
8,0 3 257298 50.000944399 4139 A R 9481215 + 256 <- (8,1) 9481152
8,0 3 257299 50.000944693 4139 Q R 9481215 + 256 [dd]
8,0 3 257300 50.000946541 4139 G R 9481215 + 256 [dd]
8,0 3 257301 50.000947954 4139 P N [dd]
8,0 3 257302 50.000948368 4139 I R 9481215 + 256 [dd]
8,0 3 257303 50.000948920 4139 U N [dd] 2
8,0 3 257304 50.000950003 4139 D R 9481215 + 256 [dd]
8,0 3 257305 50.000962541 4139 U N [dd] 2
8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
8,0 3 257308 50.003258111 4139 A R 9481471 + 256 <- (8,1) 9481408
8,0 3 257309 50.003258402 4139 Q R 9481471 + 256 [dd]
8,0 3 257310 50.003260190 4139 G R 9481471 + 256 [dd]
8,0 3 257311 50.003261399 4139 P N [dd]
8,0 3 257312 50.003261768 4139 I R 9481471 + 256 [dd]
8,0 3 257313 50.003262335 4139 U N [dd] 1
8,0 3 257314 50.003263406 4139 D R 9481471 + 256 [dd]
8,0 3 257315 50.003430472 4139 A R 9481727 + 256 <- (8,1) 9481664
8,0 3 257316 50.003430748 4139 Q R 9481727 + 256 [dd]
8,0 3 257317 50.003433065 4139 G R 9481727 + 256 [dd]
8,0 3 257318 50.003434343 4139 P N [dd]
8,0 3 257319 50.003434658 4139 I R 9481727 + 256 [dd]
8,0 3 257320 50.003435138 4139 U N [dd] 2
8,0 3 257321 50.003436083 4139 D R 9481727 + 256 [dd]
8,0 3 257322 50.003447795 4139 U N [dd] 2
8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
8,0 3 257324 50.004959499 4139 A R 9481983 + 256 <- (8,1) 9481920
8,0 3 257325 50.004959790 4139 Q R 9481983 + 256 [dd]
8,0 3 257326 50.004961590 4139 G R 9481983 + 256 [dd]
8,0 3 257327 50.004962793 4139 P N [dd]
8,0 3 257328 50.004963153 4139 I R 9481983 + 256 [dd]
8,0 3 257329 50.004964098 4139 U N [dd] 2
8,0 3 257330 50.004965184 4139 D R 9481983 + 256 [dd]
8,0 3 257331 50.004978967 4139 U N [dd] 2
8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
8,0 3 257333 50.007052043 4139 A R 9482239 + 256 <- (8,1) 9482176
8,0 3 257334 50.007052331 4139 Q R 9482239 + 256 [dd]
8,0 3 257335 50.007054146 4139 G R 9482239 + 256 [dd]
8,0 3 257336 50.007055355 4139 P N [dd]
8,0 3 257337 50.007055724 4139 I R 9482239 + 256 [dd]
8,0 3 257338 50.007056438 4139 U N [dd] 2
8,0 3 257339 50.007057605 4139 D R 9482239 + 256 [dd]
8,0 3 257340 50.007069963 4139 U N [dd] 2
8,0 3 257341 50.008250294 0 C R 9481983 + 256 [0]
8,0 3 257342 50.008431589 4139 A R 9482495 + 256 <- (8,1) 9482432
8,0 3 257343 50.008431881 4139 Q R 9482495 + 256 [dd]
8,0 3 257344 50.008433921 4139 G R 9482495 + 256 [dd]
8,0 3 257345 50.008435097 4139 P N [dd]
8,0 3 257346 50.008435466 4139 I R 9482495 + 256 [dd]
8,0 3 257347 50.008436213 4139 U N [dd] 2
8,0 3 257348 50.008437296 4139 D R 9482495 + 256 [dd]
8,0 3 257349 50.008450034 4139 U N [dd] 2
8,0 3 257350 50.010008843 0 C R 9482239 + 256 [0]
8,0 3 257351 50.010135287 4139 C R 9482495 + 256 [0]
8,0 3 257352 50.010226816 4139 A R 9482751 + 256 <- (8,1) 9482688
8,0 3 257353 50.010227107 4139 Q R 9482751 + 256 [dd]
8,0 3 257354 50.010229363 4139 G R 9482751 + 256 [dd]
8,0 3 257355 50.010230728 4139 P N [dd]
8,0 3 257356 50.010231097 4139 I R 9482751 + 256 [dd]
8,0 3 257357 50.010231655 4139 U N [dd] 1
8,0 3 257358 50.010232696 4139 D R 9482751 + 256 [dd]
8,0 3 257359 50.010380946 4139 A R 9483007 + 256 <- (8,1) 9482944
8,0 3 257360 50.010381264 4139 Q R 9483007 + 256 [dd]
8,0 3 257361 50.010383358 4139 G R 9483007 + 256 [dd]
8,0 3 257362 50.010384429 4139 P N [dd]
8,0 3 257363 50.010384741 4139 I R 9483007 + 256 [dd]
8,0 3 257364 50.010385395 4139 U N [dd] 2
8,0 3 257365 50.010386364 4139 D R 9483007 + 256 [dd]
8,0 3 257366 50.010397869 4139 U N [dd] 2
8,0 3 257367 50.014210132 0 C R 9482751 + 256 [0]
8,0 3 257368 50.014252938 0 C R 9483007 + 256 [0]
8,0 3 257369 50.014430811 4139 A R 9483263 + 256 <- (8,1) 9483200
8,0 3 257370 50.014431105 4139 Q R 9483263 + 256 [dd]
8,0 3 257371 50.014433139 4139 G R 9483263 + 256 [dd]
8,0 3 257372 50.014434520 4139 P N [dd]
8,0 3 257373 50.014435110 4139 I R 9483263 + 256 [dd]
8,0 3 257374 50.014435674 4139 U N [dd] 1
8,0 3 257375 50.014436770 4139 D R 9483263 + 256 [dd]
8,0 3 257376 50.014592117 4139 A R 9483519 + 256 <- (8,1) 9483456
8,0 3 257377 50.014592573 4139 Q R 9483519 + 256 [dd]
8,0 3 257378 50.014594391 4139 G R 9483519 + 256 [dd]
8,0 3 257379 50.014595504 4139 P N [dd]
8,0 3 257380 50.014595876 4139 I R 9483519 + 256 [dd]
8,0 3 257381 50.014596366 4139 U N [dd] 2
8,0 3 257382 50.014597368 4139 D R 9483519 + 256 [dd]
8,0 3 257383 50.014609521 4139 U N [dd] 2
8,0 3 257384 50.015937813 0 C R 9483263 + 256 [0]
8,0 3 257385 50.016124825 4139 A R 9483775 + 256 <- (8,1) 9483712
8,0 3 257386 50.016125116 4139 Q R 9483775 + 256 [dd]
8,0 3 257387 50.016127162 4139 G R 9483775 + 256 [dd]
8,0 3 257388 50.016128569 4139 P N [dd]
8,0 3 257389 50.016128983 4139 I R 9483775 + 256 [dd]
8,0 3 257390 50.016129538 4139 U N [dd] 2
8,0 3 257391 50.016130627 4139 D R 9483775 + 256 [dd]
8,0 3 257392 50.016143077 4139 U N [dd] 2
8,0 3 257393 50.016925304 0 C R 9483519 + 256 [0]
8,0 3 257394 50.017111307 4139 A R 9484031 + 256 <- (8,1) 9483968
8,0 3 257395 50.017111598 4139 Q R 9484031 + 256 [dd]
8,0 3 257396 50.017113410 4139 G R 9484031 + 256 [dd]
8,0 3 257397 50.017114835 4139 P N [dd]
8,0 3 257398 50.017115213 4139 I R 9484031 + 256 [dd]
8,0 3 257399 50.017115765 4139 U N [dd] 2
8,0 3 257400 50.017116839 4139 D R 9484031 + 256 [dd]
8,0 3 257401 50.017129023 4139 U N [dd] 2
8,0 3 257402 50.017396693 0 C R 9483775 + 256 [0]
8,0 3 257403 50.017584595 4139 A R 9484287 + 256 <- (8,1) 9484224
8,0 3 257404 50.017585018 4139 Q R 9484287 + 256 [dd]
8,0 3 257405 50.017586866 4139 G R 9484287 + 256 [dd]
8,0 3 257406 50.017587997 4139 P N [dd]
8,0 3 257407 50.017588393 4139 I R 9484287 + 256 [dd]
8,0 3 257408 50.017589105 4139 U N [dd] 2
8,0 3 257409 50.017590173 4139 D R 9484287 + 256 [dd]
8,0 3 257410 50.017602614 4139 U N [dd] 2
8,0 3 257411 50.020578876 0 C R 9484031 + 256 [0]
8,0 3 257412 50.020721857 4139 C R 9484287 + 256 [0]
8,0 3 257413 50.020803183 4139 A R 9484543 + 256 <- (8,1) 9484480
8,0 3 257414 50.020803507 4139 Q R 9484543 + 256 [dd]
8,0 3 257415 50.020805256 4139 G R 9484543 + 256 [dd]
8,0 3 257416 50.020806672 4139 P N [dd]
8,0 3 257417 50.020807065 4139 I R 9484543 + 256 [dd]
8,0 3 257418 50.020807668 4139 U N [dd] 1
8,0 3 257419 50.020808733 4139 D R 9484543 + 256 [dd]
8,0 3 257420 50.020957132 4139 A R 9484799 + 256 <- (8,1) 9484736
8,0 3 257421 50.020957423 4139 Q R 9484799 + 256 [dd]
8,0 3 257422 50.020959205 4139 G R 9484799 + 256 [dd]
8,0 3 257423 50.020960276 4139 P N [dd]
8,0 3 257424 50.020960594 4139 I R 9484799 + 256 [dd]
8,0 3 257425 50.020961062 4139 U N [dd] 2
8,0 3 257426 50.020961959 4139 D R 9484799 + 256 [dd]
8,0 3 257427 50.020974191 4139 U N [dd] 2
8,0 3 257428 50.023987847 0 C R 9484543 + 256 [0]
8,0 3 257429 50.024093062 4139 C R 9484799 + 256 [0]
8,0 3 257430 50.024207161 4139 A R 9485055 + 256 <- (8,1) 9484992
8,0 3 257431 50.024207434 4139 Q R 9485055 + 256 [dd]
8,0 3 257432 50.024209567 4139 G R 9485055 + 256 [dd]
8,0 3 257433 50.024210728 4139 P N [dd]
8,0 3 257434 50.024211097 4139 I R 9485055 + 256 [dd]
8,0 3 257435 50.024211661 4139 U N [dd] 1
8,0 3 257436 50.024212693 4139 D R 9485055 + 256 [dd]
8,0 3 257437 50.024359266 4139 A R 9485311 + 256 <- (8,1) 9485248
8,0 3 257438 50.024359584 4139 Q R 9485311 + 256 [dd]
8,0 3 257439 50.024361720 4139 G R 9485311 + 256 [dd]
8,0 3 257440 50.024362794 4139 P N [dd]
8,0 3 257441 50.024363106 4139 I R 9485311 + 256 [dd]
8,0 3 257442 50.024363760 4139 U N [dd] 2
8,0 3 257443 50.024364759 4139 D R 9485311 + 256 [dd]
8,0 3 257444 50.024376535 4139 U N [dd] 2
8,0 3 257445 50.026532544 0 C R 9485055 + 256 [0]
8,0 3 257446 50.026714236 4139 A R 9485567 + 256 <- (8,1) 9485504
8,0 3 257447 50.026714524 4139 Q R 9485567 + 256 [dd]
8,0 3 257448 50.026716354 4139 G R 9485567 + 256 [dd]
8,0 3 257449 50.026717791 4139 P N [dd]
8,0 3 257450 50.026718175 4139 I R 9485567 + 256 [dd]
8,0 3 257451 50.026718778 4139 U N [dd] 2
8,0 3 257452 50.026719876 4139 D R 9485567 + 256 [dd]
8,0 3 257453 50.026736383 4139 U N [dd] 2
8,0 3 257454 50.028531879 0 C R 9485311 + 256 [0]
8,0 3 257455 50.028684347 4139 C R 9485567 + 256 [0]
8,0 3 257456 50.028758787 4139 A R 9485823 + 256 <- (8,1) 9485760
8,0 3 257457 50.028759069 4139 Q R 9485823 + 256 [dd]
8,0 3 257458 50.028760884 4139 G R 9485823 + 256 [dd]
8,0 3 257459 50.028762099 4139 P N [dd]
8,0 3 257460 50.028762447 4139 I R 9485823 + 256 [dd]
8,0 3 257461 50.028763038 4139 U N [dd] 1
8,0 3 257462 50.028764268 4139 D R 9485823 + 256 [dd]
8,0 3 257463 50.028909841 4139 A R 9486079 + 256 <- (8,1) 9486016
8,0 3 257464 50.028910156 4139 Q R 9486079 + 256 [dd]
8,0 3 257465 50.028911896 4139 G R 9486079 + 256 [dd]
8,0 3 257466 50.028912964 4139 P N [dd]
8,0 3 257467 50.028913270 4139 I R 9486079 + 256 [dd]
8,0 3 257468 50.028913912 4139 U N [dd] 2
8,0 3 257469 50.028914878 4139 D R 9486079 + 256 [dd]
8,0 3 257470 50.028927497 4139 U N [dd] 2
8,0 3 257471 50.031158357 0 C R 9485823 + 256 [0]
8,0 3 257472 50.031292365 4139 C R 9486079 + 256 [0]
8,0 3 257473 50.031369697 4139 A R 9486335 + 160 <- (8,1) 9486272
8,0 3 257474 50.031369988 4139 Q R 9486335 + 160 [dd]
8,0 3 257475 50.031371779 4139 G R 9486335 + 160 [dd]
8,0 3 257476 50.031372850 4139 P N [dd]
8,0 3 257477 50.031373198 4139 I R 9486335 + 160 [dd]
8,0 3 257478 50.031384931 4139 A R 1056639 + 8 <- (8,1) 1056576
8,0 3 257479 50.031385201 4139 Q R 1056639 + 8 [dd]
8,0 3 257480 50.031388480 4139 G R 1056639 + 8 [dd]
8,0 3 257481 50.031388904 4139 I R 1056639 + 8 [dd]
8,0 3 257482 50.031390362 4139 U N [dd] 2
8,0 3 257483 50.031391523 4139 D R 9486335 + 160 [dd]
8,0 3 257484 50.031403403 4139 D R 1056639 + 8 [dd]
8,0 3 257485 50.033630747 0 C R 1056639 + 8 [0]
8,0 3 257486 50.033690300 4139 A R 9486495 + 96 <- (8,1) 9486432
8,0 3 257487 50.033690810 4139 Q R 9486495 + 96 [dd]
8,0 3 257488 50.033694581 4139 G R 9486495 + 96 [dd]
8,0 3 257489 50.033696739 4139 P N [dd]
8,0 3 257490 50.033697357 4139 I R 9486495 + 96 [dd]
8,0 3 257491 50.033698611 4139 U N [dd] 2
8,0 3 257492 50.033700945 4139 D R 9486495 + 96 [dd]
8,0 3 257493 50.033727763 4139 C R 9486335 + 160 [0]
8,0 3 257494 50.033996024 4139 A R 9486591 + 256 <- (8,1) 9486528
8,0 3 257495 50.033996396 4139 Q R 9486591 + 256 [dd]
8,0 3 257496 50.034000030 4139 G R 9486591 + 256 [dd]
8,0 3 257497 50.034002268 4139 P N [dd]
8,0 3 257498 50.034002820 4139 I R 9486591 + 256 [dd]
8,0 3 257499 50.034003924 4139 U N [dd] 2
8,0 3 257500 50.034006201 4139 D R 9486591 + 256 [dd]
8,0 3 257501 50.034091438 4139 U N [dd] 2
8,0 3 257502 50.034637372 0 C R 9486495 + 96 [0]
8,0 3 257503 50.034841508 4139 A R 9486847 + 256 <- (8,1) 9486784
8,0 3 257504 50.034842072 4139 Q R 9486847 + 256 [dd]
8,0 3 257505 50.034846117 4139 G R 9486847 + 256 [dd]
8,0 3 257506 50.034848676 4139 P N [dd]
8,0 3 257507 50.034849384 4139 I R 9486847 + 256 [dd]
8,0 3 257508 50.034850545 4139 U N [dd] 2
8,0 3 257509 50.034852795 4139 D R 9486847 + 256 [dd]
8,0 3 257510 50.034875503 4139 U N [dd] 2
8,0 3 257511 50.035370009 0 C R 9486591 + 256 [0]
8,0 3 257512 50.035622315 4139 A R 9487103 + 256 <- (8,1) 9487040
8,0 3 257513 50.035622954 4139 Q R 9487103 + 256 [dd]
8,0 3 257514 50.035627101 4139 G R 9487103 + 256 [dd]
8,0 3 257515 50.035629510 4139 P N [dd]
8,0 3 257516 50.035630143 4139 I R 9487103 + 256 [dd]
8,0 3 257517 50.035631058 4139 U N [dd] 2
8,0 3 257518 50.035632657 4139 D R 9487103 + 256 [dd]
8,0 3 257519 50.035656358 4139 U N [dd] 2
8,0 3 257520 50.036703329 0 C R 9486847 + 256 [0]
8,0 3 257521 50.036963604 4139 A R 9487359 + 256 <- (8,1) 9487296
8,0 3 257522 50.036964057 4139 Q R 9487359 + 256 [dd]
8,0 3 257523 50.036967636 4139 G R 9487359 + 256 [dd]
8,0 3 257524 50.036969710 4139 P N [dd]
8,0 3 257525 50.036970586 4139 I R 9487359 + 256 [dd]
8,0 3 257526 50.036971684 4139 U N [dd] 2
8,0 3 257527 50.036973631 4139 D R 9487359 + 256 [dd]
8,0 3 257528 50.036995034 4139 U N [dd] 2
8,0 3 257529 50.038904428 0 C R 9487103 + 256 [0]
8,0 3 257530 50.039161508 4139 A R 9487615 + 256 <- (8,1) 9487552
8,0 3 257531 50.039161934 4139 Q R 9487615 + 256 [dd]
8,0 3 257532 50.039165834 4139 G R 9487615 + 256 [dd]
8,0 3 257533 50.039168561 4139 P N [dd]
8,0 3 257534 50.039169353 4139 I R 9487615 + 256 [dd]
8,0 3 257535 50.039170343 4139 U N [dd] 2
8,0 3 257536 50.039171645 4139 D R 9487615 + 256 [dd]
8,0 3 257537 50.039193195 4139 U N [dd] 2
8,0 3 257538 50.040570003 0 C R 9487359 + 256 [0]
8,0 3 257539 50.040842161 4139 A R 9487871 + 256 <- (8,1) 9487808
8,0 3 257540 50.040842827 4139 Q R 9487871 + 256 [dd]
8,0 3 257541 50.040846803 4139 G R 9487871 + 256 [dd]
8,0 3 257542 50.040849902 4139 P N [dd]
8,0 3 257543 50.040850715 4139 I R 9487871 + 256 [dd]
8,0 3 257544 50.040851642 4139 U N [dd] 2
8,0 3 257545 50.040853658 4139 D R 9487871 + 256 [dd]
8,0 3 257546 50.040876270 4139 U N [dd] 2
8,0 3 257547 50.042081391 0 C R 9487615 + 256 [0]
8,0 3 257548 50.042215837 4139 C R 9487871 + 256 [0]
8,0 3 257549 50.042316192 4139 A R 9488127 + 256 <- (8,1) 9488064
8,0 3 257550 50.042316633 4139 Q R 9488127 + 256 [dd]
8,0 3 257551 50.042319213 4139 G R 9488127 + 256 [dd]
8,0 3 257552 50.042320803 4139 P N [dd]
8,0 3 257553 50.042321412 4139 I R 9488127 + 256 [dd]
8,0 3 257554 50.042322219 4139 U N [dd] 1
8,0 3 257555 50.042323362 4139 D R 9488127 + 256 [dd]
8,0 3 257556 50.042484350 4139 A R 9488383 + 256 <- (8,1) 9488320
8,0 3 257557 50.042484602 4139 Q R 9488383 + 256 [dd]
8,0 3 257558 50.042486744 4139 G R 9488383 + 256 [dd]
8,0 3 257559 50.042487908 4139 P N [dd]
8,0 3 257560 50.042488223 4139 I R 9488383 + 256 [dd]
8,0 3 257561 50.042488754 4139 U N [dd] 2
8,0 3 257562 50.042489927 4139 D R 9488383 + 256 [dd]
8,0 3 257563 50.042502678 4139 U N [dd] 2
8,0 3 257564 50.045166592 0 C R 9488127 + 256 [0]
8,0 3 257565 50.045355163 4139 A R 9488639 + 256 <- (8,1) 9488576
8,0 3 257566 50.045355493 4139 Q R 9488639 + 256 [dd]
8,0 3 257567 50.045357497 4139 G R 9488639 + 256 [dd]
8,0 3 257568 50.045358673 4139 P N [dd]
8,0 3 257569 50.045359267 4139 I R 9488639 + 256 [dd]
8,0 3 257570 50.045359831 4139 U N [dd] 2
8,0 3 257571 50.045360911 4139 D R 9488639 + 256 [dd]
8,0 3 257572 50.045373959 4139 U N [dd] 2
8,0 3 257573 50.046450730 0 C R 9488383 + 256 [0]
8,0 3 257574 50.046641639 4139 A R 9488895 + 256 <- (8,1) 9488832
8,0 3 257575 50.046642086 4139 Q R 9488895 + 256 [dd]
8,0 3 257576 50.046643937 4139 G R 9488895 + 256 [dd]
8,0 3 257577 50.046645092 4139 P N [dd]
8,0 3 257578 50.046645527 4139 I R 9488895 + 256 [dd]
8,0 3 257579 50.046646244 4139 U N [dd] 2
8,0 3 257580 50.046647327 4139 D R 9488895 + 256 [dd]
8,0 3 257581 50.046660234 4139 U N [dd] 2
8,0 3 257582 50.047826305 0 C R 9488639 + 256 [0]
8,0 3 257583 50.048011468 4139 A R 9489151 + 256 <- (8,1) 9489088
8,0 3 257584 50.048011762 4139 Q R 9489151 + 256 [dd]
8,0 3 257585 50.048013793 4139 G R 9489151 + 256 [dd]
8,0 3 257586 50.048014966 4139 P N [dd]
8,0 3 257587 50.048015380 4139 I R 9489151 + 256 [dd]
8,0 3 257588 50.048016112 4139 U N [dd] 2
8,0 3 257589 50.048017202 4139 D R 9489151 + 256 [dd]
8,0 3 257590 50.048029553 4139 U N [dd] 2
8,0 3 257591 50.049319830 0 C R 9488895 + 256 [0]
8,0 3 257592 50.049446089 4139 C R 9489151 + 256 [0]
8,0 3 257593 50.049545199 4139 A R 9489407 + 256 <- (8,1) 9489344
8,0 3 257594 50.049545628 4139 Q R 9489407 + 256 [dd]
8,0 3 257595 50.049547512 4139 G R 9489407 + 256 [dd]
8,0 3 257596 50.049548886 4139 P N [dd]
8,0 3 257597 50.049549318 4139 I R 9489407 + 256 [dd]
8,0 3 257598 50.049550047 4139 U N [dd] 1
8,0 3 257599 50.049551241 4139 D R 9489407 + 256 [dd]
8,0 3 257600 50.049699283 4139 A R 9489663 + 256 <- (8,1) 9489600
8,0 3 257601 50.049699556 4139 Q R 9489663 + 256 [dd]
8,0 3 257602 50.049701266 4139 G R 9489663 + 256 [dd]
8,0 3 257603 50.049702310 4139 P N [dd]
8,0 3 257604 50.049702656 4139 I R 9489663 + 256 [dd]
8,0 3 257605 50.049703118 4139 U N [dd] 2
8,0 3 257606 50.049704020 4139 D R 9489663 + 256 [dd]
8,0 3 257607 50.049715940 4139 U N [dd] 2
8,0 3 257608 50.052662150 0 C R 9489407 + 256 [0]
8,0 3 257609 50.052853688 4139 A R 9489919 + 256 <- (8,1) 9489856
8,0 3 257610 50.052853985 4139 Q R 9489919 + 256 [dd]
8,0 3 257611 50.052855869 4139 G R 9489919 + 256 [dd]
8,0 3 257612 50.052857057 4139 P N [dd]
8,0 3 257613 50.052857423 4139 I R 9489919 + 256 [dd]
8,0 3 257614 50.052858065 4139 U N [dd] 2
8,0 3 257615 50.052859164 4139 D R 9489919 + 256 [dd]
8,0 3 257616 50.052871806 4139 U N [dd] 2
8,0 3 257617 50.053470795 0 C R 9489663 + 256 [0]
8,0 3 257618 50.053661719 4139 A R 9490175 + 256 <- (8,1) 9490112
8,0 3 257619 50.053662097 4139 Q R 9490175 + 256 [dd]
8,0 3 257620 50.053663891 4139 G R 9490175 + 256 [dd]
8,0 3 257621 50.053665034 4139 P N [dd]
8,0 3 257622 50.053665436 4139 I R 9490175 + 256 [dd]
8,0 3 257623 50.053665982 4139 U N [dd] 2
8,0 3 257624 50.053667077 4139 D R 9490175 + 256 [dd]
8,0 3 257625 50.053679732 4139 U N [dd] 2
8,0 3 257626 50.055776383 0 C R 9489919 + 256 [0]
8,0 3 257627 50.055915017 4139 C R 9490175 + 256 [0]
8,0 3 257628 50.055997812 4139 A R 9490431 + 256 <- (8,1) 9490368
8,0 3 257629 50.055998085 4139 Q R 9490431 + 256 [dd]
8,0 3 257630 50.055999867 4139 G R 9490431 + 256 [dd]
8,0 3 257631 50.056001049 4139 P N [dd]
8,0 3 257632 50.056001451 4139 I R 9490431 + 256 [dd]
8,0 3 257633 50.056002189 4139 U N [dd] 1
8,0 3 257634 50.056003197 4139 D R 9490431 + 256 [dd]
8,0 3 257635 50.056149977 4139 A R 9490687 + 256 <- (8,1) 9490624
8,0 3 257636 50.056150279 4139 Q R 9490687 + 256 [dd]
8,0 3 257637 50.056152047 4139 G R 9490687 + 256 [dd]
8,0 3 257638 50.056153109 4139 P N [dd]
8,0 3 257639 50.056153442 4139 I R 9490687 + 256 [dd]
8,0 3 257640 50.056153904 4139 U N [dd] 2
8,0 3 257641 50.056154852 4139 D R 9490687 + 256 [dd]
8,0 3 257642 50.056166948 4139 U N [dd] 2
8,0 3 257643 50.057600660 0 C R 9490431 + 256 [0]
8,0 3 257644 50.057786753 4139 A R 9490943 + 256 <- (8,1) 9490880
8,0 3 257645 50.057787050 4139 Q R 9490943 + 256 [dd]
8,0 3 257646 50.057788865 4139 G R 9490943 + 256 [dd]
8,0 3 257647 50.057790236 4139 P N [dd]
8,0 3 257648 50.057790614 4139 I R 9490943 + 256 [dd]
8,0 3 257649 50.057791169 4139 U N [dd] 2
8,0 3 257650 50.057792246 4139 D R 9490943 + 256 [dd]
8,0 3 257651 50.057804469 4139 U N [dd] 2
8,0 3 257652 50.060322995 0 C R 9490687 + 256 [0]
8,0 3 257653 50.060464005 4139 C R 9490943 + 256 [0]
8,0 3 257654 50.060548216 4139 A R 9491199 + 256 <- (8,1) 9491136
8,0 3 257655 50.060548696 4139 Q R 9491199 + 256 [dd]
8,0 3 257656 50.060550922 4139 G R 9491199 + 256 [dd]
8,0 3 257657 50.060552096 4139 P N [dd]
8,0 3 257658 50.060552531 4139 I R 9491199 + 256 [dd]
8,0 3 257659 50.060553101 4139 U N [dd] 1
8,0 3 257660 50.060554100 4139 D R 9491199 + 256 [dd]
8,0 3 257661 50.060701569 4139 A R 9491455 + 256 <- (8,1) 9491392
8,0 3 257662 50.060701890 4139 Q R 9491455 + 256 [dd]
8,0 3 257663 50.060703993 4139 G R 9491455 + 256 [dd]
8,0 3 257664 50.060705070 4139 P N [dd]
8,0 3 257665 50.060705385 4139 I R 9491455 + 256 [dd]
8,0 3 257666 50.060706012 4139 U N [dd] 2
8,0 3 257667 50.060706987 4139 D R 9491455 + 256 [dd]
8,0 3 257668 50.060718784 4139 U N [dd] 2
8,0 3 257669 50.062964966 0 C R 9491199 + 256 [0]
8,0 3 257670 50.063102772 4139 C R 9491455 + 256 [0]
8,0 3 257671 50.063182666 4139 A R 9491711 + 256 <- (8,1) 9491648
8,0 3 257672 50.063182939 4139 Q R 9491711 + 256 [dd]
8,0 3 257673 50.063184889 4139 G R 9491711 + 256 [dd]
8,0 3 257674 50.063186074 4139 P N [dd]
8,0 3 257675 50.063186440 4139 I R 9491711 + 256 [dd]
8,0 3 257676 50.063187271 4139 U N [dd] 1
8,0 3 257677 50.063188312 4139 D R 9491711 + 256 [dd]
8,0 3 257678 50.063340467 4139 A R 9491967 + 256 <- (8,1) 9491904
8,0 3 257679 50.063340749 4139 Q R 9491967 + 256 [dd]
8,0 3 257680 50.063342529 4139 G R 9491967 + 256 [dd]
8,0 3 257681 50.063343597 4139 P N [dd]
8,0 3 257682 50.063343915 4139 I R 9491967 + 256 [dd]
8,0 3 257683 50.063344374 4139 U N [dd] 2
8,0 3 257684 50.063345313 4139 D R 9491967 + 256 [dd]
8,0 3 257685 50.063357370 4139 U N [dd] 2
8,0 3 257686 50.066605011 0 C R 9491711 + 256 [0]
8,0 3 257687 50.066643587 0 C R 9491967 + 256 [0]
8,0 3 257688 50.066821310 4139 A R 9492223 + 256 <- (8,1) 9492160
8,0 3 257689 50.066821601 4139 Q R 9492223 + 256 [dd]
8,0 3 257690 50.066823605 4139 G R 9492223 + 256 [dd]
8,0 3 257691 50.066825063 4139 P N [dd]
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-01 1:04 ` Hisashi Hifumi
@ 2009-06-05 15:15 ` Alan D. Brunelle
2009-06-06 14:36 ` KOSAKI Motohiro
0 siblings, 1 reply; 65+ messages in thread
From: Alan D. Brunelle @ 2009-06-05 15:15 UTC (permalink / raw)
To: Hisashi Hifumi; +Cc: Andrew Morton, linux-kernel, linux-fsdevel
Hisashi Hifumi wrote:
> At 09:36 09/06/01, Andrew Morton wrote:
>> On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi
>> <hifumi.hisashi@oss.ntt.co.jp> wrote:
>>
>>> I added blk_run_backing_dev on page_cache_async_readahead
>>> so readahead I/O is unpluged to improve throughput on
>>> especially RAID environment.
>> I skipped the last version of this because KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> said "Please attach blktrace analysis ;)".
>>
>> I'm not sure why he asked for that, but he's a smart chap and
>> presumably had his reasons.
>>
>> If you think that such an analysis is unneeded, or isn't worth the time
>> to generate then please tell us that. But please don't just ignore the
>> request!
>
> Hi Andrew.
>
> Sorry for this.
>
> I did not ignore KOSAKI Motohiro's request.
> I've got blktrace output for both with and without the patch,
> but I just did not clarify the reason for throuput improvement
> from this result.
>
> I do not notice any difference except around unplug behavior by dd.
> Comments?
Pardon my ignorance on the global issues concerning the patch, but
specifically looking at the traces generated by blktrace leads one to
also note that the patched version may generate inefficiencies in other
places in the kernel by reducing the merging going on. In the unpatched
version it looks like (generally) that two incoming bio's are able to be
merged to generate a single I/O request. In the patched version -
because of the quicker unplug(?) - no such merging is going on. This
leads to more work lower in the stack (twice as many I/O operations
being managed), perhaps increased interrupts & handling &c. [This may be
acceptable if the goal is to decrease latencies on a per-bio basis...]
Do you have a place where the raw blktrace data can be retrieved for
more in-depth analysis?
Regards,
Alan D. Brunelle
Hewlett-Packard
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-05 15:15 ` Alan D. Brunelle
@ 2009-06-06 14:36 ` KOSAKI Motohiro
2009-06-06 22:45 ` Wu Fengguang
0 siblings, 1 reply; 65+ messages in thread
From: KOSAKI Motohiro @ 2009-06-06 14:36 UTC (permalink / raw)
To: Alan D. Brunelle
Cc: kosaki.motohiro, Hisashi Hifumi, Andrew Morton, linux-kernel,
linux-fsdevel, Wu Fengguang
sorry for late responce.
I wonder why I and Wu don't contain Cc list in this thread.
> Hisashi Hifumi wrote:
> > At 09:36 09/06/01, Andrew Morton wrote:
> >> On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi
> >> <hifumi.hisashi@oss.ntt.co.jp> wrote:
> >>
> >>> I added blk_run_backing_dev on page_cache_async_readahead
> >>> so readahead I/O is unpluged to improve throughput on
> >>> especially RAID environment.
> >> I skipped the last version of this because KOSAKI Motohiro
> >> <kosaki.motohiro@jp.fujitsu.com> said "Please attach blktrace analysis ;)".
> >>
> >> I'm not sure why he asked for that, but he's a smart chap and
> >> presumably had his reasons.
> >>
> >> If you think that such an analysis is unneeded, or isn't worth the time
> >> to generate then please tell us that. But please don't just ignore the
> >> request!
> >
> > Hi Andrew.
> >
> > Sorry for this.
> >
> > I did not ignore KOSAKI Motohiro's request.
> > I've got blktrace output for both with and without the patch,
> > but I just did not clarify the reason for throuput improvement
> > from this result.
> >
> > I do not notice any difference except around unplug behavior by dd.
> > Comments?
>
> Pardon my ignorance on the global issues concerning the patch, but
> specifically looking at the traces generated by blktrace leads one to
> also note that the patched version may generate inefficiencies in other
> places in the kernel by reducing the merging going on. In the unpatched
> version it looks like (generally) that two incoming bio's are able to be
> merged to generate a single I/O request. In the patched version -
> because of the quicker unplug(?) - no such merging is going on. This
> leads to more work lower in the stack (twice as many I/O operations
> being managed), perhaps increased interrupts & handling &c. [This may be
> acceptable if the goal is to decrease latencies on a per-bio basis...]
>
> Do you have a place where the raw blktrace data can be retrieved for
> more in-depth analysis?
I think your comment is really adequate. In another thread, Wu Fengguang pointed
out the same issue.
I and Wu also wait his analysis.
Thanks.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-06 14:36 ` KOSAKI Motohiro
@ 2009-06-06 22:45 ` Wu Fengguang
2009-06-18 19:04 ` Andrew Morton
0 siblings, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2009-06-06 22:45 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Alan D. Brunelle, Hisashi Hifumi, Andrew Morton, linux-kernel,
linux-fsdevel, Jens Axboe, Randy Dunlap
On Sat, Jun 06, 2009 at 10:36:41PM +0800, KOSAKI Motohiro wrote:
>
> sorry for late responce.
> I wonder why I and Wu don't contain Cc list in this thread.
[restore more CC]
> > Hisashi Hifumi wrote:
> > > At 09:36 09/06/01, Andrew Morton wrote:
> > >> On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi
> > >> <hifumi.hisashi@oss.ntt.co.jp> wrote:
> > >>
> > >>> I added blk_run_backing_dev on page_cache_async_readahead
> > >>> so readahead I/O is unpluged to improve throughput on
> > >>> especially RAID environment.
> > >> I skipped the last version of this because KOSAKI Motohiro
> > >> <kosaki.motohiro@jp.fujitsu.com> said "Please attach blktrace analysis ;)".
> > >>
> > >> I'm not sure why he asked for that, but he's a smart chap and
> > >> presumably had his reasons.
> > >>
> > >> If you think that such an analysis is unneeded, or isn't worth the time
> > >> to generate then please tell us that. But please don't just ignore the
> > >> request!
> > >
> > > Hi Andrew.
> > >
> > > Sorry for this.
> > >
> > > I did not ignore KOSAKI Motohiro's request.
> > > I've got blktrace output for both with and without the patch,
> > > but I just did not clarify the reason for throuput improvement
> > > from this result.
> > >
> > > I do not notice any difference except around unplug behavior by dd.
> > > Comments?
> >
> > Pardon my ignorance on the global issues concerning the patch, but
> > specifically looking at the traces generated by blktrace leads one to
> > also note that the patched version may generate inefficiencies in other
> > places in the kernel by reducing the merging going on. In the unpatched
> > version it looks like (generally) that two incoming bio's are able to be
> > merged to generate a single I/O request. In the patched version -
> > because of the quicker unplug(?) - no such merging is going on. This
> > leads to more work lower in the stack (twice as many I/O operations
> > being managed), perhaps increased interrupts & handling &c. [This may be
> > acceptable if the goal is to decrease latencies on a per-bio basis...]
> >
> > Do you have a place where the raw blktrace data can be retrieved for
> > more in-depth analysis?
>
> I think your comment is really adequate. In another thread, Wu Fengguang pointed
> out the same issue.
> I and Wu also wait his analysis.
And do it with a large readahead size :)
Alan, this was my analysis:
: Hifumi, can you help retest with some large readahead size?
:
: Your readahead size (128K) is smaller than your max_sectors_kb (256K),
: so two readahead IO requests get merged into one real IO, that means
: half of the readahead requests are delayed.
ie. two readahead requests get merged and complete together, thus the effective
IO size is doubled but at the same time it becomes completely synchronous IO.
:
: The IO completion size goes down from 512 to 256 sectors:
:
: before patch:
: 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
: 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
: 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
: 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
: 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
:
: after patch:
: 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
: 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
: 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
: 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
: 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-06 22:45 ` Wu Fengguang
@ 2009-06-18 19:04 ` Andrew Morton
2009-06-20 3:55 ` Wu Fengguang
0 siblings, 1 reply; 65+ messages in thread
From: Andrew Morton @ 2009-06-18 19:04 UTC (permalink / raw)
To: Wu Fengguang
Cc: kosaki.motohiro, Alan.Brunelle, hifumi.hisashi, linux-kernel,
linux-fsdevel, jens.axboe, randy.dunlap
On Sun, 7 Jun 2009 06:45:38 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:
> > > Do you have a place where the raw blktrace data can be retrieved for
> > > more in-depth analysis?
> >
> > I think your comment is really adequate. In another thread, Wu Fengguang pointed
> > out the same issue.
> > I and Wu also wait his analysis.
>
> And do it with a large readahead size :)
>
> Alan, this was my analysis:
>
> : Hifumi, can you help retest with some large readahead size?
> :
> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> : so two readahead IO requests get merged into one real IO, that means
> : half of the readahead requests are delayed.
>
> ie. two readahead requests get merged and complete together, thus the effective
> IO size is doubled but at the same time it becomes completely synchronous IO.
>
> :
> : The IO completion size goes down from 512 to 256 sectors:
> :
> : before patch:
> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
> :
> : after patch:
> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
>
I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
and it's looking like 2.6.32 material, if ever.
If it turns out to be wonderful, we could always ask the -stable
maintainers to put it in 2.6.x.y I guess.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-18 19:04 ` Andrew Morton
@ 2009-06-20 3:55 ` Wu Fengguang
2009-06-20 12:29 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2009-06-20 3:55 UTC (permalink / raw)
To: Andrew Morton
Cc: kosaki.motohiro, Alan.Brunelle, hifumi.hisashi, linux-kernel,
linux-fsdevel, jens.axboe, randy.dunlap
On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
> On Sun, 7 Jun 2009 06:45:38 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > > > Do you have a place where the raw blktrace data can be retrieved for
> > > > more in-depth analysis?
> > >
> > > I think your comment is really adequate. In another thread, Wu Fengguang pointed
> > > out the same issue.
> > > I and Wu also wait his analysis.
> >
> > And do it with a large readahead size :)
> >
> > Alan, this was my analysis:
> >
> > : Hifumi, can you help retest with some large readahead size?
> > :
> > : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> > : so two readahead IO requests get merged into one real IO, that means
> > : half of the readahead requests are delayed.
> >
> > ie. two readahead requests get merged and complete together, thus the effective
> > IO size is doubled but at the same time it becomes completely synchronous IO.
> >
> > :
> > : The IO completion size goes down from 512 to 256 sectors:
> > :
> > : before patch:
> > : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
> > : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
> > : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
> > : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
> > : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
> > :
> > : after patch:
> > : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
> > : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
> > : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
> > : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
> > : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
> >
>
> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
> and it's looking like 2.6.32 material, if ever.
>
> If it turns out to be wonderful, we could always ask the -stable
> maintainers to put it in 2.6.x.y I guess.
Agreed. The expected (and interesting) test on a properly configured
HW RAID has not happened yet, hence the theory remains unsupported.
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-20 3:55 ` Wu Fengguang
@ 2009-06-20 12:29 ` Vladislav Bolkhovitin
2009-06-29 9:34 ` Wu Fengguang
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-20 12:29 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, kosaki.motohiro, Alan.Brunelle, hifumi.hisashi,
linux-kernel, linux-fsdevel, jens.axboe, randy.dunlap,
Beheer InterCommIT
Wu Fengguang, on 06/20/2009 07:55 AM wrote:
> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>> On Sun, 7 Jun 2009 06:45:38 +0800
>> Wu Fengguang <fengguang.wu@intel.com> wrote:
>>
>>>>> Do you have a place where the raw blktrace data can be retrieved for
>>>>> more in-depth analysis?
>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>>>> out the same issue.
>>>> I and Wu also wait his analysis.
>>> And do it with a large readahead size :)
>>>
>>> Alan, this was my analysis:
>>>
>>> : Hifumi, can you help retest with some large readahead size?
>>> :
>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>>> : so two readahead IO requests get merged into one real IO, that means
>>> : half of the readahead requests are delayed.
>>>
>>> ie. two readahead requests get merged and complete together, thus the effective
>>> IO size is doubled but at the same time it becomes completely synchronous IO.
>>>
>>> :
>>> : The IO completion size goes down from 512 to 256 sectors:
>>> :
>>> : before patch:
>>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
>>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
>>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
>>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
>>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
>>> :
>>> : after patch:
>>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
>>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
>>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
>>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
>>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
>>>
>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>> and it's looking like 2.6.32 material, if ever.
>>
>> If it turns out to be wonderful, we could always ask the -stable
>> maintainers to put it in 2.6.x.y I guess.
>
> Agreed. The expected (and interesting) test on a properly configured
> HW RAID has not happened yet, hence the theory remains unsupported.
Hmm, do you see anything improper in the Ronald's setup (see
http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
It is HW RAID based.
As I already wrote, we can ask Ronald to perform any needed tests.
> Thanks,
> Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-20 12:29 ` Vladislav Bolkhovitin
@ 2009-06-29 9:34 ` Wu Fengguang
2009-06-29 10:26 ` Ronald Moesbergen
2009-06-29 10:55 ` Vladislav Bolkhovitin
0 siblings, 2 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 9:34 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: Andrew Morton, kosaki.motohiro, Alan.Brunelle, hifumi.hisashi,
linux-kernel, linux-fsdevel, jens.axboe, randy.dunlap,
Beheer InterCommIT
On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>
> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
> > On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
> >> On Sun, 7 Jun 2009 06:45:38 +0800
> >> Wu Fengguang <fengguang.wu@intel.com> wrote:
> >>
> >>>>> Do you have a place where the raw blktrace data can be retrieved for
> >>>>> more in-depth analysis?
> >>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
> >>>> out the same issue.
> >>>> I and Wu also wait his analysis.
> >>> And do it with a large readahead size :)
> >>>
> >>> Alan, this was my analysis:
> >>>
> >>> : Hifumi, can you help retest with some large readahead size?
> >>> :
> >>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> >>> : so two readahead IO requests get merged into one real IO, that means
> >>> : half of the readahead requests are delayed.
> >>>
> >>> ie. two readahead requests get merged and complete together, thus the effective
> >>> IO size is doubled but at the same time it becomes completely synchronous IO.
> >>>
> >>> :
> >>> : The IO completion size goes down from 512 to 256 sectors:
> >>> :
> >>> : before patch:
> >>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
> >>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
> >>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
> >>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
> >>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
> >>> :
> >>> : after patch:
> >>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
> >>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
> >>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
> >>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
> >>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
> >>>
> >> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
> >> and it's looking like 2.6.32 material, if ever.
> >>
> >> If it turns out to be wonderful, we could always ask the -stable
> >> maintainers to put it in 2.6.x.y I guess.
> >
> > Agreed. The expected (and interesting) test on a properly configured
> > HW RAID has not happened yet, hence the theory remains unsupported.
>
> Hmm, do you see anything improper in the Ronald's setup (see
> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
> It is HW RAID based.
No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
RAID performance is too bad and may be improved by increasing the
readahead size, hehe.
> As I already wrote, we can ask Ronald to perform any needed tests.
Thanks! Ronald's test results are:
231 MB/s HW RAID
69.6 MB/s HW RAID + SCST
89.7 MB/s HW RAID + SCST + this patch
So this patch seem to help SCST, but again it would be better to
improve the SCST throughput first - it is now quite sub-optimal.
(Sorry for the long delay: currently I have not got an idea on
how to measure such timing issues.)
And if Ronald could provide the HW RAID performance with this patch,
then we can confirm if this patch really makes a difference for RAID.
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 9:34 ` Wu Fengguang
@ 2009-06-29 10:26 ` Ronald Moesbergen
2009-06-29 10:55 ` Vladislav Bolkhovitin
2009-06-29 10:55 ` Vladislav Bolkhovitin
1 sibling, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-06-29 10:26 UTC (permalink / raw)
To: Wu Fengguang
Cc: Vladislav Bolkhovitin, Andrew Morton, kosaki.motohiro,
Alan.Brunelle, hifumi.hisashi, linux-kernel, linux-fsdevel,
jens.axboe, randy.dunlap
2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>>
>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
>> > On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>> >> On Sun, 7 Jun 2009 06:45:38 +0800
>> >> Wu Fengguang <fengguang.wu@intel.com> wrote:
>> >>
>> >>>>> Do you have a place where the raw blktrace data can be retrieved for
>> >>>>> more in-depth analysis?
>> >>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>> >>>> out the same issue.
>> >>>> I and Wu also wait his analysis.
>> >>> And do it with a large readahead size :)
>> >>>
>> >>> Alan, this was my analysis:
>> >>>
>> >>> : Hifumi, can you help retest with some large readahead size?
>> >>> :
>> >>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>> >>> : so two readahead IO requests get merged into one real IO, that means
>> >>> : half of the readahead requests are delayed.
>> >>>
>> >>> ie. two readahead requests get merged and complete together, thus the effective
>> >>> IO size is doubled but at the same time it becomes completely synchronous IO.
>> >>>
>> >>> :
>> >>> : The IO completion size goes down from 512 to 256 sectors:
>> >>> :
>> >>> : before patch:
>> >>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
>> >>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
>> >>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
>> >>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
>> >>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
>> >>> :
>> >>> : after patch:
>> >>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
>> >>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
>> >>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
>> >>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
>> >>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
>> >>>
>> >> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>> >> and it's looking like 2.6.32 material, if ever.
>> >>
>> >> If it turns out to be wonderful, we could always ask the -stable
>> >> maintainers to put it in 2.6.x.y I guess.
>> >
>> > Agreed. The expected (and interesting) test on a properly configured
>> > HW RAID has not happened yet, hence the theory remains unsupported.
>>
>> Hmm, do you see anything improper in the Ronald's setup (see
>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
>> It is HW RAID based.
>
> No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
> RAID performance is too bad and may be improved by increasing the
> readahead size, hehe.
>
>> As I already wrote, we can ask Ronald to perform any needed tests.
>
> Thanks! Ronald's test results are:
>
> 231 MB/s HW RAID
> 69.6 MB/s HW RAID + SCST
> 89.7 MB/s HW RAID + SCST + this patch
>
> So this patch seem to help SCST, but again it would be better to
> improve the SCST throughput first - it is now quite sub-optimal.
> (Sorry for the long delay: currently I have not got an idea on
> how to measure such timing issues.)
>
> And if Ronald could provide the HW RAID performance with this patch,
> then we can confirm if this patch really makes a difference for RAID.
I just tested raw HW RAID throughput with the patch applied, same
readahead setting (512KB), and it doesn't look promising:
./blockdev-perftest -d -r /dev/cciss/c0d0
blocksize W W W R R R
67108864 -1 -1 -1 5.59686 5.4098 5.45396
33554432 -1 -1 -1 6.18616 6.13232 5.96124
16777216 -1 -1 -1 7.6757 7.32139 7.4966
8388608 -1 -1 -1 8.82793 9.02057 9.01055
4194304 -1 -1 -1 12.2289 12.6804 12.19
2097152 -1 -1 -1 13.3012 13.706 14.7542
1048576 -1 -1 -1 11.7577 12.3609 11.9507
524288 -1 -1 -1 12.4112 12.2383 11.9105
262144 -1 -1 -1 7.30687 7.4417 7.38246
131072 -1 -1 -1 7.95752 7.95053 8.60796
65536 -1 -1 -1 10.1282 10.1286 10.1956
32768 -1 -1 -1 9.91857 9.98597 10.8421
16384 -1 -1 -1 10.8267 10.8899 10.8718
8192 -1 -1 -1 12.0345 12.5275 12.005
4096 -1 -1 -1 15.1537 15.0771 15.1753
2048 -1 -1 -1 25.432 24.8985 25.4303
1024 -1 -1 -1 45.2674 45.2707 45.3504
512 -1 -1 -1 87.9405 88.5047 87.4726
It dropped down to 189 MB/s. :(
Ronald.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 9:34 ` Wu Fengguang
2009-06-29 10:26 ` Ronald Moesbergen
@ 2009-06-29 10:55 ` Vladislav Bolkhovitin
2009-06-29 13:00 ` Wu Fengguang
1 sibling, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-29 10:55 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, kosaki.motohiro, Alan.Brunelle, hifumi.hisashi,
linux-kernel, linux-fsdevel, jens.axboe, randy.dunlap,
Beheer InterCommIT
Wu Fengguang, on 06/29/2009 01:34 PM wrote:
> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
>>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>>>> On Sun, 7 Jun 2009 06:45:38 +0800
>>>> Wu Fengguang <fengguang.wu@intel.com> wrote:
>>>>
>>>>>>> Do you have a place where the raw blktrace data can be retrieved for
>>>>>>> more in-depth analysis?
>>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>>>>>> out the same issue.
>>>>>> I and Wu also wait his analysis.
>>>>> And do it with a large readahead size :)
>>>>>
>>>>> Alan, this was my analysis:
>>>>>
>>>>> : Hifumi, can you help retest with some large readahead size?
>>>>> :
>>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>>>>> : so two readahead IO requests get merged into one real IO, that means
>>>>> : half of the readahead requests are delayed.
>>>>>
>>>>> ie. two readahead requests get merged and complete together, thus the effective
>>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
>>>>>
>>>>> :
>>>>> : The IO completion size goes down from 512 to 256 sectors:
>>>>> :
>>>>> : before patch:
>>>>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
>>>>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
>>>>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
>>>>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
>>>>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
>>>>> :
>>>>> : after patch:
>>>>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
>>>>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
>>>>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
>>>>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
>>>>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
>>>>>
>>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>>>> and it's looking like 2.6.32 material, if ever.
>>>>
>>>> If it turns out to be wonderful, we could always ask the -stable
>>>> maintainers to put it in 2.6.x.y I guess.
>>> Agreed. The expected (and interesting) test on a properly configured
>>> HW RAID has not happened yet, hence the theory remains unsupported.
>> Hmm, do you see anything improper in the Ronald's setup (see
>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
>> It is HW RAID based.
>
> No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
> RAID performance is too bad and may be improved by increasing the
> readahead size, hehe.
>
>> As I already wrote, we can ask Ronald to perform any needed tests.
>
> Thanks! Ronald's test results are:
>
> 231 MB/s HW RAID
> 69.6 MB/s HW RAID + SCST
> 89.7 MB/s HW RAID + SCST + this patch
>
> So this patch seem to help SCST, but again it would be better to
> improve the SCST throughput first - it is now quite sub-optimal.
No, SCST performance isn't an issue here. You simply can't get more than
110 MB/s from iSCSI over 1GbE, hence 231 MB/s fundamentally isn't
possible. There is only room for 20% improvement, which should be
achieved with better client-side-driven pipelining (see our other
discussions, e.g. http://lkml.org/lkml/2009/5/12/370)
> (Sorry for the long delay: currently I have not got an idea on
> how to measure such timing issues.)
>
> And if Ronald could provide the HW RAID performance with this patch,
> then we can confirm if this patch really makes a difference for RAID.
>
> Thanks,
> Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 10:26 ` Ronald Moesbergen
@ 2009-06-29 10:55 ` Vladislav Bolkhovitin
2009-06-29 12:54 ` Wu Fengguang
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-29 10:55 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: Wu Fengguang, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
Ronald Moesbergen, on 06/29/2009 02:26 PM wrote:
> 2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
>> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
>>>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>>>>> On Sun, 7 Jun 2009 06:45:38 +0800
>>>>> Wu Fengguang <fengguang.wu@intel.com> wrote:
>>>>>
>>>>>>>> Do you have a place where the raw blktrace data can be retrieved for
>>>>>>>> more in-depth analysis?
>>>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>>>>>>> out the same issue.
>>>>>>> I and Wu also wait his analysis.
>>>>>> And do it with a large readahead size :)
>>>>>>
>>>>>> Alan, this was my analysis:
>>>>>>
>>>>>> : Hifumi, can you help retest with some large readahead size?
>>>>>> :
>>>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>>>>>> : so two readahead IO requests get merged into one real IO, that means
>>>>>> : half of the readahead requests are delayed.
>>>>>>
>>>>>> ie. two readahead requests get merged and complete together, thus the effective
>>>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
>>>>>>
>>>>>> :
>>>>>> : The IO completion size goes down from 512 to 256 sectors:
>>>>>> :
>>>>>> : before patch:
>>>>>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
>>>>>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
>>>>>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
>>>>>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
>>>>>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
>>>>>> :
>>>>>> : after patch:
>>>>>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
>>>>>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
>>>>>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
>>>>>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
>>>>>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
>>>>>>
>>>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>>>>> and it's looking like 2.6.32 material, if ever.
>>>>>
>>>>> If it turns out to be wonderful, we could always ask the -stable
>>>>> maintainers to put it in 2.6.x.y I guess.
>>>> Agreed. The expected (and interesting) test on a properly configured
>>>> HW RAID has not happened yet, hence the theory remains unsupported.
>>> Hmm, do you see anything improper in the Ronald's setup (see
>>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
>>> It is HW RAID based.
>> No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
>> RAID performance is too bad and may be improved by increasing the
>> readahead size, hehe.
>>
>>> As I already wrote, we can ask Ronald to perform any needed tests.
>> Thanks! Ronald's test results are:
>>
>> 231 MB/s HW RAID
>> 69.6 MB/s HW RAID + SCST
>> 89.7 MB/s HW RAID + SCST + this patch
>>
>> So this patch seem to help SCST, but again it would be better to
>> improve the SCST throughput first - it is now quite sub-optimal.
>> (Sorry for the long delay: currently I have not got an idea on
>> how to measure such timing issues.)
>>
>> And if Ronald could provide the HW RAID performance with this patch,
>> then we can confirm if this patch really makes a difference for RAID.
>
> I just tested raw HW RAID throughput with the patch applied, same
> readahead setting (512KB), and it doesn't look promising:
>
> ./blockdev-perftest -d -r /dev/cciss/c0d0
> blocksize W W W R R R
> 67108864 -1 -1 -1 5.59686 5.4098 5.45396
> 33554432 -1 -1 -1 6.18616 6.13232 5.96124
> 16777216 -1 -1 -1 7.6757 7.32139 7.4966
> 8388608 -1 -1 -1 8.82793 9.02057 9.01055
> 4194304 -1 -1 -1 12.2289 12.6804 12.19
> 2097152 -1 -1 -1 13.3012 13.706 14.7542
> 1048576 -1 -1 -1 11.7577 12.3609 11.9507
> 524288 -1 -1 -1 12.4112 12.2383 11.9105
> 262144 -1 -1 -1 7.30687 7.4417 7.38246
> 131072 -1 -1 -1 7.95752 7.95053 8.60796
> 65536 -1 -1 -1 10.1282 10.1286 10.1956
> 32768 -1 -1 -1 9.91857 9.98597 10.8421
> 16384 -1 -1 -1 10.8267 10.8899 10.8718
> 8192 -1 -1 -1 12.0345 12.5275 12.005
> 4096 -1 -1 -1 15.1537 15.0771 15.1753
> 2048 -1 -1 -1 25.432 24.8985 25.4303
> 1024 -1 -1 -1 45.2674 45.2707 45.3504
> 512 -1 -1 -1 87.9405 88.5047 87.4726
>
> It dropped down to 189 MB/s. :(
Ronald,
Can you, please, rerun this test locally on the target with the latest
version of blockdev-perftest, which produces much more readable results,
for the following 6 cases:
1. Default vanilla 2.6.29 kernel, default parameters, including read-ahead
2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
3. Default vanilla 2.6.29 kernel, 512 KB read-ahead, 64 KB
max_sectors_kb, the rest is default
4. Patched by the Fengguang's patch http://lkml.org/lkml/2009/5/21/319
vanilla 2.6.29 kernel, default parameters, including read-ahead
5. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB
read-ahead, the rest is default
6. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB
read-ahead, 64 KB max_sectors_kb, the rest is default
Thanks,
Vlad
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 10:55 ` Vladislav Bolkhovitin
@ 2009-06-29 12:54 ` Wu Fengguang
2009-06-29 12:58 ` Bart Van Assche
2009-06-29 13:04 ` Vladislav Bolkhovitin
0 siblings, 2 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 12:54 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: Ronald Moesbergen, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
On Mon, Jun 29, 2009 at 06:55:40PM +0800, Vladislav Bolkhovitin wrote:
> Ronald Moesbergen, on 06/29/2009 02:26 PM wrote:
> > 2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
> >> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
> >>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
> >>>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
> >>>>> On Sun, 7 Jun 2009 06:45:38 +0800
> >>>>> Wu Fengguang <fengguang.wu@intel.com> wrote:
> >>>>>
> >>>>>>>> Do you have a place where the raw blktrace data can be retrieved for
> >>>>>>>> more in-depth analysis?
> >>>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
> >>>>>>> out the same issue.
> >>>>>>> I and Wu also wait his analysis.
> >>>>>> And do it with a large readahead size :)
> >>>>>>
> >>>>>> Alan, this was my analysis:
> >>>>>>
> >>>>>> : Hifumi, can you help retest with some large readahead size?
> >>>>>> :
> >>>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> >>>>>> : so two readahead IO requests get merged into one real IO, that means
> >>>>>> : half of the readahead requests are delayed.
> >>>>>>
> >>>>>> ie. two readahead requests get merged and complete together, thus the effective
> >>>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
> >>>>>>
> >>>>>> :
> >>>>>> : The IO completion size goes down from 512 to 256 sectors:
> >>>>>> :
> >>>>>> : before patch:
> >>>>>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
> >>>>>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
> >>>>>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
> >>>>>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
> >>>>>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
> >>>>>> :
> >>>>>> : after patch:
> >>>>>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
> >>>>>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
> >>>>>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
> >>>>>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
> >>>>>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
> >>>>>>
> >>>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
> >>>>> and it's looking like 2.6.32 material, if ever.
> >>>>>
> >>>>> If it turns out to be wonderful, we could always ask the -stable
> >>>>> maintainers to put it in 2.6.x.y I guess.
> >>>> Agreed. The expected (and interesting) test on a properly configured
> >>>> HW RAID has not happened yet, hence the theory remains unsupported.
> >>> Hmm, do you see anything improper in the Ronald's setup (see
> >>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
> >>> It is HW RAID based.
> >> No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
> >> RAID performance is too bad and may be improved by increasing the
> >> readahead size, hehe.
> >>
> >>> As I already wrote, we can ask Ronald to perform any needed tests.
> >> Thanks! Ronald's test results are:
> >>
> >> 231 MB/s HW RAID
> >> 69.6 MB/s HW RAID + SCST
> >> 89.7 MB/s HW RAID + SCST + this patch
> >>
> >> So this patch seem to help SCST, but again it would be better to
> >> improve the SCST throughput first - it is now quite sub-optimal.
> >> (Sorry for the long delay: currently I have not got an idea on
> >> how to measure such timing issues.)
> >>
> >> And if Ronald could provide the HW RAID performance with this patch,
> >> then we can confirm if this patch really makes a difference for RAID.
> >
> > I just tested raw HW RAID throughput with the patch applied, same
> > readahead setting (512KB), and it doesn't look promising:
> >
> > ./blockdev-perftest -d -r /dev/cciss/c0d0
> > blocksize W W W R R R
> > 67108864 -1 -1 -1 5.59686 5.4098 5.45396
> > 33554432 -1 -1 -1 6.18616 6.13232 5.96124
> > 16777216 -1 -1 -1 7.6757 7.32139 7.4966
> > 8388608 -1 -1 -1 8.82793 9.02057 9.01055
> > 4194304 -1 -1 -1 12.2289 12.6804 12.19
> > 2097152 -1 -1 -1 13.3012 13.706 14.7542
> > 1048576 -1 -1 -1 11.7577 12.3609 11.9507
> > 524288 -1 -1 -1 12.4112 12.2383 11.9105
> > 262144 -1 -1 -1 7.30687 7.4417 7.38246
> > 131072 -1 -1 -1 7.95752 7.95053 8.60796
> > 65536 -1 -1 -1 10.1282 10.1286 10.1956
> > 32768 -1 -1 -1 9.91857 9.98597 10.8421
> > 16384 -1 -1 -1 10.8267 10.8899 10.8718
> > 8192 -1 -1 -1 12.0345 12.5275 12.005
> > 4096 -1 -1 -1 15.1537 15.0771 15.1753
> > 2048 -1 -1 -1 25.432 24.8985 25.4303
> > 1024 -1 -1 -1 45.2674 45.2707 45.3504
> > 512 -1 -1 -1 87.9405 88.5047 87.4726
> >
> > It dropped down to 189 MB/s. :(
>
> Ronald,
>
> Can you, please, rerun this test locally on the target with the latest
> version of blockdev-perftest, which produces much more readable results,
Is blockdev-perftest public available? It's not obvious from google search.
> for the following 6 cases:
>
> 1. Default vanilla 2.6.29 kernel, default parameters, including read-ahead
Why not 2.6.30? :)
> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
How about 2MB RAID readahead size? That transforms into about 512KB
per-disk readahead size.
> 3. Default vanilla 2.6.29 kernel, 512 KB read-ahead, 64 KB
> max_sectors_kb, the rest is default
>
> 4. Patched by the Fengguang's patch http://lkml.org/lkml/2009/5/21/319
> vanilla 2.6.29 kernel, default parameters, including read-ahead
>
> 5. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB
> read-ahead, the rest is default
>
> 6. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB
> read-ahead, 64 KB max_sectors_kb, the rest is default
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 12:54 ` Wu Fengguang
@ 2009-06-29 12:58 ` Bart Van Assche
2009-06-29 13:01 ` Wu Fengguang
2009-06-29 13:04 ` Vladislav Bolkhovitin
1 sibling, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2009-06-29 12:58 UTC (permalink / raw)
To: Wu Fengguang
Cc: Vladislav Bolkhovitin, Ronald Moesbergen, Andrew Morton,
kosaki.motohiro, Alan.Brunelle, hifumi.hisashi, linux-kernel,
linux-fsdevel, jens.axboe, randy.dunlap
On Mon, Jun 29, 2009 at 2:54 PM, Wu Fengguang<fengguang.wu@intel.com> wrote:
> Is blockdev-perftest public available? It's not obvious from google search.
This script is publicly available. You can retrieve it by running the
following command:
svn co https://scst.svn.sourceforge.net/svnroot/scst/trunk/scripts
Bart.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 10:55 ` Vladislav Bolkhovitin
@ 2009-06-29 13:00 ` Wu Fengguang
0 siblings, 0 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 13:00 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: Andrew Morton, kosaki.motohiro, Alan.Brunelle, hifumi.hisashi,
linux-kernel, linux-fsdevel, jens.axboe, randy.dunlap,
Beheer InterCommIT
On Mon, Jun 29, 2009 at 06:55:21PM +0800, Vladislav Bolkhovitin wrote:
>
>
> Wu Fengguang, on 06/29/2009 01:34 PM wrote:
> > On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
> >> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
> >>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
> >>>> On Sun, 7 Jun 2009 06:45:38 +0800
> >>>> Wu Fengguang <fengguang.wu@intel.com> wrote:
> >>>>
> >>>>>>> Do you have a place where the raw blktrace data can be retrieved for
> >>>>>>> more in-depth analysis?
> >>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
> >>>>>> out the same issue.
> >>>>>> I and Wu also wait his analysis.
> >>>>> And do it with a large readahead size :)
> >>>>>
> >>>>> Alan, this was my analysis:
> >>>>>
> >>>>> : Hifumi, can you help retest with some large readahead size?
> >>>>> :
> >>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> >>>>> : so two readahead IO requests get merged into one real IO, that means
> >>>>> : half of the readahead requests are delayed.
> >>>>>
> >>>>> ie. two readahead requests get merged and complete together, thus the effective
> >>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
> >>>>>
> >>>>> :
> >>>>> : The IO completion size goes down from 512 to 256 sectors:
> >>>>> :
> >>>>> : before patch:
> >>>>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
> >>>>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
> >>>>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
> >>>>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
> >>>>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
> >>>>> :
> >>>>> : after patch:
> >>>>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
> >>>>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
> >>>>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
> >>>>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
> >>>>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
> >>>>>
> >>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
> >>>> and it's looking like 2.6.32 material, if ever.
> >>>>
> >>>> If it turns out to be wonderful, we could always ask the -stable
> >>>> maintainers to put it in 2.6.x.y I guess.
> >>> Agreed. The expected (and interesting) test on a properly configured
> >>> HW RAID has not happened yet, hence the theory remains unsupported.
> >> Hmm, do you see anything improper in the Ronald's setup (see
> >> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
> >> It is HW RAID based.
> >
> > No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
> > RAID performance is too bad and may be improved by increasing the
> > readahead size, hehe.
> >
> >> As I already wrote, we can ask Ronald to perform any needed tests.
> >
> > Thanks! Ronald's test results are:
> >
> > 231 MB/s HW RAID
> > 69.6 MB/s HW RAID + SCST
> > 89.7 MB/s HW RAID + SCST + this patch
> >
> > So this patch seem to help SCST, but again it would be better to
> > improve the SCST throughput first - it is now quite sub-optimal.
>
> No, SCST performance isn't an issue here. You simply can't get more than
> 110 MB/s from iSCSI over 1GbE, hence 231 MB/s fundamentally isn't
> possible. There is only room for 20% improvement, which should be
Ah yes.
> achieved with better client-side-driven pipelining (see our other
> discussions, e.g. http://lkml.org/lkml/2009/5/12/370)
Yeah, that's what I want to figure out why :)
Thanks,
Fengguang
> > (Sorry for the long delay: currently I have not got an idea on
> > how to measure such timing issues.)
> >
> > And if Ronald could provide the HW RAID performance with this patch,
> > then we can confirm if this patch really makes a difference for RAID.
> >
> > Thanks,
> > Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 12:58 ` Bart Van Assche
@ 2009-06-29 13:01 ` Wu Fengguang
0 siblings, 0 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 13:01 UTC (permalink / raw)
To: Bart Van Assche
Cc: Vladislav Bolkhovitin, Ronald Moesbergen, Andrew Morton,
kosaki.motohiro, Alan.Brunelle, hifumi.hisashi, linux-kernel,
linux-fsdevel, jens.axboe, randy.dunlap
On Mon, Jun 29, 2009 at 08:58:24PM +0800, Bart Van Assche wrote:
> On Mon, Jun 29, 2009 at 2:54 PM, Wu Fengguang<fengguang.wu@intel.com> wrote:
> > Is blockdev-perftest public available? It's not obvious from google search.
>
> This script is publicly available. You can retrieve it by running the
> following command:
> svn co https://scst.svn.sourceforge.net/svnroot/scst/trunk/scripts
Thank you! This is a handy tool :)
Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 12:54 ` Wu Fengguang
2009-06-29 12:58 ` Bart Van Assche
@ 2009-06-29 13:04 ` Vladislav Bolkhovitin
2009-06-29 13:13 ` Wu Fengguang
2009-06-29 14:00 ` Ronald Moesbergen
1 sibling, 2 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-29 13:04 UTC (permalink / raw)
To: Wu Fengguang, Ronald Moesbergen
Cc: Andrew Morton, kosaki.motohiro, Alan.Brunelle, hifumi.hisashi,
linux-kernel, linux-fsdevel, jens.axboe, randy.dunlap,
Bart Van Assche
Wu Fengguang, on 06/29/2009 04:54 PM wrote:
> On Mon, Jun 29, 2009 at 06:55:40PM +0800, Vladislav Bolkhovitin wrote:
>> Ronald Moesbergen, on 06/29/2009 02:26 PM wrote:
>>> 2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
>>>> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>>>>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
>>>>>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>>>>>>> On Sun, 7 Jun 2009 06:45:38 +0800
>>>>>>> Wu Fengguang <fengguang.wu@intel.com> wrote:
>>>>>>>
>>>>>>>>>> Do you have a place where the raw blktrace data can be retrieved for
>>>>>>>>>> more in-depth analysis?
>>>>>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>>>>>>>>> out the same issue.
>>>>>>>>> I and Wu also wait his analysis.
>>>>>>>> And do it with a large readahead size :)
>>>>>>>>
>>>>>>>> Alan, this was my analysis:
>>>>>>>>
>>>>>>>> : Hifumi, can you help retest with some large readahead size?
>>>>>>>> :
>>>>>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>>>>>>>> : so two readahead IO requests get merged into one real IO, that means
>>>>>>>> : half of the readahead requests are delayed.
>>>>>>>>
>>>>>>>> ie. two readahead requests get merged and complete together, thus the effective
>>>>>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
>>>>>>>>
>>>>>>>> :
>>>>>>>> : The IO completion size goes down from 512 to 256 sectors:
>>>>>>>> :
>>>>>>>> : before patch:
>>>>>>>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
>>>>>>>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
>>>>>>>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
>>>>>>>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
>>>>>>>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
>>>>>>>> :
>>>>>>>> : after patch:
>>>>>>>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
>>>>>>>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
>>>>>>>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
>>>>>>>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
>>>>>>>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
>>>>>>>>
>>>>>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>>>>>>> and it's looking like 2.6.32 material, if ever.
>>>>>>>
>>>>>>> If it turns out to be wonderful, we could always ask the -stable
>>>>>>> maintainers to put it in 2.6.x.y I guess.
>>>>>> Agreed. The expected (and interesting) test on a properly configured
>>>>>> HW RAID has not happened yet, hence the theory remains unsupported.
>>>>> Hmm, do you see anything improper in the Ronald's setup (see
>>>>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
>>>>> It is HW RAID based.
>>>> No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
>>>> RAID performance is too bad and may be improved by increasing the
>>>> readahead size, hehe.
>>>>
>>>>> As I already wrote, we can ask Ronald to perform any needed tests.
>>>> Thanks! Ronald's test results are:
>>>>
>>>> 231 MB/s HW RAID
>>>> 69.6 MB/s HW RAID + SCST
>>>> 89.7 MB/s HW RAID + SCST + this patch
>>>>
>>>> So this patch seem to help SCST, but again it would be better to
>>>> improve the SCST throughput first - it is now quite sub-optimal.
>>>> (Sorry for the long delay: currently I have not got an idea on
>>>> how to measure such timing issues.)
>>>>
>>>> And if Ronald could provide the HW RAID performance with this patch,
>>>> then we can confirm if this patch really makes a difference for RAID.
>>> I just tested raw HW RAID throughput with the patch applied, same
>>> readahead setting (512KB), and it doesn't look promising:
>>>
>>> ./blockdev-perftest -d -r /dev/cciss/c0d0
>>> blocksize W W W R R R
>>> 67108864 -1 -1 -1 5.59686 5.4098 5.45396
>>> 33554432 -1 -1 -1 6.18616 6.13232 5.96124
>>> 16777216 -1 -1 -1 7.6757 7.32139 7.4966
>>> 8388608 -1 -1 -1 8.82793 9.02057 9.01055
>>> 4194304 -1 -1 -1 12.2289 12.6804 12.19
>>> 2097152 -1 -1 -1 13.3012 13.706 14.7542
>>> 1048576 -1 -1 -1 11.7577 12.3609 11.9507
>>> 524288 -1 -1 -1 12.4112 12.2383 11.9105
>>> 262144 -1 -1 -1 7.30687 7.4417 7.38246
>>> 131072 -1 -1 -1 7.95752 7.95053 8.60796
>>> 65536 -1 -1 -1 10.1282 10.1286 10.1956
>>> 32768 -1 -1 -1 9.91857 9.98597 10.8421
>>> 16384 -1 -1 -1 10.8267 10.8899 10.8718
>>> 8192 -1 -1 -1 12.0345 12.5275 12.005
>>> 4096 -1 -1 -1 15.1537 15.0771 15.1753
>>> 2048 -1 -1 -1 25.432 24.8985 25.4303
>>> 1024 -1 -1 -1 45.2674 45.2707 45.3504
>>> 512 -1 -1 -1 87.9405 88.5047 87.4726
>>>
>>> It dropped down to 189 MB/s. :(
>> Ronald,
>>
>> Can you, please, rerun this test locally on the target with the latest
>> version of blockdev-perftest, which produces much more readable results,
>
> Is blockdev-perftest public available? It's not obvious from google search.
>
>> for the following 6 cases:
>>
>> 1. Default vanilla 2.6.29 kernel, default parameters, including read-ahead
>
> Why not 2.6.30? :)
We started with 2.6.29, so why not complete with it (to save additional
Ronald's effort to move on 2.6.30)?
>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
>
> How about 2MB RAID readahead size? That transforms into about 512KB
> per-disk readahead size.
OK. Ronald, can you 4 more test cases, please:
7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
max_sectors_kb, the rest is default
9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
read-ahead, the rest is default
10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
read-ahead, 64 KB max_sectors_kb, the rest is default
>> 3. Default vanilla 2.6.29 kernel, 512 KB read-ahead, 64 KB
>> max_sectors_kb, the rest is default
>>
>> 4. Patched by the Fengguang's patch http://lkml.org/lkml/2009/5/21/319
>> vanilla 2.6.29 kernel, default parameters, including read-ahead
>>
>> 5. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB
>> read-ahead, the rest is default
>>
>> 6. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB
>> read-ahead, 64 KB max_sectors_kb, the rest is default
>
> Thanks,
> Fengguang
>
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 13:04 ` Vladislav Bolkhovitin
@ 2009-06-29 13:13 ` Wu Fengguang
2009-06-29 13:28 ` Wu Fengguang
2009-06-29 14:00 ` Ronald Moesbergen
1 sibling, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 13:13 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: Ronald Moesbergen, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
> Wu Fengguang, on 06/29/2009 04:54 PM wrote:
> >
> > Why not 2.6.30? :)
>
> We started with 2.6.29, so why not complete with it (to save additional
> Ronald's effort to move on 2.6.30)?
OK, that's fair enough.
Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 13:13 ` Wu Fengguang
@ 2009-06-29 13:28 ` Wu Fengguang
2009-06-29 14:43 ` Ronald Moesbergen
0 siblings, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 13:28 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: Ronald Moesbergen, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
[-- Attachment #1: Type: text/plain, Size: 639 bytes --]
On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
> > Wu Fengguang, on 06/29/2009 04:54 PM wrote:
> > >
> > > Why not 2.6.30? :)
> >
> > We started with 2.6.29, so why not complete with it (to save additional
> > Ronald's effort to move on 2.6.30)?
>
> OK, that's fair enough.
btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
in case it will help the SCST performance.
Ronald, if you run context readahead, please make sure that the server
side readahead size is bigger than the client side readahead size.
Thanks,
Fengguang
[-- Attachment #2: readahead-context-2.6.29.patch --]
[-- Type: text/x-diff, Size: 6651 bytes --]
--- linux.orig/mm/readahead.c
+++ linux/mm/readahead.c
@@ -337,6 +337,59 @@ static unsigned long get_next_ra_size(st
*/
/*
+ * Count contiguously cached pages from @offset-1 to @offset-@max,
+ * this count is a conservative estimation of
+ * - length of the sequential read sequence, or
+ * - thrashing threshold in memory tight systems
+ */
+static pgoff_t count_history_pages(struct address_space *mapping,
+ struct file_ra_state *ra,
+ pgoff_t offset, unsigned long max)
+{
+ pgoff_t head;
+
+ rcu_read_lock();
+ head = radix_tree_prev_hole(&mapping->page_tree, offset - 1, max);
+ rcu_read_unlock();
+
+ return offset - 1 - head;
+}
+
+/*
+ * page cache context based read-ahead
+ */
+static int try_context_readahead(struct address_space *mapping,
+ struct file_ra_state *ra,
+ pgoff_t offset,
+ unsigned long req_size,
+ unsigned long max)
+{
+ pgoff_t size;
+
+ size = count_history_pages(mapping, ra, offset, max);
+
+ /*
+ * no history pages:
+ * it could be a random read
+ */
+ if (!size)
+ return 0;
+
+ /*
+ * starts from beginning of file:
+ * it is a strong indication of long-run stream (or whole-file-read)
+ */
+ if (size >= offset)
+ size *= 2;
+
+ ra->start = offset;
+ ra->size = get_init_ra_size(size + req_size, max);
+ ra->async_size = ra->size;
+
+ return 1;
+}
+
+/*
* A minimal readahead algorithm for trivial sequential/random reads.
*/
static unsigned long
@@ -345,34 +398,26 @@ ondemand_readahead(struct address_space
bool hit_readahead_marker, pgoff_t offset,
unsigned long req_size)
{
- int max = ra->ra_pages; /* max readahead pages */
- pgoff_t prev_offset;
- int sequential;
+ unsigned long max = max_sane_readahead(ra->ra_pages);
+
+ /*
+ * start of file
+ */
+ if (!offset)
+ goto initial_readahead;
/*
* It's the expected callback offset, assume sequential access.
* Ramp up sizes, and push forward the readahead window.
*/
- if (offset && (offset == (ra->start + ra->size - ra->async_size) ||
- offset == (ra->start + ra->size))) {
+ if ((offset == (ra->start + ra->size - ra->async_size) ||
+ offset == (ra->start + ra->size))) {
ra->start += ra->size;
ra->size = get_next_ra_size(ra, max);
ra->async_size = ra->size;
goto readit;
}
- prev_offset = ra->prev_pos >> PAGE_CACHE_SHIFT;
- sequential = offset - prev_offset <= 1UL || req_size > max;
-
- /*
- * Standalone, small read.
- * Read as is, and do not pollute the readahead state.
- */
- if (!hit_readahead_marker && !sequential) {
- return __do_page_cache_readahead(mapping, filp,
- offset, req_size, 0);
- }
-
/*
* Hit a marked page without valid readahead state.
* E.g. interleaved reads.
@@ -383,7 +428,7 @@ ondemand_readahead(struct address_space
pgoff_t start;
rcu_read_lock();
- start = radix_tree_next_hole(&mapping->page_tree, offset,max+1);
+ start = radix_tree_next_hole(&mapping->page_tree, offset+1,max);
rcu_read_unlock();
if (!start || start - offset > max)
@@ -391,23 +436,53 @@ ondemand_readahead(struct address_space
ra->start = start;
ra->size = start - offset; /* old async_size */
+ ra->size += req_size;
ra->size = get_next_ra_size(ra, max);
ra->async_size = ra->size;
goto readit;
}
/*
- * It may be one of
- * - first read on start of file
- * - sequential cache miss
- * - oversize random read
- * Start readahead for it.
+ * oversize read
+ */
+ if (req_size > max)
+ goto initial_readahead;
+
+ /*
+ * sequential cache miss
*/
+ if (offset - (ra->prev_pos >> PAGE_CACHE_SHIFT) <= 1UL)
+ goto initial_readahead;
+
+ /*
+ * Query the page cache and look for the traces(cached history pages)
+ * that a sequential stream would leave behind.
+ */
+ if (try_context_readahead(mapping, ra, offset, req_size, max))
+ goto readit;
+
+ /*
+ * standalone, small random read
+ * Read as is, and do not pollute the readahead state.
+ */
+ return __do_page_cache_readahead(mapping, filp, offset, req_size, 0);
+
+initial_readahead:
ra->start = offset;
ra->size = get_init_ra_size(req_size, max);
ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
readit:
+ /*
+ * Will this read hit the readahead marker made by itself?
+ * If so, trigger the readahead marker hit now, and merge
+ * the resulted next readahead window into the current one.
+ */
+ if (offset == ra->start && ra->size == ra->async_size) {
+ ra->async_size = get_next_ra_size(ra, max);
+ ra->size += ra->async_size;
+ }
+
return ra_submit(ra, mapping, filp);
}
--- linux.orig/lib/radix-tree.c
+++ linux/lib/radix-tree.c
@@ -666,6 +666,43 @@ unsigned long radix_tree_next_hole(struc
}
EXPORT_SYMBOL(radix_tree_next_hole);
+/**
+ * radix_tree_prev_hole - find the prev hole (not-present entry)
+ * @root: tree root
+ * @index: index key
+ * @max_scan: maximum range to search
+ *
+ * Search backwards in the range [max(index-max_scan+1, 0), index]
+ * for the first hole.
+ *
+ * Returns: the index of the hole if found, otherwise returns an index
+ * outside of the set specified (in which case 'index - return >= max_scan'
+ * will be true). In rare cases of wrap-around, LONG_MAX will be returned.
+ *
+ * radix_tree_next_hole may be called under rcu_read_lock. However, like
+ * radix_tree_gang_lookup, this will not atomically search a snapshot of
+ * the tree at a single point in time. For example, if a hole is created
+ * at index 10, then subsequently a hole is created at index 5,
+ * radix_tree_prev_hole covering both indexes may return 5 if called under
+ * rcu_read_lock.
+ */
+unsigned long radix_tree_prev_hole(struct radix_tree_root *root,
+ unsigned long index, unsigned long max_scan)
+{
+ unsigned long i;
+
+ for (i = 0; i < max_scan; i++) {
+ if (!radix_tree_lookup(root, index))
+ break;
+ index--;
+ if (index == LONG_MAX)
+ break;
+ }
+
+ return index;
+}
+EXPORT_SYMBOL(radix_tree_prev_hole);
+
static unsigned int
__lookup(struct radix_tree_node *slot, void ***results, unsigned long index,
unsigned int max_items, unsigned long *next_index)
--- linux.orig/include/linux/radix-tree.h
+++ linux/include/linux/radix-tree.h
@@ -167,6 +167,8 @@ radix_tree_gang_lookup_slot(struct radix
unsigned long first_index, unsigned int max_items);
unsigned long radix_tree_next_hole(struct radix_tree_root *root,
unsigned long index, unsigned long max_scan);
+unsigned long radix_tree_prev_hole(struct radix_tree_root *root,
+ unsigned long index, unsigned long max_scan);
int radix_tree_preload(gfp_t gfp_mask);
void radix_tree_init(void);
void *radix_tree_tag_set(struct radix_tree_root *root,
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 13:04 ` Vladislav Bolkhovitin
2009-06-29 13:13 ` Wu Fengguang
@ 2009-06-29 14:00 ` Ronald Moesbergen
2009-06-29 14:21 ` Wu Fengguang
2009-06-30 10:22 ` Vladislav Bolkhovitin
1 sibling, 2 replies; 65+ messages in thread
From: Ronald Moesbergen @ 2009-06-29 14:00 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: Wu Fengguang, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
... tests ...
> We started with 2.6.29, so why not complete with it (to save additional
> Ronald's effort to move on 2.6.30)?
>
>>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
>>
>> How about 2MB RAID readahead size? That transforms into about 512KB
>> per-disk readahead size.
>
> OK. Ronald, can you 4 more test cases, please:
>
> 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
>
> 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
> max_sectors_kb, the rest is default
>
> 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> read-ahead, the rest is default
>
> 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> read-ahead, 64 KB max_sectors_kb, the rest is default
The results:
Unpatched, 128KB readahead, 512 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 5.621 5.503 5.419 185.744 2.780 2.902
33554432 6.628 5.897 6.242 164.068 7.827 5.127
16777216 7.312 7.165 7.614 139.148 3.501 8.697
8388608 8.719 8.408 8.694 119.003 1.973 14.875
4194304 11.836 12.192 12.137 84.958 1.111 21.239
2097152 13.452 13.992 14.035 74.090 1.442 37.045
1048576 12.759 11.996 12.195 83.194 2.152 83.194
524288 11.895 12.297 12.587 83.570 1.945 167.140
262144 7.325 7.285 7.444 139.304 1.272 557.214
131072 7.992 8.832 7.952 124.279 5.901 994.228
65536 10.940 10.062 10.122 98.847 3.715 1581.545
32768 9.973 10.012 9.945 102.640 0.281 3284.493
16384 11.377 10.538 10.692 94.316 3.100 6036.222
Unpatched, 512KB readahead, 512 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 5.032 4.770 5.265 204.228 8.271 3.191
33554432 5.569 5.712 5.863 179.263 3.755 5.602
16777216 6.661 6.857 6.550 153.132 2.888 9.571
8388608 8.022 8.000 7.978 127.998 0.288 16.000
4194304 10.959 11.579 12.208 88.586 3.902 22.146
2097152 13.692 12.670 12.625 78.906 2.914 39.453
1048576 11.120 11.144 10.878 92.703 1.018 92.703
524288 11.234 10.915 11.374 91.667 1.587 183.334
262144 6.848 6.678 6.795 151.191 1.594 604.763
131072 7.393 7.367 7.337 139.025 0.428 1112.202
65536 10.003 10.919 10.015 99.466 4.019 1591.462
32768 10.117 10.124 10.169 101.018 0.229 3232.574
16384 11.614 11.027 11.029 91.293 2.207 5842.771
Unpatched, 2MB readahead, 512 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 5.268 5.316 5.418 191.996 2.241 3.000
33554432 5.831 6.459 6.110 167.259 6.977 5.227
16777216 7.313 7.069 7.197 142.385 1.972 8.899
8388608 8.657 8.500 8.498 119.754 1.039 14.969
4194304 11.846 12.116 11.801 85.911 0.994 21.478
2097152 12.917 13.652 13.100 77.484 1.808 38.742
1048576 9.544 10.667 10.807 99.345 5.640 99.345
524288 11.736 7.171 6.599 128.410 29.539 256.821
262144 7.530 7.403 7.416 137.464 1.053 549.857
131072 8.741 8.002 8.022 124.256 5.029 994.051
65536 10.701 10.138 10.090 99.394 2.629 1590.311
32768 9.978 9.950 9.934 102.875 0.188 3291.994
16384 11.435 10.823 10.907 92.684 2.234 5931.749
Unpatched, 512KB readahead, 64 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 3.994 3.991 4.123 253.774 3.838 3.965
33554432 4.100 4.329 4.161 244.111 5.569 7.628
16777216 5.476 4.835 5.079 200.148 10.177 12.509
8388608 5.484 5.258 5.227 192.470 4.084 24.059
4194304 6.429 6.458 6.435 158.989 0.315 39.747
2097152 7.219 7.744 7.306 138.081 4.187 69.040
1048576 6.850 6.897 6.776 149.696 1.089 149.696
524288 6.406 6.393 6.469 159.439 0.814 318.877
262144 6.865 7.508 6.861 144.931 6.041 579.726
131072 8.435 8.482 8.307 121.792 1.076 974.334
65536 9.616 9.610 10.262 104.279 3.176 1668.462
32768 9.682 9.932 10.015 103.701 1.497 3318.428
16384 10.962 10.852 11.565 92.106 2.547 5894.813
Unpatched, 2MB readahead, 64 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 3.730 3.714 3.914 270.615 6.396 4.228
33554432 4.445 3.999 3.989 247.710 12.276 7.741
16777216 4.763 4.712 4.709 216.590 1.122 13.537
8388608 5.001 5.086 5.229 200.649 3.673 25.081
4194304 6.365 6.362 6.905 156.710 5.948 39.178
2097152 7.390 7.367 7.270 139.470 0.992 69.735
1048576 7.038 7.050 7.090 145.052 0.456 145.052
524288 6.862 7.167 7.278 144.272 3.617 288.544
262144 7.266 7.313 7.265 140.635 0.436 562.540
131072 8.677 8.735 8.821 117.108 0.790 936.865
65536 10.865 10.040 10.038 99.418 3.658 1590.685
32768 10.167 10.130 10.177 100.805 0.201 3225.749
16384 11.643 11.017 11.103 91.041 2.203 5826.629
Patched, 128KB readahead, 512 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 5.670 5.188 5.636 186.555 7.671 2.915
33554432 6.069 5.971 6.141 168.992 1.954 5.281
16777216 7.821 7.501 7.372 135.451 3.340 8.466
8388608 9.147 8.618 9.000 114.849 2.908 14.356
4194304 12.199 12.914 12.381 81.981 1.964 20.495
2097152 13.449 13.891 14.288 73.842 1.828 36.921
1048576 11.890 12.182 11.519 86.360 1.984 86.360
524288 11.899 12.706 12.135 83.678 2.287 167.357
262144 7.460 7.559 7.563 136.041 0.864 544.164
131072 7.987 8.003 8.530 125.403 3.792 1003.220
65536 10.179 10.119 10.131 100.957 0.255 1615.312
32768 9.899 9.923 10.589 101.114 3.121 3235.656
16384 10.849 10.835 10.876 94.351 0.150 6038.474
Patched, 512KB readahead, 512 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 5.062 5.111 5.083 201.358 0.795 3.146
33554432 5.589 5.713 5.657 181.165 1.625 5.661
16777216 6.337 7.220 6.457 154.002 8.690 9.625
8388608 7.952 7.880 7.527 131.588 3.192 16.448
4194304 10.695 11.224 10.736 94.119 2.047 23.530
2097152 10.898 12.072 12.358 87.215 4.839 43.607
1048576 10.890 11.347 9.290 98.166 8.664 98.166
524288 10.898 11.032 10.887 93.611 0.560 187.223
262144 6.714 7.230 6.804 148.219 4.724 592.875
131072 7.325 7.342 7.363 139.441 0.295 1115.530
65536 9.773 9.988 10.592 101.327 3.417 1621.227
32768 10.031 9.995 10.086 102.019 0.377 3264.620
16384 11.041 10.987 11.564 91.502 2.093 5856.144
Patched, 2MB readahead, 512 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 4.970 5.097 5.188 201.435 3.559 3.147
33554432 5.588 5.793 5.169 186.042 8.923 5.814
16777216 6.151 6.414 6.526 161.012 4.027 10.063
8388608 7.836 7.299 7.475 135.980 3.989 16.998
4194304 11.792 10.964 10.158 93.683 5.706 23.421
2097152 11.225 11.492 11.357 90.162 0.866 45.081
1048576 12.017 11.258 11.432 88.580 2.449 88.580
524288 5.974 10.883 11.840 117.323 38.361 234.647
262144 6.774 6.765 6.526 153.155 2.661 612.619
131072 8.036 7.324 7.341 135.579 5.766 1084.633
65536 9.964 10.595 9.999 100.608 2.806 1609.735
32768 10.132 10.036 10.190 101.197 0.637 3238.308
16384 11.133 11.568 11.036 91.093 1.850 5829.981
Patched, 512KB readahead, 64 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 3.722 3.698 3.721 275.759 0.809 4.309
33554432 4.058 3.849 3.957 259.063 5.580 8.096
16777216 4.601 4.613 4.738 220.212 2.913 13.763
8388608 5.039 5.534 5.017 197.452 8.791 24.682
4194304 6.302 6.270 6.282 162.942 0.341 40.735
2097152 7.314 7.302 7.069 141.700 2.233 70.850
1048576 6.881 7.655 6.909 143.597 6.951 143.597
524288 7.163 7.025 6.951 145.344 1.803 290.687
262144 7.315 7.233 7.299 140.621 0.689 562.482
131072 9.292 8.756 8.807 114.475 3.036 915.803
65536 9.942 9.985 9.960 102.787 0.181 1644.598
32768 10.721 10.091 10.192 99.154 2.605 3172.935
16384 11.049 11.016 11.065 92.727 0.169 5934.531
Patched, 2MB readahead, 64 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 3.697 3.819 3.741 272.931 3.661 4.265
33554432 3.951 3.905 4.038 258.320 3.586 8.073
16777216 5.595 5.182 4.864 197.044 11.236 12.315
8388608 5.267 5.156 5.116 197.725 2.431 24.716
4194304 6.411 6.335 6.290 161.389 1.267 40.347
2097152 7.329 7.663 7.462 136.860 2.502 68.430
1048576 7.225 7.077 7.215 142.784 1.352 142.784
524288 6.903 7.015 7.095 146.210 1.647 292.419
262144 7.365 7.926 7.278 136.309 5.076 545.237
131072 8.796 8.819 8.814 116.233 0.130 929.862
65536 9.998 10.609 9.995 100.464 2.786 1607.423
32768 10.161 10.124 10.246 100.623 0.505 3219.943
Regards,
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 14:00 ` Ronald Moesbergen
@ 2009-06-29 14:21 ` Wu Fengguang
2009-06-29 15:01 ` Wu Fengguang
2009-06-30 10:22 ` Vladislav Bolkhovitin
1 sibling, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 14:21 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: Vladislav Bolkhovitin, Andrew Morton, kosaki.motohiro,
Alan.Brunelle, hifumi.hisashi, linux-kernel, linux-fsdevel,
jens.axboe, randy.dunlap, Bart Van Assche
On Mon, Jun 29, 2009 at 10:00:20PM +0800, Ronald Moesbergen wrote:
> ... tests ...
>
> > We started with 2.6.29, so why not complete with it (to save additional
> > Ronald's effort to move on 2.6.30)?
> >
> >>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
> >>
> >> How about 2MB RAID readahead size? That transforms into about 512KB
> >> per-disk readahead size.
> >
> > OK. Ronald, can you 4 more test cases, please:
> >
> > 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
> >
> > 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
> > max_sectors_kb, the rest is default
> >
> > 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> > read-ahead, the rest is default
> >
> > 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> > read-ahead, 64 KB max_sectors_kb, the rest is default
>
> The results:
I made a blindless average:
N MB/s IOPS case
0 114.859 984.148 Unpatched, 128KB readahead, 512 max_sectors_kb
1 122.960 981.213 Unpatched, 512KB readahead, 512 max_sectors_kb
2 120.709 985.111 Unpatched, 2MB readahead, 512 max_sectors_kb
3 158.732 1004.714 Unpatched, 512KB readahead, 64 max_sectors_kb
4 159.237 979.659 Unpatched, 2MB readahead, 64 max_sectors_kb
5 114.583 982.998 Patched, 128KB readahead, 512 max_sectors_kb
6 124.902 987.523 Patched, 512KB readahead, 512 max_sectors_kb
7 127.373 984.848 Patched, 2MB readahead, 512 max_sectors_kb
8 161.218 986.698 Patched, 512KB readahead, 64 max_sectors_kb
9 163.908 574.651 Patched, 2MB readahead, 64 max_sectors_kb
So before/after patch:
avg throughput 135.299 => 138.397 by +2.3%
avg IOPS 986.969 => 903.344 by -8.5%
The IOPS is a bit weird.
Summaries:
- this patch improves RAID throughput by +2.3% on average
- after this patch, 2MB readahead performs slightly better
(by 1-2%) than 512KB readahead
Thanks,
Fengguang
> Unpatched, 128KB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.621 5.503 5.419 185.744 2.780 2.902
> 33554432 6.628 5.897 6.242 164.068 7.827 5.127
> 16777216 7.312 7.165 7.614 139.148 3.501 8.697
> 8388608 8.719 8.408 8.694 119.003 1.973 14.875
> 4194304 11.836 12.192 12.137 84.958 1.111 21.239
> 2097152 13.452 13.992 14.035 74.090 1.442 37.045
> 1048576 12.759 11.996 12.195 83.194 2.152 83.194
> 524288 11.895 12.297 12.587 83.570 1.945 167.140
> 262144 7.325 7.285 7.444 139.304 1.272 557.214
> 131072 7.992 8.832 7.952 124.279 5.901 994.228
> 65536 10.940 10.062 10.122 98.847 3.715 1581.545
> 32768 9.973 10.012 9.945 102.640 0.281 3284.493
> 16384 11.377 10.538 10.692 94.316 3.100 6036.222
>
> Unpatched, 512KB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.032 4.770 5.265 204.228 8.271 3.191
> 33554432 5.569 5.712 5.863 179.263 3.755 5.602
> 16777216 6.661 6.857 6.550 153.132 2.888 9.571
> 8388608 8.022 8.000 7.978 127.998 0.288 16.000
> 4194304 10.959 11.579 12.208 88.586 3.902 22.146
> 2097152 13.692 12.670 12.625 78.906 2.914 39.453
> 1048576 11.120 11.144 10.878 92.703 1.018 92.703
> 524288 11.234 10.915 11.374 91.667 1.587 183.334
> 262144 6.848 6.678 6.795 151.191 1.594 604.763
> 131072 7.393 7.367 7.337 139.025 0.428 1112.202
> 65536 10.003 10.919 10.015 99.466 4.019 1591.462
> 32768 10.117 10.124 10.169 101.018 0.229 3232.574
> 16384 11.614 11.027 11.029 91.293 2.207 5842.771
>
> Unpatched, 2MB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.268 5.316 5.418 191.996 2.241 3.000
> 33554432 5.831 6.459 6.110 167.259 6.977 5.227
> 16777216 7.313 7.069 7.197 142.385 1.972 8.899
> 8388608 8.657 8.500 8.498 119.754 1.039 14.969
> 4194304 11.846 12.116 11.801 85.911 0.994 21.478
> 2097152 12.917 13.652 13.100 77.484 1.808 38.742
> 1048576 9.544 10.667 10.807 99.345 5.640 99.345
> 524288 11.736 7.171 6.599 128.410 29.539 256.821
> 262144 7.530 7.403 7.416 137.464 1.053 549.857
> 131072 8.741 8.002 8.022 124.256 5.029 994.051
> 65536 10.701 10.138 10.090 99.394 2.629 1590.311
> 32768 9.978 9.950 9.934 102.875 0.188 3291.994
> 16384 11.435 10.823 10.907 92.684 2.234 5931.749
>
> Unpatched, 512KB readahead, 64 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 3.994 3.991 4.123 253.774 3.838 3.965
> 33554432 4.100 4.329 4.161 244.111 5.569 7.628
> 16777216 5.476 4.835 5.079 200.148 10.177 12.509
> 8388608 5.484 5.258 5.227 192.470 4.084 24.059
> 4194304 6.429 6.458 6.435 158.989 0.315 39.747
> 2097152 7.219 7.744 7.306 138.081 4.187 69.040
> 1048576 6.850 6.897 6.776 149.696 1.089 149.696
> 524288 6.406 6.393 6.469 159.439 0.814 318.877
> 262144 6.865 7.508 6.861 144.931 6.041 579.726
> 131072 8.435 8.482 8.307 121.792 1.076 974.334
> 65536 9.616 9.610 10.262 104.279 3.176 1668.462
> 32768 9.682 9.932 10.015 103.701 1.497 3318.428
> 16384 10.962 10.852 11.565 92.106 2.547 5894.813
>
> Unpatched, 2MB readahead, 64 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 3.730 3.714 3.914 270.615 6.396 4.228
> 33554432 4.445 3.999 3.989 247.710 12.276 7.741
> 16777216 4.763 4.712 4.709 216.590 1.122 13.537
> 8388608 5.001 5.086 5.229 200.649 3.673 25.081
> 4194304 6.365 6.362 6.905 156.710 5.948 39.178
> 2097152 7.390 7.367 7.270 139.470 0.992 69.735
> 1048576 7.038 7.050 7.090 145.052 0.456 145.052
> 524288 6.862 7.167 7.278 144.272 3.617 288.544
> 262144 7.266 7.313 7.265 140.635 0.436 562.540
> 131072 8.677 8.735 8.821 117.108 0.790 936.865
> 65536 10.865 10.040 10.038 99.418 3.658 1590.685
> 32768 10.167 10.130 10.177 100.805 0.201 3225.749
> 16384 11.643 11.017 11.103 91.041 2.203 5826.629
>
> Patched, 128KB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.670 5.188 5.636 186.555 7.671 2.915
> 33554432 6.069 5.971 6.141 168.992 1.954 5.281
> 16777216 7.821 7.501 7.372 135.451 3.340 8.466
> 8388608 9.147 8.618 9.000 114.849 2.908 14.356
> 4194304 12.199 12.914 12.381 81.981 1.964 20.495
> 2097152 13.449 13.891 14.288 73.842 1.828 36.921
> 1048576 11.890 12.182 11.519 86.360 1.984 86.360
> 524288 11.899 12.706 12.135 83.678 2.287 167.357
> 262144 7.460 7.559 7.563 136.041 0.864 544.164
> 131072 7.987 8.003 8.530 125.403 3.792 1003.220
> 65536 10.179 10.119 10.131 100.957 0.255 1615.312
> 32768 9.899 9.923 10.589 101.114 3.121 3235.656
> 16384 10.849 10.835 10.876 94.351 0.150 6038.474
>
> Patched, 512KB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.062 5.111 5.083 201.358 0.795 3.146
> 33554432 5.589 5.713 5.657 181.165 1.625 5.661
> 16777216 6.337 7.220 6.457 154.002 8.690 9.625
> 8388608 7.952 7.880 7.527 131.588 3.192 16.448
> 4194304 10.695 11.224 10.736 94.119 2.047 23.530
> 2097152 10.898 12.072 12.358 87.215 4.839 43.607
> 1048576 10.890 11.347 9.290 98.166 8.664 98.166
> 524288 10.898 11.032 10.887 93.611 0.560 187.223
> 262144 6.714 7.230 6.804 148.219 4.724 592.875
> 131072 7.325 7.342 7.363 139.441 0.295 1115.530
> 65536 9.773 9.988 10.592 101.327 3.417 1621.227
> 32768 10.031 9.995 10.086 102.019 0.377 3264.620
> 16384 11.041 10.987 11.564 91.502 2.093 5856.144
>
> Patched, 2MB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 4.970 5.097 5.188 201.435 3.559 3.147
> 33554432 5.588 5.793 5.169 186.042 8.923 5.814
> 16777216 6.151 6.414 6.526 161.012 4.027 10.063
> 8388608 7.836 7.299 7.475 135.980 3.989 16.998
> 4194304 11.792 10.964 10.158 93.683 5.706 23.421
> 2097152 11.225 11.492 11.357 90.162 0.866 45.081
> 1048576 12.017 11.258 11.432 88.580 2.449 88.580
> 524288 5.974 10.883 11.840 117.323 38.361 234.647
> 262144 6.774 6.765 6.526 153.155 2.661 612.619
> 131072 8.036 7.324 7.341 135.579 5.766 1084.633
> 65536 9.964 10.595 9.999 100.608 2.806 1609.735
> 32768 10.132 10.036 10.190 101.197 0.637 3238.308
> 16384 11.133 11.568 11.036 91.093 1.850 5829.981
>
> Patched, 512KB readahead, 64 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 3.722 3.698 3.721 275.759 0.809 4.309
> 33554432 4.058 3.849 3.957 259.063 5.580 8.096
> 16777216 4.601 4.613 4.738 220.212 2.913 13.763
> 8388608 5.039 5.534 5.017 197.452 8.791 24.682
> 4194304 6.302 6.270 6.282 162.942 0.341 40.735
> 2097152 7.314 7.302 7.069 141.700 2.233 70.850
> 1048576 6.881 7.655 6.909 143.597 6.951 143.597
> 524288 7.163 7.025 6.951 145.344 1.803 290.687
> 262144 7.315 7.233 7.299 140.621 0.689 562.482
> 131072 9.292 8.756 8.807 114.475 3.036 915.803
> 65536 9.942 9.985 9.960 102.787 0.181 1644.598
> 32768 10.721 10.091 10.192 99.154 2.605 3172.935
> 16384 11.049 11.016 11.065 92.727 0.169 5934.531
>
> Patched, 2MB readahead, 64 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 3.697 3.819 3.741 272.931 3.661 4.265
> 33554432 3.951 3.905 4.038 258.320 3.586 8.073
> 16777216 5.595 5.182 4.864 197.044 11.236 12.315
> 8388608 5.267 5.156 5.116 197.725 2.431 24.716
> 4194304 6.411 6.335 6.290 161.389 1.267 40.347
> 2097152 7.329 7.663 7.462 136.860 2.502 68.430
> 1048576 7.225 7.077 7.215 142.784 1.352 142.784
> 524288 6.903 7.015 7.095 146.210 1.647 292.419
> 262144 7.365 7.926 7.278 136.309 5.076 545.237
> 131072 8.796 8.819 8.814 116.233 0.130 929.862
> 65536 9.998 10.609 9.995 100.464 2.786 1607.423
> 32768 10.161 10.124 10.246 100.623 0.505 3219.943
>
> Regards,
> Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 13:28 ` Wu Fengguang
@ 2009-06-29 14:43 ` Ronald Moesbergen
2009-06-29 14:51 ` Wu Fengguang
0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-06-29 14:43 UTC (permalink / raw)
To: Wu Fengguang
Cc: Vladislav Bolkhovitin, Andrew Morton, kosaki.motohiro,
Alan.Brunelle, hifumi.hisashi, linux-kernel, linux-fsdevel,
jens.axboe, randy.dunlap, Bart Van Assche
2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
> On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
>> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
>> > Wu Fengguang, on 06/29/2009 04:54 PM wrote:
>> > >
>> > > Why not 2.6.30? :)
>> >
>> > We started with 2.6.29, so why not complete with it (to save additional
>> > Ronald's effort to move on 2.6.30)?
>>
>> OK, that's fair enough.
>
> btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
> in case it will help the SCST performance.
>
> Ronald, if you run context readahead, please make sure that the server
> side readahead size is bigger than the client side readahead size.
I tried this patch on a vanilla kernel and no other patches applied,
but it does not seem to help. The iSCSI throughput does not go above
60MB/s. (1GB in 17 seconds). I have tried several readahead settings
from 128KB up to 4MB and kept the server readahead at twice the client
readahead, but it never comes above 60MB/s. This is using SCST on the
serverside and openiscsi on the client. I get much better throughput
(90 MB/s) when using the patches supplied with SCST, together with the
blk_run_backing_dev readahead patch.
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 14:43 ` Ronald Moesbergen
@ 2009-06-29 14:51 ` Wu Fengguang
2009-06-29 14:56 ` Ronald Moesbergen
2009-06-29 15:37 ` Vladislav Bolkhovitin
0 siblings, 2 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 14:51 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: Vladislav Bolkhovitin, Andrew Morton, kosaki.motohiro,
Alan.Brunelle, hifumi.hisashi, linux-kernel, linux-fsdevel,
jens.axboe, randy.dunlap, Bart Van Assche
On Mon, Jun 29, 2009 at 10:43:48PM +0800, Ronald Moesbergen wrote:
> 2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
> > On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
> >> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
> >> > Wu Fengguang, on 06/29/2009 04:54 PM wrote:
> >> > >
> >> > > Why not 2.6.30? :)
> >> >
> >> > We started with 2.6.29, so why not complete with it (to save additional
> >> > Ronald's effort to move on 2.6.30)?
> >>
> >> OK, that's fair enough.
> >
> > btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
> > in case it will help the SCST performance.
> >
> > Ronald, if you run context readahead, please make sure that the server
> > side readahead size is bigger than the client side readahead size.
>
> I tried this patch on a vanilla kernel and no other patches applied,
> but it does not seem to help. The iSCSI throughput does not go above
> 60MB/s. (1GB in 17 seconds). I have tried several readahead settings
> from 128KB up to 4MB and kept the server readahead at twice the client
> readahead, but it never comes above 60MB/s. This is using SCST on the
OK, thanks for the tests anyway!
> serverside and openiscsi on the client. I get much better throughput
> (90 MB/s) when using the patches supplied with SCST, together with the
What do you mean by "patches supplied with SCST"?
> blk_run_backing_dev readahead patch.
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 14:51 ` Wu Fengguang
@ 2009-06-29 14:56 ` Ronald Moesbergen
2009-06-29 15:37 ` Vladislav Bolkhovitin
1 sibling, 0 replies; 65+ messages in thread
From: Ronald Moesbergen @ 2009-06-29 14:56 UTC (permalink / raw)
To: Wu Fengguang
Cc: Vladislav Bolkhovitin, Andrew Morton, kosaki.motohiro,
Alan.Brunelle, hifumi.hisashi, linux-kernel, linux-fsdevel,
jens.axboe, randy.dunlap, Bart Van Assche
2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
> On Mon, Jun 29, 2009 at 10:43:48PM +0800, Ronald Moesbergen wrote:
>> 2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
>> > On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
>> >> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
>> >> > Wu Fengguang, on 06/29/2009 04:54 PM wrote:
>> >> > >
>> >> > > Why not 2.6.30? :)
>> >> >
>> >> > We started with 2.6.29, so why not complete with it (to save additional
>> >> > Ronald's effort to move on 2.6.30)?
>> >>
>> >> OK, that's fair enough.
>> >
>> > btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
>> > in case it will help the SCST performance.
>> >
>> > Ronald, if you run context readahead, please make sure that the server
>> > side readahead size is bigger than the client side readahead size.
>>
>> I tried this patch on a vanilla kernel and no other patches applied,
>> but it does not seem to help. The iSCSI throughput does not go above
>> 60MB/s. (1GB in 17 seconds). I have tried several readahead settings
>> from 128KB up to 4MB and kept the server readahead at twice the client
>> readahead, but it never comes above 60MB/s. This is using SCST on the
>
> OK, thanks for the tests anyway!
You're welcome.
>> serverside and openiscsi on the client. I get much better throughput
>> (90 MB/s) when using the patches supplied with SCST, together with the
>
> What do you mean by "patches supplied with SCST"?
These:
http://scst.svn.sourceforge.net/viewvc/scst/trunk/scst/kernel/
Regards,
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 14:21 ` Wu Fengguang
@ 2009-06-29 15:01 ` Wu Fengguang
2009-06-29 15:37 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 15:01 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: Vladislav Bolkhovitin, Andrew Morton, kosaki.motohiro,
Alan.Brunelle, hifumi.hisashi, linux-kernel, linux-fsdevel,
jens.axboe, randy.dunlap, Bart Van Assche
On Mon, Jun 29, 2009 at 10:21:24PM +0800, Wu Fengguang wrote:
> On Mon, Jun 29, 2009 at 10:00:20PM +0800, Ronald Moesbergen wrote:
> > ... tests ...
> >
> > > We started with 2.6.29, so why not complete with it (to save additional
> > > Ronald's effort to move on 2.6.30)?
> > >
> > >>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
> > >>
> > >> How about 2MB RAID readahead size? That transforms into about 512KB
> > >> per-disk readahead size.
> > >
> > > OK. Ronald, can you 4 more test cases, please:
> > >
> > > 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
> > >
> > > 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
> > > max_sectors_kb, the rest is default
> > >
> > > 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> > > read-ahead, the rest is default
> > >
> > > 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> > > read-ahead, 64 KB max_sectors_kb, the rest is default
> >
> > The results:
>
> I made a blindless average:
>
> N MB/s IOPS case
>
> 0 114.859 984.148 Unpatched, 128KB readahead, 512 max_sectors_kb
> 1 122.960 981.213 Unpatched, 512KB readahead, 512 max_sectors_kb
> 2 120.709 985.111 Unpatched, 2MB readahead, 512 max_sectors_kb
> 3 158.732 1004.714 Unpatched, 512KB readahead, 64 max_sectors_kb
> 4 159.237 979.659 Unpatched, 2MB readahead, 64 max_sectors_kb
>
> 5 114.583 982.998 Patched, 128KB readahead, 512 max_sectors_kb
> 6 124.902 987.523 Patched, 512KB readahead, 512 max_sectors_kb
> 7 127.373 984.848 Patched, 2MB readahead, 512 max_sectors_kb
> 8 161.218 986.698 Patched, 512KB readahead, 64 max_sectors_kb
> 9 163.908 574.651 Patched, 2MB readahead, 64 max_sectors_kb
>
> So before/after patch:
>
> avg throughput 135.299 => 138.397 by +2.3%
> avg IOPS 986.969 => 903.344 by -8.5%
>
> The IOPS is a bit weird.
>
> Summaries:
> - this patch improves RAID throughput by +2.3% on average
> - after this patch, 2MB readahead performs slightly better
> (by 1-2%) than 512KB readahead
and the most important one:
- 64 max_sectors_kb performs much better than 256 max_sectors_kb, by ~30% !
Thanks,
Fengguang
> > Unpatched, 128KB readahead, 512 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 5.621 5.503 5.419 185.744 2.780 2.902
> > 33554432 6.628 5.897 6.242 164.068 7.827 5.127
> > 16777216 7.312 7.165 7.614 139.148 3.501 8.697
> > 8388608 8.719 8.408 8.694 119.003 1.973 14.875
> > 4194304 11.836 12.192 12.137 84.958 1.111 21.239
> > 2097152 13.452 13.992 14.035 74.090 1.442 37.045
> > 1048576 12.759 11.996 12.195 83.194 2.152 83.194
> > 524288 11.895 12.297 12.587 83.570 1.945 167.140
> > 262144 7.325 7.285 7.444 139.304 1.272 557.214
> > 131072 7.992 8.832 7.952 124.279 5.901 994.228
> > 65536 10.940 10.062 10.122 98.847 3.715 1581.545
> > 32768 9.973 10.012 9.945 102.640 0.281 3284.493
> > 16384 11.377 10.538 10.692 94.316 3.100 6036.222
> >
> > Unpatched, 512KB readahead, 512 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 5.032 4.770 5.265 204.228 8.271 3.191
> > 33554432 5.569 5.712 5.863 179.263 3.755 5.602
> > 16777216 6.661 6.857 6.550 153.132 2.888 9.571
> > 8388608 8.022 8.000 7.978 127.998 0.288 16.000
> > 4194304 10.959 11.579 12.208 88.586 3.902 22.146
> > 2097152 13.692 12.670 12.625 78.906 2.914 39.453
> > 1048576 11.120 11.144 10.878 92.703 1.018 92.703
> > 524288 11.234 10.915 11.374 91.667 1.587 183.334
> > 262144 6.848 6.678 6.795 151.191 1.594 604.763
> > 131072 7.393 7.367 7.337 139.025 0.428 1112.202
> > 65536 10.003 10.919 10.015 99.466 4.019 1591.462
> > 32768 10.117 10.124 10.169 101.018 0.229 3232.574
> > 16384 11.614 11.027 11.029 91.293 2.207 5842.771
> >
> > Unpatched, 2MB readahead, 512 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 5.268 5.316 5.418 191.996 2.241 3.000
> > 33554432 5.831 6.459 6.110 167.259 6.977 5.227
> > 16777216 7.313 7.069 7.197 142.385 1.972 8.899
> > 8388608 8.657 8.500 8.498 119.754 1.039 14.969
> > 4194304 11.846 12.116 11.801 85.911 0.994 21.478
> > 2097152 12.917 13.652 13.100 77.484 1.808 38.742
> > 1048576 9.544 10.667 10.807 99.345 5.640 99.345
> > 524288 11.736 7.171 6.599 128.410 29.539 256.821
> > 262144 7.530 7.403 7.416 137.464 1.053 549.857
> > 131072 8.741 8.002 8.022 124.256 5.029 994.051
> > 65536 10.701 10.138 10.090 99.394 2.629 1590.311
> > 32768 9.978 9.950 9.934 102.875 0.188 3291.994
> > 16384 11.435 10.823 10.907 92.684 2.234 5931.749
> >
> > Unpatched, 512KB readahead, 64 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 3.994 3.991 4.123 253.774 3.838 3.965
> > 33554432 4.100 4.329 4.161 244.111 5.569 7.628
> > 16777216 5.476 4.835 5.079 200.148 10.177 12.509
> > 8388608 5.484 5.258 5.227 192.470 4.084 24.059
> > 4194304 6.429 6.458 6.435 158.989 0.315 39.747
> > 2097152 7.219 7.744 7.306 138.081 4.187 69.040
> > 1048576 6.850 6.897 6.776 149.696 1.089 149.696
> > 524288 6.406 6.393 6.469 159.439 0.814 318.877
> > 262144 6.865 7.508 6.861 144.931 6.041 579.726
> > 131072 8.435 8.482 8.307 121.792 1.076 974.334
> > 65536 9.616 9.610 10.262 104.279 3.176 1668.462
> > 32768 9.682 9.932 10.015 103.701 1.497 3318.428
> > 16384 10.962 10.852 11.565 92.106 2.547 5894.813
> >
> > Unpatched, 2MB readahead, 64 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 3.730 3.714 3.914 270.615 6.396 4.228
> > 33554432 4.445 3.999 3.989 247.710 12.276 7.741
> > 16777216 4.763 4.712 4.709 216.590 1.122 13.537
> > 8388608 5.001 5.086 5.229 200.649 3.673 25.081
> > 4194304 6.365 6.362 6.905 156.710 5.948 39.178
> > 2097152 7.390 7.367 7.270 139.470 0.992 69.735
> > 1048576 7.038 7.050 7.090 145.052 0.456 145.052
> > 524288 6.862 7.167 7.278 144.272 3.617 288.544
> > 262144 7.266 7.313 7.265 140.635 0.436 562.540
> > 131072 8.677 8.735 8.821 117.108 0.790 936.865
> > 65536 10.865 10.040 10.038 99.418 3.658 1590.685
> > 32768 10.167 10.130 10.177 100.805 0.201 3225.749
> > 16384 11.643 11.017 11.103 91.041 2.203 5826.629
> >
> > Patched, 128KB readahead, 512 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 5.670 5.188 5.636 186.555 7.671 2.915
> > 33554432 6.069 5.971 6.141 168.992 1.954 5.281
> > 16777216 7.821 7.501 7.372 135.451 3.340 8.466
> > 8388608 9.147 8.618 9.000 114.849 2.908 14.356
> > 4194304 12.199 12.914 12.381 81.981 1.964 20.495
> > 2097152 13.449 13.891 14.288 73.842 1.828 36.921
> > 1048576 11.890 12.182 11.519 86.360 1.984 86.360
> > 524288 11.899 12.706 12.135 83.678 2.287 167.357
> > 262144 7.460 7.559 7.563 136.041 0.864 544.164
> > 131072 7.987 8.003 8.530 125.403 3.792 1003.220
> > 65536 10.179 10.119 10.131 100.957 0.255 1615.312
> > 32768 9.899 9.923 10.589 101.114 3.121 3235.656
> > 16384 10.849 10.835 10.876 94.351 0.150 6038.474
> >
> > Patched, 512KB readahead, 512 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 5.062 5.111 5.083 201.358 0.795 3.146
> > 33554432 5.589 5.713 5.657 181.165 1.625 5.661
> > 16777216 6.337 7.220 6.457 154.002 8.690 9.625
> > 8388608 7.952 7.880 7.527 131.588 3.192 16.448
> > 4194304 10.695 11.224 10.736 94.119 2.047 23.530
> > 2097152 10.898 12.072 12.358 87.215 4.839 43.607
> > 1048576 10.890 11.347 9.290 98.166 8.664 98.166
> > 524288 10.898 11.032 10.887 93.611 0.560 187.223
> > 262144 6.714 7.230 6.804 148.219 4.724 592.875
> > 131072 7.325 7.342 7.363 139.441 0.295 1115.530
> > 65536 9.773 9.988 10.592 101.327 3.417 1621.227
> > 32768 10.031 9.995 10.086 102.019 0.377 3264.620
> > 16384 11.041 10.987 11.564 91.502 2.093 5856.144
> >
> > Patched, 2MB readahead, 512 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 4.970 5.097 5.188 201.435 3.559 3.147
> > 33554432 5.588 5.793 5.169 186.042 8.923 5.814
> > 16777216 6.151 6.414 6.526 161.012 4.027 10.063
> > 8388608 7.836 7.299 7.475 135.980 3.989 16.998
> > 4194304 11.792 10.964 10.158 93.683 5.706 23.421
> > 2097152 11.225 11.492 11.357 90.162 0.866 45.081
> > 1048576 12.017 11.258 11.432 88.580 2.449 88.580
> > 524288 5.974 10.883 11.840 117.323 38.361 234.647
> > 262144 6.774 6.765 6.526 153.155 2.661 612.619
> > 131072 8.036 7.324 7.341 135.579 5.766 1084.633
> > 65536 9.964 10.595 9.999 100.608 2.806 1609.735
> > 32768 10.132 10.036 10.190 101.197 0.637 3238.308
> > 16384 11.133 11.568 11.036 91.093 1.850 5829.981
> >
> > Patched, 512KB readahead, 64 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 3.722 3.698 3.721 275.759 0.809 4.309
> > 33554432 4.058 3.849 3.957 259.063 5.580 8.096
> > 16777216 4.601 4.613 4.738 220.212 2.913 13.763
> > 8388608 5.039 5.534 5.017 197.452 8.791 24.682
> > 4194304 6.302 6.270 6.282 162.942 0.341 40.735
> > 2097152 7.314 7.302 7.069 141.700 2.233 70.850
> > 1048576 6.881 7.655 6.909 143.597 6.951 143.597
> > 524288 7.163 7.025 6.951 145.344 1.803 290.687
> > 262144 7.315 7.233 7.299 140.621 0.689 562.482
> > 131072 9.292 8.756 8.807 114.475 3.036 915.803
> > 65536 9.942 9.985 9.960 102.787 0.181 1644.598
> > 32768 10.721 10.091 10.192 99.154 2.605 3172.935
> > 16384 11.049 11.016 11.065 92.727 0.169 5934.531
> >
> > Patched, 2MB readahead, 64 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 3.697 3.819 3.741 272.931 3.661 4.265
> > 33554432 3.951 3.905 4.038 258.320 3.586 8.073
> > 16777216 5.595 5.182 4.864 197.044 11.236 12.315
> > 8388608 5.267 5.156 5.116 197.725 2.431 24.716
> > 4194304 6.411 6.335 6.290 161.389 1.267 40.347
> > 2097152 7.329 7.663 7.462 136.860 2.502 68.430
> > 1048576 7.225 7.077 7.215 142.784 1.352 142.784
> > 524288 6.903 7.015 7.095 146.210 1.647 292.419
> > 262144 7.365 7.926 7.278 136.309 5.076 545.237
> > 131072 8.796 8.819 8.814 116.233 0.130 929.862
> > 65536 9.998 10.609 9.995 100.464 2.786 1607.423
> > 32768 10.161 10.124 10.246 100.623 0.505 3219.943
> >
> > Regards,
> > Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 14:51 ` Wu Fengguang
2009-06-29 14:56 ` Ronald Moesbergen
@ 2009-06-29 15:37 ` Vladislav Bolkhovitin
1 sibling, 0 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-29 15:37 UTC (permalink / raw)
To: Wu Fengguang
Cc: Ronald Moesbergen, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
Wu Fengguang, on 06/29/2009 06:51 PM wrote:
> On Mon, Jun 29, 2009 at 10:43:48PM +0800, Ronald Moesbergen wrote:
>> 2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
>>> On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
>>>> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
>>>>> Wu Fengguang, on 06/29/2009 04:54 PM wrote:
>>>>>> Why not 2.6.30? :)
>>>>> We started with 2.6.29, so why not complete with it (to save additional
>>>>> Ronald's effort to move on 2.6.30)?
>>>> OK, that's fair enough.
>>> btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
>>> in case it will help the SCST performance.
>>>
>>> Ronald, if you run context readahead, please make sure that the server
>>> side readahead size is bigger than the client side readahead size.
>> I tried this patch on a vanilla kernel and no other patches applied,
>> but it does not seem to help. The iSCSI throughput does not go above
>> 60MB/s. (1GB in 17 seconds). I have tried several readahead settings
>> from 128KB up to 4MB and kept the server readahead at twice the client
>> readahead, but it never comes above 60MB/s. This is using SCST on the
>
> OK, thanks for the tests anyway!
>
>> serverside and openiscsi on the client. I get much better throughput
>> (90 MB/s) when using the patches supplied with SCST, together with the
>
> What do you mean by "patches supplied with SCST"?
Ronald means io_context patch
(http://scst.svn.sourceforge.net/viewvc/scst/trunk/scst/kernel/io_context-2.6.29.patch?revision=717),
which allows SCST's I/O threads to share a single IO context.
Vlad
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 15:01 ` Wu Fengguang
@ 2009-06-29 15:37 ` Vladislav Bolkhovitin
[not found] ` <20090630010414.GB31418@localhost>
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-29 15:37 UTC (permalink / raw)
To: Wu Fengguang
Cc: Ronald Moesbergen, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
Wu Fengguang, on 06/29/2009 07:01 PM wrote:
> On Mon, Jun 29, 2009 at 10:21:24PM +0800, Wu Fengguang wrote:
>> On Mon, Jun 29, 2009 at 10:00:20PM +0800, Ronald Moesbergen wrote:
>>> ... tests ...
>>>
>>>> We started with 2.6.29, so why not complete with it (to save additional
>>>> Ronald's effort to move on 2.6.30)?
>>>>
>>>>>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
>>>>> How about 2MB RAID readahead size? That transforms into about 512KB
>>>>> per-disk readahead size.
>>>> OK. Ronald, can you 4 more test cases, please:
>>>>
>>>> 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
>>>>
>>>> 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
>>>> max_sectors_kb, the rest is default
>>>>
>>>> 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
>>>> read-ahead, the rest is default
>>>>
>>>> 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
>>>> read-ahead, 64 KB max_sectors_kb, the rest is default
>>> The results:
>> I made a blindless average:
>>
>> N MB/s IOPS case
>>
>> 0 114.859 984.148 Unpatched, 128KB readahead, 512 max_sectors_kb
>> 1 122.960 981.213 Unpatched, 512KB readahead, 512 max_sectors_kb
>> 2 120.709 985.111 Unpatched, 2MB readahead, 512 max_sectors_kb
>> 3 158.732 1004.714 Unpatched, 512KB readahead, 64 max_sectors_kb
>> 4 159.237 979.659 Unpatched, 2MB readahead, 64 max_sectors_kb
>>
>> 5 114.583 982.998 Patched, 128KB readahead, 512 max_sectors_kb
>> 6 124.902 987.523 Patched, 512KB readahead, 512 max_sectors_kb
>> 7 127.373 984.848 Patched, 2MB readahead, 512 max_sectors_kb
>> 8 161.218 986.698 Patched, 512KB readahead, 64 max_sectors_kb
>> 9 163.908 574.651 Patched, 2MB readahead, 64 max_sectors_kb
>>
>> So before/after patch:
>>
>> avg throughput 135.299 => 138.397 by +2.3%
>> avg IOPS 986.969 => 903.344 by -8.5%
>>
>> The IOPS is a bit weird.
>>
>> Summaries:
>> - this patch improves RAID throughput by +2.3% on average
>> - after this patch, 2MB readahead performs slightly better
>> (by 1-2%) than 512KB readahead
>
> and the most important one:
> - 64 max_sectors_kb performs much better than 256 max_sectors_kb, by ~30% !
Yes, I've just wanted to point it out ;)
> Thanks,
> Fengguang
>
>>> Unpatched, 128KB readahead, 512 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 5.621 5.503 5.419 185.744 2.780 2.902
>>> 33554432 6.628 5.897 6.242 164.068 7.827 5.127
>>> 16777216 7.312 7.165 7.614 139.148 3.501 8.697
>>> 8388608 8.719 8.408 8.694 119.003 1.973 14.875
>>> 4194304 11.836 12.192 12.137 84.958 1.111 21.239
>>> 2097152 13.452 13.992 14.035 74.090 1.442 37.045
>>> 1048576 12.759 11.996 12.195 83.194 2.152 83.194
>>> 524288 11.895 12.297 12.587 83.570 1.945 167.140
>>> 262144 7.325 7.285 7.444 139.304 1.272 557.214
>>> 131072 7.992 8.832 7.952 124.279 5.901 994.228
>>> 65536 10.940 10.062 10.122 98.847 3.715 1581.545
>>> 32768 9.973 10.012 9.945 102.640 0.281 3284.493
>>> 16384 11.377 10.538 10.692 94.316 3.100 6036.222
>>>
>>> Unpatched, 512KB readahead, 512 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 5.032 4.770 5.265 204.228 8.271 3.191
>>> 33554432 5.569 5.712 5.863 179.263 3.755 5.602
>>> 16777216 6.661 6.857 6.550 153.132 2.888 9.571
>>> 8388608 8.022 8.000 7.978 127.998 0.288 16.000
>>> 4194304 10.959 11.579 12.208 88.586 3.902 22.146
>>> 2097152 13.692 12.670 12.625 78.906 2.914 39.453
>>> 1048576 11.120 11.144 10.878 92.703 1.018 92.703
>>> 524288 11.234 10.915 11.374 91.667 1.587 183.334
>>> 262144 6.848 6.678 6.795 151.191 1.594 604.763
>>> 131072 7.393 7.367 7.337 139.025 0.428 1112.202
>>> 65536 10.003 10.919 10.015 99.466 4.019 1591.462
>>> 32768 10.117 10.124 10.169 101.018 0.229 3232.574
>>> 16384 11.614 11.027 11.029 91.293 2.207 5842.771
>>>
>>> Unpatched, 2MB readahead, 512 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 5.268 5.316 5.418 191.996 2.241 3.000
>>> 33554432 5.831 6.459 6.110 167.259 6.977 5.227
>>> 16777216 7.313 7.069 7.197 142.385 1.972 8.899
>>> 8388608 8.657 8.500 8.498 119.754 1.039 14.969
>>> 4194304 11.846 12.116 11.801 85.911 0.994 21.478
>>> 2097152 12.917 13.652 13.100 77.484 1.808 38.742
>>> 1048576 9.544 10.667 10.807 99.345 5.640 99.345
>>> 524288 11.736 7.171 6.599 128.410 29.539 256.821
>>> 262144 7.530 7.403 7.416 137.464 1.053 549.857
>>> 131072 8.741 8.002 8.022 124.256 5.029 994.051
>>> 65536 10.701 10.138 10.090 99.394 2.629 1590.311
>>> 32768 9.978 9.950 9.934 102.875 0.188 3291.994
>>> 16384 11.435 10.823 10.907 92.684 2.234 5931.749
>>>
>>> Unpatched, 512KB readahead, 64 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 3.994 3.991 4.123 253.774 3.838 3.965
>>> 33554432 4.100 4.329 4.161 244.111 5.569 7.628
>>> 16777216 5.476 4.835 5.079 200.148 10.177 12.509
>>> 8388608 5.484 5.258 5.227 192.470 4.084 24.059
>>> 4194304 6.429 6.458 6.435 158.989 0.315 39.747
>>> 2097152 7.219 7.744 7.306 138.081 4.187 69.040
>>> 1048576 6.850 6.897 6.776 149.696 1.089 149.696
>>> 524288 6.406 6.393 6.469 159.439 0.814 318.877
>>> 262144 6.865 7.508 6.861 144.931 6.041 579.726
>>> 131072 8.435 8.482 8.307 121.792 1.076 974.334
>>> 65536 9.616 9.610 10.262 104.279 3.176 1668.462
>>> 32768 9.682 9.932 10.015 103.701 1.497 3318.428
>>> 16384 10.962 10.852 11.565 92.106 2.547 5894.813
>>>
>>> Unpatched, 2MB readahead, 64 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 3.730 3.714 3.914 270.615 6.396 4.228
>>> 33554432 4.445 3.999 3.989 247.710 12.276 7.741
>>> 16777216 4.763 4.712 4.709 216.590 1.122 13.537
>>> 8388608 5.001 5.086 5.229 200.649 3.673 25.081
>>> 4194304 6.365 6.362 6.905 156.710 5.948 39.178
>>> 2097152 7.390 7.367 7.270 139.470 0.992 69.735
>>> 1048576 7.038 7.050 7.090 145.052 0.456 145.052
>>> 524288 6.862 7.167 7.278 144.272 3.617 288.544
>>> 262144 7.266 7.313 7.265 140.635 0.436 562.540
>>> 131072 8.677 8.735 8.821 117.108 0.790 936.865
>>> 65536 10.865 10.040 10.038 99.418 3.658 1590.685
>>> 32768 10.167 10.130 10.177 100.805 0.201 3225.749
>>> 16384 11.643 11.017 11.103 91.041 2.203 5826.629
>>>
>>> Patched, 128KB readahead, 512 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 5.670 5.188 5.636 186.555 7.671 2.915
>>> 33554432 6.069 5.971 6.141 168.992 1.954 5.281
>>> 16777216 7.821 7.501 7.372 135.451 3.340 8.466
>>> 8388608 9.147 8.618 9.000 114.849 2.908 14.356
>>> 4194304 12.199 12.914 12.381 81.981 1.964 20.495
>>> 2097152 13.449 13.891 14.288 73.842 1.828 36.921
>>> 1048576 11.890 12.182 11.519 86.360 1.984 86.360
>>> 524288 11.899 12.706 12.135 83.678 2.287 167.357
>>> 262144 7.460 7.559 7.563 136.041 0.864 544.164
>>> 131072 7.987 8.003 8.530 125.403 3.792 1003.220
>>> 65536 10.179 10.119 10.131 100.957 0.255 1615.312
>>> 32768 9.899 9.923 10.589 101.114 3.121 3235.656
>>> 16384 10.849 10.835 10.876 94.351 0.150 6038.474
>>>
>>> Patched, 512KB readahead, 512 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 5.062 5.111 5.083 201.358 0.795 3.146
>>> 33554432 5.589 5.713 5.657 181.165 1.625 5.661
>>> 16777216 6.337 7.220 6.457 154.002 8.690 9.625
>>> 8388608 7.952 7.880 7.527 131.588 3.192 16.448
>>> 4194304 10.695 11.224 10.736 94.119 2.047 23.530
>>> 2097152 10.898 12.072 12.358 87.215 4.839 43.607
>>> 1048576 10.890 11.347 9.290 98.166 8.664 98.166
>>> 524288 10.898 11.032 10.887 93.611 0.560 187.223
>>> 262144 6.714 7.230 6.804 148.219 4.724 592.875
>>> 131072 7.325 7.342 7.363 139.441 0.295 1115.530
>>> 65536 9.773 9.988 10.592 101.327 3.417 1621.227
>>> 32768 10.031 9.995 10.086 102.019 0.377 3264.620
>>> 16384 11.041 10.987 11.564 91.502 2.093 5856.144
>>>
>>> Patched, 2MB readahead, 512 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 4.970 5.097 5.188 201.435 3.559 3.147
>>> 33554432 5.588 5.793 5.169 186.042 8.923 5.814
>>> 16777216 6.151 6.414 6.526 161.012 4.027 10.063
>>> 8388608 7.836 7.299 7.475 135.980 3.989 16.998
>>> 4194304 11.792 10.964 10.158 93.683 5.706 23.421
>>> 2097152 11.225 11.492 11.357 90.162 0.866 45.081
>>> 1048576 12.017 11.258 11.432 88.580 2.449 88.580
>>> 524288 5.974 10.883 11.840 117.323 38.361 234.647
>>> 262144 6.774 6.765 6.526 153.155 2.661 612.619
>>> 131072 8.036 7.324 7.341 135.579 5.766 1084.633
>>> 65536 9.964 10.595 9.999 100.608 2.806 1609.735
>>> 32768 10.132 10.036 10.190 101.197 0.637 3238.308
>>> 16384 11.133 11.568 11.036 91.093 1.850 5829.981
>>>
>>> Patched, 512KB readahead, 64 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 3.722 3.698 3.721 275.759 0.809 4.309
>>> 33554432 4.058 3.849 3.957 259.063 5.580 8.096
>>> 16777216 4.601 4.613 4.738 220.212 2.913 13.763
>>> 8388608 5.039 5.534 5.017 197.452 8.791 24.682
>>> 4194304 6.302 6.270 6.282 162.942 0.341 40.735
>>> 2097152 7.314 7.302 7.069 141.700 2.233 70.850
>>> 1048576 6.881 7.655 6.909 143.597 6.951 143.597
>>> 524288 7.163 7.025 6.951 145.344 1.803 290.687
>>> 262144 7.315 7.233 7.299 140.621 0.689 562.482
>>> 131072 9.292 8.756 8.807 114.475 3.036 915.803
>>> 65536 9.942 9.985 9.960 102.787 0.181 1644.598
>>> 32768 10.721 10.091 10.192 99.154 2.605 3172.935
>>> 16384 11.049 11.016 11.065 92.727 0.169 5934.531
>>>
>>> Patched, 2MB readahead, 64 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 3.697 3.819 3.741 272.931 3.661 4.265
>>> 33554432 3.951 3.905 4.038 258.320 3.586 8.073
>>> 16777216 5.595 5.182 4.864 197.044 11.236 12.315
>>> 8388608 5.267 5.156 5.116 197.725 2.431 24.716
>>> 4194304 6.411 6.335 6.290 161.389 1.267 40.347
>>> 2097152 7.329 7.663 7.462 136.860 2.502 68.430
>>> 1048576 7.225 7.077 7.215 142.784 1.352 142.784
>>> 524288 6.903 7.015 7.095 146.210 1.647 292.419
>>> 262144 7.365 7.926 7.278 136.309 5.076 545.237
>>> 131072 8.796 8.819 8.814 116.233 0.130 929.862
>>> 65536 9.998 10.609 9.995 100.464 2.786 1607.423
>>> 32768 10.161 10.124 10.246 100.623 0.505 3219.943
>>>
>>> Regards,
>>> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-06-29 14:00 ` Ronald Moesbergen
2009-06-29 14:21 ` Wu Fengguang
@ 2009-06-30 10:22 ` Vladislav Bolkhovitin
1 sibling, 0 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-30 10:22 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: Wu Fengguang, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
[-- Attachment #1: Type: text/plain, Size: 11777 bytes --]
Ronald Moesbergen, on 06/29/2009 06:00 PM wrote:
> ... tests ...
>
>> We started with 2.6.29, so why not complete with it (to save additional
>> Ronald's effort to move on 2.6.30)?
>>
>>>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
>>> How about 2MB RAID readahead size? That transforms into about 512KB
>>> per-disk readahead size.
>> OK. Ronald, can you 4 more test cases, please:
>>
>> 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
>>
>> 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
>> max_sectors_kb, the rest is default
>>
>> 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
>> read-ahead, the rest is default
>>
>> 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
>> read-ahead, 64 KB max_sectors_kb, the rest is default
>
> The results:
>
> Unpatched, 128KB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.621 5.503 5.419 185.744 2.780 2.902
> 33554432 6.628 5.897 6.242 164.068 7.827 5.127
> 16777216 7.312 7.165 7.614 139.148 3.501 8.697
> 8388608 8.719 8.408 8.694 119.003 1.973 14.875
> 4194304 11.836 12.192 12.137 84.958 1.111 21.239
> 2097152 13.452 13.992 14.035 74.090 1.442 37.045
> 1048576 12.759 11.996 12.195 83.194 2.152 83.194
> 524288 11.895 12.297 12.587 83.570 1.945 167.140
> 262144 7.325 7.285 7.444 139.304 1.272 557.214
> 131072 7.992 8.832 7.952 124.279 5.901 994.228
> 65536 10.940 10.062 10.122 98.847 3.715 1581.545
> 32768 9.973 10.012 9.945 102.640 0.281 3284.493
> 16384 11.377 10.538 10.692 94.316 3.100 6036.222
>
> Unpatched, 512KB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.032 4.770 5.265 204.228 8.271 3.191
> 33554432 5.569 5.712 5.863 179.263 3.755 5.602
> 16777216 6.661 6.857 6.550 153.132 2.888 9.571
> 8388608 8.022 8.000 7.978 127.998 0.288 16.000
> 4194304 10.959 11.579 12.208 88.586 3.902 22.146
> 2097152 13.692 12.670 12.625 78.906 2.914 39.453
> 1048576 11.120 11.144 10.878 92.703 1.018 92.703
> 524288 11.234 10.915 11.374 91.667 1.587 183.334
Can somebody explain those big throughput drops (66% in this case, 68%
in the above case)? It happens nearly in all the tests, only cases of 64
max_sectors_kb with big RA sizes suffer less from it.
It looks like a possible sing of some not understood deficiency in I/O
submission or read-ahead path.
(blockdev-perftest just runs dd reading 1 GB for each "bs" 3 times, then
calculates the average and IOPS, then prints the results. It's small, so
I attached it.)
> 262144 6.848 6.678 6.795 151.191 1.594 604.763
> 131072 7.393 7.367 7.337 139.025 0.428 1112.202
> 65536 10.003 10.919 10.015 99.466 4.019 1591.462
> 32768 10.117 10.124 10.169 101.018 0.229 3232.574
> 16384 11.614 11.027 11.029 91.293 2.207 5842.771
>
> Unpatched, 2MB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.268 5.316 5.418 191.996 2.241 3.000
> 33554432 5.831 6.459 6.110 167.259 6.977 5.227
> 16777216 7.313 7.069 7.197 142.385 1.972 8.899
> 8388608 8.657 8.500 8.498 119.754 1.039 14.969
> 4194304 11.846 12.116 11.801 85.911 0.994 21.478
> 2097152 12.917 13.652 13.100 77.484 1.808 38.742
> 1048576 9.544 10.667 10.807 99.345 5.640 99.345
> 524288 11.736 7.171 6.599 128.410 29.539 256.821
> 262144 7.530 7.403 7.416 137.464 1.053 549.857
> 131072 8.741 8.002 8.022 124.256 5.029 994.051
> 65536 10.701 10.138 10.090 99.394 2.629 1590.311
> 32768 9.978 9.950 9.934 102.875 0.188 3291.994
> 16384 11.435 10.823 10.907 92.684 2.234 5931.749
>
> Unpatched, 512KB readahead, 64 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 3.994 3.991 4.123 253.774 3.838 3.965
> 33554432 4.100 4.329 4.161 244.111 5.569 7.628
> 16777216 5.476 4.835 5.079 200.148 10.177 12.509
> 8388608 5.484 5.258 5.227 192.470 4.084 24.059
> 4194304 6.429 6.458 6.435 158.989 0.315 39.747
> 2097152 7.219 7.744 7.306 138.081 4.187 69.040
> 1048576 6.850 6.897 6.776 149.696 1.089 149.696
> 524288 6.406 6.393 6.469 159.439 0.814 318.877
> 262144 6.865 7.508 6.861 144.931 6.041 579.726
> 131072 8.435 8.482 8.307 121.792 1.076 974.334
> 65536 9.616 9.610 10.262 104.279 3.176 1668.462
> 32768 9.682 9.932 10.015 103.701 1.497 3318.428
> 16384 10.962 10.852 11.565 92.106 2.547 5894.813
>
> Unpatched, 2MB readahead, 64 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 3.730 3.714 3.914 270.615 6.396 4.228
> 33554432 4.445 3.999 3.989 247.710 12.276 7.741
> 16777216 4.763 4.712 4.709 216.590 1.122 13.537
> 8388608 5.001 5.086 5.229 200.649 3.673 25.081
> 4194304 6.365 6.362 6.905 156.710 5.948 39.178
> 2097152 7.390 7.367 7.270 139.470 0.992 69.735
> 1048576 7.038 7.050 7.090 145.052 0.456 145.052
> 524288 6.862 7.167 7.278 144.272 3.617 288.544
> 262144 7.266 7.313 7.265 140.635 0.436 562.540
> 131072 8.677 8.735 8.821 117.108 0.790 936.865
> 65536 10.865 10.040 10.038 99.418 3.658 1590.685
> 32768 10.167 10.130 10.177 100.805 0.201 3225.749
> 16384 11.643 11.017 11.103 91.041 2.203 5826.629
>
> Patched, 128KB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.670 5.188 5.636 186.555 7.671 2.915
> 33554432 6.069 5.971 6.141 168.992 1.954 5.281
> 16777216 7.821 7.501 7.372 135.451 3.340 8.466
> 8388608 9.147 8.618 9.000 114.849 2.908 14.356
> 4194304 12.199 12.914 12.381 81.981 1.964 20.495
> 2097152 13.449 13.891 14.288 73.842 1.828 36.921
> 1048576 11.890 12.182 11.519 86.360 1.984 86.360
> 524288 11.899 12.706 12.135 83.678 2.287 167.357
> 262144 7.460 7.559 7.563 136.041 0.864 544.164
> 131072 7.987 8.003 8.530 125.403 3.792 1003.220
> 65536 10.179 10.119 10.131 100.957 0.255 1615.312
> 32768 9.899 9.923 10.589 101.114 3.121 3235.656
> 16384 10.849 10.835 10.876 94.351 0.150 6038.474
>
> Patched, 512KB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.062 5.111 5.083 201.358 0.795 3.146
> 33554432 5.589 5.713 5.657 181.165 1.625 5.661
> 16777216 6.337 7.220 6.457 154.002 8.690 9.625
> 8388608 7.952 7.880 7.527 131.588 3.192 16.448
> 4194304 10.695 11.224 10.736 94.119 2.047 23.530
> 2097152 10.898 12.072 12.358 87.215 4.839 43.607
> 1048576 10.890 11.347 9.290 98.166 8.664 98.166
> 524288 10.898 11.032 10.887 93.611 0.560 187.223
> 262144 6.714 7.230 6.804 148.219 4.724 592.875
> 131072 7.325 7.342 7.363 139.441 0.295 1115.530
> 65536 9.773 9.988 10.592 101.327 3.417 1621.227
> 32768 10.031 9.995 10.086 102.019 0.377 3264.620
> 16384 11.041 10.987 11.564 91.502 2.093 5856.144
>
> Patched, 2MB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 4.970 5.097 5.188 201.435 3.559 3.147
> 33554432 5.588 5.793 5.169 186.042 8.923 5.814
> 16777216 6.151 6.414 6.526 161.012 4.027 10.063
> 8388608 7.836 7.299 7.475 135.980 3.989 16.998
> 4194304 11.792 10.964 10.158 93.683 5.706 23.421
> 2097152 11.225 11.492 11.357 90.162 0.866 45.081
> 1048576 12.017 11.258 11.432 88.580 2.449 88.580
> 524288 5.974 10.883 11.840 117.323 38.361 234.647
> 262144 6.774 6.765 6.526 153.155 2.661 612.619
> 131072 8.036 7.324 7.341 135.579 5.766 1084.633
> 65536 9.964 10.595 9.999 100.608 2.806 1609.735
> 32768 10.132 10.036 10.190 101.197 0.637 3238.308
> 16384 11.133 11.568 11.036 91.093 1.850 5829.981
>
> Patched, 512KB readahead, 64 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 3.722 3.698 3.721 275.759 0.809 4.309
> 33554432 4.058 3.849 3.957 259.063 5.580 8.096
> 16777216 4.601 4.613 4.738 220.212 2.913 13.763
> 8388608 5.039 5.534 5.017 197.452 8.791 24.682
> 4194304 6.302 6.270 6.282 162.942 0.341 40.735
> 2097152 7.314 7.302 7.069 141.700 2.233 70.850
> 1048576 6.881 7.655 6.909 143.597 6.951 143.597
> 524288 7.163 7.025 6.951 145.344 1.803 290.687
> 262144 7.315 7.233 7.299 140.621 0.689 562.482
> 131072 9.292 8.756 8.807 114.475 3.036 915.803
> 65536 9.942 9.985 9.960 102.787 0.181 1644.598
> 32768 10.721 10.091 10.192 99.154 2.605 3172.935
> 16384 11.049 11.016 11.065 92.727 0.169 5934.531
>
> Patched, 2MB readahead, 64 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 3.697 3.819 3.741 272.931 3.661 4.265
> 33554432 3.951 3.905 4.038 258.320 3.586 8.073
> 16777216 5.595 5.182 4.864 197.044 11.236 12.315
> 8388608 5.267 5.156 5.116 197.725 2.431 24.716
> 4194304 6.411 6.335 6.290 161.389 1.267 40.347
> 2097152 7.329 7.663 7.462 136.860 2.502 68.430
> 1048576 7.225 7.077 7.215 142.784 1.352 142.784
> 524288 6.903 7.015 7.095 146.210 1.647 292.419
> 262144 7.365 7.926 7.278 136.309 5.076 545.237
> 131072 8.796 8.819 8.814 116.233 0.130 929.862
> 65536 9.998 10.609 9.995 100.464 2.786 1607.423
> 32768 10.161 10.124 10.246 100.623 0.505 3219.943
>
> Regards,
> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
[-- Attachment #2: blockdev-perftest --]
[-- Type: text/plain, Size: 5399 bytes --]
#!/bin/sh
############################################################################
#
# Script for testing block device I/O performance. Running this script on a
# block device that is connected to a remote SCST target device allows to
# test the performance of the transport protocols implemented in SCST. The
# operation of this script is similar to iozone, while this script is easier
# to use.
#
# Copyright (C) 2009 Bart Van Assche <bart.vanassche@gmail.com>.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation, version 2
# of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
############################################################################
#########################
# Function definitions #
#########################
usage() {
echo "Usage: $0 [-a] [-d] [-i <i>] [-n] [-r] [-s <l2s>] <dev>"
echo " -a - use asynchronous (buffered) I/O."
echo " -d - use direct (non-buffered) I/O."
echo " -i - number times each test is iterated."
echo " -n - do not verify the data on <dev> before overwriting it."
echo " -r - only perform the read test."
echo " -s - logarithm base two of the I/O size."
echo " <dev> - block device to run the I/O performance test on."
}
# Echo ((2**$1))
pow2() {
if [ $1 = 0 ]; then
echo 1
else
echo $((2 * $(pow2 $(($1 - 1)) ) ))
fi
}
drop_caches() {
sync
if [ -w /proc/sys/vm/drop_caches ]; then
echo 3 > /proc/sys/vm/drop_caches
fi
}
# Read times in seconds from stdin, one number per line, echo each number
# using format $1, and also echo the average transfer size in MB/s, its
# standard deviation and the number of IOPS using the total I/O size $2 and
# the block transfer size $3.
echo_and_calc_avg() {
awk -v fmt="$1" -v iosize="$2" -v blocksize="$3" 'BEGIN{pow_2_20=1024*1024}{if ($1 != 0){n++;sum+=iosize/$1;sumsq+=iosize*iosize/($1*$1)};printf fmt, $1} END{d=(n>0?sumsq/n-sum*sum/n/n:0);avg=(n>0?sum/n:0);stddev=(d>0?sqrt(d):0);iops=avg/blocksize;printf fmt fmt fmt,avg/pow_2_20,stddev/pow_2_20,iops}'
}
#########################
# Default settings #
#########################
iterations=3
log2_io_size=30 # 1 GB
log2_min_blocksize=9 # 512 bytes
log2_max_blocksize=26 # 64 MB
iotype=direct
read_test_only=false
verify_device_data=true
#########################
# Argument processing #
#########################
set -- $(/usr/bin/getopt "adhi:nrs:" "$@")
while [ "$1" != "${1#-}" ]
do
case "$1" in
'-a') iotype="buffered"; shift;;
'-d') iotype="direct"; shift;;
'-i') iterations="$2"; shift; shift;;
'-n') verify_device_data="false"; shift;;
'-r') read_test_only="true"; shift;;
'-s') log2_io_size="$2"; shift; shift;;
'--') shift;;
*) usage; exit 1;;
esac
done
if [ "$#" != 1 ]; then
usage
exit 1
fi
device="$1"
####################
# Performance test #
####################
if [ ! -e "${device}" ]; then
echo "Error: device ${device} does not exist."
exit 1
fi
if [ "${read_test_only}" = "false" -a ! -w "${device}" ]; then
echo "Error: device ${device} is not writeable."
exit 1
fi
if [ "${read_test_only}" = "false" -a "${verify_device_data}" = "true" ] \
&& ! cmp -s -n $(pow2 $log2_io_size) "${device}" /dev/zero
then
echo "Error: device ${device} still contains data."
exit 1
fi
if [ "${iotype}" = "direct" ]; then
dd_oflags="oflag=direct"
dd_iflags="iflag=direct"
else
dd_oflags="oflag=sync"
dd_iflags=""
fi
# Header, line 1
printf "%9s " blocksize
i=0
while [ $i -lt ${iterations} ]
do
printf "%8s " "W"
i=$((i+1))
done
printf "%8s %8s %8s " "W(avg," "W(std," "W"
i=0
while [ $i -lt ${iterations} ]
do
printf "%8s " "R"
i=$((i+1))
done
printf "%8s %8s %8s" "R(avg," "R(std" "R"
printf "\n"
# Header, line 2
printf "%9s " "(bytes)"
i=0
while [ $i -lt ${iterations} ]
do
printf "%8s " "(s)"
i=$((i+1))
done
printf "%8s %8s %8s " "MB/s)" ",MB/s)" "(IOPS)"
i=0
while [ $i -lt ${iterations} ]
do
printf "%8s " "(s)"
i=$((i+1))
done
printf "%8s %8s %8s" "MB/s)" ",MB/s)" "(IOPS)"
printf "\n"
# Measurements
log2_blocksize=${log2_max_blocksize}
while [ ! $log2_blocksize -lt $log2_min_blocksize ]
do
if [ $log2_blocksize -gt $log2_io_size ]; then
continue
fi
iosize=$(pow2 $log2_io_size)
bs=$(pow2 $log2_blocksize)
count=$(pow2 $(($log2_io_size - $log2_blocksize)))
printf "%9d " ${bs}
i=0
while [ $i -lt ${iterations} ]
do
if [ "${read_test_only}" = "false" ]; then
drop_caches
dd if=/dev/zero of="${device}" bs=${bs} count=${count} \
${dd_oflags} 2>&1 \
| sed -n 's/.* \([0-9.]*\) s,.*/\1/p'
else
echo 0
fi
i=$((i+1))
done | echo_and_calc_avg "%8.3f " ${iosize} ${bs}
i=0
while [ $i -lt ${iterations} ]
do
drop_caches
dd if="${device}" of=/dev/null bs=${bs} count=${count} \
${dd_iflags} 2>&1 \
| sed -n 's/.* \([0-9.]*\) s,.*/\1/p'
i=$((i+1))
done | echo_and_calc_avg "%8.3f " ${iosize} ${bs}
printf "\n"
log2_blocksize=$((log2_blocksize - 1))
done
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
[not found] ` <a0272b440907040819l5289483cp44b37d967440ef73@mail.gmail.com>
@ 2009-07-06 11:12 ` Vladislav Bolkhovitin
2009-07-06 14:37 ` Ronald Moesbergen
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-06 11:12 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: Wu Fengguang, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
Bart Van Assche
(Restored the original list of recipients in this thread as I was asked.)
Hi Ronald,
Ronald Moesbergen, on 07/04/2009 07:19 PM wrote:
> 2009/7/3 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Ronald Moesbergen, on 07/03/2009 01:14 PM wrote:
>>>>> OK, now I tend to agree on decreasing max_sectors_kb and increasing
>>>>> read_ahead_kb. But before actually trying to push that idea I'd like
>>>>> to
>>>>> - do more benchmarks
>>>>> - figure out why context readahead didn't help SCST performance
>>>>> (previous traces show that context readahead is submitting perfect
>>>>> large io requests, so I wonder if it's some io scheduler bug)
>>>> Because, as we found out, without your http://lkml.org/lkml/2009/5/21/319
>>>> patch read-ahead was nearly disabled, hence there were no difference
>>>> which
>>>> algorithm was used?
>>>>
>>>> Ronald, can you run the following tests, please? This time with 2 hosts,
>>>> initiator (client) and target (server) connected using 1 Gbps iSCSI. It
>>>> would be the best if on the client vanilla 2.6.29 will be ran, but any
>>>> other
>>>> kernel will be fine as well, only specify which. Blockdev-perftest should
>>>> be
>>>> ran as before in buffered mode, i.e. with "-a" switch.
>>>>
>>>> 1. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 patch with all default settings.
>>>>
>>>> 2. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 patch with default RA size and 64KB
>>>> max_sectors_kb.
>>>>
>>>> 3. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and default
>>>> max_sectors_kb.
>>>>
>>>> 4. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and 64KB
>>>> max_sectors_kb.
>>>>
>>>> 5. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 patch and with context RA patch. RA
>>>> size
>>>> and max_sectors_kb are default. For your convenience I committed the
>>>> backported context RA patches into the SCST SVN repository.
>>>>
>>>> 6. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with default RA
>>>> size and 64KB max_sectors_kb.
>>>>
>>>> 7. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>> size
>>>> and default max_sectors_kb.
>>>>
>>>> 8. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>> size
>>>> and 64KB max_sectors_kb.
>>>>
>>>> 9. On the client default RA size and 64KB max_sectors_kb. On the server
>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>
>>>> 10. On the client 2MB RA size and default max_sectors_kb. On the server
>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>
>>>> 11. On the client 2MB RA size and 64KB max_sectors_kb. On the server
>>>> vanilla
>>>> 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and context RA
>>>> patches with 2MB RA size and 64KB max_sectors_kb.
>>> Ok, done. Performance is pretty bad overall :(
>>>
>>> The kernels I used:
>>> client kernel: 2.6.26-15lenny3 (debian)
>>> server kernel: 2.6.29.5 with blk_dev_run patch
>>>
>>> And I adjusted the blockdev-perftest script to drop caches on both the
>>> server (via ssh) and the client.
>>>
>>> The results:
>>>
>
> ... previous results ...
>
>> Those are on the server without io_context-2.6.29 and readahead-2.6.29
>> patches applied and with CFQ scheduler, correct?
>>
>> Then we see how reorder of requests caused by many I/O threads submitting
>> I/O in separate I/O contexts badly affect performance and no RA, especially
>> with default 128KB RA size, can solve it. Less max_sectors_kb on the client
>> => more requests it sends at once => more reorder on the server => worse
>> throughput. Although, Fengguang, in theory, context RA with 2MB RA size
>> should considerably help it, no?
>>
>> Ronald, can you perform those tests again with both io_context-2.6.29 and
>> readahead-2.6.29 patches applied on the server, please?
>
> Hi Vlad,
>
> I have retested with the patches you requested (and got access to the
> systems today :) ) The results are better, but still not great.
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with io_context and readahead patch
>
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.303 19.867 18.481 54.299 1.961 0.848
> 33554432 18.321 17.681 18.708 56.181 1.314 1.756
> 16777216 17.816 17.406 19.257 56.494 2.410 3.531
> 8388608 18.077 17.727 19.338 55.789 2.056 6.974
> 4194304 17.918 16.601 18.287 58.276 2.454 14.569
> 2097152 17.426 17.334 17.610 58.661 0.384 29.331
> 1048576 19.358 18.764 17.253 55.607 2.734 55.607
> 524288 17.951 18.163 17.440 57.379 0.983 114.757
> 262144 18.196 17.724 17.520 57.499 0.907 229.995
> 131072 18.342 18.259 17.551 56.751 1.131 454.010
> 65536 17.733 18.572 17.134 57.548 1.893 920.766
> 32768 19.081 19.321 17.364 55.213 2.673 1766.818
> 16384 17.181 18.729 17.731 57.343 2.033 3669.932
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 21.790 20.062 19.534 50.153 2.304 0.784
> 33554432 20.212 19.744 19.564 51.623 0.706 1.613
> 16777216 20.404 19.329 19.738 51.680 1.148 3.230
> 8388608 20.170 20.772 19.509 50.852 1.304 6.356
> 4194304 19.334 18.742 18.522 54.296 0.978 13.574
> 2097152 19.413 18.858 18.884 53.758 0.715 26.879
> 1048576 20.472 18.755 18.476 53.347 2.377 53.347
> 524288 19.120 20.104 18.404 53.378 1.925 106.756
> 262144 20.337 19.213 18.636 52.866 1.901 211.464
> 131072 19.199 18.312 19.970 53.510 1.900 428.083
> 65536 19.855 20.114 19.592 51.584 0.555 825.342
> 32768 20.586 18.724 20.340 51.592 2.204 1650.941
> 16384 21.119 19.834 19.594 50.792 1.651 3250.669
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 17.767 16.489 16.949 60.050 1.842 0.938
> 33554432 16.777 17.034 17.102 60.341 0.500 1.886
> 16777216 18.509 16.784 16.971 58.891 2.537 3.681
> 8388608 18.058 17.949 17.599 57.313 0.632 7.164
> 4194304 18.286 17.648 17.026 58.055 1.692 14.514
> 2097152 17.387 18.451 17.875 57.226 1.388 28.613
> 1048576 18.270 17.698 17.570 57.397 0.969 57.397
> 524288 16.708 17.900 17.233 59.306 1.668 118.611
> 262144 18.041 17.381 18.035 57.484 1.011 229.934
> 131072 17.994 17.777 18.146 56.981 0.481 455.844
> 65536 17.097 18.597 17.737 57.563 1.975 921.011
> 32768 17.167 17.035 19.693 57.254 3.721 1832.127
> 16384 17.144 16.664 17.623 59.762 1.367 3824.774
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 20.003 21.133 19.308 50.894 1.881 0.795
> 33554432 19.448 20.015 18.908 52.657 1.222 1.646
> 16777216 19.964 19.350 19.106 52.603 0.967 3.288
> 8388608 18.961 19.213 19.318 53.437 0.419 6.680
> 4194304 18.135 19.508 19.361 53.948 1.788 13.487
> 2097152 18.753 19.471 18.367 54.315 1.306 27.158
> 1048576 19.189 18.586 18.867 54.244 0.707 54.244
> 524288 18.985 19.199 18.840 53.874 0.417 107.749
> 262144 19.064 21.143 19.674 51.398 2.204 205.592
> 131072 18.691 18.664 19.116 54.406 0.594 435.245
> 65536 18.468 20.673 18.554 53.389 2.729 854.229
> 32768 20.401 21.156 19.552 50.323 1.623 1610.331
> 16384 19.532 20.028 20.466 51.196 0.977 3276.567
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.458 16.649 17.346 60.919 1.364 0.952
> 33554432 16.479 16.744 17.069 61.096 0.878 1.909
> 16777216 17.128 16.585 17.112 60.456 0.910 3.778
> 8388608 17.322 16.780 16.885 60.262 0.824 7.533
> 4194304 17.530 16.725 16.756 60.250 1.299 15.063
> 2097152 16.580 17.875 16.619 60.221 2.076 30.110
> 1048576 17.550 17.406 17.075 59.049 0.681 59.049
> 524288 16.492 18.211 16.832 59.718 2.519 119.436
> 262144 17.241 17.115 17.365 59.397 0.352 237.588
> 131072 17.430 16.902 17.511 59.271 0.936 474.167
> 65536 16.726 16.894 17.246 60.404 0.768 966.461
> 32768 16.662 17.517 17.052 59.989 1.224 1919.658
> 16384 17.429 16.793 16.753 60.285 1.085 3858.268
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 17.601 18.334 17.379 57.650 1.307 0.901
> 33554432 18.281 18.128 17.169 57.381 1.610 1.793
> 16777216 17.660 17.875 17.356 58.091 0.703 3.631
> 8388608 17.724 17.810 18.383 56.992 0.918 7.124
> 4194304 17.475 17.770 19.003 56.704 2.031 14.176
> 2097152 17.287 17.674 18.492 57.516 1.604 28.758
> 1048576 17.972 17.460 18.777 56.721 1.689 56.721
> 524288 18.680 18.952 19.445 53.837 0.890 107.673
> 262144 18.070 18.337 18.639 55.817 0.707 223.270
> 131072 16.990 16.651 16.862 60.832 0.507 486.657
> 65536 17.707 16.972 17.520 58.870 1.066 941.924
> 32768 17.767 17.208 17.205 58.887 0.885 1884.399
> 16384 18.258 17.252 18.035 57.407 1.407 3674.059
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 17.993 18.307 18.718 55.850 0.902 0.873
> 33554432 19.554 18.485 17.902 54.988 1.993 1.718
> 16777216 18.829 18.236 18.748 55.052 0.785 3.441
> 8388608 21.152 19.065 18.738 52.257 2.745 6.532
> 4194304 19.131 19.703 17.850 54.288 2.268 13.572
> 2097152 19.093 19.152 19.509 53.196 0.504 26.598
> 1048576 19.371 18.775 18.804 53.953 0.772 53.953
> 524288 20.003 17.911 18.602 54.470 2.476 108.940
> 262144 19.182 19.460 18.476 53.809 1.183 215.236
> 131072 19.403 19.192 18.907 53.429 0.567 427.435
> 65536 19.502 19.656 18.599 53.219 1.309 851.509
> 32768 18.746 18.747 18.250 55.119 0.701 1763.817
> 16384 20.977 19.437 18.840 51.951 2.319 3324.862
The results look inconsistently with what you had previously (89.7
MB/s). How can you explain it?
I think, most likely, there was some confusion between the tested and
patched versions of the kernel or you forgot to apply the io_context
patch. Please recheck.
> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-06 11:12 ` Vladislav Bolkhovitin
@ 2009-07-06 14:37 ` Ronald Moesbergen
2009-07-06 17:48 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-06 14:37 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: Wu Fengguang, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
Bart Van Assche
2009/7/6 Vladislav Bolkhovitin <vst@vlnb.net>:
> (Restored the original list of recipients in this thread as I was asked.)
>
> Hi Ronald,
>
> Ronald Moesbergen, on 07/04/2009 07:19 PM wrote:
>>
>> 2009/7/3 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>
>>> Ronald Moesbergen, on 07/03/2009 01:14 PM wrote:
>>>>>>
>>>>>> OK, now I tend to agree on decreasing max_sectors_kb and increasing
>>>>>> read_ahead_kb. But before actually trying to push that idea I'd like
>>>>>> to
>>>>>> - do more benchmarks
>>>>>> - figure out why context readahead didn't help SCST performance
>>>>>> (previous traces show that context readahead is submitting perfect
>>>>>> large io requests, so I wonder if it's some io scheduler bug)
>>>>>
>>>>> Because, as we found out, without your
>>>>> http://lkml.org/lkml/2009/5/21/319
>>>>> patch read-ahead was nearly disabled, hence there were no difference
>>>>> which
>>>>> algorithm was used?
>>>>>
>>>>> Ronald, can you run the following tests, please? This time with 2
>>>>> hosts,
>>>>> initiator (client) and target (server) connected using 1 Gbps iSCSI. It
>>>>> would be the best if on the client vanilla 2.6.29 will be ran, but any
>>>>> other
>>>>> kernel will be fine as well, only specify which. Blockdev-perftest
>>>>> should
>>>>> be
>>>>> ran as before in buffered mode, i.e. with "-a" switch.
>>>>>
>>>>> 1. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 patch with all default settings.
>>>>>
>>>>> 2. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 patch with default RA size and 64KB
>>>>> max_sectors_kb.
>>>>>
>>>>> 3. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and default
>>>>> max_sectors_kb.
>>>>>
>>>>> 4. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and 64KB
>>>>> max_sectors_kb.
>>>>>
>>>>> 5. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 patch and with context RA patch. RA
>>>>> size
>>>>> and max_sectors_kb are default. For your convenience I committed the
>>>>> backported context RA patches into the SCST SVN repository.
>>>>>
>>>>> 6. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with default
>>>>> RA
>>>>> size and 64KB max_sectors_kb.
>>>>>
>>>>> 7. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>>> size
>>>>> and default max_sectors_kb.
>>>>>
>>>>> 8. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>>> size
>>>>> and 64KB max_sectors_kb.
>>>>>
>>>>> 9. On the client default RA size and 64KB max_sectors_kb. On the server
>>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>
>>>>> 10. On the client 2MB RA size and default max_sectors_kb. On the server
>>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>
>>>>> 11. On the client 2MB RA size and 64KB max_sectors_kb. On the server
>>>>> vanilla
>>>>> 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and context
>>>>> RA
>>>>> patches with 2MB RA size and 64KB max_sectors_kb.
>>>>
>>>> Ok, done. Performance is pretty bad overall :(
>>>>
>>>> The kernels I used:
>>>> client kernel: 2.6.26-15lenny3 (debian)
>>>> server kernel: 2.6.29.5 with blk_dev_run patch
>>>>
>>>> And I adjusted the blockdev-perftest script to drop caches on both the
>>>> server (via ssh) and the client.
>>>>
>>>> The results:
>>>>
>>
>> ... previous results ...
>>
>>> Those are on the server without io_context-2.6.29 and readahead-2.6.29
>>> patches applied and with CFQ scheduler, correct?
>>>
>>> Then we see how reorder of requests caused by many I/O threads submitting
>>> I/O in separate I/O contexts badly affect performance and no RA,
>>> especially
>>> with default 128KB RA size, can solve it. Less max_sectors_kb on the
>>> client
>>> => more requests it sends at once => more reorder on the server => worse
>>> throughput. Although, Fengguang, in theory, context RA with 2MB RA size
>>> should considerably help it, no?
>>>
>>> Ronald, can you perform those tests again with both io_context-2.6.29 and
>>> readahead-2.6.29 patches applied on the server, please?
>>
>> Hi Vlad,
>>
>> I have retested with the patches you requested (and got access to the
>> systems today :) ) The results are better, but still not great.
>>
>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with io_context and readahead patch
>>
>> 5) client: default, server: default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 18.303 19.867 18.481 54.299 1.961 0.848
>> 33554432 18.321 17.681 18.708 56.181 1.314 1.756
>> 16777216 17.816 17.406 19.257 56.494 2.410 3.531
>> 8388608 18.077 17.727 19.338 55.789 2.056 6.974
>> 4194304 17.918 16.601 18.287 58.276 2.454 14.569
>> 2097152 17.426 17.334 17.610 58.661 0.384 29.331
>> 1048576 19.358 18.764 17.253 55.607 2.734 55.607
>> 524288 17.951 18.163 17.440 57.379 0.983 114.757
>> 262144 18.196 17.724 17.520 57.499 0.907 229.995
>> 131072 18.342 18.259 17.551 56.751 1.131 454.010
>> 65536 17.733 18.572 17.134 57.548 1.893 920.766
>> 32768 19.081 19.321 17.364 55.213 2.673 1766.818
>> 16384 17.181 18.729 17.731 57.343 2.033 3669.932
>>
>> 6) client: default, server: 64 max_sectors_kb, RA default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 21.790 20.062 19.534 50.153 2.304 0.784
>> 33554432 20.212 19.744 19.564 51.623 0.706 1.613
>> 16777216 20.404 19.329 19.738 51.680 1.148 3.230
>> 8388608 20.170 20.772 19.509 50.852 1.304 6.356
>> 4194304 19.334 18.742 18.522 54.296 0.978 13.574
>> 2097152 19.413 18.858 18.884 53.758 0.715 26.879
>> 1048576 20.472 18.755 18.476 53.347 2.377 53.347
>> 524288 19.120 20.104 18.404 53.378 1.925 106.756
>> 262144 20.337 19.213 18.636 52.866 1.901 211.464
>> 131072 19.199 18.312 19.970 53.510 1.900 428.083
>> 65536 19.855 20.114 19.592 51.584 0.555 825.342
>> 32768 20.586 18.724 20.340 51.592 2.204 1650.941
>> 16384 21.119 19.834 19.594 50.792 1.651 3250.669
>>
>> 7) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 17.767 16.489 16.949 60.050 1.842 0.938
>> 33554432 16.777 17.034 17.102 60.341 0.500 1.886
>> 16777216 18.509 16.784 16.971 58.891 2.537 3.681
>> 8388608 18.058 17.949 17.599 57.313 0.632 7.164
>> 4194304 18.286 17.648 17.026 58.055 1.692 14.514
>> 2097152 17.387 18.451 17.875 57.226 1.388 28.613
>> 1048576 18.270 17.698 17.570 57.397 0.969 57.397
>> 524288 16.708 17.900 17.233 59.306 1.668 118.611
>> 262144 18.041 17.381 18.035 57.484 1.011 229.934
>> 131072 17.994 17.777 18.146 56.981 0.481 455.844
>> 65536 17.097 18.597 17.737 57.563 1.975 921.011
>> 32768 17.167 17.035 19.693 57.254 3.721 1832.127
>> 16384 17.144 16.664 17.623 59.762 1.367 3824.774
>>
>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 20.003 21.133 19.308 50.894 1.881 0.795
>> 33554432 19.448 20.015 18.908 52.657 1.222 1.646
>> 16777216 19.964 19.350 19.106 52.603 0.967 3.288
>> 8388608 18.961 19.213 19.318 53.437 0.419 6.680
>> 4194304 18.135 19.508 19.361 53.948 1.788 13.487
>> 2097152 18.753 19.471 18.367 54.315 1.306 27.158
>> 1048576 19.189 18.586 18.867 54.244 0.707 54.244
>> 524288 18.985 19.199 18.840 53.874 0.417 107.749
>> 262144 19.064 21.143 19.674 51.398 2.204 205.592
>> 131072 18.691 18.664 19.116 54.406 0.594 435.245
>> 65536 18.468 20.673 18.554 53.389 2.729 854.229
>> 32768 20.401 21.156 19.552 50.323 1.623 1610.331
>> 16384 19.532 20.028 20.466 51.196 0.977 3276.567
>>
>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA
>> 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 16.458 16.649 17.346 60.919 1.364 0.952
>> 33554432 16.479 16.744 17.069 61.096 0.878 1.909
>> 16777216 17.128 16.585 17.112 60.456 0.910 3.778
>> 8388608 17.322 16.780 16.885 60.262 0.824 7.533
>> 4194304 17.530 16.725 16.756 60.250 1.299 15.063
>> 2097152 16.580 17.875 16.619 60.221 2.076 30.110
>> 1048576 17.550 17.406 17.075 59.049 0.681 59.049
>> 524288 16.492 18.211 16.832 59.718 2.519 119.436
>> 262144 17.241 17.115 17.365 59.397 0.352 237.588
>> 131072 17.430 16.902 17.511 59.271 0.936 474.167
>> 65536 16.726 16.894 17.246 60.404 0.768 966.461
>> 32768 16.662 17.517 17.052 59.989 1.224 1919.658
>> 16384 17.429 16.793 16.753 60.285 1.085 3858.268
>>
>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA
>> 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 17.601 18.334 17.379 57.650 1.307 0.901
>> 33554432 18.281 18.128 17.169 57.381 1.610 1.793
>> 16777216 17.660 17.875 17.356 58.091 0.703 3.631
>> 8388608 17.724 17.810 18.383 56.992 0.918 7.124
>> 4194304 17.475 17.770 19.003 56.704 2.031 14.176
>> 2097152 17.287 17.674 18.492 57.516 1.604 28.758
>> 1048576 17.972 17.460 18.777 56.721 1.689 56.721
>> 524288 18.680 18.952 19.445 53.837 0.890 107.673
>> 262144 18.070 18.337 18.639 55.817 0.707 223.270
>> 131072 16.990 16.651 16.862 60.832 0.507 486.657
>> 65536 17.707 16.972 17.520 58.870 1.066 941.924
>> 32768 17.767 17.208 17.205 58.887 0.885 1884.399
>> 16384 18.258 17.252 18.035 57.407 1.407 3674.059
>>
>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 17.993 18.307 18.718 55.850 0.902 0.873
>> 33554432 19.554 18.485 17.902 54.988 1.993 1.718
>> 16777216 18.829 18.236 18.748 55.052 0.785 3.441
>> 8388608 21.152 19.065 18.738 52.257 2.745 6.532
>> 4194304 19.131 19.703 17.850 54.288 2.268 13.572
>> 2097152 19.093 19.152 19.509 53.196 0.504 26.598
>> 1048576 19.371 18.775 18.804 53.953 0.772 53.953
>> 524288 20.003 17.911 18.602 54.470 2.476 108.940
>> 262144 19.182 19.460 18.476 53.809 1.183 215.236
>> 131072 19.403 19.192 18.907 53.429 0.567 427.435
>> 65536 19.502 19.656 18.599 53.219 1.309 851.509
>> 32768 18.746 18.747 18.250 55.119 0.701 1763.817
>> 16384 20.977 19.437 18.840 51.951 2.319 3324.862
>
> The results look inconsistently with what you had previously (89.7 MB/s).
> How can you explain it?
I had more patches applied with that test: (scst_exec_req_fifo-2.6.29,
put_page_callback-2.6.29) and I used a different dd command:
dd if=/dev/sdc of=/dev/zero bs=512K count=2000
But all that said, I can't reproduce speeds that high now. Must have
made a mistake back then (maybe I forgot to clear the pagecache).
> I think, most likely, there was some confusion between the tested and
> patched versions of the kernel or you forgot to apply the io_context patch.
> Please recheck.
The tests above were definitely done right, I just rechecked the
patches, and I do see an average increase of about 10MB/s over an
unpatched kernel. But overall the performance is still pretty bad.
Ronald.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-06 14:37 ` Ronald Moesbergen
@ 2009-07-06 17:48 ` Vladislav Bolkhovitin
2009-07-07 6:49 ` Ronald Moesbergen
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-06 17:48 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: Wu Fengguang, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
Bart Van Assche
Ronald Moesbergen, on 07/06/2009 06:37 PM wrote:
> 2009/7/6 Vladislav Bolkhovitin <vst@vlnb.net>:
>> (Restored the original list of recipients in this thread as I was asked.)
>>
>> Hi Ronald,
>>
>> Ronald Moesbergen, on 07/04/2009 07:19 PM wrote:
>>> 2009/7/3 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>> Ronald Moesbergen, on 07/03/2009 01:14 PM wrote:
>>>>>>> OK, now I tend to agree on decreasing max_sectors_kb and increasing
>>>>>>> read_ahead_kb. But before actually trying to push that idea I'd like
>>>>>>> to
>>>>>>> - do more benchmarks
>>>>>>> - figure out why context readahead didn't help SCST performance
>>>>>>> (previous traces show that context readahead is submitting perfect
>>>>>>> large io requests, so I wonder if it's some io scheduler bug)
>>>>>> Because, as we found out, without your
>>>>>> http://lkml.org/lkml/2009/5/21/319
>>>>>> patch read-ahead was nearly disabled, hence there were no difference
>>>>>> which
>>>>>> algorithm was used?
>>>>>>
>>>>>> Ronald, can you run the following tests, please? This time with 2
>>>>>> hosts,
>>>>>> initiator (client) and target (server) connected using 1 Gbps iSCSI. It
>>>>>> would be the best if on the client vanilla 2.6.29 will be ran, but any
>>>>>> other
>>>>>> kernel will be fine as well, only specify which. Blockdev-perftest
>>>>>> should
>>>>>> be
>>>>>> ran as before in buffered mode, i.e. with "-a" switch.
>>>>>>
>>>>>> 1. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with all default settings.
>>>>>>
>>>>>> 2. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with default RA size and 64KB
>>>>>> max_sectors_kb.
>>>>>>
>>>>>> 3. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and default
>>>>>> max_sectors_kb.
>>>>>>
>>>>>> 4. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and 64KB
>>>>>> max_sectors_kb.
>>>>>>
>>>>>> 5. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 patch and with context RA patch. RA
>>>>>> size
>>>>>> and max_sectors_kb are default. For your convenience I committed the
>>>>>> backported context RA patches into the SCST SVN repository.
>>>>>>
>>>>>> 6. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with default
>>>>>> RA
>>>>>> size and 64KB max_sectors_kb.
>>>>>>
>>>>>> 7. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>>>> size
>>>>>> and default max_sectors_kb.
>>>>>>
>>>>>> 8. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>>>> size
>>>>>> and 64KB max_sectors_kb.
>>>>>>
>>>>>> 9. On the client default RA size and 64KB max_sectors_kb. On the server
>>>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>>
>>>>>> 10. On the client 2MB RA size and default max_sectors_kb. On the server
>>>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>>
>>>>>> 11. On the client 2MB RA size and 64KB max_sectors_kb. On the server
>>>>>> vanilla
>>>>>> 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and context
>>>>>> RA
>>>>>> patches with 2MB RA size and 64KB max_sectors_kb.
>>>>> Ok, done. Performance is pretty bad overall :(
>>>>>
>>>>> The kernels I used:
>>>>> client kernel: 2.6.26-15lenny3 (debian)
>>>>> server kernel: 2.6.29.5 with blk_dev_run patch
>>>>>
>>>>> And I adjusted the blockdev-perftest script to drop caches on both the
>>>>> server (via ssh) and the client.
>>>>>
>>>>> The results:
>>>>>
>>> ... previous results ...
>>>
>>>> Those are on the server without io_context-2.6.29 and readahead-2.6.29
>>>> patches applied and with CFQ scheduler, correct?
>>>>
>>>> Then we see how reorder of requests caused by many I/O threads submitting
>>>> I/O in separate I/O contexts badly affect performance and no RA,
>>>> especially
>>>> with default 128KB RA size, can solve it. Less max_sectors_kb on the
>>>> client
>>>> => more requests it sends at once => more reorder on the server => worse
>>>> throughput. Although, Fengguang, in theory, context RA with 2MB RA size
>>>> should considerably help it, no?
>>>>
>>>> Ronald, can you perform those tests again with both io_context-2.6.29 and
>>>> readahead-2.6.29 patches applied on the server, please?
>>> Hi Vlad,
>>>
>>> I have retested with the patches you requested (and got access to the
>>> systems today :) ) The results are better, but still not great.
>>>
>>> client kernel: 2.6.26-15lenny3 (debian)
>>> server kernel: 2.6.29.5 with io_context and readahead patch
>>>
>>> 5) client: default, server: default
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 18.303 19.867 18.481 54.299 1.961 0.848
>>> 33554432 18.321 17.681 18.708 56.181 1.314 1.756
>>> 16777216 17.816 17.406 19.257 56.494 2.410 3.531
>>> 8388608 18.077 17.727 19.338 55.789 2.056 6.974
>>> 4194304 17.918 16.601 18.287 58.276 2.454 14.569
>>> 2097152 17.426 17.334 17.610 58.661 0.384 29.331
>>> 1048576 19.358 18.764 17.253 55.607 2.734 55.607
>>> 524288 17.951 18.163 17.440 57.379 0.983 114.757
>>> 262144 18.196 17.724 17.520 57.499 0.907 229.995
>>> 131072 18.342 18.259 17.551 56.751 1.131 454.010
>>> 65536 17.733 18.572 17.134 57.548 1.893 920.766
>>> 32768 19.081 19.321 17.364 55.213 2.673 1766.818
>>> 16384 17.181 18.729 17.731 57.343 2.033 3669.932
>>>
>>> 6) client: default, server: 64 max_sectors_kb, RA default
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 21.790 20.062 19.534 50.153 2.304 0.784
>>> 33554432 20.212 19.744 19.564 51.623 0.706 1.613
>>> 16777216 20.404 19.329 19.738 51.680 1.148 3.230
>>> 8388608 20.170 20.772 19.509 50.852 1.304 6.356
>>> 4194304 19.334 18.742 18.522 54.296 0.978 13.574
>>> 2097152 19.413 18.858 18.884 53.758 0.715 26.879
>>> 1048576 20.472 18.755 18.476 53.347 2.377 53.347
>>> 524288 19.120 20.104 18.404 53.378 1.925 106.756
>>> 262144 20.337 19.213 18.636 52.866 1.901 211.464
>>> 131072 19.199 18.312 19.970 53.510 1.900 428.083
>>> 65536 19.855 20.114 19.592 51.584 0.555 825.342
>>> 32768 20.586 18.724 20.340 51.592 2.204 1650.941
>>> 16384 21.119 19.834 19.594 50.792 1.651 3250.669
>>>
>>> 7) client: default, server: default max_sectors_kb, RA 2MB
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 17.767 16.489 16.949 60.050 1.842 0.938
>>> 33554432 16.777 17.034 17.102 60.341 0.500 1.886
>>> 16777216 18.509 16.784 16.971 58.891 2.537 3.681
>>> 8388608 18.058 17.949 17.599 57.313 0.632 7.164
>>> 4194304 18.286 17.648 17.026 58.055 1.692 14.514
>>> 2097152 17.387 18.451 17.875 57.226 1.388 28.613
>>> 1048576 18.270 17.698 17.570 57.397 0.969 57.397
>>> 524288 16.708 17.900 17.233 59.306 1.668 118.611
>>> 262144 18.041 17.381 18.035 57.484 1.011 229.934
>>> 131072 17.994 17.777 18.146 56.981 0.481 455.844
>>> 65536 17.097 18.597 17.737 57.563 1.975 921.011
>>> 32768 17.167 17.035 19.693 57.254 3.721 1832.127
>>> 16384 17.144 16.664 17.623 59.762 1.367 3824.774
>>>
>>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 20.003 21.133 19.308 50.894 1.881 0.795
>>> 33554432 19.448 20.015 18.908 52.657 1.222 1.646
>>> 16777216 19.964 19.350 19.106 52.603 0.967 3.288
>>> 8388608 18.961 19.213 19.318 53.437 0.419 6.680
>>> 4194304 18.135 19.508 19.361 53.948 1.788 13.487
>>> 2097152 18.753 19.471 18.367 54.315 1.306 27.158
>>> 1048576 19.189 18.586 18.867 54.244 0.707 54.244
>>> 524288 18.985 19.199 18.840 53.874 0.417 107.749
>>> 262144 19.064 21.143 19.674 51.398 2.204 205.592
>>> 131072 18.691 18.664 19.116 54.406 0.594 435.245
>>> 65536 18.468 20.673 18.554 53.389 2.729 854.229
>>> 32768 20.401 21.156 19.552 50.323 1.623 1610.331
>>> 16384 19.532 20.028 20.466 51.196 0.977 3276.567
>>>
>>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA
>>> 2MB
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 16.458 16.649 17.346 60.919 1.364 0.952
>>> 33554432 16.479 16.744 17.069 61.096 0.878 1.909
>>> 16777216 17.128 16.585 17.112 60.456 0.910 3.778
>>> 8388608 17.322 16.780 16.885 60.262 0.824 7.533
>>> 4194304 17.530 16.725 16.756 60.250 1.299 15.063
>>> 2097152 16.580 17.875 16.619 60.221 2.076 30.110
>>> 1048576 17.550 17.406 17.075 59.049 0.681 59.049
>>> 524288 16.492 18.211 16.832 59.718 2.519 119.436
>>> 262144 17.241 17.115 17.365 59.397 0.352 237.588
>>> 131072 17.430 16.902 17.511 59.271 0.936 474.167
>>> 65536 16.726 16.894 17.246 60.404 0.768 966.461
>>> 32768 16.662 17.517 17.052 59.989 1.224 1919.658
>>> 16384 17.429 16.793 16.753 60.285 1.085 3858.268
>>>
>>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA
>>> 2MB
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 17.601 18.334 17.379 57.650 1.307 0.901
>>> 33554432 18.281 18.128 17.169 57.381 1.610 1.793
>>> 16777216 17.660 17.875 17.356 58.091 0.703 3.631
>>> 8388608 17.724 17.810 18.383 56.992 0.918 7.124
>>> 4194304 17.475 17.770 19.003 56.704 2.031 14.176
>>> 2097152 17.287 17.674 18.492 57.516 1.604 28.758
>>> 1048576 17.972 17.460 18.777 56.721 1.689 56.721
>>> 524288 18.680 18.952 19.445 53.837 0.890 107.673
>>> 262144 18.070 18.337 18.639 55.817 0.707 223.270
>>> 131072 16.990 16.651 16.862 60.832 0.507 486.657
>>> 65536 17.707 16.972 17.520 58.870 1.066 941.924
>>> 32768 17.767 17.208 17.205 58.887 0.885 1884.399
>>> 16384 18.258 17.252 18.035 57.407 1.407 3674.059
>>>
>>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 17.993 18.307 18.718 55.850 0.902 0.873
>>> 33554432 19.554 18.485 17.902 54.988 1.993 1.718
>>> 16777216 18.829 18.236 18.748 55.052 0.785 3.441
>>> 8388608 21.152 19.065 18.738 52.257 2.745 6.532
>>> 4194304 19.131 19.703 17.850 54.288 2.268 13.572
>>> 2097152 19.093 19.152 19.509 53.196 0.504 26.598
>>> 1048576 19.371 18.775 18.804 53.953 0.772 53.953
>>> 524288 20.003 17.911 18.602 54.470 2.476 108.940
>>> 262144 19.182 19.460 18.476 53.809 1.183 215.236
>>> 131072 19.403 19.192 18.907 53.429 0.567 427.435
>>> 65536 19.502 19.656 18.599 53.219 1.309 851.509
>>> 32768 18.746 18.747 18.250 55.119 0.701 1763.817
>>> 16384 20.977 19.437 18.840 51.951 2.319 3324.862
>> The results look inconsistently with what you had previously (89.7 MB/s).
>> How can you explain it?
>
> I had more patches applied with that test: (scst_exec_req_fifo-2.6.29,
> put_page_callback-2.6.29) and I used a different dd command:
>
> dd if=/dev/sdc of=/dev/zero bs=512K count=2000
>
> But all that said, I can't reproduce speeds that high now. Must have
> made a mistake back then (maybe I forgot to clear the pagecache).
If you forgot to clear the cache, you would had had the wire throughput
(110 MB/s) or more.
>> I think, most likely, there was some confusion between the tested and
>> patched versions of the kernel or you forgot to apply the io_context patch.
>> Please recheck.
>
> The tests above were definitely done right, I just rechecked the
> patches, and I do see an average increase of about 10MB/s over an
> unpatched kernel. But overall the performance is still pretty bad.
Have you rebuild and reinstall SCST after patching kernel?
> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-06 17:48 ` Vladislav Bolkhovitin
@ 2009-07-07 6:49 ` Ronald Moesbergen
[not found] ` <4A5395FD.2040507@vlnb.net>
0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-07 6:49 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: Wu Fengguang, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
Bart Van Assche
2009/7/6 Vladislav Bolkhovitin <vst@vlnb.net>:
> Ronald Moesbergen, on 07/06/2009 06:37 PM wrote:
>>
>> 2009/7/6 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>
>>> (Restored the original list of recipients in this thread as I was asked.)
>>>
>>> Hi Ronald,
>>>
>>> Ronald Moesbergen, on 07/04/2009 07:19 PM wrote:
>>>>
>>>> 2009/7/3 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>>>
>>>>> Ronald Moesbergen, on 07/03/2009 01:14 PM wrote:
>>>>>>>>
>>>>>>>> OK, now I tend to agree on decreasing max_sectors_kb and increasing
>>>>>>>> read_ahead_kb. But before actually trying to push that idea I'd like
>>>>>>>> to
>>>>>>>> - do more benchmarks
>>>>>>>> - figure out why context readahead didn't help SCST performance
>>>>>>>> (previous traces show that context readahead is submitting perfect
>>>>>>>> large io requests, so I wonder if it's some io scheduler bug)
>>>>>>>
>>>>>>> Because, as we found out, without your
>>>>>>> http://lkml.org/lkml/2009/5/21/319
>>>>>>> patch read-ahead was nearly disabled, hence there were no difference
>>>>>>> which
>>>>>>> algorithm was used?
>>>>>>>
>>>>>>> Ronald, can you run the following tests, please? This time with 2
>>>>>>> hosts,
>>>>>>> initiator (client) and target (server) connected using 1 Gbps iSCSI.
>>>>>>> It
>>>>>>> would be the best if on the client vanilla 2.6.29 will be ran, but
>>>>>>> any
>>>>>>> other
>>>>>>> kernel will be fine as well, only specify which. Blockdev-perftest
>>>>>>> should
>>>>>>> be
>>>>>>> ran as before in buffered mode, i.e. with "-a" switch.
>>>>>>>
>>>>>>> 1. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with all default settings.
>>>>>>>
>>>>>>> 2. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with default RA size and
>>>>>>> 64KB
>>>>>>> max_sectors_kb.
>>>>>>>
>>>>>>> 3. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and default
>>>>>>> max_sectors_kb.
>>>>>>>
>>>>>>> 4. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and 64KB
>>>>>>> max_sectors_kb.
>>>>>>>
>>>>>>> 5. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 patch and with context RA patch.
>>>>>>> RA
>>>>>>> size
>>>>>>> and max_sectors_kb are default. For your convenience I committed the
>>>>>>> backported context RA patches into the SCST SVN repository.
>>>>>>>
>>>>>>> 6. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with
>>>>>>> default
>>>>>>> RA
>>>>>>> size and 64KB max_sectors_kb.
>>>>>>>
>>>>>>> 7. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>>>>> size
>>>>>>> and default max_sectors_kb.
>>>>>>>
>>>>>>> 8. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>>>>> size
>>>>>>> and 64KB max_sectors_kb.
>>>>>>>
>>>>>>> 9. On the client default RA size and 64KB max_sectors_kb. On the
>>>>>>> server
>>>>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319
>>>>>>> and
>>>>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>>>
>>>>>>> 10. On the client 2MB RA size and default max_sectors_kb. On the
>>>>>>> server
>>>>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319
>>>>>>> and
>>>>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>>>
>>>>>>> 11. On the client 2MB RA size and 64KB max_sectors_kb. On the server
>>>>>>> vanilla
>>>>>>> 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>>>>> context
>>>>>>> RA
>>>>>>> patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>>
>>>>>> Ok, done. Performance is pretty bad overall :(
>>>>>>
>>>>>> The kernels I used:
>>>>>> client kernel: 2.6.26-15lenny3 (debian)
>>>>>> server kernel: 2.6.29.5 with blk_dev_run patch
>>>>>>
>>>>>> And I adjusted the blockdev-perftest script to drop caches on both the
>>>>>> server (via ssh) and the client.
>>>>>>
>>>>>> The results:
>>>>>>
>>>> ... previous results ...
>>>>
>>>>> Those are on the server without io_context-2.6.29 and readahead-2.6.29
>>>>> patches applied and with CFQ scheduler, correct?
>>>>>
>>>>> Then we see how reorder of requests caused by many I/O threads
>>>>> submitting
>>>>> I/O in separate I/O contexts badly affect performance and no RA,
>>>>> especially
>>>>> with default 128KB RA size, can solve it. Less max_sectors_kb on the
>>>>> client
>>>>> => more requests it sends at once => more reorder on the server =>
>>>>> worse
>>>>> throughput. Although, Fengguang, in theory, context RA with 2MB RA size
>>>>> should considerably help it, no?
>>>>>
>>>>> Ronald, can you perform those tests again with both io_context-2.6.29
>>>>> and
>>>>> readahead-2.6.29 patches applied on the server, please?
>>>>
>>>> Hi Vlad,
>>>>
>>>> I have retested with the patches you requested (and got access to the
>>>> systems today :) ) The results are better, but still not great.
>>>>
>>>> client kernel: 2.6.26-15lenny3 (debian)
>>>> server kernel: 2.6.29.5 with io_context and readahead patch
>>>>
>>>> 5) client: default, server: default
>>>> blocksize R R R R(avg, R(std R
>>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>>> 67108864 18.303 19.867 18.481 54.299 1.961 0.848
>>>> 33554432 18.321 17.681 18.708 56.181 1.314 1.756
>>>> 16777216 17.816 17.406 19.257 56.494 2.410 3.531
>>>> 8388608 18.077 17.727 19.338 55.789 2.056 6.974
>>>> 4194304 17.918 16.601 18.287 58.276 2.454 14.569
>>>> 2097152 17.426 17.334 17.610 58.661 0.384 29.331
>>>> 1048576 19.358 18.764 17.253 55.607 2.734 55.607
>>>> 524288 17.951 18.163 17.440 57.379 0.983 114.757
>>>> 262144 18.196 17.724 17.520 57.499 0.907 229.995
>>>> 131072 18.342 18.259 17.551 56.751 1.131 454.010
>>>> 65536 17.733 18.572 17.134 57.548 1.893 920.766
>>>> 32768 19.081 19.321 17.364 55.213 2.673 1766.818
>>>> 16384 17.181 18.729 17.731 57.343 2.033 3669.932
>>>>
>>>> 6) client: default, server: 64 max_sectors_kb, RA default
>>>> blocksize R R R R(avg, R(std R
>>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>>> 67108864 21.790 20.062 19.534 50.153 2.304 0.784
>>>> 33554432 20.212 19.744 19.564 51.623 0.706 1.613
>>>> 16777216 20.404 19.329 19.738 51.680 1.148 3.230
>>>> 8388608 20.170 20.772 19.509 50.852 1.304 6.356
>>>> 4194304 19.334 18.742 18.522 54.296 0.978 13.574
>>>> 2097152 19.413 18.858 18.884 53.758 0.715 26.879
>>>> 1048576 20.472 18.755 18.476 53.347 2.377 53.347
>>>> 524288 19.120 20.104 18.404 53.378 1.925 106.756
>>>> 262144 20.337 19.213 18.636 52.866 1.901 211.464
>>>> 131072 19.199 18.312 19.970 53.510 1.900 428.083
>>>> 65536 19.855 20.114 19.592 51.584 0.555 825.342
>>>> 32768 20.586 18.724 20.340 51.592 2.204 1650.941
>>>> 16384 21.119 19.834 19.594 50.792 1.651 3250.669
>>>>
>>>> 7) client: default, server: default max_sectors_kb, RA 2MB
>>>> blocksize R R R R(avg, R(std R
>>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>>> 67108864 17.767 16.489 16.949 60.050 1.842 0.938
>>>> 33554432 16.777 17.034 17.102 60.341 0.500 1.886
>>>> 16777216 18.509 16.784 16.971 58.891 2.537 3.681
>>>> 8388608 18.058 17.949 17.599 57.313 0.632 7.164
>>>> 4194304 18.286 17.648 17.026 58.055 1.692 14.514
>>>> 2097152 17.387 18.451 17.875 57.226 1.388 28.613
>>>> 1048576 18.270 17.698 17.570 57.397 0.969 57.397
>>>> 524288 16.708 17.900 17.233 59.306 1.668 118.611
>>>> 262144 18.041 17.381 18.035 57.484 1.011 229.934
>>>> 131072 17.994 17.777 18.146 56.981 0.481 455.844
>>>> 65536 17.097 18.597 17.737 57.563 1.975 921.011
>>>> 32768 17.167 17.035 19.693 57.254 3.721 1832.127
>>>> 16384 17.144 16.664 17.623 59.762 1.367 3824.774
>>>>
>>>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>>>> blocksize R R R R(avg, R(std R
>>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>>> 67108864 20.003 21.133 19.308 50.894 1.881 0.795
>>>> 33554432 19.448 20.015 18.908 52.657 1.222 1.646
>>>> 16777216 19.964 19.350 19.106 52.603 0.967 3.288
>>>> 8388608 18.961 19.213 19.318 53.437 0.419 6.680
>>>> 4194304 18.135 19.508 19.361 53.948 1.788 13.487
>>>> 2097152 18.753 19.471 18.367 54.315 1.306 27.158
>>>> 1048576 19.189 18.586 18.867 54.244 0.707 54.244
>>>> 524288 18.985 19.199 18.840 53.874 0.417 107.749
>>>> 262144 19.064 21.143 19.674 51.398 2.204 205.592
>>>> 131072 18.691 18.664 19.116 54.406 0.594 435.245
>>>> 65536 18.468 20.673 18.554 53.389 2.729 854.229
>>>> 32768 20.401 21.156 19.552 50.323 1.623 1610.331
>>>> 16384 19.532 20.028 20.466 51.196 0.977 3276.567
>>>>
>>>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA
>>>> 2MB
>>>> blocksize R R R R(avg, R(std R
>>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>>> 67108864 16.458 16.649 17.346 60.919 1.364 0.952
>>>> 33554432 16.479 16.744 17.069 61.096 0.878 1.909
>>>> 16777216 17.128 16.585 17.112 60.456 0.910 3.778
>>>> 8388608 17.322 16.780 16.885 60.262 0.824 7.533
>>>> 4194304 17.530 16.725 16.756 60.250 1.299 15.063
>>>> 2097152 16.580 17.875 16.619 60.221 2.076 30.110
>>>> 1048576 17.550 17.406 17.075 59.049 0.681 59.049
>>>> 524288 16.492 18.211 16.832 59.718 2.519 119.436
>>>> 262144 17.241 17.115 17.365 59.397 0.352 237.588
>>>> 131072 17.430 16.902 17.511 59.271 0.936 474.167
>>>> 65536 16.726 16.894 17.246 60.404 0.768 966.461
>>>> 32768 16.662 17.517 17.052 59.989 1.224 1919.658
>>>> 16384 17.429 16.793 16.753 60.285 1.085 3858.268
>>>>
>>>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb,
>>>> RA
>>>> 2MB
>>>> blocksize R R R R(avg, R(std R
>>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>>> 67108864 17.601 18.334 17.379 57.650 1.307 0.901
>>>> 33554432 18.281 18.128 17.169 57.381 1.610 1.793
>>>> 16777216 17.660 17.875 17.356 58.091 0.703 3.631
>>>> 8388608 17.724 17.810 18.383 56.992 0.918 7.124
>>>> 4194304 17.475 17.770 19.003 56.704 2.031 14.176
>>>> 2097152 17.287 17.674 18.492 57.516 1.604 28.758
>>>> 1048576 17.972 17.460 18.777 56.721 1.689 56.721
>>>> 524288 18.680 18.952 19.445 53.837 0.890 107.673
>>>> 262144 18.070 18.337 18.639 55.817 0.707 223.270
>>>> 131072 16.990 16.651 16.862 60.832 0.507 486.657
>>>> 65536 17.707 16.972 17.520 58.870 1.066 941.924
>>>> 32768 17.767 17.208 17.205 58.887 0.885 1884.399
>>>> 16384 18.258 17.252 18.035 57.407 1.407 3674.059
>>>>
>>>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>>>> blocksize R R R R(avg, R(std R
>>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>>> 67108864 17.993 18.307 18.718 55.850 0.902 0.873
>>>> 33554432 19.554 18.485 17.902 54.988 1.993 1.718
>>>> 16777216 18.829 18.236 18.748 55.052 0.785 3.441
>>>> 8388608 21.152 19.065 18.738 52.257 2.745 6.532
>>>> 4194304 19.131 19.703 17.850 54.288 2.268 13.572
>>>> 2097152 19.093 19.152 19.509 53.196 0.504 26.598
>>>> 1048576 19.371 18.775 18.804 53.953 0.772 53.953
>>>> 524288 20.003 17.911 18.602 54.470 2.476 108.940
>>>> 262144 19.182 19.460 18.476 53.809 1.183 215.236
>>>> 131072 19.403 19.192 18.907 53.429 0.567 427.435
>>>> 65536 19.502 19.656 18.599 53.219 1.309 851.509
>>>> 32768 18.746 18.747 18.250 55.119 0.701 1763.817
>>>> 16384 20.977 19.437 18.840 51.951 2.319 3324.862
>>>
>>> The results look inconsistently with what you had previously (89.7 MB/s).
>>> How can you explain it?
>>
>> I had more patches applied with that test: (scst_exec_req_fifo-2.6.29,
>> put_page_callback-2.6.29) and I used a different dd command:
>>
>> dd if=/dev/sdc of=/dev/zero bs=512K count=2000
>>
>> But all that said, I can't reproduce speeds that high now. Must have
>> made a mistake back then (maybe I forgot to clear the pagecache).
>
> If you forgot to clear the cache, you would had had the wire throughput (110
> MB/s) or more.
Maybe. Maybe just part of what I was transferring was in cache. I had
done some tests on the filesystem on that same block device too.
>>> I think, most likely, there was some confusion between the tested and
>>> patched versions of the kernel or you forgot to apply the io_context
>>> patch.
>>> Please recheck.
>>
>> The tests above were definitely done right, I just rechecked the
>> patches, and I do see an average increase of about 10MB/s over an
>> unpatched kernel. But overall the performance is still pretty bad.
>
> Have you rebuild and reinstall SCST after patching kernel?
Yes I have. And the warning about missing io_context patches wasn't
there during the compilation.
Ronald.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
[not found] ` <a0272b440907080149j3eeeb9bat13f942520db059a8@mail.gmail.com>
@ 2009-07-08 12:40 ` Vladislav Bolkhovitin
2009-07-10 6:32 ` Ronald Moesbergen
2009-07-15 20:52 ` Kurt Garloff
0 siblings, 2 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-08 12:40 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
Bart Van Assche
Ronald Moesbergen, on 07/08/2009 12:49 PM wrote:
> 2009/7/7 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Ronald Moesbergen, on 07/07/2009 10:49 AM wrote:
>>>>>> I think, most likely, there was some confusion between the tested and
>>>>>> patched versions of the kernel or you forgot to apply the io_context
>>>>>> patch.
>>>>>> Please recheck.
>>>>> The tests above were definitely done right, I just rechecked the
>>>>> patches, and I do see an average increase of about 10MB/s over an
>>>>> unpatched kernel. But overall the performance is still pretty bad.
>>>> Have you rebuild and reinstall SCST after patching kernel?
>>> Yes I have. And the warning about missing io_context patches wasn't
>>> there during the compilation.
>> Can you update to the latest trunk/ and send me the kernel logs from the
>> kernel's boot after one dd with any block size you like >128K and the
>> transfer rate the dd reported, please?
>>
>
> I think I just reproduced the 'wrong' result:
>
> dd if=/dev/sdc of=/dev/null bs=512K count=2000
> 2000+0 records in
> 2000+0 records out
> 1048576000 bytes (1.0 GB) copied, 12.1291 s, 86.5 MB/s
>
> This happens when I do a 'dd' on the device with a mounted filesystem.
> The filesystem mount causes some of the blocks on the device to be
> cached and therefore the results are wrong. This was not the case in
> all the blockdev-perftest run's I did (the filesystem was never
> mounted).
Why do you think the file system (which one, BTW?) has any additional
caching if you did "echo 3 > /proc/sys/vm/drop_caches" before the tests?
All block devices and file systems use the same cache facilities.
I've also long ago noticed that reading data from block devices is
slower than from files from mounted on those block devices file systems.
Can anybody explain it?
Looks like this is strangeness #2 which we uncovered in our tests (the
first one was earlier in this thread why the context RA doesn't work
with cooperative I/O threads as good as it should).
Can you rerun the same 11 tests over a file on the file system, please?
> Ronald.
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-08 12:40 ` Vladislav Bolkhovitin
@ 2009-07-10 6:32 ` Ronald Moesbergen
2009-07-10 8:43 ` Vladislav Bolkhovitin
2009-07-15 20:52 ` Kurt Garloff
1 sibling, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-10 6:32 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
Bart Van Assche
2009/7/8 Vladislav Bolkhovitin <vst@vlnb.net>:
> Ronald Moesbergen, on 07/08/2009 12:49 PM wrote:
>>
>> 2009/7/7 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>
>>> Ronald Moesbergen, on 07/07/2009 10:49 AM wrote:
>>>>>>>
>>>>>>> I think, most likely, there was some confusion between the tested and
>>>>>>> patched versions of the kernel or you forgot to apply the io_context
>>>>>>> patch.
>>>>>>> Please recheck.
>>>>>>
>>>>>> The tests above were definitely done right, I just rechecked the
>>>>>> patches, and I do see an average increase of about 10MB/s over an
>>>>>> unpatched kernel. But overall the performance is still pretty bad.
>>>>>
>>>>> Have you rebuild and reinstall SCST after patching kernel?
>>>>
>>>> Yes I have. And the warning about missing io_context patches wasn't
>>>> there during the compilation.
>>>
>>> Can you update to the latest trunk/ and send me the kernel logs from the
>>> kernel's boot after one dd with any block size you like >128K and the
>>> transfer rate the dd reported, please?
>>>
>>
>> I think I just reproduced the 'wrong' result:
>>
>> dd if=/dev/sdc of=/dev/null bs=512K count=2000
>> 2000+0 records in
>> 2000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 12.1291 s, 86.5 MB/s
>>
>> This happens when I do a 'dd' on the device with a mounted filesystem.
>> The filesystem mount causes some of the blocks on the device to be
>> cached and therefore the results are wrong. This was not the case in
>> all the blockdev-perftest run's I did (the filesystem was never
>> mounted).
>
> Why do you think the file system (which one, BTW?) has any additional
> caching if you did "echo 3 > /proc/sys/vm/drop_caches" before the tests? All
> block devices and file systems use the same cache facilities.
I didn't drop the caches because I just restarted both machines and
thought that would be enough. But because of the mounted filesystem
the results were invalid. (The filesystem is OCFS2, but that doesn't
matter).
> I've also long ago noticed that reading data from block devices is slower
> than from files from mounted on those block devices file systems. Can
> anybody explain it?
>
> Looks like this is strangeness #2 which we uncovered in our tests (the first
> one was earlier in this thread why the context RA doesn't work with
> cooperative I/O threads as good as it should).
>
> Can you rerun the same 11 tests over a file on the file system, please?
I'll see what I can do. Just te be sure: you want me to run
blockdev-perftest on a file on the OCFS2 filesystem which is mounted
on the client over iScsi, right?
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-10 6:32 ` Ronald Moesbergen
@ 2009-07-10 8:43 ` Vladislav Bolkhovitin
2009-07-10 9:27 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-10 8:43 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
Bart Van Assche
Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>> I've also long ago noticed that reading data from block devices is slower
>> than from files from mounted on those block devices file systems. Can
>> anybody explain it?
>>
>> Looks like this is strangeness #2 which we uncovered in our tests (the first
>> one was earlier in this thread why the context RA doesn't work with
>> cooperative I/O threads as good as it should).
>>
>> Can you rerun the same 11 tests over a file on the file system, please?
>
> I'll see what I can do. Just te be sure: you want me to run
> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
> on the client over iScsi, right?
Yes, please.
> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-10 8:43 ` Vladislav Bolkhovitin
@ 2009-07-10 9:27 ` Vladislav Bolkhovitin
2009-07-13 12:12 ` Ronald Moesbergen
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-10 9:27 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
Bart Van Assche
Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>>> I've also long ago noticed that reading data from block devices is slower
>>> than from files from mounted on those block devices file systems. Can
>>> anybody explain it?
>>>
>>> Looks like this is strangeness #2 which we uncovered in our tests (the first
>>> one was earlier in this thread why the context RA doesn't work with
>>> cooperative I/O threads as good as it should).
>>>
>>> Can you rerun the same 11 tests over a file on the file system, please?
>> I'll see what I can do. Just te be sure: you want me to run
>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>> on the client over iScsi, right?
>
> Yes, please.
Forgot to mention that you should also configure your backend storage as
a big file on a file system (preferably, XFS) too, not as direct device,
like /dev/vg/db-master.
Thanks,
Vlad
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-10 9:27 ` Vladislav Bolkhovitin
@ 2009-07-13 12:12 ` Ronald Moesbergen
2009-07-13 12:36 ` Wu Fengguang
2009-07-14 18:52 ` Vladislav Bolkhovitin
0 siblings, 2 replies; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-13 12:12 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
Bart Van Assche
2009/7/10 Vladislav Bolkhovitin <vst@vlnb.net>:
>
> Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
>>
>> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>>>>
>>>> I've also long ago noticed that reading data from block devices is
>>>> slower
>>>> than from files from mounted on those block devices file systems. Can
>>>> anybody explain it?
>>>>
>>>> Looks like this is strangeness #2 which we uncovered in our tests (the
>>>> first
>>>> one was earlier in this thread why the context RA doesn't work with
>>>> cooperative I/O threads as good as it should).
>>>>
>>>> Can you rerun the same 11 tests over a file on the file system, please?
>>>
>>> I'll see what I can do. Just te be sure: you want me to run
>>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>>> on the client over iScsi, right?
>>
>> Yes, please.
>
> Forgot to mention that you should also configure your backend storage as a
> big file on a file system (preferably, XFS) too, not as direct device, like
> /dev/vg/db-master.
Ok, here are the results:
client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead patch
Test done with XFS on both the target and the initiator. This confirms
your findings, using files instead of block devices is faster, but
only when using the io_context patch.
Without io_context patch:
1) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.327 18.327 17.740 56.491 0.872 0.883
33554432 18.662 18.311 18.116 55.772 0.683 1.743
16777216 18.900 18.421 18.312 55.229 0.754 3.452
8388608 18.893 18.533 18.281 55.156 0.743 6.895
4194304 18.512 18.097 18.400 55.850 0.536 13.963
2097152 18.635 18.313 18.676 55.232 0.486 27.616
1048576 18.441 18.264 18.245 55.907 0.267 55.907
524288 17.773 18.669 18.459 55.980 1.184 111.960
262144 18.580 18.758 17.483 56.091 1.767 224.365
131072 17.224 18.333 18.765 56.626 2.067 453.006
65536 18.082 19.223 18.238 55.348 1.483 885.567
32768 17.719 18.293 18.198 56.680 0.795 1813.766
16384 17.872 18.322 17.537 57.192 1.024 3660.273
2) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.738 18.435 18.400 55.283 0.451 0.864
33554432 18.046 18.167 17.572 57.128 0.826 1.785
16777216 18.504 18.203 18.377 55.771 0.376 3.486
8388608 22.069 18.554 17.825 53.013 4.766 6.627
4194304 19.211 18.136 18.083 55.465 1.529 13.866
2097152 18.647 17.851 18.511 55.866 1.071 27.933
1048576 19.084 18.177 18.194 55.425 1.249 55.425
524288 18.999 18.553 18.380 54.934 0.763 109.868
262144 18.867 18.273 18.063 55.668 1.020 222.673
131072 17.846 18.966 18.193 55.885 1.412 447.081
65536 18.195 18.616 18.482 55.564 0.530 889.023
32768 17.882 18.841 17.707 56.481 1.525 1807.394
16384 17.073 18.278 17.985 57.646 1.689 3689.369
3) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.658 17.830 19.258 55.162 1.750 0.862
33554432 17.193 18.265 18.517 56.974 1.854 1.780
16777216 17.531 17.681 18.776 56.955 1.720 3.560
8388608 18.234 17.547 18.201 56.926 1.014 7.116
4194304 18.057 17.923 17.901 57.015 0.218 14.254
2097152 18.565 17.739 17.658 56.958 1.277 28.479
1048576 18.393 17.433 17.314 57.851 1.550 57.851
524288 18.939 17.835 18.972 55.152 1.600 110.304
262144 18.562 19.005 18.069 55.240 1.141 220.959
131072 19.574 17.562 18.251 55.576 2.476 444.611
65536 19.117 18.019 17.886 55.882 1.647 894.115
32768 18.237 17.415 17.482 57.842 1.200 1850.933
16384 17.760 18.444 18.055 56.631 0.876 3624.391
4) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.368 17.495 18.524 56.520 1.434 0.883
33554432 18.209 17.523 19.146 56.052 2.027 1.752
16777216 18.765 18.053 18.550 55.497 0.903 3.469
8388608 17.878 17.848 18.389 56.778 0.774 7.097
4194304 18.058 17.683 18.567 56.589 1.129 14.147
2097152 18.896 18.384 18.697 54.888 0.623 27.444
1048576 18.505 17.769 17.804 56.826 1.055 56.826
524288 18.319 17.689 17.941 56.955 0.816 113.910
262144 19.227 17.770 18.212 55.704 1.821 222.815
131072 18.738 18.227 17.869 56.044 1.090 448.354
65536 19.319 18.525 18.084 54.969 1.494 879.504
32768 18.321 17.672 17.870 57.047 0.856 1825.495
16384 18.249 17.495 18.146 57.025 1.073 3649.582
With io_context patch:
5) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.393 11.925 12.627 83.196 1.989 1.300
33554432 11.844 11.855 12.191 85.610 1.142 2.675
16777216 12.729 12.602 12.068 82.187 1.913 5.137
8388608 12.245 12.060 14.081 80.419 5.469 10.052
4194304 13.224 11.866 12.110 82.763 3.833 20.691
2097152 11.585 12.584 11.755 85.623 3.052 42.811
1048576 12.166 12.144 12.321 83.867 0.539 83.867
524288 12.019 12.148 12.160 84.568 0.448 169.137
262144 12.014 12.378 12.074 84.259 1.095 337.036
131072 11.840 12.068 11.849 85.921 0.756 687.369
65536 12.098 11.803 12.312 84.857 1.470 1357.720
32768 11.852 12.635 11.887 84.529 2.465 2704.931
16384 12.443 13.110 11.881 82.197 3.299 5260.620
6) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.033 12.122 11.950 82.911 3.110 1.295
33554432 12.386 13.357 12.082 81.364 3.429 2.543
16777216 12.102 11.542 12.053 86.096 1.860 5.381
8388608 12.240 11.740 11.789 85.917 1.601 10.740
4194304 11.824 12.388 12.042 84.768 1.621 21.192
2097152 11.962 12.283 11.973 84.832 1.036 42.416
1048576 12.639 11.863 12.010 84.197 2.290 84.197
524288 11.809 12.919 11.853 84.121 3.439 168.243
262144 12.105 12.649 12.779 81.894 1.940 327.577
131072 12.441 12.769 12.713 81.017 0.923 648.137
65536 12.490 13.308 12.440 80.414 2.457 1286.630
32768 13.235 11.917 12.300 82.184 3.576 2629.883
16384 12.335 12.394 12.201 83.187 0.549 5323.990
7) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.017 12.334 12.151 84.168 0.897 1.315
33554432 12.265 12.200 11.976 84.310 0.864 2.635
16777216 12.356 11.972 12.292 83.903 1.165 5.244
8388608 12.247 12.368 11.769 84.472 1.825 10.559
4194304 11.888 11.974 12.144 85.325 0.754 21.331
2097152 12.433 10.938 11.669 87.911 4.595 43.956
1048576 11.748 12.271 12.498 84.180 2.196 84.180
524288 11.726 11.681 12.322 86.031 2.075 172.062
262144 12.593 12.263 11.939 83.530 1.817 334.119
131072 11.874 12.265 12.441 84.012 1.648 672.093
65536 12.119 11.848 12.037 85.330 0.809 1365.277
32768 12.549 12.080 12.008 83.882 1.625 2684.238
16384 12.369 12.087 12.589 82.949 1.385 5308.766
8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.664 11.793 11.963 84.428 2.575 1.319
33554432 11.825 12.074 12.442 84.571 1.761 2.643
16777216 11.997 11.952 10.905 88.311 3.958 5.519
8388608 11.866 12.270 11.796 85.519 1.476 10.690
4194304 11.754 12.095 12.539 84.483 2.230 21.121
2097152 11.948 11.633 11.886 86.628 1.007 43.314
1048576 12.029 12.519 11.701 84.811 2.345 84.811
524288 11.928 12.011 12.049 85.363 0.361 170.726
262144 12.559 11.827 11.729 85.140 2.566 340.558
131072 12.015 12.356 11.587 85.494 2.253 683.952
65536 11.741 12.113 11.931 85.861 1.093 1373.770
32768 12.655 11.738 12.237 83.945 2.589 2686.246
16384 11.928 12.423 11.875 84.834 1.711 5429.381
9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.570 13.491 14.299 74.326 1.927 1.161
33554432 13.238 13.198 13.255 77.398 0.142 2.419
16777216 13.851 13.199 13.463 75.857 1.497 4.741
8388608 13.339 16.695 13.551 71.223 7.010 8.903
4194304 13.689 13.173 14.258 74.787 2.415 18.697
2097152 13.518 13.543 13.894 75.021 0.934 37.510
1048576 14.119 14.030 13.820 73.202 0.659 73.202
524288 13.747 14.781 13.820 72.621 2.369 145.243
262144 14.168 13.652 14.165 73.189 1.284 292.757
131072 14.112 13.868 14.213 72.817 0.753 582.535
65536 14.604 13.762 13.725 73.045 2.071 1168.728
32768 14.796 15.356 14.486 68.861 1.653 2203.564
16384 13.079 13.525 13.427 76.757 1.111 4912.426
10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 20.372 18.077 17.262 55.411 3.800 0.866
33554432 17.287 17.620 17.828 58.263 0.740 1.821
16777216 16.802 18.154 17.315 58.831 1.865 3.677
8388608 17.510 18.291 17.253 57.939 1.427 7.242
4194304 17.059 17.706 17.352 58.958 0.897 14.740
2097152 17.252 18.064 17.615 58.059 1.090 29.029
1048576 17.082 17.373 17.688 58.927 0.838 58.927
524288 17.129 17.271 17.583 59.103 0.644 118.206
262144 17.411 17.695 18.048 57.808 0.848 231.231
131072 17.937 17.704 18.681 56.581 1.285 452.649
65536 17.927 17.465 17.907 57.646 0.698 922.338
32768 18.494 17.820 17.719 56.875 1.073 1819.985
16384 18.800 17.759 17.575 56.798 1.666 3635.058
11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 20.045 21.881 20.018 49.680 2.037 0.776
33554432 20.768 20.291 20.464 49.938 0.479 1.561
16777216 21.563 20.714 20.429 49.017 1.116 3.064
8388608 21.290 21.109 21.308 48.221 0.205 6.028
4194304 22.240 20.662 21.088 48.054 1.479 12.013
2097152 20.282 21.098 20.580 49.593 0.806 24.796
1048576 20.367 19.929 20.252 50.741 0.469 50.741
524288 20.885 21.203 20.684 48.945 0.498 97.890
262144 19.982 21.375 20.798 49.463 1.373 197.853
131072 20.744 21.590 19.698 49.593 1.866 396.740
65536 21.586 20.953 21.055 48.314 0.627 773.024
32768 21.228 20.307 21.049 49.104 0.950 1571.327
16384 21.257 21.209 21.150 48.289 0.100 3090.498
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-13 12:12 ` Ronald Moesbergen
@ 2009-07-13 12:36 ` Wu Fengguang
2009-07-13 12:47 ` Ronald Moesbergen
2009-07-14 18:52 ` Vladislav Bolkhovitin
2009-07-14 18:52 ` Vladislav Bolkhovitin
1 sibling, 2 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-07-13 12:36 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: Vladislav Bolkhovitin, linux-kernel, akpm, kosaki.motohiro,
Alan.Brunelle, hifumi.hisashi, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
On Mon, Jul 13, 2009 at 08:12:14PM +0800, Ronald Moesbergen wrote:
> 2009/7/10 Vladislav Bolkhovitin <vst@vlnb.net>:
> >
> > Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
> >>
> >> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
> >>>>
> >>>> I've also long ago noticed that reading data from block devices is
> >>>> slower
> >>>> than from files from mounted on those block devices file systems. Can
> >>>> anybody explain it?
> >>>>
> >>>> Looks like this is strangeness #2 which we uncovered in our tests (the
> >>>> first
> >>>> one was earlier in this thread why the context RA doesn't work with
> >>>> cooperative I/O threads as good as it should).
> >>>>
> >>>> Can you rerun the same 11 tests over a file on the file system, please?
> >>>
> >>> I'll see what I can do. Just te be sure: you want me to run
> >>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
> >>> on the client over iScsi, right?
> >>
> >> Yes, please.
> >
> > Forgot to mention that you should also configure your backend storage as a
> > big file on a file system (preferably, XFS) too, not as direct device, like
> > /dev/vg/db-master.
>
> Ok, here are the results:
Ronald, thanks for the numbers!
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead patch
Do you mean the context readahead patch?
> Test done with XFS on both the target and the initiator. This confirms
> your findings, using files instead of block devices is faster, but
> only when using the io_context patch.
It shows that the one really matters is the io_context patch,
even when context readahead is running. I guess what happened
in the tests are:
- without readahead (or readahead algorithm failed to do proper
sequential readaheads), the SCST processes will be submitting
small but close to each other IOs. CFQ relies on the io_context
patch to prevent unnecessary idling.
- with proper readahead, the SCST processes will also be submitting
close readahead IOs. For example, one file's 100-102MB pages is
readahead by process A, while its 102-104MB pages may be
readahead by process B. In this case CFQ will also idle waiting
for process A to submit the next IO, but in fact that IO is being
submitted by process B. So the io_context patch is still necessary
even when context readahead is working fine. I guess context
readahead do have the added value of possibly enlarging the IO size
(however this benchmark seems to not very sensitive to IO size).
Thanks,
Fengguang
> Without io_context patch:
> 1) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.327 18.327 17.740 56.491 0.872 0.883
> 33554432 18.662 18.311 18.116 55.772 0.683 1.743
> 16777216 18.900 18.421 18.312 55.229 0.754 3.452
> 8388608 18.893 18.533 18.281 55.156 0.743 6.895
> 4194304 18.512 18.097 18.400 55.850 0.536 13.963
> 2097152 18.635 18.313 18.676 55.232 0.486 27.616
> 1048576 18.441 18.264 18.245 55.907 0.267 55.907
> 524288 17.773 18.669 18.459 55.980 1.184 111.960
> 262144 18.580 18.758 17.483 56.091 1.767 224.365
> 131072 17.224 18.333 18.765 56.626 2.067 453.006
> 65536 18.082 19.223 18.238 55.348 1.483 885.567
> 32768 17.719 18.293 18.198 56.680 0.795 1813.766
> 16384 17.872 18.322 17.537 57.192 1.024 3660.273
>
> 2) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.738 18.435 18.400 55.283 0.451 0.864
> 33554432 18.046 18.167 17.572 57.128 0.826 1.785
> 16777216 18.504 18.203 18.377 55.771 0.376 3.486
> 8388608 22.069 18.554 17.825 53.013 4.766 6.627
> 4194304 19.211 18.136 18.083 55.465 1.529 13.866
> 2097152 18.647 17.851 18.511 55.866 1.071 27.933
> 1048576 19.084 18.177 18.194 55.425 1.249 55.425
> 524288 18.999 18.553 18.380 54.934 0.763 109.868
> 262144 18.867 18.273 18.063 55.668 1.020 222.673
> 131072 17.846 18.966 18.193 55.885 1.412 447.081
> 65536 18.195 18.616 18.482 55.564 0.530 889.023
> 32768 17.882 18.841 17.707 56.481 1.525 1807.394
> 16384 17.073 18.278 17.985 57.646 1.689 3689.369
>
> 3) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.658 17.830 19.258 55.162 1.750 0.862
> 33554432 17.193 18.265 18.517 56.974 1.854 1.780
> 16777216 17.531 17.681 18.776 56.955 1.720 3.560
> 8388608 18.234 17.547 18.201 56.926 1.014 7.116
> 4194304 18.057 17.923 17.901 57.015 0.218 14.254
> 2097152 18.565 17.739 17.658 56.958 1.277 28.479
> 1048576 18.393 17.433 17.314 57.851 1.550 57.851
> 524288 18.939 17.835 18.972 55.152 1.600 110.304
> 262144 18.562 19.005 18.069 55.240 1.141 220.959
> 131072 19.574 17.562 18.251 55.576 2.476 444.611
> 65536 19.117 18.019 17.886 55.882 1.647 894.115
> 32768 18.237 17.415 17.482 57.842 1.200 1850.933
> 16384 17.760 18.444 18.055 56.631 0.876 3624.391
>
> 4) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.368 17.495 18.524 56.520 1.434 0.883
> 33554432 18.209 17.523 19.146 56.052 2.027 1.752
> 16777216 18.765 18.053 18.550 55.497 0.903 3.469
> 8388608 17.878 17.848 18.389 56.778 0.774 7.097
> 4194304 18.058 17.683 18.567 56.589 1.129 14.147
> 2097152 18.896 18.384 18.697 54.888 0.623 27.444
> 1048576 18.505 17.769 17.804 56.826 1.055 56.826
> 524288 18.319 17.689 17.941 56.955 0.816 113.910
> 262144 19.227 17.770 18.212 55.704 1.821 222.815
> 131072 18.738 18.227 17.869 56.044 1.090 448.354
> 65536 19.319 18.525 18.084 54.969 1.494 879.504
> 32768 18.321 17.672 17.870 57.047 0.856 1825.495
> 16384 18.249 17.495 18.146 57.025 1.073 3649.582
>
> With io_context patch:
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.393 11.925 12.627 83.196 1.989 1.300
> 33554432 11.844 11.855 12.191 85.610 1.142 2.675
> 16777216 12.729 12.602 12.068 82.187 1.913 5.137
> 8388608 12.245 12.060 14.081 80.419 5.469 10.052
> 4194304 13.224 11.866 12.110 82.763 3.833 20.691
> 2097152 11.585 12.584 11.755 85.623 3.052 42.811
> 1048576 12.166 12.144 12.321 83.867 0.539 83.867
> 524288 12.019 12.148 12.160 84.568 0.448 169.137
> 262144 12.014 12.378 12.074 84.259 1.095 337.036
> 131072 11.840 12.068 11.849 85.921 0.756 687.369
> 65536 12.098 11.803 12.312 84.857 1.470 1357.720
> 32768 11.852 12.635 11.887 84.529 2.465 2704.931
> 16384 12.443 13.110 11.881 82.197 3.299 5260.620
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.033 12.122 11.950 82.911 3.110 1.295
> 33554432 12.386 13.357 12.082 81.364 3.429 2.543
> 16777216 12.102 11.542 12.053 86.096 1.860 5.381
> 8388608 12.240 11.740 11.789 85.917 1.601 10.740
> 4194304 11.824 12.388 12.042 84.768 1.621 21.192
> 2097152 11.962 12.283 11.973 84.832 1.036 42.416
> 1048576 12.639 11.863 12.010 84.197 2.290 84.197
> 524288 11.809 12.919 11.853 84.121 3.439 168.243
> 262144 12.105 12.649 12.779 81.894 1.940 327.577
> 131072 12.441 12.769 12.713 81.017 0.923 648.137
> 65536 12.490 13.308 12.440 80.414 2.457 1286.630
> 32768 13.235 11.917 12.300 82.184 3.576 2629.883
> 16384 12.335 12.394 12.201 83.187 0.549 5323.990
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.017 12.334 12.151 84.168 0.897 1.315
> 33554432 12.265 12.200 11.976 84.310 0.864 2.635
> 16777216 12.356 11.972 12.292 83.903 1.165 5.244
> 8388608 12.247 12.368 11.769 84.472 1.825 10.559
> 4194304 11.888 11.974 12.144 85.325 0.754 21.331
> 2097152 12.433 10.938 11.669 87.911 4.595 43.956
> 1048576 11.748 12.271 12.498 84.180 2.196 84.180
> 524288 11.726 11.681 12.322 86.031 2.075 172.062
> 262144 12.593 12.263 11.939 83.530 1.817 334.119
> 131072 11.874 12.265 12.441 84.012 1.648 672.093
> 65536 12.119 11.848 12.037 85.330 0.809 1365.277
> 32768 12.549 12.080 12.008 83.882 1.625 2684.238
> 16384 12.369 12.087 12.589 82.949 1.385 5308.766
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.664 11.793 11.963 84.428 2.575 1.319
> 33554432 11.825 12.074 12.442 84.571 1.761 2.643
> 16777216 11.997 11.952 10.905 88.311 3.958 5.519
> 8388608 11.866 12.270 11.796 85.519 1.476 10.690
> 4194304 11.754 12.095 12.539 84.483 2.230 21.121
> 2097152 11.948 11.633 11.886 86.628 1.007 43.314
> 1048576 12.029 12.519 11.701 84.811 2.345 84.811
> 524288 11.928 12.011 12.049 85.363 0.361 170.726
> 262144 12.559 11.827 11.729 85.140 2.566 340.558
> 131072 12.015 12.356 11.587 85.494 2.253 683.952
> 65536 11.741 12.113 11.931 85.861 1.093 1373.770
> 32768 12.655 11.738 12.237 83.945 2.589 2686.246
> 16384 11.928 12.423 11.875 84.834 1.711 5429.381
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.570 13.491 14.299 74.326 1.927 1.161
> 33554432 13.238 13.198 13.255 77.398 0.142 2.419
> 16777216 13.851 13.199 13.463 75.857 1.497 4.741
> 8388608 13.339 16.695 13.551 71.223 7.010 8.903
> 4194304 13.689 13.173 14.258 74.787 2.415 18.697
> 2097152 13.518 13.543 13.894 75.021 0.934 37.510
> 1048576 14.119 14.030 13.820 73.202 0.659 73.202
> 524288 13.747 14.781 13.820 72.621 2.369 145.243
> 262144 14.168 13.652 14.165 73.189 1.284 292.757
> 131072 14.112 13.868 14.213 72.817 0.753 582.535
> 65536 14.604 13.762 13.725 73.045 2.071 1168.728
> 32768 14.796 15.356 14.486 68.861 1.653 2203.564
> 16384 13.079 13.525 13.427 76.757 1.111 4912.426
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 20.372 18.077 17.262 55.411 3.800 0.866
> 33554432 17.287 17.620 17.828 58.263 0.740 1.821
> 16777216 16.802 18.154 17.315 58.831 1.865 3.677
> 8388608 17.510 18.291 17.253 57.939 1.427 7.242
> 4194304 17.059 17.706 17.352 58.958 0.897 14.740
> 2097152 17.252 18.064 17.615 58.059 1.090 29.029
> 1048576 17.082 17.373 17.688 58.927 0.838 58.927
> 524288 17.129 17.271 17.583 59.103 0.644 118.206
> 262144 17.411 17.695 18.048 57.808 0.848 231.231
> 131072 17.937 17.704 18.681 56.581 1.285 452.649
> 65536 17.927 17.465 17.907 57.646 0.698 922.338
> 32768 18.494 17.820 17.719 56.875 1.073 1819.985
> 16384 18.800 17.759 17.575 56.798 1.666 3635.058
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 20.045 21.881 20.018 49.680 2.037 0.776
> 33554432 20.768 20.291 20.464 49.938 0.479 1.561
> 16777216 21.563 20.714 20.429 49.017 1.116 3.064
> 8388608 21.290 21.109 21.308 48.221 0.205 6.028
> 4194304 22.240 20.662 21.088 48.054 1.479 12.013
> 2097152 20.282 21.098 20.580 49.593 0.806 24.796
> 1048576 20.367 19.929 20.252 50.741 0.469 50.741
> 524288 20.885 21.203 20.684 48.945 0.498 97.890
> 262144 19.982 21.375 20.798 49.463 1.373 197.853
> 131072 20.744 21.590 19.698 49.593 1.866 396.740
> 65536 21.586 20.953 21.055 48.314 0.627 773.024
> 32768 21.228 20.307 21.049 49.104 0.950 1571.327
> 16384 21.257 21.209 21.150 48.289 0.100 3090.498
>
> Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-13 12:36 ` Wu Fengguang
@ 2009-07-13 12:47 ` Ronald Moesbergen
2009-07-13 12:52 ` Wu Fengguang
2009-07-14 18:52 ` Vladislav Bolkhovitin
1 sibling, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-13 12:47 UTC (permalink / raw)
To: Wu Fengguang
Cc: Vladislav Bolkhovitin, linux-kernel, akpm, kosaki.motohiro,
Alan.Brunelle, hifumi.hisashi, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
2009/7/13 Wu Fengguang <fengguang.wu@intel.com>:
> On Mon, Jul 13, 2009 at 08:12:14PM +0800, Ronald Moesbergen wrote:
>> 2009/7/10 Vladislav Bolkhovitin <vst@vlnb.net>:
>> >
>> > Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
>> >>
>> >> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>> >>>>
>> >>>> I've also long ago noticed that reading data from block devices is
>> >>>> slower
>> >>>> than from files from mounted on those block devices file systems. Can
>> >>>> anybody explain it?
>> >>>>
>> >>>> Looks like this is strangeness #2 which we uncovered in our tests (the
>> >>>> first
>> >>>> one was earlier in this thread why the context RA doesn't work with
>> >>>> cooperative I/O threads as good as it should).
>> >>>>
>> >>>> Can you rerun the same 11 tests over a file on the file system, please?
>> >>>
>> >>> I'll see what I can do. Just te be sure: you want me to run
>> >>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>> >>> on the client over iScsi, right?
>> >>
>> >> Yes, please.
>> >
>> > Forgot to mention that you should also configure your backend storage as a
>> > big file on a file system (preferably, XFS) too, not as direct device, like
>> > /dev/vg/db-master.
>>
>> Ok, here are the results:
>
> Ronald, thanks for the numbers!
You're welcome.
>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with readahead patch
>
> Do you mean the context readahead patch?
No, I meant the blk_run_backing_dev patch. The patchnames are
confusing, I'll be sure to clarify them from now on.
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-13 12:47 ` Ronald Moesbergen
@ 2009-07-13 12:52 ` Wu Fengguang
0 siblings, 0 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-07-13 12:52 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: Vladislav Bolkhovitin, linux-kernel, akpm, kosaki.motohiro,
Alan.Brunelle, hifumi.hisashi, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
On Mon, Jul 13, 2009 at 08:47:31PM +0800, Ronald Moesbergen wrote:
> 2009/7/13 Wu Fengguang <fengguang.wu@intel.com>:
> > On Mon, Jul 13, 2009 at 08:12:14PM +0800, Ronald Moesbergen wrote:
> >> 2009/7/10 Vladislav Bolkhovitin <vst@vlnb.net>:
> >> >
> >> > Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
> >> >>
> >> >> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
> >> >>>>
> >> >>>> I've also long ago noticed that reading data from block devices is
> >> >>>> slower
> >> >>>> than from files from mounted on those block devices file systems. Can
> >> >>>> anybody explain it?
> >> >>>>
> >> >>>> Looks like this is strangeness #2 which we uncovered in our tests (the
> >> >>>> first
> >> >>>> one was earlier in this thread why the context RA doesn't work with
> >> >>>> cooperative I/O threads as good as it should).
> >> >>>>
> >> >>>> Can you rerun the same 11 tests over a file on the file system, please?
> >> >>>
> >> >>> I'll see what I can do. Just te be sure: you want me to run
> >> >>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
> >> >>> on the client over iScsi, right?
> >> >>
> >> >> Yes, please.
> >> >
> >> > Forgot to mention that you should also configure your backend storage as a
> >> > big file on a file system (preferably, XFS) too, not as direct device, like
> >> > /dev/vg/db-master.
> >>
> >> Ok, here are the results:
> >
> > Ronald, thanks for the numbers!
>
> You're welcome.
>
> >> client kernel: 2.6.26-15lenny3 (debian)
> >> server kernel: 2.6.29.5 with readahead patch
> >
> > Do you mean the context readahead patch?
>
> No, I meant the blk_run_backing_dev patch. The patchnames are
> confusing, I'll be sure to clarify them from now on.
That's OK. I did see previous benchmarks were not helped by context
readahead noticeably on CFQ, hehe.
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-13 12:12 ` Ronald Moesbergen
2009-07-13 12:36 ` Wu Fengguang
@ 2009-07-14 18:52 ` Vladislav Bolkhovitin
2009-07-15 6:30 ` Vladislav Bolkhovitin
1 sibling, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-14 18:52 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
Bart Van Assche
Ronald Moesbergen, on 07/13/2009 04:12 PM wrote:
> 2009/7/10 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
>>> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>>>>> I've also long ago noticed that reading data from block devices is
>>>>> slower
>>>>> than from files from mounted on those block devices file systems. Can
>>>>> anybody explain it?
>>>>>
>>>>> Looks like this is strangeness #2 which we uncovered in our tests (the
>>>>> first
>>>>> one was earlier in this thread why the context RA doesn't work with
>>>>> cooperative I/O threads as good as it should).
>>>>>
>>>>> Can you rerun the same 11 tests over a file on the file system, please?
>>>> I'll see what I can do. Just te be sure: you want me to run
>>>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>>>> on the client over iScsi, right?
>>> Yes, please.
>> Forgot to mention that you should also configure your backend storage as a
>> big file on a file system (preferably, XFS) too, not as direct device, like
>> /dev/vg/db-master.
>
> Ok, here are the results:
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead patch
>
> Test done with XFS on both the target and the initiator. This confirms
> your findings, using files instead of block devices is faster, but
> only when using the io_context patch.
Seems, correct, except case (2), which is still 10% faster.
> Without io_context patch:
> 1) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.327 18.327 17.740 56.491 0.872 0.883
> 33554432 18.662 18.311 18.116 55.772 0.683 1.743
> 16777216 18.900 18.421 18.312 55.229 0.754 3.452
> 8388608 18.893 18.533 18.281 55.156 0.743 6.895
> 4194304 18.512 18.097 18.400 55.850 0.536 13.963
> 2097152 18.635 18.313 18.676 55.232 0.486 27.616
> 1048576 18.441 18.264 18.245 55.907 0.267 55.907
> 524288 17.773 18.669 18.459 55.980 1.184 111.960
> 262144 18.580 18.758 17.483 56.091 1.767 224.365
> 131072 17.224 18.333 18.765 56.626 2.067 453.006
> 65536 18.082 19.223 18.238 55.348 1.483 885.567
> 32768 17.719 18.293 18.198 56.680 0.795 1813.766
> 16384 17.872 18.322 17.537 57.192 1.024 3660.273
>
> 2) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.738 18.435 18.400 55.283 0.451 0.864
> 33554432 18.046 18.167 17.572 57.128 0.826 1.785
> 16777216 18.504 18.203 18.377 55.771 0.376 3.486
> 8388608 22.069 18.554 17.825 53.013 4.766 6.627
> 4194304 19.211 18.136 18.083 55.465 1.529 13.866
> 2097152 18.647 17.851 18.511 55.866 1.071 27.933
> 1048576 19.084 18.177 18.194 55.425 1.249 55.425
> 524288 18.999 18.553 18.380 54.934 0.763 109.868
> 262144 18.867 18.273 18.063 55.668 1.020 222.673
> 131072 17.846 18.966 18.193 55.885 1.412 447.081
> 65536 18.195 18.616 18.482 55.564 0.530 889.023
> 32768 17.882 18.841 17.707 56.481 1.525 1807.394
> 16384 17.073 18.278 17.985 57.646 1.689 3689.369
>
> 3) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.658 17.830 19.258 55.162 1.750 0.862
> 33554432 17.193 18.265 18.517 56.974 1.854 1.780
> 16777216 17.531 17.681 18.776 56.955 1.720 3.560
> 8388608 18.234 17.547 18.201 56.926 1.014 7.116
> 4194304 18.057 17.923 17.901 57.015 0.218 14.254
> 2097152 18.565 17.739 17.658 56.958 1.277 28.479
> 1048576 18.393 17.433 17.314 57.851 1.550 57.851
> 524288 18.939 17.835 18.972 55.152 1.600 110.304
> 262144 18.562 19.005 18.069 55.240 1.141 220.959
> 131072 19.574 17.562 18.251 55.576 2.476 444.611
> 65536 19.117 18.019 17.886 55.882 1.647 894.115
> 32768 18.237 17.415 17.482 57.842 1.200 1850.933
> 16384 17.760 18.444 18.055 56.631 0.876 3624.391
>
> 4) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.368 17.495 18.524 56.520 1.434 0.883
> 33554432 18.209 17.523 19.146 56.052 2.027 1.752
> 16777216 18.765 18.053 18.550 55.497 0.903 3.469
> 8388608 17.878 17.848 18.389 56.778 0.774 7.097
> 4194304 18.058 17.683 18.567 56.589 1.129 14.147
> 2097152 18.896 18.384 18.697 54.888 0.623 27.444
> 1048576 18.505 17.769 17.804 56.826 1.055 56.826
> 524288 18.319 17.689 17.941 56.955 0.816 113.910
> 262144 19.227 17.770 18.212 55.704 1.821 222.815
> 131072 18.738 18.227 17.869 56.044 1.090 448.354
> 65536 19.319 18.525 18.084 54.969 1.494 879.504
> 32768 18.321 17.672 17.870 57.047 0.856 1825.495
> 16384 18.249 17.495 18.146 57.025 1.073 3649.582
>
> With io_context patch:
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.393 11.925 12.627 83.196 1.989 1.300
> 33554432 11.844 11.855 12.191 85.610 1.142 2.675
> 16777216 12.729 12.602 12.068 82.187 1.913 5.137
> 8388608 12.245 12.060 14.081 80.419 5.469 10.052
> 4194304 13.224 11.866 12.110 82.763 3.833 20.691
> 2097152 11.585 12.584 11.755 85.623 3.052 42.811
> 1048576 12.166 12.144 12.321 83.867 0.539 83.867
> 524288 12.019 12.148 12.160 84.568 0.448 169.137
> 262144 12.014 12.378 12.074 84.259 1.095 337.036
> 131072 11.840 12.068 11.849 85.921 0.756 687.369
> 65536 12.098 11.803 12.312 84.857 1.470 1357.720
> 32768 11.852 12.635 11.887 84.529 2.465 2704.931
> 16384 12.443 13.110 11.881 82.197 3.299 5260.620
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.033 12.122 11.950 82.911 3.110 1.295
> 33554432 12.386 13.357 12.082 81.364 3.429 2.543
> 16777216 12.102 11.542 12.053 86.096 1.860 5.381
> 8388608 12.240 11.740 11.789 85.917 1.601 10.740
> 4194304 11.824 12.388 12.042 84.768 1.621 21.192
> 2097152 11.962 12.283 11.973 84.832 1.036 42.416
> 1048576 12.639 11.863 12.010 84.197 2.290 84.197
> 524288 11.809 12.919 11.853 84.121 3.439 168.243
> 262144 12.105 12.649 12.779 81.894 1.940 327.577
> 131072 12.441 12.769 12.713 81.017 0.923 648.137
> 65536 12.490 13.308 12.440 80.414 2.457 1286.630
> 32768 13.235 11.917 12.300 82.184 3.576 2629.883
> 16384 12.335 12.394 12.201 83.187 0.549 5323.990
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.017 12.334 12.151 84.168 0.897 1.315
> 33554432 12.265 12.200 11.976 84.310 0.864 2.635
> 16777216 12.356 11.972 12.292 83.903 1.165 5.244
> 8388608 12.247 12.368 11.769 84.472 1.825 10.559
> 4194304 11.888 11.974 12.144 85.325 0.754 21.331
> 2097152 12.433 10.938 11.669 87.911 4.595 43.956
> 1048576 11.748 12.271 12.498 84.180 2.196 84.180
> 524288 11.726 11.681 12.322 86.031 2.075 172.062
> 262144 12.593 12.263 11.939 83.530 1.817 334.119
> 131072 11.874 12.265 12.441 84.012 1.648 672.093
> 65536 12.119 11.848 12.037 85.330 0.809 1365.277
> 32768 12.549 12.080 12.008 83.882 1.625 2684.238
> 16384 12.369 12.087 12.589 82.949 1.385 5308.766
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.664 11.793 11.963 84.428 2.575 1.319
> 33554432 11.825 12.074 12.442 84.571 1.761 2.643
> 16777216 11.997 11.952 10.905 88.311 3.958 5.519
> 8388608 11.866 12.270 11.796 85.519 1.476 10.690
> 4194304 11.754 12.095 12.539 84.483 2.230 21.121
> 2097152 11.948 11.633 11.886 86.628 1.007 43.314
> 1048576 12.029 12.519 11.701 84.811 2.345 84.811
> 524288 11.928 12.011 12.049 85.363 0.361 170.726
> 262144 12.559 11.827 11.729 85.140 2.566 340.558
> 131072 12.015 12.356 11.587 85.494 2.253 683.952
> 65536 11.741 12.113 11.931 85.861 1.093 1373.770
> 32768 12.655 11.738 12.237 83.945 2.589 2686.246
> 16384 11.928 12.423 11.875 84.834 1.711 5429.381
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.570 13.491 14.299 74.326 1.927 1.161
> 33554432 13.238 13.198 13.255 77.398 0.142 2.419
> 16777216 13.851 13.199 13.463 75.857 1.497 4.741
> 8388608 13.339 16.695 13.551 71.223 7.010 8.903
> 4194304 13.689 13.173 14.258 74.787 2.415 18.697
> 2097152 13.518 13.543 13.894 75.021 0.934 37.510
> 1048576 14.119 14.030 13.820 73.202 0.659 73.202
> 524288 13.747 14.781 13.820 72.621 2.369 145.243
> 262144 14.168 13.652 14.165 73.189 1.284 292.757
> 131072 14.112 13.868 14.213 72.817 0.753 582.535
> 65536 14.604 13.762 13.725 73.045 2.071 1168.728
> 32768 14.796 15.356 14.486 68.861 1.653 2203.564
> 16384 13.079 13.525 13.427 76.757 1.111 4912.426
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 20.372 18.077 17.262 55.411 3.800 0.866
> 33554432 17.287 17.620 17.828 58.263 0.740 1.821
> 16777216 16.802 18.154 17.315 58.831 1.865 3.677
> 8388608 17.510 18.291 17.253 57.939 1.427 7.242
> 4194304 17.059 17.706 17.352 58.958 0.897 14.740
> 2097152 17.252 18.064 17.615 58.059 1.090 29.029
> 1048576 17.082 17.373 17.688 58.927 0.838 58.927
> 524288 17.129 17.271 17.583 59.103 0.644 118.206
> 262144 17.411 17.695 18.048 57.808 0.848 231.231
> 131072 17.937 17.704 18.681 56.581 1.285 452.649
> 65536 17.927 17.465 17.907 57.646 0.698 922.338
> 32768 18.494 17.820 17.719 56.875 1.073 1819.985
> 16384 18.800 17.759 17.575 56.798 1.666 3635.058
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 20.045 21.881 20.018 49.680 2.037 0.776
> 33554432 20.768 20.291 20.464 49.938 0.479 1.561
> 16777216 21.563 20.714 20.429 49.017 1.116 3.064
> 8388608 21.290 21.109 21.308 48.221 0.205 6.028
> 4194304 22.240 20.662 21.088 48.054 1.479 12.013
> 2097152 20.282 21.098 20.580 49.593 0.806 24.796
> 1048576 20.367 19.929 20.252 50.741 0.469 50.741
> 524288 20.885 21.203 20.684 48.945 0.498 97.890
> 262144 19.982 21.375 20.798 49.463 1.373 197.853
> 131072 20.744 21.590 19.698 49.593 1.866 396.740
> 65536 21.586 20.953 21.055 48.314 0.627 773.024
> 32768 21.228 20.307 21.049 49.104 0.950 1571.327
> 16384 21.257 21.209 21.150 48.289 0.100 3090.498
The drop with 64 max_sectors_kb on the client is a consequence of how
CFQ is working. I can't find the exact code responsible for this, but
from all signs, CFQ stops delaying requests if amount of outstanding
requests exceeds some threshold, which is 2 or 3. With 64 max_sectors_kb
and 5 SCST I/O threads this threshold is exceeded, so CFQ doesn't
recover order of requests, hence the performance drop. With default 512
max_sectors_kb and 128K RA the server sees at max 2 requests at time.
Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
please?
You can limit amount of SCST I/O threads by num_threads parameter of
scst_vdisk module.
Thanks,
Vlad
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-13 12:36 ` Wu Fengguang
2009-07-13 12:47 ` Ronald Moesbergen
@ 2009-07-14 18:52 ` Vladislav Bolkhovitin
2009-07-15 7:06 ` Wu Fengguang
1 sibling, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-14 18:52 UTC (permalink / raw)
To: Wu Fengguang
Cc: Ronald Moesbergen, linux-kernel, akpm, kosaki.motohiro,
Alan.Brunelle, hifumi.hisashi, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
Wu Fengguang, on 07/13/2009 04:36 PM wrote:
>> Test done with XFS on both the target and the initiator. This confirms
>> your findings, using files instead of block devices is faster, but
>> only when using the io_context patch.
>
> It shows that the one really matters is the io_context patch,
> even when context readahead is running. I guess what happened
> in the tests are:
> - without readahead (or readahead algorithm failed to do proper
> sequential readaheads), the SCST processes will be submitting
> small but close to each other IOs. CFQ relies on the io_context
> patch to prevent unnecessary idling.
> - with proper readahead, the SCST processes will also be submitting
> close readahead IOs. For example, one file's 100-102MB pages is
> readahead by process A, while its 102-104MB pages may be
> readahead by process B. In this case CFQ will also idle waiting
> for process A to submit the next IO, but in fact that IO is being
> submitted by process B. So the io_context patch is still necessary
> even when context readahead is working fine. I guess context
> readahead do have the added value of possibly enlarging the IO size
> (however this benchmark seems to not very sensitive to IO size).
Looks like the truth. Although with 2MB RA I expect CFQ to do idling >10
times less, which should bring bigger improvement than few %%.
For how long CFQ idles? For HZ/125, i.e. 8 ms with HZ 250?
> Thanks,
> Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-14 18:52 ` Vladislav Bolkhovitin
@ 2009-07-15 6:30 ` Vladislav Bolkhovitin
2009-07-16 7:32 ` Ronald Moesbergen
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-15 6:30 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
Bart Van Assche
Vladislav Bolkhovitin, on 07/14/2009 10:52 PM wrote:
> Ronald Moesbergen, on 07/13/2009 04:12 PM wrote:
>> 2009/7/10 Vladislav Bolkhovitin <vst@vlnb.net>:
>>> Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
>>>> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>>>>>> I've also long ago noticed that reading data from block devices is
>>>>>> slower
>>>>>> than from files from mounted on those block devices file systems. Can
>>>>>> anybody explain it?
>>>>>>
>>>>>> Looks like this is strangeness #2 which we uncovered in our tests (the
>>>>>> first
>>>>>> one was earlier in this thread why the context RA doesn't work with
>>>>>> cooperative I/O threads as good as it should).
>>>>>>
>>>>>> Can you rerun the same 11 tests over a file on the file system, please?
>>>>> I'll see what I can do. Just te be sure: you want me to run
>>>>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>>>>> on the client over iScsi, right?
>>>> Yes, please.
>>> Forgot to mention that you should also configure your backend storage as a
>>> big file on a file system (preferably, XFS) too, not as direct device, like
>>> /dev/vg/db-master.
>> Ok, here are the results:
>>
>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with readahead patch
>>
>> Test done with XFS on both the target and the initiator. This confirms
>> your findings, using files instead of block devices is faster, but
>> only when using the io_context patch.
>
> Seems, correct, except case (2), which is still 10% faster.
>
>> Without io_context patch:
>> 1) client: default, server: default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 18.327 18.327 17.740 56.491 0.872 0.883
>> 33554432 18.662 18.311 18.116 55.772 0.683 1.743
>> 16777216 18.900 18.421 18.312 55.229 0.754 3.452
>> 8388608 18.893 18.533 18.281 55.156 0.743 6.895
>> 4194304 18.512 18.097 18.400 55.850 0.536 13.963
>> 2097152 18.635 18.313 18.676 55.232 0.486 27.616
>> 1048576 18.441 18.264 18.245 55.907 0.267 55.907
>> 524288 17.773 18.669 18.459 55.980 1.184 111.960
>> 262144 18.580 18.758 17.483 56.091 1.767 224.365
>> 131072 17.224 18.333 18.765 56.626 2.067 453.006
>> 65536 18.082 19.223 18.238 55.348 1.483 885.567
>> 32768 17.719 18.293 18.198 56.680 0.795 1813.766
>> 16384 17.872 18.322 17.537 57.192 1.024 3660.273
>>
>> 2) client: default, server: 64 max_sectors_kb, RA default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 18.738 18.435 18.400 55.283 0.451 0.864
>> 33554432 18.046 18.167 17.572 57.128 0.826 1.785
>> 16777216 18.504 18.203 18.377 55.771 0.376 3.486
>> 8388608 22.069 18.554 17.825 53.013 4.766 6.627
>> 4194304 19.211 18.136 18.083 55.465 1.529 13.866
>> 2097152 18.647 17.851 18.511 55.866 1.071 27.933
>> 1048576 19.084 18.177 18.194 55.425 1.249 55.425
>> 524288 18.999 18.553 18.380 54.934 0.763 109.868
>> 262144 18.867 18.273 18.063 55.668 1.020 222.673
>> 131072 17.846 18.966 18.193 55.885 1.412 447.081
>> 65536 18.195 18.616 18.482 55.564 0.530 889.023
>> 32768 17.882 18.841 17.707 56.481 1.525 1807.394
>> 16384 17.073 18.278 17.985 57.646 1.689 3689.369
>>
>> 3) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 18.658 17.830 19.258 55.162 1.750 0.862
>> 33554432 17.193 18.265 18.517 56.974 1.854 1.780
>> 16777216 17.531 17.681 18.776 56.955 1.720 3.560
>> 8388608 18.234 17.547 18.201 56.926 1.014 7.116
>> 4194304 18.057 17.923 17.901 57.015 0.218 14.254
>> 2097152 18.565 17.739 17.658 56.958 1.277 28.479
>> 1048576 18.393 17.433 17.314 57.851 1.550 57.851
>> 524288 18.939 17.835 18.972 55.152 1.600 110.304
>> 262144 18.562 19.005 18.069 55.240 1.141 220.959
>> 131072 19.574 17.562 18.251 55.576 2.476 444.611
>> 65536 19.117 18.019 17.886 55.882 1.647 894.115
>> 32768 18.237 17.415 17.482 57.842 1.200 1850.933
>> 16384 17.760 18.444 18.055 56.631 0.876 3624.391
>>
>> 4) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 18.368 17.495 18.524 56.520 1.434 0.883
>> 33554432 18.209 17.523 19.146 56.052 2.027 1.752
>> 16777216 18.765 18.053 18.550 55.497 0.903 3.469
>> 8388608 17.878 17.848 18.389 56.778 0.774 7.097
>> 4194304 18.058 17.683 18.567 56.589 1.129 14.147
>> 2097152 18.896 18.384 18.697 54.888 0.623 27.444
>> 1048576 18.505 17.769 17.804 56.826 1.055 56.826
>> 524288 18.319 17.689 17.941 56.955 0.816 113.910
>> 262144 19.227 17.770 18.212 55.704 1.821 222.815
>> 131072 18.738 18.227 17.869 56.044 1.090 448.354
>> 65536 19.319 18.525 18.084 54.969 1.494 879.504
>> 32768 18.321 17.672 17.870 57.047 0.856 1825.495
>> 16384 18.249 17.495 18.146 57.025 1.073 3649.582
>>
>> With io_context patch:
>> 5) client: default, server: default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.393 11.925 12.627 83.196 1.989 1.300
>> 33554432 11.844 11.855 12.191 85.610 1.142 2.675
>> 16777216 12.729 12.602 12.068 82.187 1.913 5.137
>> 8388608 12.245 12.060 14.081 80.419 5.469 10.052
>> 4194304 13.224 11.866 12.110 82.763 3.833 20.691
>> 2097152 11.585 12.584 11.755 85.623 3.052 42.811
>> 1048576 12.166 12.144 12.321 83.867 0.539 83.867
>> 524288 12.019 12.148 12.160 84.568 0.448 169.137
>> 262144 12.014 12.378 12.074 84.259 1.095 337.036
>> 131072 11.840 12.068 11.849 85.921 0.756 687.369
>> 65536 12.098 11.803 12.312 84.857 1.470 1357.720
>> 32768 11.852 12.635 11.887 84.529 2.465 2704.931
>> 16384 12.443 13.110 11.881 82.197 3.299 5260.620
>>
>> 6) client: default, server: 64 max_sectors_kb, RA default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 13.033 12.122 11.950 82.911 3.110 1.295
>> 33554432 12.386 13.357 12.082 81.364 3.429 2.543
>> 16777216 12.102 11.542 12.053 86.096 1.860 5.381
>> 8388608 12.240 11.740 11.789 85.917 1.601 10.740
>> 4194304 11.824 12.388 12.042 84.768 1.621 21.192
>> 2097152 11.962 12.283 11.973 84.832 1.036 42.416
>> 1048576 12.639 11.863 12.010 84.197 2.290 84.197
>> 524288 11.809 12.919 11.853 84.121 3.439 168.243
>> 262144 12.105 12.649 12.779 81.894 1.940 327.577
>> 131072 12.441 12.769 12.713 81.017 0.923 648.137
>> 65536 12.490 13.308 12.440 80.414 2.457 1286.630
>> 32768 13.235 11.917 12.300 82.184 3.576 2629.883
>> 16384 12.335 12.394 12.201 83.187 0.549 5323.990
>>
>> 7) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.017 12.334 12.151 84.168 0.897 1.315
>> 33554432 12.265 12.200 11.976 84.310 0.864 2.635
>> 16777216 12.356 11.972 12.292 83.903 1.165 5.244
>> 8388608 12.247 12.368 11.769 84.472 1.825 10.559
>> 4194304 11.888 11.974 12.144 85.325 0.754 21.331
>> 2097152 12.433 10.938 11.669 87.911 4.595 43.956
>> 1048576 11.748 12.271 12.498 84.180 2.196 84.180
>> 524288 11.726 11.681 12.322 86.031 2.075 172.062
>> 262144 12.593 12.263 11.939 83.530 1.817 334.119
>> 131072 11.874 12.265 12.441 84.012 1.648 672.093
>> 65536 12.119 11.848 12.037 85.330 0.809 1365.277
>> 32768 12.549 12.080 12.008 83.882 1.625 2684.238
>> 16384 12.369 12.087 12.589 82.949 1.385 5308.766
>>
>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.664 11.793 11.963 84.428 2.575 1.319
>> 33554432 11.825 12.074 12.442 84.571 1.761 2.643
>> 16777216 11.997 11.952 10.905 88.311 3.958 5.519
>> 8388608 11.866 12.270 11.796 85.519 1.476 10.690
>> 4194304 11.754 12.095 12.539 84.483 2.230 21.121
>> 2097152 11.948 11.633 11.886 86.628 1.007 43.314
>> 1048576 12.029 12.519 11.701 84.811 2.345 84.811
>> 524288 11.928 12.011 12.049 85.363 0.361 170.726
>> 262144 12.559 11.827 11.729 85.140 2.566 340.558
>> 131072 12.015 12.356 11.587 85.494 2.253 683.952
>> 65536 11.741 12.113 11.931 85.861 1.093 1373.770
>> 32768 12.655 11.738 12.237 83.945 2.589 2686.246
>> 16384 11.928 12.423 11.875 84.834 1.711 5429.381
>>
>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 13.570 13.491 14.299 74.326 1.927 1.161
>> 33554432 13.238 13.198 13.255 77.398 0.142 2.419
>> 16777216 13.851 13.199 13.463 75.857 1.497 4.741
>> 8388608 13.339 16.695 13.551 71.223 7.010 8.903
>> 4194304 13.689 13.173 14.258 74.787 2.415 18.697
>> 2097152 13.518 13.543 13.894 75.021 0.934 37.510
>> 1048576 14.119 14.030 13.820 73.202 0.659 73.202
>> 524288 13.747 14.781 13.820 72.621 2.369 145.243
>> 262144 14.168 13.652 14.165 73.189 1.284 292.757
>> 131072 14.112 13.868 14.213 72.817 0.753 582.535
>> 65536 14.604 13.762 13.725 73.045 2.071 1168.728
>> 32768 14.796 15.356 14.486 68.861 1.653 2203.564
>> 16384 13.079 13.525 13.427 76.757 1.111 4912.426
>>
>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 20.372 18.077 17.262 55.411 3.800 0.866
>> 33554432 17.287 17.620 17.828 58.263 0.740 1.821
>> 16777216 16.802 18.154 17.315 58.831 1.865 3.677
>> 8388608 17.510 18.291 17.253 57.939 1.427 7.242
>> 4194304 17.059 17.706 17.352 58.958 0.897 14.740
>> 2097152 17.252 18.064 17.615 58.059 1.090 29.029
>> 1048576 17.082 17.373 17.688 58.927 0.838 58.927
>> 524288 17.129 17.271 17.583 59.103 0.644 118.206
>> 262144 17.411 17.695 18.048 57.808 0.848 231.231
>> 131072 17.937 17.704 18.681 56.581 1.285 452.649
>> 65536 17.927 17.465 17.907 57.646 0.698 922.338
>> 32768 18.494 17.820 17.719 56.875 1.073 1819.985
>> 16384 18.800 17.759 17.575 56.798 1.666 3635.058
>>
>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 20.045 21.881 20.018 49.680 2.037 0.776
>> 33554432 20.768 20.291 20.464 49.938 0.479 1.561
>> 16777216 21.563 20.714 20.429 49.017 1.116 3.064
>> 8388608 21.290 21.109 21.308 48.221 0.205 6.028
>> 4194304 22.240 20.662 21.088 48.054 1.479 12.013
>> 2097152 20.282 21.098 20.580 49.593 0.806 24.796
>> 1048576 20.367 19.929 20.252 50.741 0.469 50.741
>> 524288 20.885 21.203 20.684 48.945 0.498 97.890
>> 262144 19.982 21.375 20.798 49.463 1.373 197.853
>> 131072 20.744 21.590 19.698 49.593 1.866 396.740
>> 65536 21.586 20.953 21.055 48.314 0.627 773.024
>> 32768 21.228 20.307 21.049 49.104 0.950 1571.327
>> 16384 21.257 21.209 21.150 48.289 0.100 3090.498
>
> The drop with 64 max_sectors_kb on the client is a consequence of how
> CFQ is working. I can't find the exact code responsible for this, but
> from all signs, CFQ stops delaying requests if amount of outstanding
> requests exceeds some threshold, which is 2 or 3. With 64 max_sectors_kb
> and 5 SCST I/O threads this threshold is exceeded, so CFQ doesn't
> recover order of requests, hence the performance drop. With default 512
> max_sectors_kb and 128K RA the server sees at max 2 requests at time.
>
> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
> please?
With context-RA patch, please, in those and future tests, since it
should make RA for cooperative threads much better.
> You can limit amount of SCST I/O threads by num_threads parameter of
> scst_vdisk module.
>
> Thanks,
> Vlad
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-14 18:52 ` Vladislav Bolkhovitin
@ 2009-07-15 7:06 ` Wu Fengguang
0 siblings, 0 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-07-15 7:06 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: Ronald Moesbergen, linux-kernel, akpm, kosaki.motohiro,
Alan.Brunelle, hifumi.hisashi, linux-fsdevel, jens.axboe,
randy.dunlap, Bart Van Assche
On Wed, Jul 15, 2009 at 02:52:27AM +0800, Vladislav Bolkhovitin wrote:
>
> Wu Fengguang, on 07/13/2009 04:36 PM wrote:
> >> Test done with XFS on both the target and the initiator. This confirms
> >> your findings, using files instead of block devices is faster, but
> >> only when using the io_context patch.
> >
> > It shows that the one really matters is the io_context patch,
> > even when context readahead is running. I guess what happened
> > in the tests are:
> > - without readahead (or readahead algorithm failed to do proper
> > sequential readaheads), the SCST processes will be submitting
> > small but close to each other IOs. CFQ relies on the io_context
> > patch to prevent unnecessary idling.
> > - with proper readahead, the SCST processes will also be submitting
> > close readahead IOs. For example, one file's 100-102MB pages is
> > readahead by process A, while its 102-104MB pages may be
> > readahead by process B. In this case CFQ will also idle waiting
> > for process A to submit the next IO, but in fact that IO is being
> > submitted by process B. So the io_context patch is still necessary
> > even when context readahead is working fine. I guess context
> > readahead do have the added value of possibly enlarging the IO size
> > (however this benchmark seems to not very sensitive to IO size).
>
> Looks like the truth. Although with 2MB RA I expect CFQ to do idling >10
> times less, which should bring bigger improvement than few %%.
>
> For how long CFQ idles? For HZ/125, i.e. 8 ms with HZ 250?
Yes, 8ms by default. Note that the 8ms idle time is armed when the
last IO from current process completes. So it would be definitely a
waste if the cooperative process submitted the next read/readahead
IO within this 8ms idle window (without cfq_coop.patch).
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-08 12:40 ` Vladislav Bolkhovitin
2009-07-10 6:32 ` Ronald Moesbergen
@ 2009-07-15 20:52 ` Kurt Garloff
2009-07-16 10:38 ` Vladislav Bolkhovitin
1 sibling, 1 reply; 65+ messages in thread
From: Kurt Garloff @ 2009-07-15 20:52 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 657 bytes --]
Hi,
On Wed, Jul 08, 2009 at 04:40:08PM +0400, Vladislav Bolkhovitin wrote:
> I've also long ago noticed that reading data from block devices is slower
> than from files from mounted on those block devices file systems. Can
> anybody explain it?
Brainstorming:
- block size (reads on the block dev might be done with smaller size)
- readahead (do we use the same RA algo for block devs)
- page cache might be better optimized than buffer cache?
Just guesses from someone that has not looked into that area of the
kernel for a while, so take it with a grain of salt.
Cheers,
--
Kurt Garloff, VP OPS Partner Engineering -- Novell Inc.
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-15 6:30 ` Vladislav Bolkhovitin
@ 2009-07-16 7:32 ` Ronald Moesbergen
2009-07-16 10:36 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-16 7:32 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>> The drop with 64 max_sectors_kb on the client is a consequence of how CFQ
>> is working. I can't find the exact code responsible for this, but from all
>> signs, CFQ stops delaying requests if amount of outstanding requests exceeds
>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>> threads this threshold is exceeded, so CFQ doesn't recover order of
>> requests, hence the performance drop. With default 512 max_sectors_kb and
>> 128K RA the server sees at max 2 requests at time.
>>
>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>> please?
Ok. Should I still use the file-on-xfs testcase for this, or should I
go back to using a regular block device? The file-over-iscsi is quite
uncommon I suppose, most people will export a block device over iscsi,
not a file.
> With context-RA patch, please, in those and future tests, since it should
> make RA for cooperative threads much better.
>
>> You can limit amount of SCST I/O threads by num_threads parameter of
>> scst_vdisk module.
Ok, I'll try that and include the blk_run_backing_dev,
readahead-context and io_context patches.
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-16 7:32 ` Ronald Moesbergen
@ 2009-07-16 10:36 ` Vladislav Bolkhovitin
2009-07-16 14:54 ` Ronald Moesbergen
2009-07-17 14:15 ` Ronald Moesbergen
0 siblings, 2 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-16 10:36 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
> 2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>>> The drop with 64 max_sectors_kb on the client is a consequence of how CFQ
>>> is working. I can't find the exact code responsible for this, but from all
>>> signs, CFQ stops delaying requests if amount of outstanding requests exceeds
>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>> requests, hence the performance drop. With default 512 max_sectors_kb and
>>> 128K RA the server sees at max 2 requests at time.
>>>
>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>> please?
>
> Ok. Should I still use the file-on-xfs testcase for this, or should I
> go back to using a regular block device?
Yes, please
> The file-over-iscsi is quite
> uncommon I suppose, most people will export a block device over iscsi,
> not a file.
No, files are common. The main reason why people use direct block
devices is a not supported by anything believe that comparing with files
they "have less overhead", so "should be faster". But it isn't true and
can be easily checked.
>> With context-RA patch, please, in those and future tests, since it should
>> make RA for cooperative threads much better.
>>
>>> You can limit amount of SCST I/O threads by num_threads parameter of
>>> scst_vdisk module.
>
> Ok, I'll try that and include the blk_run_backing_dev,
> readahead-context and io_context patches.
>
> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-15 20:52 ` Kurt Garloff
@ 2009-07-16 10:38 ` Vladislav Bolkhovitin
0 siblings, 0 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-16 10:38 UTC (permalink / raw)
To: Kurt Garloff, linux-kernel, linux-fsdevel
Kurt Garloff, on 07/16/2009 12:52 AM wrote:
> Hi,
>
> On Wed, Jul 08, 2009 at 04:40:08PM +0400, Vladislav Bolkhovitin wrote:
>> I've also long ago noticed that reading data from block devices is slower
>> than from files from mounted on those block devices file systems. Can
>> anybody explain it?
>
> Brainstorming:
> - block size (reads on the block dev might be done with smaller size)
As we already found out in this and other threads, smaller "block size",
i.e. each request size, is often means better throughput, sometimes much
better.
> - readahead (do we use the same RA algo for block devs)
> - page cache might be better optimized than buffer cache?
>
> Just guesses from someone that has not looked into that area of the
> kernel for a while, so take it with a grain of salt.
>
> Cheers,
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-16 10:36 ` Vladislav Bolkhovitin
@ 2009-07-16 14:54 ` Ronald Moesbergen
2009-07-16 16:03 ` Vladislav Bolkhovitin
2009-07-17 14:15 ` Ronald Moesbergen
1 sibling, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-16 14:54 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
2009/7/16 Vladislav Bolkhovitin <vst@vlnb.net>:
>
> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>
>> 2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>>
>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>> CFQ
>>>> is working. I can't find the exact code responsible for this, but from
>>>> all
>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>> exceeds
>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>> and
>>>> 128K RA the server sees at max 2 requests at time.
>>>>
>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>> please?
>>
>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>> go back to using a regular block device?
>
> Yes, please
As in: Yes, go back to block device, or Yes use file-on-xfs?
>> The file-over-iscsi is quite
>> uncommon I suppose, most people will export a block device over iscsi,
>> not a file.
>
> No, files are common. The main reason why people use direct block devices is
> a not supported by anything believe that comparing with files they "have
> less overhead", so "should be faster". But it isn't true and can be easily
> checked.
Well, there are other advantages of using a block device: they are
generally more manageble, for instance you can use LVM for resizing
instead of strange dd magic to extend a file. When using a file you
have to extend the volume that holds the file first, and then the file
itself. And you don't lose disk space to filesystem metadata twice.
Also, I still don't get why reads/writes from a blockdevice are
different in speed than reads/writes from a file on a filesystem. I
for one will not be using files exported over iscsi, but blockdevices
(LVM volumes).
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-16 14:54 ` Ronald Moesbergen
@ 2009-07-16 16:03 ` Vladislav Bolkhovitin
0 siblings, 0 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-16 16:03 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
Ronald Moesbergen, on 07/16/2009 06:54 PM wrote:
> 2009/7/16 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>> 2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>>> CFQ
>>>>> is working. I can't find the exact code responsible for this, but from
>>>>> all
>>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>>> exceeds
>>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>>> and
>>>>> 128K RA the server sees at max 2 requests at time.
>>>>>
>>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>>> please?
>>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>>> go back to using a regular block device?
>> Yes, please
>
> As in: Yes, go back to block device, or Yes use file-on-xfs?
File-on-xfs :)
>>> The file-over-iscsi is quite
>>> uncommon I suppose, most people will export a block device over iscsi,
>>> not a file.
>> No, files are common. The main reason why people use direct block devices is
>> a not supported by anything believe that comparing with files they "have
>> less overhead", so "should be faster". But it isn't true and can be easily
>> checked.
>
> Well, there are other advantages of using a block device: they are
> generally more manageble, for instance you can use LVM for resizing
> instead of strange dd magic to extend a file. When using a file you
> have to extend the volume that holds the file first, and then the file
> itself.
Files also have advantages. For instance, it's easier to backup them and
move between servers. On modern systems with fallocate() syscall support
you don't have to do "strange dd magic" to resize files and can nearly
instantaneously make them bigger. Also with pretty simple modifications
scst_vdisk can be improved to make a single virtual device from several
files.
> And you don't lose disk space to filesystem metadata twice.
This is negligible (0.05% for XFS)
> Also, I still don't get why reads/writes from a blockdevice are
> different in speed than reads/writes from a file on a filesystem.
Me too and I'd appreciate if someone explain it. But I don't want to
introduce one more variable in the task we are solving (how to make
100+MB/s from iSCSI on your system).
> I
> for one will not be using files exported over iscsi, but blockdevices
> (LVM volumes).
Are you sure?
> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-16 10:36 ` Vladislav Bolkhovitin
2009-07-16 14:54 ` Ronald Moesbergen
@ 2009-07-17 14:15 ` Ronald Moesbergen
2009-07-17 18:23 ` Vladislav Bolkhovitin
1 sibling, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-17 14:15 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
2009/7/16 Vladislav Bolkhovitin <vst@vlnb.net>:
>
> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>
>> 2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>>
>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>> CFQ
>>>> is working. I can't find the exact code responsible for this, but from
>>>> all
>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>> exceeds
>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>> and
>>>> 128K RA the server sees at max 2 requests at time.
>>>>
>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>> please?
>>
>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>> go back to using a regular block device?
>
> Yes, please
>
>> The file-over-iscsi is quite
>> uncommon I suppose, most people will export a block device over iscsi,
>> not a file.
>
> No, files are common. The main reason why people use direct block devices is
> a not supported by anything believe that comparing with files they "have
> less overhead", so "should be faster". But it isn't true and can be easily
> checked.
>
>>> With context-RA patch, please, in those and future tests, since it should
>>> make RA for cooperative threads much better.
>>>
>>>> You can limit amount of SCST I/O threads by num_threads parameter of
>>>> scst_vdisk module.
>>
>> Ok, I'll try that and include the blk_run_backing_dev,
>> readahead-context and io_context patches.
The results:
client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context
With one IO thread:
5) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.990 15.308 16.689 64.097 2.259 1.002
33554432 15.981 16.064 16.221 63.651 0.392 1.989
16777216 15.841 15.660 16.031 64.635 0.619 4.040
6) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.035 16.024 16.654 63.084 1.130 0.986
33554432 15.924 15.975 16.359 63.668 0.762 1.990
16777216 16.168 16.104 15.838 63.858 0.571 3.991
7) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.895 16.142 15.998 65.398 2.379 1.022
33554432 16.753 16.169 16.067 62.729 1.146 1.960
16777216 16.866 15.912 16.099 62.892 1.570 3.931
8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.923 15.716 16.741 63.545 1.715 0.993
33554432 16.010 16.026 16.113 63.802 0.180 1.994
16777216 16.644 16.239 16.143 62.672 0.827 3.917
9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.753 15.882 15.482 65.207 0.697 1.019
33554432 15.670 16.268 15.669 64.548 1.134 2.017
16777216 15.746 15.519 16.411 64.471 1.516 4.029
10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.639 14.360 13.654 73.795 1.758 1.153
33554432 13.584 13.938 14.538 73.095 2.035 2.284
16777216 13.617 13.510 13.803 75.060 0.665 4.691
11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.428 13.541 14.144 74.760 1.690 1.168
33554432 13.707 13.352 13.462 75.821 0.827 2.369
16777216 14.380 13.504 13.675 73.975 1.991 4.623
With two threads:
5) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.453 12.173 13.014 81.677 2.254 1.276
33554432 12.066 11.999 12.960 83.073 2.877 2.596
16777216 13.719 11.969 12.569 80.554 4.500 5.035
6) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.886 12.201 12.147 82.564 2.198 1.290
33554432 12.344 12.928 12.007 82.483 2.504 2.578
16777216 12.380 11.951 13.119 82.151 3.141 5.134
7) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.824 13.485 13.534 77.148 1.913 1.205
33554432 12.084 13.752 12.111 81.251 4.800 2.539
16777216 12.658 13.035 11.196 83.640 5.612 5.227
8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.253 12.552 11.773 84.044 2.230 1.313
33554432 13.177 12.456 11.604 82.723 4.316 2.585
16777216 12.471 12.318 13.006 81.324 1.878 5.083
9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.409 13.311 14.278 73.238 2.624 1.144
33554432 14.665 14.260 14.080 71.455 1.211 2.233
16777216 14.179 14.810 14.640 70.438 1.303 4.402
10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.401 14.107 13.549 74.860 1.642 1.170
33554432 14.575 13.221 14.428 72.894 3.236 2.278
16777216 13.771 14.227 13.594 73.887 1.408 4.618
11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 10.286 12.272 10.245 94.317 7.690 1.474
33554432 10.241 10.415 13.374 91.624 10.670 2.863
16777216 10.499 10.224 10.792 97.526 2.151 6.095
The last result comes close to 100MB/s!
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-17 14:15 ` Ronald Moesbergen
@ 2009-07-17 18:23 ` Vladislav Bolkhovitin
2009-07-20 7:20 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-17 18:23 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
Ronald Moesbergen, on 07/17/2009 06:15 PM wrote:
> 2009/7/16 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>> 2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>>> CFQ
>>>>> is working. I can't find the exact code responsible for this, but from
>>>>> all
>>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>>> exceeds
>>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>>> and
>>>>> 128K RA the server sees at max 2 requests at time.
>>>>>
>>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>>> please?
>>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>>> go back to using a regular block device?
>> Yes, please
>>
>>> The file-over-iscsi is quite
>>> uncommon I suppose, most people will export a block device over iscsi,
>>> not a file.
>> No, files are common. The main reason why people use direct block devices is
>> a not supported by anything believe that comparing with files they "have
>> less overhead", so "should be faster". But it isn't true and can be easily
>> checked.
>>
>>>> With context-RA patch, please, in those and future tests, since it should
>>>> make RA for cooperative threads much better.
>>>>
>>>>> You can limit amount of SCST I/O threads by num_threads parameter of
>>>>> scst_vdisk module.
>>> Ok, I'll try that and include the blk_run_backing_dev,
>>> readahead-context and io_context patches.
>
> The results:
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context
>
> With one IO thread:
>
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.990 15.308 16.689 64.097 2.259 1.002
> 33554432 15.981 16.064 16.221 63.651 0.392 1.989
> 16777216 15.841 15.660 16.031 64.635 0.619 4.040
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.035 16.024 16.654 63.084 1.130 0.986
> 33554432 15.924 15.975 16.359 63.668 0.762 1.990
> 16777216 16.168 16.104 15.838 63.858 0.571 3.991
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.895 16.142 15.998 65.398 2.379 1.022
> 33554432 16.753 16.169 16.067 62.729 1.146 1.960
> 16777216 16.866 15.912 16.099 62.892 1.570 3.931
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.923 15.716 16.741 63.545 1.715 0.993
> 33554432 16.010 16.026 16.113 63.802 0.180 1.994
> 16777216 16.644 16.239 16.143 62.672 0.827 3.917
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.753 15.882 15.482 65.207 0.697 1.019
> 33554432 15.670 16.268 15.669 64.548 1.134 2.017
> 16777216 15.746 15.519 16.411 64.471 1.516 4.029
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.639 14.360 13.654 73.795 1.758 1.153
> 33554432 13.584 13.938 14.538 73.095 2.035 2.284
> 16777216 13.617 13.510 13.803 75.060 0.665 4.691
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.428 13.541 14.144 74.760 1.690 1.168
> 33554432 13.707 13.352 13.462 75.821 0.827 2.369
> 16777216 14.380 13.504 13.675 73.975 1.991 4.623
>
> With two threads:
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.453 12.173 13.014 81.677 2.254 1.276
> 33554432 12.066 11.999 12.960 83.073 2.877 2.596
> 16777216 13.719 11.969 12.569 80.554 4.500 5.035
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.886 12.201 12.147 82.564 2.198 1.290
> 33554432 12.344 12.928 12.007 82.483 2.504 2.578
> 16777216 12.380 11.951 13.119 82.151 3.141 5.134
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.824 13.485 13.534 77.148 1.913 1.205
> 33554432 12.084 13.752 12.111 81.251 4.800 2.539
> 16777216 12.658 13.035 11.196 83.640 5.612 5.227
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.253 12.552 11.773 84.044 2.230 1.313
> 33554432 13.177 12.456 11.604 82.723 4.316 2.585
> 16777216 12.471 12.318 13.006 81.324 1.878 5.083
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.409 13.311 14.278 73.238 2.624 1.144
> 33554432 14.665 14.260 14.080 71.455 1.211 2.233
> 16777216 14.179 14.810 14.640 70.438 1.303 4.402
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.401 14.107 13.549 74.860 1.642 1.170
> 33554432 14.575 13.221 14.428 72.894 3.236 2.278
> 16777216 13.771 14.227 13.594 73.887 1.408 4.618
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 10.286 12.272 10.245 94.317 7.690 1.474
> 33554432 10.241 10.415 13.374 91.624 10.670 2.863
> 16777216 10.499 10.224 10.792 97.526 2.151 6.095
>
> The last result comes close to 100MB/s!
Good! Although I expected maximum with a single thread.
Can you do the same set of tests with deadline scheduler on the server?
Thanks,
Vlad
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-17 18:23 ` Vladislav Bolkhovitin
@ 2009-07-20 7:20 ` Vladislav Bolkhovitin
2009-07-22 8:44 ` Ronald Moesbergen
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-20 7:20 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
Vladislav Bolkhovitin, on 07/17/2009 10:23 PM wrote:
> Ronald Moesbergen, on 07/17/2009 06:15 PM wrote:
>> 2009/7/16 Vladislav Bolkhovitin <vst@vlnb.net>:
>>> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>>> 2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>>>> CFQ
>>>>>> is working. I can't find the exact code responsible for this, but from
>>>>>> all
>>>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>>>> exceeds
>>>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>>>> and
>>>>>> 128K RA the server sees at max 2 requests at time.
>>>>>>
>>>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>>>> please?
>>>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>>>> go back to using a regular block device?
>>> Yes, please
>>>
>>>> The file-over-iscsi is quite
>>>> uncommon I suppose, most people will export a block device over iscsi,
>>>> not a file.
>>> No, files are common. The main reason why people use direct block devices is
>>> a not supported by anything believe that comparing with files they "have
>>> less overhead", so "should be faster". But it isn't true and can be easily
>>> checked.
>>>
>>>>> With context-RA patch, please, in those and future tests, since it should
>>>>> make RA for cooperative threads much better.
>>>>>
>>>>>> You can limit amount of SCST I/O threads by num_threads parameter of
>>>>>> scst_vdisk module.
>>>> Ok, I'll try that and include the blk_run_backing_dev,
>>>> readahead-context and io_context patches.
>> The results:
>>
>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
>> and io_context
>>
>> With one IO thread:
>>
>> 5) client: default, server: default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 15.990 15.308 16.689 64.097 2.259 1.002
>> 33554432 15.981 16.064 16.221 63.651 0.392 1.989
>> 16777216 15.841 15.660 16.031 64.635 0.619 4.040
>>
>> 6) client: default, server: 64 max_sectors_kb, RA default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 16.035 16.024 16.654 63.084 1.130 0.986
>> 33554432 15.924 15.975 16.359 63.668 0.762 1.990
>> 16777216 16.168 16.104 15.838 63.858 0.571 3.991
>>
>> 7) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 14.895 16.142 15.998 65.398 2.379 1.022
>> 33554432 16.753 16.169 16.067 62.729 1.146 1.960
>> 16777216 16.866 15.912 16.099 62.892 1.570 3.931
>>
>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 15.923 15.716 16.741 63.545 1.715 0.993
>> 33554432 16.010 16.026 16.113 63.802 0.180 1.994
>> 16777216 16.644 16.239 16.143 62.672 0.827 3.917
>>
>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 15.753 15.882 15.482 65.207 0.697 1.019
>> 33554432 15.670 16.268 15.669 64.548 1.134 2.017
>> 16777216 15.746 15.519 16.411 64.471 1.516 4.029
>>
>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 13.639 14.360 13.654 73.795 1.758 1.153
>> 33554432 13.584 13.938 14.538 73.095 2.035 2.284
>> 16777216 13.617 13.510 13.803 75.060 0.665 4.691
>>
>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 13.428 13.541 14.144 74.760 1.690 1.168
>> 33554432 13.707 13.352 13.462 75.821 0.827 2.369
>> 16777216 14.380 13.504 13.675 73.975 1.991 4.623
>>
>> With two threads:
>> 5) client: default, server: default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.453 12.173 13.014 81.677 2.254 1.276
>> 33554432 12.066 11.999 12.960 83.073 2.877 2.596
>> 16777216 13.719 11.969 12.569 80.554 4.500 5.035
>>
>> 6) client: default, server: 64 max_sectors_kb, RA default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.886 12.201 12.147 82.564 2.198 1.290
>> 33554432 12.344 12.928 12.007 82.483 2.504 2.578
>> 16777216 12.380 11.951 13.119 82.151 3.141 5.134
>>
>> 7) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.824 13.485 13.534 77.148 1.913 1.205
>> 33554432 12.084 13.752 12.111 81.251 4.800 2.539
>> 16777216 12.658 13.035 11.196 83.640 5.612 5.227
>>
>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.253 12.552 11.773 84.044 2.230 1.313
>> 33554432 13.177 12.456 11.604 82.723 4.316 2.585
>> 16777216 12.471 12.318 13.006 81.324 1.878 5.083
>>
>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 14.409 13.311 14.278 73.238 2.624 1.144
>> 33554432 14.665 14.260 14.080 71.455 1.211 2.233
>> 16777216 14.179 14.810 14.640 70.438 1.303 4.402
>>
>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 13.401 14.107 13.549 74.860 1.642 1.170
>> 33554432 14.575 13.221 14.428 72.894 3.236 2.278
>> 16777216 13.771 14.227 13.594 73.887 1.408 4.618
>>
>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 10.286 12.272 10.245 94.317 7.690 1.474
>> 33554432 10.241 10.415 13.374 91.624 10.670 2.863
>> 16777216 10.499 10.224 10.792 97.526 2.151 6.095
>>
>> The last result comes close to 100MB/s!
>
> Good! Although I expected maximum with a single thread.
>
> Can you do the same set of tests with deadline scheduler on the server?
Case of 5 I/O threads (default) will also be interesting. I.e., overall,
cases of 1, 2 and 5 I/O threads with deadline scheduler on the server.
Thanks,
Vlad
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-20 7:20 ` Vladislav Bolkhovitin
@ 2009-07-22 8:44 ` Ronald Moesbergen
2009-07-27 13:11 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-22 8:44 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
2009/7/20 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>
>>> The last result comes close to 100MB/s!
>>
>> Good! Although I expected maximum with a single thread.
>>
>> Can you do the same set of tests with deadline scheduler on the server?
>
> Case of 5 I/O threads (default) will also be interesting. I.e., overall,
> cases of 1, 2 and 5 I/O threads with deadline scheduler on the server.
Ok. The results:
Cfq seems to perform better in this case.
client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context
server scheduler: deadline
With one IO thread:
5) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.067 16.883 16.096 62.669 1.427 0.979
33554432 16.034 16.564 16.050 63.161 0.948 1.974
16777216 16.045 15.086 16.709 64.329 2.715 4.021
6) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.851 15.348 16.652 64.271 2.147 1.004
33554432 16.182 16.104 16.170 63.397 0.135 1.981
16777216 15.952 16.085 16.258 63.613 0.493 3.976
7) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.814 16.222 16.650 63.126 1.327 0.986
33554432 16.113 15.962 16.340 63.456 0.610 1.983
16777216 16.149 16.098 15.895 63.815 0.438 3.988
8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.032 17.163 15.864 62.695 2.161 0.980
33554432 16.163 15.499 16.466 63.870 1.626 1.996
16777216 16.067 16.133 16.710 62.829 1.099 3.927
9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.498 15.474 15.195 66.547 0.599 1.040
33554432 15.729 15.636 15.758 65.192 0.214 2.037
16777216 15.656 15.481 15.724 65.557 0.430 4.097
10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.480 14.125 13.648 74.497 1.466 1.164
33554432 13.584 13.518 14.272 74.293 1.806 2.322
16777216 13.511 13.585 13.552 75.576 0.170 4.723
11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.356 13.079 13.488 76.960 0.991 1.203
33554432 13.713 13.038 13.030 77.268 1.834 2.415
16777216 13.895 13.032 13.128 76.758 2.178 4.797
With two threads:
5) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.661 12.773 13.654 78.681 2.622 1.229
33554432 12.709 12.693 12.459 81.145 0.738 2.536
16777216 12.657 14.055 13.237 77.038 3.292 4.815
6) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.300 12.877 13.705 77.078 1.964 1.204
33554432 13.025 14.404 12.833 76.501 3.855 2.391
16777216 13.172 13.220 12.997 77.995 0.570 4.875
7) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.365 13.168 12.835 78.053 1.308 1.220
33554432 13.518 13.122 13.366 76.799 0.942 2.400
16777216 13.177 13.146 13.839 76.534 1.797 4.783
8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.308 12.669 13.520 76.045 3.788 1.188
33554432 12.586 12.897 13.221 79.405 1.596 2.481
16777216 13.766 12.583 14.176 76.001 3.903 4.750
9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.454 12.537 15.058 73.509 5.893 1.149
33554432 15.871 14.201 13.846 70.194 4.083 2.194
16777216 14.721 13.346 14.434 72.410 3.104 4.526
10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.262 13.308 13.416 76.828 0.371 1.200
33554432 13.915 13.182 13.065 76.551 2.114 2.392
16777216 13.223 14.133 13.317 75.596 2.232 4.725
11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.277 17.743 17.534 57.380 0.997 0.897
33554432 18.018 17.728 17.343 57.879 0.907 1.809
16777216 17.600 18.466 17.645 57.223 1.253 3.576
With five threads:
5) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.915 13.643 12.572 78.598 2.654 1.228
33554432 12.716 12.970 13.283 78.858 1.403 2.464
16777216 14.372 13.282 13.122 75.461 3.002 4.716
6) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.372 13.205 12.468 78.750 2.421 1.230
33554432 13.489 13.352 12.883 77.363 1.533 2.418
16777216 13.127 12.653 14.252 76.928 3.785 4.808
7) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.135 13.031 13.824 76.872 1.994 1.201
33554432 13.079 13.590 13.730 76.076 1.600 2.377
16777216 12.707 12.951 13.805 77.942 2.735 4.871
8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.030 12.947 13.538 77.772 1.524 1.215
33554432 12.826 12.973 13.805 77.649 2.482 2.427
16777216 12.751 13.007 12.986 79.295 0.718 4.956
9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.236 13.349 13.833 76.034 1.445 1.188
33554432 13.481 14.259 13.582 74.389 1.836 2.325
16777216 14.394 13.922 13.943 72.712 1.111 4.545
10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.245 18.690 17.342 56.654 1.779 0.885
33554432 17.744 18.122 17.577 57.492 0.731 1.797
16777216 18.280 18.564 17.846 56.186 0.914 3.512
11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.241 16.894 15.853 64.131 2.705 1.002
33554432 14.858 16.904 15.588 65.064 3.435 2.033
16777216 16.777 15.939 15.034 64.465 2.893 4.029
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-22 8:44 ` Ronald Moesbergen
@ 2009-07-27 13:11 ` Vladislav Bolkhovitin
2009-07-28 9:51 ` Ronald Moesbergen
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-27 13:11 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
[-- Attachment #1: Type: text/plain, Size: 9408 bytes --]
Ronald Moesbergen, on 07/22/2009 12:44 PM wrote:
> 2009/7/20 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>> The last result comes close to 100MB/s!
>>> Good! Although I expected maximum with a single thread.
>>>
>>> Can you do the same set of tests with deadline scheduler on the server?
>> Case of 5 I/O threads (default) will also be interesting. I.e., overall,
>> cases of 1, 2 and 5 I/O threads with deadline scheduler on the server.
>
> Ok. The results:
>
> Cfq seems to perform better in this case.
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context
> server scheduler: deadline
>
> With one IO thread:
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.067 16.883 16.096 62.669 1.427 0.979
> 33554432 16.034 16.564 16.050 63.161 0.948 1.974
> 16777216 16.045 15.086 16.709 64.329 2.715 4.021
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.851 15.348 16.652 64.271 2.147 1.004
> 33554432 16.182 16.104 16.170 63.397 0.135 1.981
> 16777216 15.952 16.085 16.258 63.613 0.493 3.976
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.814 16.222 16.650 63.126 1.327 0.986
> 33554432 16.113 15.962 16.340 63.456 0.610 1.983
> 16777216 16.149 16.098 15.895 63.815 0.438 3.988
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.032 17.163 15.864 62.695 2.161 0.980
> 33554432 16.163 15.499 16.466 63.870 1.626 1.996
> 16777216 16.067 16.133 16.710 62.829 1.099 3.927
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.498 15.474 15.195 66.547 0.599 1.040
> 33554432 15.729 15.636 15.758 65.192 0.214 2.037
> 16777216 15.656 15.481 15.724 65.557 0.430 4.097
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.480 14.125 13.648 74.497 1.466 1.164
> 33554432 13.584 13.518 14.272 74.293 1.806 2.322
> 16777216 13.511 13.585 13.552 75.576 0.170 4.723
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.356 13.079 13.488 76.960 0.991 1.203
> 33554432 13.713 13.038 13.030 77.268 1.834 2.415
> 16777216 13.895 13.032 13.128 76.758 2.178 4.797
>
> With two threads:
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.661 12.773 13.654 78.681 2.622 1.229
> 33554432 12.709 12.693 12.459 81.145 0.738 2.536
> 16777216 12.657 14.055 13.237 77.038 3.292 4.815
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.300 12.877 13.705 77.078 1.964 1.204
> 33554432 13.025 14.404 12.833 76.501 3.855 2.391
> 16777216 13.172 13.220 12.997 77.995 0.570 4.875
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.365 13.168 12.835 78.053 1.308 1.220
> 33554432 13.518 13.122 13.366 76.799 0.942 2.400
> 16777216 13.177 13.146 13.839 76.534 1.797 4.783
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.308 12.669 13.520 76.045 3.788 1.188
> 33554432 12.586 12.897 13.221 79.405 1.596 2.481
> 16777216 13.766 12.583 14.176 76.001 3.903 4.750
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.454 12.537 15.058 73.509 5.893 1.149
> 33554432 15.871 14.201 13.846 70.194 4.083 2.194
> 16777216 14.721 13.346 14.434 72.410 3.104 4.526
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.262 13.308 13.416 76.828 0.371 1.200
> 33554432 13.915 13.182 13.065 76.551 2.114 2.392
> 16777216 13.223 14.133 13.317 75.596 2.232 4.725
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.277 17.743 17.534 57.380 0.997 0.897
> 33554432 18.018 17.728 17.343 57.879 0.907 1.809
> 16777216 17.600 18.466 17.645 57.223 1.253 3.576
>
> With five threads:
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.915 13.643 12.572 78.598 2.654 1.228
> 33554432 12.716 12.970 13.283 78.858 1.403 2.464
> 16777216 14.372 13.282 13.122 75.461 3.002 4.716
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.372 13.205 12.468 78.750 2.421 1.230
> 33554432 13.489 13.352 12.883 77.363 1.533 2.418
> 16777216 13.127 12.653 14.252 76.928 3.785 4.808
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.135 13.031 13.824 76.872 1.994 1.201
> 33554432 13.079 13.590 13.730 76.076 1.600 2.377
> 16777216 12.707 12.951 13.805 77.942 2.735 4.871
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.030 12.947 13.538 77.772 1.524 1.215
> 33554432 12.826 12.973 13.805 77.649 2.482 2.427
> 16777216 12.751 13.007 12.986 79.295 0.718 4.956
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.236 13.349 13.833 76.034 1.445 1.188
> 33554432 13.481 14.259 13.582 74.389 1.836 2.325
> 16777216 14.394 13.922 13.943 72.712 1.111 4.545
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.245 18.690 17.342 56.654 1.779 0.885
> 33554432 17.744 18.122 17.577 57.492 0.731 1.797
> 16777216 18.280 18.564 17.846 56.186 0.914 3.512
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.241 16.894 15.853 64.131 2.705 1.002
> 33554432 14.858 16.904 15.588 65.064 3.435 2.033
> 16777216 16.777 15.939 15.034 64.465 2.893 4.029
Hmm, it's really weird, why the case of 2 threads is faster. There must
be some commands reordering somewhere in SCST, which I'm missing, like
list_add() instead of list_add_tail().
Can you apply the attached patch and repeat tests 5, 8 and 11 with 1 and
2 threads, please. The patch will enable forced commands order
protection, i.e. with it all the commands will be executed in exactly
the same order as they were received.
Thanks,
Vlad
[-- Attachment #2: forced_order.diff --]
[-- Type: text/x-patch, Size: 747 bytes --]
Index: scst/src/scst_targ.c
===================================================================
--- scst/src/scst_targ.c (revision 971)
+++ scst/src/scst_targ.c (working copy)
@@ -3182,10 +3182,10 @@ static void scst_cmd_set_sn(struct scst_
switch (cmd->queue_type) {
case SCST_CMD_QUEUE_SIMPLE:
case SCST_CMD_QUEUE_UNTAGGED:
-#if 0 /* left for future performance investigations */
- if (scst_cmd_is_expected_set(cmd)) {
+#if 1 /* left for future performance investigations */
+/* if (scst_cmd_is_expected_set(cmd)) {
if ((cmd->expected_data_direction == SCST_DATA_READ) &&
- (atomic_read(&cmd->dev->write_cmd_count) == 0))
+ (atomic_read(&cmd->dev->write_cmd_count) == 0))*/
goto ordered;
} else
goto ordered;
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-27 13:11 ` Vladislav Bolkhovitin
@ 2009-07-28 9:51 ` Ronald Moesbergen
2009-07-28 19:07 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-28 9:51 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
2009/7/27 Vladislav Bolkhovitin <vst@vlnb.net>:
>
> Hmm, it's really weird, why the case of 2 threads is faster. There must be
> some commands reordering somewhere in SCST, which I'm missing, like
> list_add() instead of list_add_tail().
>
> Can you apply the attached patch and repeat tests 5, 8 and 11 with 1 and 2
> threads, please. The patch will enable forced commands order protection,
> i.e. with it all the commands will be executed in exactly the same order as
> they were received.
The patched source doesn't compile. I changed the code to this:
@ line 3184:
case SCST_CMD_QUEUE_UNTAGGED:
#if 1 /* left for future performance investigations */
goto ordered;
#endif
The results:
Overall performance seems lower.
client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context, forced_order
With one IO thread:
5) client: default, server: default (cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.484 16.417 16.068 62.741 0.706 0.980
33554432 15.684 16.348 16.011 63.961 1.083 1.999
16777216 16.044 16.239 15.938 63.710 0.493 3.982
8) client: default, server: 64 max_sectors_kb, RA 2MB (cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.127 15.784 16.210 63.847 0.740 0.998
33554432 16.103 16.072 16.106 63.627 0.061 1.988
16777216 16.637 16.058 16.154 62.902 0.970 3.931
11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB (cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.417 15.219 13.912 72.405 3.785 1.131
33554432 13.868 13.789 14.110 73.558 0.718 2.299
16777216 13.691 13.784 10.280 82.898 11.822 5.181
11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA
2MB (deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.604 13.532 13.978 74.733 1.055 1.168
33554432 13.523 13.166 13.504 76.443 0.945 2.389
16777216 13.434 13.409 13.632 75.902 0.557 4.744
With two threads:
5) client: default, server: default (cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.206 16.001 15.908 63.851 0.493 0.998
33554432 16.927 16.033 15.991 62.799 1.631 1.962
16777216 16.566 15.968 16.212 63.035 0.950 3.940
8) client: default, server: 64 max_sectors_kb, RA 2MB (cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.017 15.849 15.748 64.521 0.450 1.008
33554432 16.652 15.542 16.259 63.454 1.823 1.983
16777216 16.456 16.071 15.943 63.392 0.849 3.962
11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB (cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.109 9.985 13.548 83.572 13.478 1.306
33554432 13.698 14.236 13.754 73.711 1.267 2.303
16777216 13.610 12.090 14.136 77.458 5.244 4.841
11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA
2MB (deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.542 13.975 13.978 74.049 1.110 1.157
33554432 9.921 13.272 13.321 85.746 12.349 2.680
16777216 13.850 13.600 13.344 75.324 1.144 4.708
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-28 9:51 ` Ronald Moesbergen
@ 2009-07-28 19:07 ` Vladislav Bolkhovitin
2009-07-29 12:48 ` Ronald Moesbergen
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-28 19:07 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
Ronald Moesbergen, on 07/28/2009 01:51 PM wrote:
> 2009/7/27 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Hmm, it's really weird, why the case of 2 threads is faster. There must be
>> some commands reordering somewhere in SCST, which I'm missing, like
>> list_add() instead of list_add_tail().
>>
>> Can you apply the attached patch and repeat tests 5, 8 and 11 with 1 and 2
>> threads, please. The patch will enable forced commands order protection,
>> i.e. with it all the commands will be executed in exactly the same order as
>> they were received.
>
> The patched source doesn't compile. I changed the code to this:
>
> @ line 3184:
>
> case SCST_CMD_QUEUE_UNTAGGED:
> #if 1 /* left for future performance investigations */
> goto ordered;
> #endif
>
> The results:
>
> Overall performance seems lower.
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context, forced_order
>
> With one IO thread:
> 5) client: default, server: default (cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.484 16.417 16.068 62.741 0.706 0.980
> 33554432 15.684 16.348 16.011 63.961 1.083 1.999
> 16777216 16.044 16.239 15.938 63.710 0.493 3.982
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.127 15.784 16.210 63.847 0.740 0.998
> 33554432 16.103 16.072 16.106 63.627 0.061 1.988
> 16777216 16.637 16.058 16.154 62.902 0.970 3.931
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB (cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.417 15.219 13.912 72.405 3.785 1.131
> 33554432 13.868 13.789 14.110 73.558 0.718 2.299
> 16777216 13.691 13.784 10.280 82.898 11.822 5.181
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA
> 2MB (deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.604 13.532 13.978 74.733 1.055 1.168
> 33554432 13.523 13.166 13.504 76.443 0.945 2.389
> 16777216 13.434 13.409 13.632 75.902 0.557 4.744
>
> With two threads:
> 5) client: default, server: default (cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.206 16.001 15.908 63.851 0.493 0.998
> 33554432 16.927 16.033 15.991 62.799 1.631 1.962
> 16777216 16.566 15.968 16.212 63.035 0.950 3.940
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.017 15.849 15.748 64.521 0.450 1.008
> 33554432 16.652 15.542 16.259 63.454 1.823 1.983
> 16777216 16.456 16.071 15.943 63.392 0.849 3.962
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB (cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.109 9.985 13.548 83.572 13.478 1.306
> 33554432 13.698 14.236 13.754 73.711 1.267 2.303
> 16777216 13.610 12.090 14.136 77.458 5.244 4.841
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA
> 2MB (deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.542 13.975 13.978 74.049 1.110 1.157
> 33554432 9.921 13.272 13.321 85.746 12.349 2.680
> 16777216 13.850 13.600 13.344 75.324 1.144 4.708
Can you perform the tests 5 and 8 the deadline? I asked for deadline..
What I/O scheduler do you use on the initiator? Can you check if
changing it to deadline or noop makes any difference?
Thanks,
Vlad
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-28 19:07 ` Vladislav Bolkhovitin
@ 2009-07-29 12:48 ` Ronald Moesbergen
2009-07-31 18:32 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-29 12:48 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
2009/7/28 Vladislav Bolkhovitin <vst@vlnb.net>:
>
> Can you perform the tests 5 and 8 the deadline? I asked for deadline..
>
> What I/O scheduler do you use on the initiator? Can you check if changing it
> to deadline or noop makes any difference?
>
client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context, forced_order
With one IO thread:
5) client: default, server: default (server deadline, client cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.739 15.339 16.511 64.613 1.959 1.010
33554432 15.411 12.384 15.400 71.876 7.646 2.246
16777216 16.564 15.569 16.279 63.498 1.667 3.969
5) client: default, server: default (server deadline, client deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 17.578 20.051 18.010 55.395 3.111 0.866
33554432 19.247 12.607 17.930 63.846 12.390 1.995
16777216 14.587 19.631 18.032 59.718 7.650 3.732
8) client: default, server: 64 max_sectors_kb, RA 2MB (server
deadline, client deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 17.418 19.520 22.050 52.564 5.043 0.821
33554432 21.263 17.623 17.782 54.616 4.571 1.707
16777216 17.896 18.335 19.407 55.278 1.864 3.455
8) client: default, server: 64 max_sectors_kb, RA 2MB (server
deadline, client cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.639 15.216 16.035 64.233 2.365 1.004
33554432 15.750 16.511 16.092 63.557 1.224 1.986
16777216 16.390 15.866 15.331 64.604 1.763 4.038
11) client: 2MB RA, 64 max_sectors_kb, server: 64 max_sectors_kb, RA
2MB (server deadline, client deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.117 13.610 13.558 74.435 1.347 1.163
33554432 13.450 10.344 13.556 83.555 10.918 2.611
16777216 13.408 13.319 13.239 76.867 0.398 4.804
With two threads:
5) client: default, server: default (server deadline, client cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.723 16.535 16.189 63.438 1.312 0.991
33554432 16.152 16.363 15.782 63.621 0.954 1.988
16777216 15.174 16.084 16.682 64.178 2.516 4.011
5) client: default, server: default (server deadline, client deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.087 18.082 17.639 57.099 0.674 0.892
33554432 18.377 15.750 17.551 59.694 3.912 1.865
16777216 18.490 15.553 18.778 58.585 5.143 3.662
8) client: default, server: 64 max_sectors_kb, RA 2MB (server
deadline, client deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.140 19.114 17.442 56.244 2.103 0.879
33554432 17.183 17.233 21.367 55.646 5.461 1.739
16777216 19.813 17.965 18.132 55.053 2.393 3.441
8) client: default, server: 64 max_sectors_kb, RA 2MB (server
deadline, client cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.753 16.085 16.522 63.548 1.239 0.993
33554432 13.502 15.912 15.507 68.743 5.065 2.148
16777216 16.584 16.171 15.959 63.077 1.003 3.942
11) client: 2MB RA, 64 max_sectors_kb, server: 64 max_sectors_kb, RA
2MB (server deadline, client deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.051 13.427 13.498 75.001 1.510 1.172
33554432 13.397 14.008 13.453 75.217 1.503 2.351
16777216 13.277 9.942 14.318 83.882 13.712 5.243
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-29 12:48 ` Ronald Moesbergen
@ 2009-07-31 18:32 ` Vladislav Bolkhovitin
2009-08-03 9:15 ` Ronald Moesbergen
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-31 18:32 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
Ronald Moesbergen, on 07/29/2009 04:48 PM wrote:
> 2009/7/28 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Can you perform the tests 5 and 8 the deadline? I asked for deadline..
>>
>> What I/O scheduler do you use on the initiator? Can you check if changing it
>> to deadline or noop makes any difference?
>>
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context, forced_order
>
> With one IO thread:
> 5) client: default, server: default (server deadline, client cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.739 15.339 16.511 64.613 1.959 1.010
> 33554432 15.411 12.384 15.400 71.876 7.646 2.246
> 16777216 16.564 15.569 16.279 63.498 1.667 3.969
>
> 5) client: default, server: default (server deadline, client deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 17.578 20.051 18.010 55.395 3.111 0.866
> 33554432 19.247 12.607 17.930 63.846 12.390 1.995
> 16777216 14.587 19.631 18.032 59.718 7.650 3.732
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (server
> deadline, client deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 17.418 19.520 22.050 52.564 5.043 0.821
> 33554432 21.263 17.623 17.782 54.616 4.571 1.707
> 16777216 17.896 18.335 19.407 55.278 1.864 3.455
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (server
> deadline, client cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.639 15.216 16.035 64.233 2.365 1.004
> 33554432 15.750 16.511 16.092 63.557 1.224 1.986
> 16777216 16.390 15.866 15.331 64.604 1.763 4.038
>
> 11) client: 2MB RA, 64 max_sectors_kb, server: 64 max_sectors_kb, RA
> 2MB (server deadline, client deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.117 13.610 13.558 74.435 1.347 1.163
> 33554432 13.450 10.344 13.556 83.555 10.918 2.611
> 16777216 13.408 13.319 13.239 76.867 0.398 4.804
>
> With two threads:
> 5) client: default, server: default (server deadline, client cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.723 16.535 16.189 63.438 1.312 0.991
> 33554432 16.152 16.363 15.782 63.621 0.954 1.988
> 16777216 15.174 16.084 16.682 64.178 2.516 4.011
>
> 5) client: default, server: default (server deadline, client deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.087 18.082 17.639 57.099 0.674 0.892
> 33554432 18.377 15.750 17.551 59.694 3.912 1.865
> 16777216 18.490 15.553 18.778 58.585 5.143 3.662
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (server
> deadline, client deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.140 19.114 17.442 56.244 2.103 0.879
> 33554432 17.183 17.233 21.367 55.646 5.461 1.739
> 16777216 19.813 17.965 18.132 55.053 2.393 3.441
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (server
> deadline, client cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.753 16.085 16.522 63.548 1.239 0.993
> 33554432 13.502 15.912 15.507 68.743 5.065 2.148
> 16777216 16.584 16.171 15.959 63.077 1.003 3.942
>
> 11) client: 2MB RA, 64 max_sectors_kb, server: 64 max_sectors_kb, RA
> 2MB (server deadline, client deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.051 13.427 13.498 75.001 1.510 1.172
> 33554432 13.397 14.008 13.453 75.217 1.503 2.351
> 16777216 13.277 9.942 14.318 83.882 13.712 5.243
OK, as I expected, on the SCST level everything is clear and the forced
ordering change didn't change anything.
But still, a single read stream must be the fastest from single thread.
Otherwise, there's something wrong somewhere in the I/O path: block
layer, RA, I/O scheduler. And, apparently, this is what we have and
should find out the cause.
Can you check if noop on the target and/or initiator makes any
difference? Case 5 with 1 and 2 threads will be sufficient.
Thanks,
Vlad
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-07-31 18:32 ` Vladislav Bolkhovitin
@ 2009-08-03 9:15 ` Ronald Moesbergen
2009-08-03 9:20 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-08-03 9:15 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
2009/7/31 Vladislav Bolkhovitin <vst@vlnb.net>:
>
> OK, as I expected, on the SCST level everything is clear and the forced
> ordering change didn't change anything.
>
> But still, a single read stream must be the fastest from single thread.
> Otherwise, there's something wrong somewhere in the I/O path: block layer,
> RA, I/O scheduler. And, apparently, this is what we have and should find out
> the cause.
>
> Can you check if noop on the target and/or initiator makes any difference?
> Case 5 with 1 and 2 threads will be sufficient.
That doesn't seem to help:
client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context, forced_order
With one IO thread:
5) client: default, server: default (server noop, client noop)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 17.612 21.113 21.355 51.532 4.680 0.805
33554432 18.329 18.523 19.049 54.969 0.891 1.718
16777216 18.497 18.219 17.042 57.217 2.059 3.576
With two threads:
5) client: default, server: default (server noop, client noop)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 17.436 18.376 20.493 54.807 3.634 0.856
33554432 17.466 16.980 18.261 58.337 1.740 1.823
16777216 18.222 17.567 18.077 57.045 0.901 3.565
Ronald.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-08-03 9:15 ` Ronald Moesbergen
@ 2009-08-03 9:20 ` Vladislav Bolkhovitin
2009-08-03 11:44 ` Ronald Moesbergen
0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-08-03 9:20 UTC (permalink / raw)
To: Ronald Moesbergen
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
Ronald Moesbergen, on 08/03/2009 01:15 PM wrote:
> 2009/7/31 Vladislav Bolkhovitin <vst@vlnb.net>:
>> OK, as I expected, on the SCST level everything is clear and the forced
>> ordering change didn't change anything.
>>
>> But still, a single read stream must be the fastest from single thread.
>> Otherwise, there's something wrong somewhere in the I/O path: block layer,
>> RA, I/O scheduler. And, apparently, this is what we have and should find out
>> the cause.
>>
>> Can you check if noop on the target and/or initiator makes any difference?
>> Case 5 with 1 and 2 threads will be sufficient.
>
> That doesn't seem to help:
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context, forced_order
>
> With one IO thread:
> 5) client: default, server: default (server noop, client noop)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 17.612 21.113 21.355 51.532 4.680 0.805
> 33554432 18.329 18.523 19.049 54.969 0.891 1.718
> 16777216 18.497 18.219 17.042 57.217 2.059 3.576
>
> With two threads:
> 5) client: default, server: default (server noop, client noop)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 17.436 18.376 20.493 54.807 3.634 0.856
> 33554432 17.466 16.980 18.261 58.337 1.740 1.823
> 16777216 18.222 17.567 18.077 57.045 0.901 3.565
And with client cfq, server noop?
> Ronald.
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-08-03 9:20 ` Vladislav Bolkhovitin
@ 2009-08-03 11:44 ` Ronald Moesbergen
0 siblings, 0 replies; 65+ messages in thread
From: Ronald Moesbergen @ 2009-08-03 11:44 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche
2009/8/3 Vladislav Bolkhovitin <vst@vlnb.net>:
> Ronald Moesbergen, on 08/03/2009 01:15 PM wrote:
>>
>> 2009/7/31 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>
>>> OK, as I expected, on the SCST level everything is clear and the forced
>>> ordering change didn't change anything.
>>>
>>> But still, a single read stream must be the fastest from single thread.
>>> Otherwise, there's something wrong somewhere in the I/O path: block
>>> layer,
>>> RA, I/O scheduler. And, apparently, this is what we have and should find
>>> out
>>> the cause.
>>>
>>> Can you check if noop on the target and/or initiator makes any
>>> difference?
>>> Case 5 with 1 and 2 threads will be sufficient.
>>
>> That doesn't seem to help:
>>
>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
>> and io_context, forced_order
>>
>> With one IO thread:
>> 5) client: default, server: default (server noop, client noop)
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 17.612 21.113 21.355 51.532 4.680 0.805
>> 33554432 18.329 18.523 19.049 54.969 0.891 1.718
>> 16777216 18.497 18.219 17.042 57.217 2.059 3.576
>>
>> With two threads:
>> 5) client: default, server: default (server noop, client noop)
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 17.436 18.376 20.493 54.807 3.634 0.856
>> 33554432 17.466 16.980 18.261 58.337 1.740 1.823
>> 16777216 18.222 17.567 18.077 57.045 0.901 3.565
>
> And with client cfq, server noop?
client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context, forced_order
With one IO thread:
5) client: default, server: default (server noop, client cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.019 16.434 15.730 63.777 1.144 0.997
33554432 16.020 16.624 15.936 63.258 1.183 1.977
16777216 15.966 15.465 16.115 64.630 1.145 4.039
With two threads:
5) client: default, server: default (server noop, client cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.504 15.762 14.842 65.335 2.848 1.021
33554432 16.080 16.627 15.766 63.406 1.386 1.981
16777216 15.489 16.627 16.043 63.842 1.846 3.990
Ronald.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
2009-05-29 5:35 [RESEND] [PATCH] readahead:add blk_run_backing_dev Hisashi Hifumi
2009-06-01 0:36 ` Andrew Morton
@ 2009-09-22 20:58 ` Andrew Morton
1 sibling, 0 replies; 65+ messages in thread
From: Andrew Morton @ 2009-09-22 20:58 UTC (permalink / raw)
To: Hisashi Hifumi; +Cc: linux-kernel, linux-fsdevel, linux-mm, Wu Fengguang
On Fri, 29 May 2009 14:35:55 +0900
Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> wrote:
> I added blk_run_backing_dev on page_cache_async_readahead
> so readahead I/O is unpluged to improve throughput on
> especially RAID environment.
I still haven't sent this upstream. It's unclear to me that we've
decided that it merits merging?
From: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
I added blk_run_backing_dev on page_cache_async_readahead so readahead I/O
is unpluged to improve throughput on especially RAID environment.
The normal case is, if page N become uptodate at time T(N), then T(N) <=
T(N+1) holds. With RAID (and NFS to some degree), there is no strict
ordering, the data arrival time depends on runtime status of individual
disks, which breaks that formula. So in do_generic_file_read(), just
after submitting the async readahead IO request, the current page may well
be uptodate, so the page won't be locked, and the block device won't be
implicitly unplugged:
if (PageReadahead(page))
page_cache_async_readahead()
if (!PageUptodate(page))
goto page_not_up_to_date;
//...
page_not_up_to_date:
lock_page_killable(page);
Therefore explicit unplugging can help.
Following is the test result with dd.
#dd if=testdir/testfile of=/dev/null bs=16384
-2.6.30-rc6
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s
-2.6.30-rc6-patched
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s
(7Disks RAID-0 Array)
-2.6.30-rc6
1054976+0 records in
1054976+0 records out
17284726784 bytes (17 GB) copied, 212.233 seconds, 81.4 MB/s
-2.6.30-rc6-patched
1054976+0 records out
17284726784 bytes (17 GB) copied, 198.878 seconds, 86.9 MB/s
(7Disks RAID-5 Array)
The patch was found to improve performance with the SCST scsi target
driver. See
http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel
[akpm@linux-foundation.org: unbust comment layout]
[akpm@linux-foundation.org: "fix" CONFIG_BLOCK=n]
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: Ronald <intercommit@gmail.com>
Cc: Bart Van Assche <bart.vanassche@gmail.com>
Cc: Vladislav Bolkhovitin <vst@vlnb.net>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/readahead.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff -puN mm/readahead.c~readahead-add-blk_run_backing_dev mm/readahead.c
--- a/mm/readahead.c~readahead-add-blk_run_backing_dev
+++ a/mm/readahead.c
@@ -547,5 +547,17 @@ page_cache_async_readahead(struct addres
/* do read-ahead */
ondemand_readahead(mapping, ra, filp, true, offset, req_size);
+
+#ifdef CONFIG_BLOCK
+ /*
+ * Normally the current page is !uptodate and lock_page() will be
+ * immediately called to implicitly unplug the device. However this
+ * is not always true for RAID conifgurations, where data arrives
+ * not strictly in their submission order. In this case we need to
+ * explicitly kick off the IO.
+ */
+ if (PageUptodate(page))
+ blk_run_backing_dev(mapping->backing_dev_info, NULL);
+#endif
}
EXPORT_SYMBOL_GPL(page_cache_async_readahead);
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 65+ messages in thread
* [RESEND][PATCH] readahead:add blk_run_backing_dev
@ 2009-05-22 0:09 Hisashi Hifumi
0 siblings, 0 replies; 65+ messages in thread
From: Hisashi Hifumi @ 2009-05-22 0:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel
Hi Andrew.
Following patch improves sequential read performance and does not harm
other performance.
Please merge my patch.
Comments?
Thanks.
#dd if=testdir/testfile of=/dev/null bs=16384
-2.6.30-rc6
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s
-2.6.30-rc6-patched
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
diff -Nrup linux-2.6.30-rc6.org/mm/readahead.c linux-2.6.30-rc6.unplug/mm/readahead.c
--- linux-2.6.30-rc6.org/mm/readahead.c 2009-05-18 10:46:15.000000000 +0900
+++ linux-2.6.30-rc6.unplug/mm/readahead.c 2009-05-18 13:00:42.000000000 +0900
@@ -490,5 +490,7 @@ page_cache_async_readahead(struct addres
/* do read-ahead */
ondemand_readahead(mapping, ra, filp, true, offset, req_size);
+
+ blk_run_backing_dev(mapping->backing_dev_info, NULL);
}
EXPORT_SYMBOL_GPL(page_cache_async_readahead);
^ permalink raw reply [flat|nested] 65+ messages in thread
end of thread, other threads:[~2009-09-22 20:58 UTC | newest]
Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-29 5:35 [RESEND] [PATCH] readahead:add blk_run_backing_dev Hisashi Hifumi
2009-06-01 0:36 ` Andrew Morton
2009-06-01 1:04 ` Hisashi Hifumi
2009-06-05 15:15 ` Alan D. Brunelle
2009-06-06 14:36 ` KOSAKI Motohiro
2009-06-06 22:45 ` Wu Fengguang
2009-06-18 19:04 ` Andrew Morton
2009-06-20 3:55 ` Wu Fengguang
2009-06-20 12:29 ` Vladislav Bolkhovitin
2009-06-29 9:34 ` Wu Fengguang
2009-06-29 10:26 ` Ronald Moesbergen
2009-06-29 10:55 ` Vladislav Bolkhovitin
2009-06-29 12:54 ` Wu Fengguang
2009-06-29 12:58 ` Bart Van Assche
2009-06-29 13:01 ` Wu Fengguang
2009-06-29 13:04 ` Vladislav Bolkhovitin
2009-06-29 13:13 ` Wu Fengguang
2009-06-29 13:28 ` Wu Fengguang
2009-06-29 14:43 ` Ronald Moesbergen
2009-06-29 14:51 ` Wu Fengguang
2009-06-29 14:56 ` Ronald Moesbergen
2009-06-29 15:37 ` Vladislav Bolkhovitin
2009-06-29 14:00 ` Ronald Moesbergen
2009-06-29 14:21 ` Wu Fengguang
2009-06-29 15:01 ` Wu Fengguang
2009-06-29 15:37 ` Vladislav Bolkhovitin
[not found] ` <20090630010414.GB31418@localhost>
[not found] ` <4A49EEF9.6010205@vlnb.net>
[not found] ` <a0272b440907030214l4016422bxbc98fd003bfe1b3d@mail.gmail.com>
[not found] ` <4A4DE3C1.5080307@vlnb.net>
[not found] ` <a0272b440907040819l5289483cp44b37d967440ef73@mail.gmail.com>
2009-07-06 11:12 ` Vladislav Bolkhovitin
2009-07-06 14:37 ` Ronald Moesbergen
2009-07-06 17:48 ` Vladislav Bolkhovitin
2009-07-07 6:49 ` Ronald Moesbergen
[not found] ` <4A5395FD.2040507@vlnb.net>
[not found] ` <a0272b440907080149j3eeeb9bat13f942520db059a8@mail.gmail.com>
2009-07-08 12:40 ` Vladislav Bolkhovitin
2009-07-10 6:32 ` Ronald Moesbergen
2009-07-10 8:43 ` Vladislav Bolkhovitin
2009-07-10 9:27 ` Vladislav Bolkhovitin
2009-07-13 12:12 ` Ronald Moesbergen
2009-07-13 12:36 ` Wu Fengguang
2009-07-13 12:47 ` Ronald Moesbergen
2009-07-13 12:52 ` Wu Fengguang
2009-07-14 18:52 ` Vladislav Bolkhovitin
2009-07-15 7:06 ` Wu Fengguang
2009-07-14 18:52 ` Vladislav Bolkhovitin
2009-07-15 6:30 ` Vladislav Bolkhovitin
2009-07-16 7:32 ` Ronald Moesbergen
2009-07-16 10:36 ` Vladislav Bolkhovitin
2009-07-16 14:54 ` Ronald Moesbergen
2009-07-16 16:03 ` Vladislav Bolkhovitin
2009-07-17 14:15 ` Ronald Moesbergen
2009-07-17 18:23 ` Vladislav Bolkhovitin
2009-07-20 7:20 ` Vladislav Bolkhovitin
2009-07-22 8:44 ` Ronald Moesbergen
2009-07-27 13:11 ` Vladislav Bolkhovitin
2009-07-28 9:51 ` Ronald Moesbergen
2009-07-28 19:07 ` Vladislav Bolkhovitin
2009-07-29 12:48 ` Ronald Moesbergen
2009-07-31 18:32 ` Vladislav Bolkhovitin
2009-08-03 9:15 ` Ronald Moesbergen
2009-08-03 9:20 ` Vladislav Bolkhovitin
2009-08-03 11:44 ` Ronald Moesbergen
2009-07-15 20:52 ` Kurt Garloff
2009-07-16 10:38 ` Vladislav Bolkhovitin
2009-06-30 10:22 ` Vladislav Bolkhovitin
2009-06-29 10:55 ` Vladislav Bolkhovitin
2009-06-29 13:00 ` Wu Fengguang
2009-09-22 20:58 ` Andrew Morton
-- strict thread matches above, loose matches on Subject: below --
2009-05-22 0:09 [RESEND][PATCH] " Hisashi Hifumi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).