linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND] [PATCH] readahead:add blk_run_backing_dev
@ 2009-05-29  5:35 Hisashi Hifumi
  2009-06-01  0:36 ` Andrew Morton
  2009-09-22 20:58 ` Andrew Morton
  0 siblings, 2 replies; 65+ messages in thread
From: Hisashi Hifumi @ 2009-05-29  5:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel

Hi Andrew.

I added blk_run_backing_dev on page_cache_async_readahead
so readahead I/O is unpluged to improve throughput on 
especially RAID environment. 

The normal case is, if page N become uptodate at time T(N), then
T(N) <= T(N+1) holds. With RAID (and NFS to some degree), there 
is no strict ordering, the data arrival time depends on
runtime status of individual disks, which breaks that formula. So
in do_generic_file_read(), just after submitting the async readahead IO
request, the current page may well be uptodate, so the page won't be locked,
and the block device won't be implicitly unplugged:

               if (PageReadahead(page))
                        page_cache_async_readahead()
                if (!PageUptodate(page))
                                goto page_not_up_to_date;
                //...
page_not_up_to_date:
                lock_page_killable(page);

Therefore explicit unplugging can help.

Following is the test result with dd.

#dd if=testdir/testfile of=/dev/null bs=16384

-2.6.30-rc6
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s

-2.6.30-rc6-patched
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s

(7Disks RAID-0 Array)

-2.6.30-rc6
1054976+0 records in
1054976+0 records out
17284726784 bytes (17 GB) copied, 212.233 seconds, 81.4 MB/s

-2.6.30-rc6-patched
1054976+0 records out
17284726784 bytes (17 GB) copied, 198.878 seconds, 86.9 MB/s

(7Disks RAID-5 Array)

Thanks.

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Acked-by: Wu Fengguang <fengguang.wu@intel.com> 


 mm/readahead.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

--- linux.orig/mm/readahead.c
+++ linux/mm/readahead.c
@@ -490,5 +490,15 @@ page_cache_async_readahead(struct addres
 
 	/* do read-ahead */
 	ondemand_readahead(mapping, ra, filp, true, offset, req_size);
+
+	/*
+	* Normally the current page is !uptodate and lock_page() will be
+	* immediately called to implicitly unplug the device. However this
+	* is not always true for RAID conifgurations, where data arrives
+	* not strictly in their submission order. In this case we need to
+	* explicitly kick off the IO.
+	*/
+	if (PageUptodate(page))
+		blk_run_backing_dev(mapping->backing_dev_info, NULL);
 }
 EXPORT_SYMBOL_GPL(page_cache_async_readahead); 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-05-29  5:35 [RESEND] [PATCH] readahead:add blk_run_backing_dev Hisashi Hifumi
@ 2009-06-01  0:36 ` Andrew Morton
  2009-06-01  1:04   ` Hisashi Hifumi
  2009-09-22 20:58 ` Andrew Morton
  1 sibling, 1 reply; 65+ messages in thread
From: Andrew Morton @ 2009-06-01  0:36 UTC (permalink / raw)
  To: Hisashi Hifumi; +Cc: linux-kernel, linux-fsdevel

On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> wrote:

> I added blk_run_backing_dev on page_cache_async_readahead
> so readahead I/O is unpluged to improve throughput on 
> especially RAID environment. 

I skipped the last version of this because KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> said "Please attach blktrace analysis ;)".

I'm not sure why he asked for that, but he's a smart chap and
presumably had his reasons.

If you think that such an analysis is unneeded, or isn't worth the time
to generate then please tell us that.  But please don't just ignore the
request!

Thanks.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-01  0:36 ` Andrew Morton
@ 2009-06-01  1:04   ` Hisashi Hifumi
  2009-06-05 15:15     ` Alan D. Brunelle
  0 siblings, 1 reply; 65+ messages in thread
From: Hisashi Hifumi @ 2009-06-01  1:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel


At 09:36 09/06/01, Andrew Morton wrote:
>On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi 
><hifumi.hisashi@oss.ntt.co.jp> wrote:
>
>> I added blk_run_backing_dev on page_cache_async_readahead
>> so readahead I/O is unpluged to improve throughput on 
>> especially RAID environment. 
>
>I skipped the last version of this because KOSAKI Motohiro
><kosaki.motohiro@jp.fujitsu.com> said "Please attach blktrace analysis ;)".
>
>I'm not sure why he asked for that, but he's a smart chap and
>presumably had his reasons.
>
>If you think that such an analysis is unneeded, or isn't worth the time
>to generate then please tell us that.  But please don't just ignore the
>request!

Hi Andrew.

Sorry for this.

I did not ignore KOSAKI Motohiro's request.
I've got blktrace output for both with and without the patch, 
but I just did not clarify the reason for throuput improvement
from this result.

I do not notice any difference except around unplug behavior by dd.
Comments?

-2.6.30-rc6
  8,0    3   177784    50.001437357     0  C   R 8717567 + 512 [0]
  8,0    3   177785    50.001635405  4148  A   R 8718079 + 256 <- (8,1) 8718016
  8,0    3   177786    50.001635675  4148  Q   R 8718079 + 256 [dd]
  8,0    3   177787    50.001637517  4148  G   R 8718079 + 256 [dd]
  8,0    3   177788    50.001638954  4148  P   N [dd]
  8,0    3   177789    50.001639290  4148  I   R 8718079 + 256 [dd]
  8,0    3   177790    50.001765339  4148  A   R 8718335 + 256 <- (8,1) 8718272
  8,0    3   177791    50.001765699  4148  Q   R 8718335 + 256 [dd]
  8,0    3   177792    50.001766971  4148  M   R 8718335 + 256 [dd]
  8,0    3   177793    50.001768243  4148  U   N [dd] 1
  8,0    3   177794    50.001769464  4148  D   R 8718079 + 512 [dd]
  8,0    3   177795    50.003815034     0  C   R 8718079 + 512 [0]
  8,0    3   177796    50.004008636  4148  A   R 8718591 + 256 <- (8,1) 8718528
  8,0    3   177797    50.004008951  4148  Q   R 8718591 + 256 [dd]
  8,0    3   177798    50.004010787  4148  G   R 8718591 + 256 [dd]
  8,0    3   177799    50.004012089  4148  P   N [dd]
  8,0    3   177800    50.004012641  4148  I   R 8718591 + 256 [dd]
  8,0    3   177801    50.004139944  4148  A   R 8718847 + 256 <- (8,1) 8718784
  8,0    3   177802    50.004140298  4148  Q   R 8718847 + 256 [dd]
  8,0    3   177803    50.004141393  4148  M   R 8718847 + 256 [dd]
  8,0    3   177804    50.004142815  4148  U   N [dd] 1
  8,0    3   177805    50.004144003  4148  D   R 8718591 + 512 [dd]
  8,0    3   177806    50.007151480     0  C   R 8718591 + 512 [0]
  8,0    3   177807    50.007344467  4148  A   R 8719103 + 256 <- (8,1) 8719040
  8,0    3   177808    50.007344779  4148  Q   R 8719103 + 256 [dd]
  8,0    3   177809    50.007346636  4148  G   R 8719103 + 256 [dd]
  8,0    3   177810    50.007347821  4148  P   N [dd]
  8,0    3   177811    50.007348346  4148  I   R 8719103 + 256 [dd]
  8,0    3   177812    50.007480827  4148  A   R 8719359 + 256 <- (8,1) 8719296
  8,0    3   177813    50.007481187  4148  Q   R 8719359 + 256 [dd]
  8,0    3   177814    50.007482669  4148  M   R 8719359 + 256 [dd]
  8,0    3   177815    50.007483965  4148  U   N [dd] 1
  8,0    3   177816    50.007485171  4148  D   R 8719103 + 512 [dd]
  8,0    3   177817    50.009885672     0  C   R 8719103 + 512 [0]
  8,0    3   177818    50.010077696  4148  A   R 8719615 + 256 <- (8,1) 8719552
  8,0    3   177819    50.010078008  4148  Q   R 8719615 + 256 [dd]
  8,0    3   177820    50.010079841  4148  G   R 8719615 + 256 [dd]
  8,0    3   177821    50.010081227  4148  P   N [dd]
  8,0    3   177822    50.010081560  4148  I   R 8719615 + 256 [dd]
  8,0    3   177823    50.010208686  4148  A   R 8719871 + 256 <- (8,1) 8719808
  8,0    3   177824    50.010209046  4148  Q   R 8719871 + 256 [dd]
  8,0    3   177825    50.010210366  4148  M   R 8719871 + 256 [dd]
  8,0    3   177826    50.010211686  4148  U   N [dd] 1
  8,0    3   177827    50.010212916  4148  D   R 8719615 + 512 [dd]
  8,0    3   177828    50.013880081     0  C   R 8719615 + 512 [0]
  8,0    3   177829    50.014071235  4148  A   R 8720127 + 256 <- (8,1) 8720064
  8,0    3   177830    50.014071544  4148  Q   R 8720127 + 256 [dd]
  8,0    3   177831    50.014073332  4148  G   R 8720127 + 256 [dd]
  8,0    3   177832    50.014074517  4148  P   N [dd]
  8,0    3   177833    50.014075084  4148  I   R 8720127 + 256 [dd]
  8,0    3   177834    50.014201763  4148  A   R 8720383 + 256 <- (8,1) 8720320
  8,0    3   177835    50.014202123  4148  Q   R 8720383 + 256 [dd]
  8,0    3   177836    50.014203608  4148  M   R 8720383 + 256 [dd]
  8,0    3   177837    50.014204889  4148  U   N [dd] 1
  8,0    3   177838    50.014206095  4148  D   R 8720127 + 512 [dd]
  8,0    3   177839    50.017545281     0  C   R 8720127 + 512 [0]
  8,0    3   177840    50.017741679  4148  A   R 8720639 + 256 <- (8,1) 8720576
  8,0    3   177841    50.017742006  4148  Q   R 8720639 + 256 [dd]
  8,0    3   177842    50.017743848  4148  G   R 8720639 + 256 [dd]
  8,0    3   177843    50.017745318  4148  P   N [dd]
  8,0    3   177844    50.017745672  4148  I   R 8720639 + 256 [dd]
  8,0    3   177845    50.017876956  4148  A   R 8720895 + 256 <- (8,1) 8720832
  8,0    3   177846    50.017877286  4148  Q   R 8720895 + 256 [dd]
  8,0    3   177847    50.017878615  4148  M   R 8720895 + 256 [dd]
  8,0    3   177848    50.017880082  4148  U   N [dd] 1
  8,0    3   177849    50.017881339  4148  D   R 8720639 + 512 [dd]
  8,0    3   177850    50.020674534     0  C   R 8720639 + 512 [0]
  8,0    3   177851    50.020864689  4148  A   R 8721151 + 256 <- (8,1) 8721088
  8,0    3   177852    50.020865007  4148  Q   R 8721151 + 256 [dd]
  8,0    3   177853    50.020866900  4148  G   R 8721151 + 256 [dd]
  8,0    3   177854    50.020868283  4148  P   N [dd]
  8,0    3   177855    50.020868628  4148  I   R 8721151 + 256 [dd]
  8,0    3   177856    50.020997302  4148  A   R 8721407 + 256 <- (8,1) 8721344
  8,0    3   177857    50.020997662  4148  Q   R 8721407 + 256 [dd]
  8,0    3   177858    50.020998976  4148  M   R 8721407 + 256 [dd]
  8,0    3   177859    50.021000305  4148  U   N [dd] 1
  8,0    3   177860    50.021001520  4148  D   R 8721151 + 512 [dd]
  8,0    3   177861    50.024269136     0  C   R 8721151 + 512 [0]
  8,0    3   177862    50.024460931  4148  A   R 8721663 + 256 <- (8,1) 8721600
  8,0    3   177863    50.024461337  4148  Q   R 8721663 + 256 [dd]
  8,0    3   177864    50.024463175  4148  G   R 8721663 + 256 [dd]
  8,0    3   177865    50.024464537  4148  P   N [dd]
  8,0    3   177866    50.024464871  4148  I   R 8721663 + 256 [dd]
  8,0    3   177867    50.024597943  4148  A   R 8721919 + 256 <- (8,1) 8721856
  8,0    3   177868    50.024598213  4148  Q   R 8721919 + 256 [dd]
  8,0    3   177869    50.024599323  4148  M   R 8721919 + 256 [dd]
  8,0    3   177870    50.024600751  4148  U   N [dd] 1
  8,0    3   177871    50.024602104  4148  D   R 8721663 + 512 [dd]
  8,0    3   177872    50.026966145     0  C   R 8721663 + 512 [0]
  8,0    3   177873    50.027157245  4148  A   R 8722175 + 256 <- (8,1) 8722112
  8,0    3   177874    50.027157563  4148  Q   R 8722175 + 256 [dd]
  8,0    3   177875    50.027159351  4148  G   R 8722175 + 256 [dd]
  8,0    3   177876    50.027160731  4148  P   N [dd]
  8,0    3   177877    50.027161064  4148  I   R 8722175 + 256 [dd]
  8,0    3   177878    50.027288745  4148  A   R 8722431 + 256 <- (8,1) 8722368
  8,0    3   177879    50.027289105  4148  Q   R 8722431 + 256 [dd]
  8,0    3   177880    50.027290206  4148  M   R 8722431 + 256 [dd]
  8,0    3   177881    50.027291697  4148  U   N [dd] 1
  8,0    3   177882    50.027293119  4148  D   R 8722175 + 512 [dd]
  8,0    3   177883    50.030406105     0  C   R 8722175 + 512 [0]
  8,0    3   177884    50.030600613  4148  A   R 8722687 + 256 <- (8,1) 8722624
  8,0    3   177885    50.030601199  4148  Q   R 8722687 + 256 [dd]
  8,0    3   177886    50.030603269  4148  G   R 8722687 + 256 [dd]
  8,0    3   177887    50.030604463  4148  P   N [dd]
  8,0    3   177888    50.030604799  4148  I   R 8722687 + 256 [dd]
  8,0    3   177889    50.030731757  4148  A   R 8722943 + 256 <- (8,1) 8722880
  8,0    3   177890    50.030732117  4148  Q   R 8722943 + 256 [dd]
  8,0    3   177891    50.030733397  4148  M   R 8722943 + 256 [dd]
  8,0    3   177892    50.030734882  4148  U   N [dd] 1
  8,0    3   177893    50.030736109  4148  D   R 8722687 + 512 [dd]
  8,0    3   177894    50.032916699     0  C   R 8722687 + 512 [0]
  8,0    3   177895    50.033176618  4148  A   R 8723199 + 256 <- (8,1) 8723136
  8,0    3   177896    50.033177218  4148  Q   R 8723199 + 256 [dd]
  8,0    3   177897    50.033181433  4148  G   R 8723199 + 256 [dd]
  8,0    3   177898    50.033184757  4148  P   N [dd]
  8,0    3   177899    50.033185642  4148  I   R 8723199 + 256 [dd]
  8,0    3   177900    50.033371264  4148  A   R 8723455 + 256 <- (8,1) 8723392
  8,0    3   177901    50.033371717  4148  Q   R 8723455 + 256 [dd]
  8,0    3   177902    50.033374015  4148  M   R 8723455 + 256 [dd]
  8,0    3   177903    50.033376814  4148  U   N [dd] 1
  8,0    3   177904    50.033380126  4148  D   R 8723199 + 512 [dd]
  8,0    3   177905    50.036715133     0  C   R 8723199 + 512 [0]
  8,0    3   177906    50.036971296  4148  A   R 8723711 + 256 <- (8,1) 8723648
  8,0    3   177907    50.036972136  4148  Q   R 8723711 + 256 [dd]
  8,0    3   177908    50.036975673  4148  G   R 8723711 + 256 [dd]
  8,0    3   177909    50.036978277  4148  P   N [dd]
  8,0    3   177910    50.036979450  4148  I   R 8723711 + 256 [dd]
  8,0    3   177911    50.037162429  4148  A   R 8723967 + 256 <- (8,1) 8723904
  8,0    3   177912    50.037162840  4148  Q   R 8723967 + 256 [dd]
  8,0    3   177913    50.037164967  4148  M   R 8723967 + 256 [dd]
  8,0    3   177914    50.037167223  4148  U   N [dd] 1
  8,0    3   177915    50.037170001  4148  D   R 8723711 + 512 [dd]
  8,0    3   177916    50.040521790     0  C   R 8723711 + 512 [0]
  8,0    3   177917    50.040729738  4148  A   R 8724223 + 256 <- (8,1) 8724160
  8,0    3   177918    50.040730200  4148  Q   R 8724223 + 256 [dd]
  8,0    3   177919    50.040732060  4148  G   R 8724223 + 256 [dd]
  8,0    3   177920    50.040733551  4148  P   N [dd]
  8,0    3   177921    50.040734109  4148  I   R 8724223 + 256 [dd]
  8,0    3   177922    50.040860173  4148  A   R 8724479 + 160 <- (8,1) 8724416
  8,0    3   177923    50.040860536  4148  Q   R 8724479 + 160 [dd]
  8,0    3   177924    50.040861517  4148  M   R 8724479 + 160 [dd]
  8,0    3   177925    50.040872542  4148  A   R 1055943 + 8 <- (8,1) 1055880
  8,0    3   177926    50.040872800  4148  Q   R 1055943 + 8 [dd]
  8,0    3   177927    50.040874849  4148  G   R 1055943 + 8 [dd]
  8,0    3   177928    50.040875485  4148  I   R 1055943 + 8 [dd]
  8,0    3   177929    50.040877045  4148  U   N [dd] 2
  8,0    3   177930    50.040878625  4148  D   R 8724223 + 416 [dd]
  8,0    3   177931    50.040895335  4148  D   R 1055943 + 8 [dd]
  8,0    3   177932    50.044383267     0  C   R 8724223 + 416 [0]
  8,0    3   177933    50.044704725     0  C   R 1055943 + 8 [0]
  8,0    3   177934    50.044749068  4148  A   R 8724639 + 96 <- (8,1) 8724576
  8,0    3   177935    50.044749472  4148  Q   R 8724639 + 96 [dd]
  8,0    3   177936    50.044752184  4148  G   R 8724639 + 96 [dd]
  8,0    3   177937    50.044753552  4148  P   N [dd]
  8,0    3   177938    50.044754032  4148  I   R 8724639 + 96 [dd]
  8,0    3   177939    50.044896095  4148  A   R 8724735 + 256 <- (8,1) 8724672
  8,0    3   177940    50.044896443  4148  Q   R 8724735 + 256 [dd]
  8,0    3   177941    50.044897538  4148  M   R 8724735 + 256 [dd]
  8,0    3   177942    50.044948546  4148  U   N [dd] 1
  8,0    3   177943    50.044950001  4148  D   R 8724639 + 352 [dd]
  8,0    3   177944    50.047150137     0  C   R 8724639 + 352 [0]
  8,0    3   177945    50.047294824  4148  A   R 8724991 + 256 <- (8,1) 8724928
  8,0    3   177946    50.047295142  4148  Q   R 8724991 + 256 [dd]
  8,0    3   177947    50.047296978  4148  G   R 8724991 + 256 [dd]
  8,0    3   177948    50.047298301  4148  P   N [dd]
  8,0    3   177949    50.047298637  4148  I   R 8724991 + 256 [dd]
  8,0    3   177950    50.047429027  4148  A   R 8725247 + 256 <- (8,1) 8725184
  8,0    3   177951    50.047429387  4148  Q   R 8725247 + 256 [dd]
  8,0    3   177952    50.047430479  4148  M   R 8725247 + 256 [dd]
  8,0    3   177953    50.047431736  4148  U   N [dd] 1
  8,0    3   177954    50.047432951  4148  D   R 8724991 + 512 [dd]
  8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
  8,0    3   177956    50.050507961  4148  A   R 8725503 + 256 <- (8,1) 8725440
  8,0    3   177957    50.050508273  4148  Q   R 8725503 + 256 [dd]
  8,0    3   177958    50.050510139  4148  G   R 8725503 + 256 [dd]
  8,0    3   177959    50.050511522  4148  P   N [dd]
  8,0    3   177960    50.050512062  4148  I   R 8725503 + 256 [dd]
  8,0    3   177961    50.050645393  4148  A   R 8725759 + 256 <- (8,1) 8725696
  8,0    3   177962    50.050645867  4148  Q   R 8725759 + 256 [dd]
  8,0    3   177963    50.050647171  4148  M   R 8725759 + 256 [dd]
  8,0    3   177964    50.050648593  4148  U   N [dd] 1
  8,0    3   177965    50.050649985  4148  D   R 8725503 + 512 [dd]
  8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
  8,0    3   177967    50.053576324  4148  A   R 8726015 + 256 <- (8,1) 8725952
  8,0    3   177968    50.053576615  4148  Q   R 8726015 + 256 [dd]
  8,0    3   177969    50.053578994  4148  G   R 8726015 + 256 [dd]
  8,0    3   177970    50.053580173  4148  P   N [dd]
  8,0    3   177971    50.053580509  4148  I   R 8726015 + 256 [dd]
  8,0    3   177972    50.053711503  4148  A   R 8726271 + 256 <- (8,1) 8726208
  8,0    3   177973    50.053712001  4148  Q   R 8726271 + 256 [dd]
  8,0    3   177974    50.053713332  4148  M   R 8726271 + 256 [dd]
  8,0    3   177975    50.053714583  4148  U   N [dd] 1
  8,0    3   177976    50.053715768  4148  D   R 8726015 + 512 [dd]
  8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
  8,0    3   177978    50.057161408  4148  A   R 8726527 + 256 <- (8,1) 8726464
  8,0    3   177979    50.057161726  4148  Q   R 8726527 + 256 [dd]
  8,0    3   177980    50.057163718  4148  G   R 8726527 + 256 [dd]
  8,0    3   177981    50.057165098  4148  P   N [dd]
  8,0    3   177982    50.057165431  4148  I   R 8726527 + 256 [dd]
  8,0    3   177983    50.057294630  4148  A   R 8726783 + 256 <- (8,1) 8726720
  8,0    3   177984    50.057294990  4148  Q   R 8726783 + 256 [dd]
  8,0    3   177985    50.057296070  4148  M   R 8726783 + 256 [dd]
  8,0    3   177986    50.057297402  4148  U   N [dd] 1
  8,0    3   177987    50.057298899  4148  D   R 8726527 + 512 [dd]
  8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
  8,0    3   177989    50.060523768  4148  A   R 8727039 + 256 <- (8,1) 8726976
  8,0    3   177990    50.060524095  4148  Q   R 8727039 + 256 [dd]
  8,0    3   177991    50.060525910  4148  G   R 8727039 + 256 [dd]
  8,0    3   177992    50.060527239  4148  P   N [dd]
  8,0    3   177993    50.060527575  4148  I   R 8727039 + 256 [dd]
  8,0    3   177994    50.060662280  4148  A   R 8727295 + 256 <- (8,1) 8727232
  8,0    3   177995    50.060662778  4148  Q   R 8727295 + 256 [dd]
  8,0    3   177996    50.060663993  4148  M   R 8727295 + 256 [dd]
  8,0    3   177997    50.060665403  4148  U   N [dd] 1
  8,0    3   177998    50.060666999  4148  D   R 8727039 + 512 [dd]
  8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]
  8,0    3   178000    50.064113177  4148  A   R 8727551 + 256 <- (8,1) 8727488
  8,0    3   178001    50.064113492  4148  Q   R 8727551 + 256 [dd]
  8,0    3   178002    50.064115373  4148  G   R 8727551 + 256 [dd]

-2.6.30-rc6-patched
  8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
  8,0    3   257298    50.000944399  4139  A   R 9481215 + 256 <- (8,1) 9481152
  8,0    3   257299    50.000944693  4139  Q   R 9481215 + 256 [dd]
  8,0    3   257300    50.000946541  4139  G   R 9481215 + 256 [dd]
  8,0    3   257301    50.000947954  4139  P   N [dd]
  8,0    3   257302    50.000948368  4139  I   R 9481215 + 256 [dd]
  8,0    3   257303    50.000948920  4139  U   N [dd] 2
  8,0    3   257304    50.000950003  4139  D   R 9481215 + 256 [dd]
  8,0    3   257305    50.000962541  4139  U   N [dd] 2
  8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
  8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
  8,0    3   257308    50.003258111  4139  A   R 9481471 + 256 <- (8,1) 9481408
  8,0    3   257309    50.003258402  4139  Q   R 9481471 + 256 [dd]
  8,0    3   257310    50.003260190  4139  G   R 9481471 + 256 [dd]
  8,0    3   257311    50.003261399  4139  P   N [dd]
  8,0    3   257312    50.003261768  4139  I   R 9481471 + 256 [dd]
  8,0    3   257313    50.003262335  4139  U   N [dd] 1
  8,0    3   257314    50.003263406  4139  D   R 9481471 + 256 [dd]
  8,0    3   257315    50.003430472  4139  A   R 9481727 + 256 <- (8,1) 9481664
  8,0    3   257316    50.003430748  4139  Q   R 9481727 + 256 [dd]
  8,0    3   257317    50.003433065  4139  G   R 9481727 + 256 [dd]
  8,0    3   257318    50.003434343  4139  P   N [dd]
  8,0    3   257319    50.003434658  4139  I   R 9481727 + 256 [dd]
  8,0    3   257320    50.003435138  4139  U   N [dd] 2
  8,0    3   257321    50.003436083  4139  D   R 9481727 + 256 [dd]
  8,0    3   257322    50.003447795  4139  U   N [dd] 2
  8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
  8,0    3   257324    50.004959499  4139  A   R 9481983 + 256 <- (8,1) 9481920
  8,0    3   257325    50.004959790  4139  Q   R 9481983 + 256 [dd]
  8,0    3   257326    50.004961590  4139  G   R 9481983 + 256 [dd]
  8,0    3   257327    50.004962793  4139  P   N [dd]
  8,0    3   257328    50.004963153  4139  I   R 9481983 + 256 [dd]
  8,0    3   257329    50.004964098  4139  U   N [dd] 2
  8,0    3   257330    50.004965184  4139  D   R 9481983 + 256 [dd]
  8,0    3   257331    50.004978967  4139  U   N [dd] 2
  8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]
  8,0    3   257333    50.007052043  4139  A   R 9482239 + 256 <- (8,1) 9482176
  8,0    3   257334    50.007052331  4139  Q   R 9482239 + 256 [dd]
  8,0    3   257335    50.007054146  4139  G   R 9482239 + 256 [dd]
  8,0    3   257336    50.007055355  4139  P   N [dd]
  8,0    3   257337    50.007055724  4139  I   R 9482239 + 256 [dd]
  8,0    3   257338    50.007056438  4139  U   N [dd] 2
  8,0    3   257339    50.007057605  4139  D   R 9482239 + 256 [dd]
  8,0    3   257340    50.007069963  4139  U   N [dd] 2
  8,0    3   257341    50.008250294     0  C   R 9481983 + 256 [0]
  8,0    3   257342    50.008431589  4139  A   R 9482495 + 256 <- (8,1) 9482432
  8,0    3   257343    50.008431881  4139  Q   R 9482495 + 256 [dd]
  8,0    3   257344    50.008433921  4139  G   R 9482495 + 256 [dd]
  8,0    3   257345    50.008435097  4139  P   N [dd]
  8,0    3   257346    50.008435466  4139  I   R 9482495 + 256 [dd]
  8,0    3   257347    50.008436213  4139  U   N [dd] 2
  8,0    3   257348    50.008437296  4139  D   R 9482495 + 256 [dd]
  8,0    3   257349    50.008450034  4139  U   N [dd] 2
  8,0    3   257350    50.010008843     0  C   R 9482239 + 256 [0]
  8,0    3   257351    50.010135287  4139  C   R 9482495 + 256 [0]
  8,0    3   257352    50.010226816  4139  A   R 9482751 + 256 <- (8,1) 9482688
  8,0    3   257353    50.010227107  4139  Q   R 9482751 + 256 [dd]
  8,0    3   257354    50.010229363  4139  G   R 9482751 + 256 [dd]
  8,0    3   257355    50.010230728  4139  P   N [dd]
  8,0    3   257356    50.010231097  4139  I   R 9482751 + 256 [dd]
  8,0    3   257357    50.010231655  4139  U   N [dd] 1
  8,0    3   257358    50.010232696  4139  D   R 9482751 + 256 [dd]
  8,0    3   257359    50.010380946  4139  A   R 9483007 + 256 <- (8,1) 9482944
  8,0    3   257360    50.010381264  4139  Q   R 9483007 + 256 [dd]
  8,0    3   257361    50.010383358  4139  G   R 9483007 + 256 [dd]
  8,0    3   257362    50.010384429  4139  P   N [dd]
  8,0    3   257363    50.010384741  4139  I   R 9483007 + 256 [dd]
  8,0    3   257364    50.010385395  4139  U   N [dd] 2
  8,0    3   257365    50.010386364  4139  D   R 9483007 + 256 [dd]
  8,0    3   257366    50.010397869  4139  U   N [dd] 2
  8,0    3   257367    50.014210132     0  C   R 9482751 + 256 [0]
  8,0    3   257368    50.014252938     0  C   R 9483007 + 256 [0]
  8,0    3   257369    50.014430811  4139  A   R 9483263 + 256 <- (8,1) 9483200
  8,0    3   257370    50.014431105  4139  Q   R 9483263 + 256 [dd]
  8,0    3   257371    50.014433139  4139  G   R 9483263 + 256 [dd]
  8,0    3   257372    50.014434520  4139  P   N [dd]
  8,0    3   257373    50.014435110  4139  I   R 9483263 + 256 [dd]
  8,0    3   257374    50.014435674  4139  U   N [dd] 1
  8,0    3   257375    50.014436770  4139  D   R 9483263 + 256 [dd]
  8,0    3   257376    50.014592117  4139  A   R 9483519 + 256 <- (8,1) 9483456
  8,0    3   257377    50.014592573  4139  Q   R 9483519 + 256 [dd]
  8,0    3   257378    50.014594391  4139  G   R 9483519 + 256 [dd]
  8,0    3   257379    50.014595504  4139  P   N [dd]
  8,0    3   257380    50.014595876  4139  I   R 9483519 + 256 [dd]
  8,0    3   257381    50.014596366  4139  U   N [dd] 2
  8,0    3   257382    50.014597368  4139  D   R 9483519 + 256 [dd]
  8,0    3   257383    50.014609521  4139  U   N [dd] 2
  8,0    3   257384    50.015937813     0  C   R 9483263 + 256 [0]
  8,0    3   257385    50.016124825  4139  A   R 9483775 + 256 <- (8,1) 9483712
  8,0    3   257386    50.016125116  4139  Q   R 9483775 + 256 [dd]
  8,0    3   257387    50.016127162  4139  G   R 9483775 + 256 [dd]
  8,0    3   257388    50.016128569  4139  P   N [dd]
  8,0    3   257389    50.016128983  4139  I   R 9483775 + 256 [dd]
  8,0    3   257390    50.016129538  4139  U   N [dd] 2
  8,0    3   257391    50.016130627  4139  D   R 9483775 + 256 [dd]
  8,0    3   257392    50.016143077  4139  U   N [dd] 2
  8,0    3   257393    50.016925304     0  C   R 9483519 + 256 [0]
  8,0    3   257394    50.017111307  4139  A   R 9484031 + 256 <- (8,1) 9483968
  8,0    3   257395    50.017111598  4139  Q   R 9484031 + 256 [dd]
  8,0    3   257396    50.017113410  4139  G   R 9484031 + 256 [dd]
  8,0    3   257397    50.017114835  4139  P   N [dd]
  8,0    3   257398    50.017115213  4139  I   R 9484031 + 256 [dd]
  8,0    3   257399    50.017115765  4139  U   N [dd] 2
  8,0    3   257400    50.017116839  4139  D   R 9484031 + 256 [dd]
  8,0    3   257401    50.017129023  4139  U   N [dd] 2
  8,0    3   257402    50.017396693     0  C   R 9483775 + 256 [0]
  8,0    3   257403    50.017584595  4139  A   R 9484287 + 256 <- (8,1) 9484224
  8,0    3   257404    50.017585018  4139  Q   R 9484287 + 256 [dd]
  8,0    3   257405    50.017586866  4139  G   R 9484287 + 256 [dd]
  8,0    3   257406    50.017587997  4139  P   N [dd]
  8,0    3   257407    50.017588393  4139  I   R 9484287 + 256 [dd]
  8,0    3   257408    50.017589105  4139  U   N [dd] 2
  8,0    3   257409    50.017590173  4139  D   R 9484287 + 256 [dd]
  8,0    3   257410    50.017602614  4139  U   N [dd] 2
  8,0    3   257411    50.020578876     0  C   R 9484031 + 256 [0]
  8,0    3   257412    50.020721857  4139  C   R 9484287 + 256 [0]
  8,0    3   257413    50.020803183  4139  A   R 9484543 + 256 <- (8,1) 9484480
  8,0    3   257414    50.020803507  4139  Q   R 9484543 + 256 [dd]
  8,0    3   257415    50.020805256  4139  G   R 9484543 + 256 [dd]
  8,0    3   257416    50.020806672  4139  P   N [dd]
  8,0    3   257417    50.020807065  4139  I   R 9484543 + 256 [dd]
  8,0    3   257418    50.020807668  4139  U   N [dd] 1
  8,0    3   257419    50.020808733  4139  D   R 9484543 + 256 [dd]
  8,0    3   257420    50.020957132  4139  A   R 9484799 + 256 <- (8,1) 9484736
  8,0    3   257421    50.020957423  4139  Q   R 9484799 + 256 [dd]
  8,0    3   257422    50.020959205  4139  G   R 9484799 + 256 [dd]
  8,0    3   257423    50.020960276  4139  P   N [dd]
  8,0    3   257424    50.020960594  4139  I   R 9484799 + 256 [dd]
  8,0    3   257425    50.020961062  4139  U   N [dd] 2
  8,0    3   257426    50.020961959  4139  D   R 9484799 + 256 [dd]
  8,0    3   257427    50.020974191  4139  U   N [dd] 2
  8,0    3   257428    50.023987847     0  C   R 9484543 + 256 [0]
  8,0    3   257429    50.024093062  4139  C   R 9484799 + 256 [0]
  8,0    3   257430    50.024207161  4139  A   R 9485055 + 256 <- (8,1) 9484992
  8,0    3   257431    50.024207434  4139  Q   R 9485055 + 256 [dd]
  8,0    3   257432    50.024209567  4139  G   R 9485055 + 256 [dd]
  8,0    3   257433    50.024210728  4139  P   N [dd]
  8,0    3   257434    50.024211097  4139  I   R 9485055 + 256 [dd]
  8,0    3   257435    50.024211661  4139  U   N [dd] 1
  8,0    3   257436    50.024212693  4139  D   R 9485055 + 256 [dd]
  8,0    3   257437    50.024359266  4139  A   R 9485311 + 256 <- (8,1) 9485248
  8,0    3   257438    50.024359584  4139  Q   R 9485311 + 256 [dd]
  8,0    3   257439    50.024361720  4139  G   R 9485311 + 256 [dd]
  8,0    3   257440    50.024362794  4139  P   N [dd]
  8,0    3   257441    50.024363106  4139  I   R 9485311 + 256 [dd]
  8,0    3   257442    50.024363760  4139  U   N [dd] 2
  8,0    3   257443    50.024364759  4139  D   R 9485311 + 256 [dd]
  8,0    3   257444    50.024376535  4139  U   N [dd] 2
  8,0    3   257445    50.026532544     0  C   R 9485055 + 256 [0]
  8,0    3   257446    50.026714236  4139  A   R 9485567 + 256 <- (8,1) 9485504
  8,0    3   257447    50.026714524  4139  Q   R 9485567 + 256 [dd]
  8,0    3   257448    50.026716354  4139  G   R 9485567 + 256 [dd]
  8,0    3   257449    50.026717791  4139  P   N [dd]
  8,0    3   257450    50.026718175  4139  I   R 9485567 + 256 [dd]
  8,0    3   257451    50.026718778  4139  U   N [dd] 2
  8,0    3   257452    50.026719876  4139  D   R 9485567 + 256 [dd]
  8,0    3   257453    50.026736383  4139  U   N [dd] 2
  8,0    3   257454    50.028531879     0  C   R 9485311 + 256 [0]
  8,0    3   257455    50.028684347  4139  C   R 9485567 + 256 [0]
  8,0    3   257456    50.028758787  4139  A   R 9485823 + 256 <- (8,1) 9485760
  8,0    3   257457    50.028759069  4139  Q   R 9485823 + 256 [dd]
  8,0    3   257458    50.028760884  4139  G   R 9485823 + 256 [dd]
  8,0    3   257459    50.028762099  4139  P   N [dd]
  8,0    3   257460    50.028762447  4139  I   R 9485823 + 256 [dd]
  8,0    3   257461    50.028763038  4139  U   N [dd] 1
  8,0    3   257462    50.028764268  4139  D   R 9485823 + 256 [dd]
  8,0    3   257463    50.028909841  4139  A   R 9486079 + 256 <- (8,1) 9486016
  8,0    3   257464    50.028910156  4139  Q   R 9486079 + 256 [dd]
  8,0    3   257465    50.028911896  4139  G   R 9486079 + 256 [dd]
  8,0    3   257466    50.028912964  4139  P   N [dd]
  8,0    3   257467    50.028913270  4139  I   R 9486079 + 256 [dd]
  8,0    3   257468    50.028913912  4139  U   N [dd] 2
  8,0    3   257469    50.028914878  4139  D   R 9486079 + 256 [dd]
  8,0    3   257470    50.028927497  4139  U   N [dd] 2
  8,0    3   257471    50.031158357     0  C   R 9485823 + 256 [0]
  8,0    3   257472    50.031292365  4139  C   R 9486079 + 256 [0]
  8,0    3   257473    50.031369697  4139  A   R 9486335 + 160 <- (8,1) 9486272
  8,0    3   257474    50.031369988  4139  Q   R 9486335 + 160 [dd]
  8,0    3   257475    50.031371779  4139  G   R 9486335 + 160 [dd]
  8,0    3   257476    50.031372850  4139  P   N [dd]
  8,0    3   257477    50.031373198  4139  I   R 9486335 + 160 [dd]
  8,0    3   257478    50.031384931  4139  A   R 1056639 + 8 <- (8,1) 1056576
  8,0    3   257479    50.031385201  4139  Q   R 1056639 + 8 [dd]
  8,0    3   257480    50.031388480  4139  G   R 1056639 + 8 [dd]
  8,0    3   257481    50.031388904  4139  I   R 1056639 + 8 [dd]
  8,0    3   257482    50.031390362  4139  U   N [dd] 2
  8,0    3   257483    50.031391523  4139  D   R 9486335 + 160 [dd]
  8,0    3   257484    50.031403403  4139  D   R 1056639 + 8 [dd]
  8,0    3   257485    50.033630747     0  C   R 1056639 + 8 [0]
  8,0    3   257486    50.033690300  4139  A   R 9486495 + 96 <- (8,1) 9486432
  8,0    3   257487    50.033690810  4139  Q   R 9486495 + 96 [dd]
  8,0    3   257488    50.033694581  4139  G   R 9486495 + 96 [dd]
  8,0    3   257489    50.033696739  4139  P   N [dd]
  8,0    3   257490    50.033697357  4139  I   R 9486495 + 96 [dd]
  8,0    3   257491    50.033698611  4139  U   N [dd] 2
  8,0    3   257492    50.033700945  4139  D   R 9486495 + 96 [dd]
  8,0    3   257493    50.033727763  4139  C   R 9486335 + 160 [0]
  8,0    3   257494    50.033996024  4139  A   R 9486591 + 256 <- (8,1) 9486528
  8,0    3   257495    50.033996396  4139  Q   R 9486591 + 256 [dd]
  8,0    3   257496    50.034000030  4139  G   R 9486591 + 256 [dd]
  8,0    3   257497    50.034002268  4139  P   N [dd]
  8,0    3   257498    50.034002820  4139  I   R 9486591 + 256 [dd]
  8,0    3   257499    50.034003924  4139  U   N [dd] 2
  8,0    3   257500    50.034006201  4139  D   R 9486591 + 256 [dd]
  8,0    3   257501    50.034091438  4139  U   N [dd] 2
  8,0    3   257502    50.034637372     0  C   R 9486495 + 96 [0]
  8,0    3   257503    50.034841508  4139  A   R 9486847 + 256 <- (8,1) 9486784
  8,0    3   257504    50.034842072  4139  Q   R 9486847 + 256 [dd]
  8,0    3   257505    50.034846117  4139  G   R 9486847 + 256 [dd]
  8,0    3   257506    50.034848676  4139  P   N [dd]
  8,0    3   257507    50.034849384  4139  I   R 9486847 + 256 [dd]
  8,0    3   257508    50.034850545  4139  U   N [dd] 2
  8,0    3   257509    50.034852795  4139  D   R 9486847 + 256 [dd]
  8,0    3   257510    50.034875503  4139  U   N [dd] 2
  8,0    3   257511    50.035370009     0  C   R 9486591 + 256 [0]
  8,0    3   257512    50.035622315  4139  A   R 9487103 + 256 <- (8,1) 9487040
  8,0    3   257513    50.035622954  4139  Q   R 9487103 + 256 [dd]
  8,0    3   257514    50.035627101  4139  G   R 9487103 + 256 [dd]
  8,0    3   257515    50.035629510  4139  P   N [dd]
  8,0    3   257516    50.035630143  4139  I   R 9487103 + 256 [dd]
  8,0    3   257517    50.035631058  4139  U   N [dd] 2
  8,0    3   257518    50.035632657  4139  D   R 9487103 + 256 [dd]
  8,0    3   257519    50.035656358  4139  U   N [dd] 2
  8,0    3   257520    50.036703329     0  C   R 9486847 + 256 [0]
  8,0    3   257521    50.036963604  4139  A   R 9487359 + 256 <- (8,1) 9487296
  8,0    3   257522    50.036964057  4139  Q   R 9487359 + 256 [dd]
  8,0    3   257523    50.036967636  4139  G   R 9487359 + 256 [dd]
  8,0    3   257524    50.036969710  4139  P   N [dd]
  8,0    3   257525    50.036970586  4139  I   R 9487359 + 256 [dd]
  8,0    3   257526    50.036971684  4139  U   N [dd] 2
  8,0    3   257527    50.036973631  4139  D   R 9487359 + 256 [dd]
  8,0    3   257528    50.036995034  4139  U   N [dd] 2
  8,0    3   257529    50.038904428     0  C   R 9487103 + 256 [0]
  8,0    3   257530    50.039161508  4139  A   R 9487615 + 256 <- (8,1) 9487552
  8,0    3   257531    50.039161934  4139  Q   R 9487615 + 256 [dd]
  8,0    3   257532    50.039165834  4139  G   R 9487615 + 256 [dd]
  8,0    3   257533    50.039168561  4139  P   N [dd]
  8,0    3   257534    50.039169353  4139  I   R 9487615 + 256 [dd]
  8,0    3   257535    50.039170343  4139  U   N [dd] 2
  8,0    3   257536    50.039171645  4139  D   R 9487615 + 256 [dd]
  8,0    3   257537    50.039193195  4139  U   N [dd] 2
  8,0    3   257538    50.040570003     0  C   R 9487359 + 256 [0]
  8,0    3   257539    50.040842161  4139  A   R 9487871 + 256 <- (8,1) 9487808
  8,0    3   257540    50.040842827  4139  Q   R 9487871 + 256 [dd]
  8,0    3   257541    50.040846803  4139  G   R 9487871 + 256 [dd]
  8,0    3   257542    50.040849902  4139  P   N [dd]
  8,0    3   257543    50.040850715  4139  I   R 9487871 + 256 [dd]
  8,0    3   257544    50.040851642  4139  U   N [dd] 2
  8,0    3   257545    50.040853658  4139  D   R 9487871 + 256 [dd]
  8,0    3   257546    50.040876270  4139  U   N [dd] 2
  8,0    3   257547    50.042081391     0  C   R 9487615 + 256 [0]
  8,0    3   257548    50.042215837  4139  C   R 9487871 + 256 [0]
  8,0    3   257549    50.042316192  4139  A   R 9488127 + 256 <- (8,1) 9488064
  8,0    3   257550    50.042316633  4139  Q   R 9488127 + 256 [dd]
  8,0    3   257551    50.042319213  4139  G   R 9488127 + 256 [dd]
  8,0    3   257552    50.042320803  4139  P   N [dd]
  8,0    3   257553    50.042321412  4139  I   R 9488127 + 256 [dd]
  8,0    3   257554    50.042322219  4139  U   N [dd] 1
  8,0    3   257555    50.042323362  4139  D   R 9488127 + 256 [dd]
  8,0    3   257556    50.042484350  4139  A   R 9488383 + 256 <- (8,1) 9488320
  8,0    3   257557    50.042484602  4139  Q   R 9488383 + 256 [dd]
  8,0    3   257558    50.042486744  4139  G   R 9488383 + 256 [dd]
  8,0    3   257559    50.042487908  4139  P   N [dd]
  8,0    3   257560    50.042488223  4139  I   R 9488383 + 256 [dd]
  8,0    3   257561    50.042488754  4139  U   N [dd] 2
  8,0    3   257562    50.042489927  4139  D   R 9488383 + 256 [dd]
  8,0    3   257563    50.042502678  4139  U   N [dd] 2
  8,0    3   257564    50.045166592     0  C   R 9488127 + 256 [0]
  8,0    3   257565    50.045355163  4139  A   R 9488639 + 256 <- (8,1) 9488576
  8,0    3   257566    50.045355493  4139  Q   R 9488639 + 256 [dd]
  8,0    3   257567    50.045357497  4139  G   R 9488639 + 256 [dd]
  8,0    3   257568    50.045358673  4139  P   N [dd]
  8,0    3   257569    50.045359267  4139  I   R 9488639 + 256 [dd]
  8,0    3   257570    50.045359831  4139  U   N [dd] 2
  8,0    3   257571    50.045360911  4139  D   R 9488639 + 256 [dd]
  8,0    3   257572    50.045373959  4139  U   N [dd] 2
  8,0    3   257573    50.046450730     0  C   R 9488383 + 256 [0]
  8,0    3   257574    50.046641639  4139  A   R 9488895 + 256 <- (8,1) 9488832
  8,0    3   257575    50.046642086  4139  Q   R 9488895 + 256 [dd]
  8,0    3   257576    50.046643937  4139  G   R 9488895 + 256 [dd]
  8,0    3   257577    50.046645092  4139  P   N [dd]
  8,0    3   257578    50.046645527  4139  I   R 9488895 + 256 [dd]
  8,0    3   257579    50.046646244  4139  U   N [dd] 2
  8,0    3   257580    50.046647327  4139  D   R 9488895 + 256 [dd]
  8,0    3   257581    50.046660234  4139  U   N [dd] 2
  8,0    3   257582    50.047826305     0  C   R 9488639 + 256 [0]
  8,0    3   257583    50.048011468  4139  A   R 9489151 + 256 <- (8,1) 9489088
  8,0    3   257584    50.048011762  4139  Q   R 9489151 + 256 [dd]
  8,0    3   257585    50.048013793  4139  G   R 9489151 + 256 [dd]
  8,0    3   257586    50.048014966  4139  P   N [dd]
  8,0    3   257587    50.048015380  4139  I   R 9489151 + 256 [dd]
  8,0    3   257588    50.048016112  4139  U   N [dd] 2
  8,0    3   257589    50.048017202  4139  D   R 9489151 + 256 [dd]
  8,0    3   257590    50.048029553  4139  U   N [dd] 2
  8,0    3   257591    50.049319830     0  C   R 9488895 + 256 [0]
  8,0    3   257592    50.049446089  4139  C   R 9489151 + 256 [0]
  8,0    3   257593    50.049545199  4139  A   R 9489407 + 256 <- (8,1) 9489344
  8,0    3   257594    50.049545628  4139  Q   R 9489407 + 256 [dd]
  8,0    3   257595    50.049547512  4139  G   R 9489407 + 256 [dd]
  8,0    3   257596    50.049548886  4139  P   N [dd]
  8,0    3   257597    50.049549318  4139  I   R 9489407 + 256 [dd]
  8,0    3   257598    50.049550047  4139  U   N [dd] 1
  8,0    3   257599    50.049551241  4139  D   R 9489407 + 256 [dd]
  8,0    3   257600    50.049699283  4139  A   R 9489663 + 256 <- (8,1) 9489600
  8,0    3   257601    50.049699556  4139  Q   R 9489663 + 256 [dd]
  8,0    3   257602    50.049701266  4139  G   R 9489663 + 256 [dd]
  8,0    3   257603    50.049702310  4139  P   N [dd]
  8,0    3   257604    50.049702656  4139  I   R 9489663 + 256 [dd]
  8,0    3   257605    50.049703118  4139  U   N [dd] 2
  8,0    3   257606    50.049704020  4139  D   R 9489663 + 256 [dd]
  8,0    3   257607    50.049715940  4139  U   N [dd] 2
  8,0    3   257608    50.052662150     0  C   R 9489407 + 256 [0]
  8,0    3   257609    50.052853688  4139  A   R 9489919 + 256 <- (8,1) 9489856
  8,0    3   257610    50.052853985  4139  Q   R 9489919 + 256 [dd]
  8,0    3   257611    50.052855869  4139  G   R 9489919 + 256 [dd]
  8,0    3   257612    50.052857057  4139  P   N [dd]
  8,0    3   257613    50.052857423  4139  I   R 9489919 + 256 [dd]
  8,0    3   257614    50.052858065  4139  U   N [dd] 2
  8,0    3   257615    50.052859164  4139  D   R 9489919 + 256 [dd]
  8,0    3   257616    50.052871806  4139  U   N [dd] 2
  8,0    3   257617    50.053470795     0  C   R 9489663 + 256 [0]
  8,0    3   257618    50.053661719  4139  A   R 9490175 + 256 <- (8,1) 9490112
  8,0    3   257619    50.053662097  4139  Q   R 9490175 + 256 [dd]
  8,0    3   257620    50.053663891  4139  G   R 9490175 + 256 [dd]
  8,0    3   257621    50.053665034  4139  P   N [dd]
  8,0    3   257622    50.053665436  4139  I   R 9490175 + 256 [dd]
  8,0    3   257623    50.053665982  4139  U   N [dd] 2
  8,0    3   257624    50.053667077  4139  D   R 9490175 + 256 [dd]
  8,0    3   257625    50.053679732  4139  U   N [dd] 2
  8,0    3   257626    50.055776383     0  C   R 9489919 + 256 [0]
  8,0    3   257627    50.055915017  4139  C   R 9490175 + 256 [0]
  8,0    3   257628    50.055997812  4139  A   R 9490431 + 256 <- (8,1) 9490368
  8,0    3   257629    50.055998085  4139  Q   R 9490431 + 256 [dd]
  8,0    3   257630    50.055999867  4139  G   R 9490431 + 256 [dd]
  8,0    3   257631    50.056001049  4139  P   N [dd]
  8,0    3   257632    50.056001451  4139  I   R 9490431 + 256 [dd]
  8,0    3   257633    50.056002189  4139  U   N [dd] 1
  8,0    3   257634    50.056003197  4139  D   R 9490431 + 256 [dd]
  8,0    3   257635    50.056149977  4139  A   R 9490687 + 256 <- (8,1) 9490624
  8,0    3   257636    50.056150279  4139  Q   R 9490687 + 256 [dd]
  8,0    3   257637    50.056152047  4139  G   R 9490687 + 256 [dd]
  8,0    3   257638    50.056153109  4139  P   N [dd]
  8,0    3   257639    50.056153442  4139  I   R 9490687 + 256 [dd]
  8,0    3   257640    50.056153904  4139  U   N [dd] 2
  8,0    3   257641    50.056154852  4139  D   R 9490687 + 256 [dd]
  8,0    3   257642    50.056166948  4139  U   N [dd] 2
  8,0    3   257643    50.057600660     0  C   R 9490431 + 256 [0]
  8,0    3   257644    50.057786753  4139  A   R 9490943 + 256 <- (8,1) 9490880
  8,0    3   257645    50.057787050  4139  Q   R 9490943 + 256 [dd]
  8,0    3   257646    50.057788865  4139  G   R 9490943 + 256 [dd]
  8,0    3   257647    50.057790236  4139  P   N [dd]
  8,0    3   257648    50.057790614  4139  I   R 9490943 + 256 [dd]
  8,0    3   257649    50.057791169  4139  U   N [dd] 2
  8,0    3   257650    50.057792246  4139  D   R 9490943 + 256 [dd]
  8,0    3   257651    50.057804469  4139  U   N [dd] 2
  8,0    3   257652    50.060322995     0  C   R 9490687 + 256 [0]
  8,0    3   257653    50.060464005  4139  C   R 9490943 + 256 [0]
  8,0    3   257654    50.060548216  4139  A   R 9491199 + 256 <- (8,1) 9491136
  8,0    3   257655    50.060548696  4139  Q   R 9491199 + 256 [dd]
  8,0    3   257656    50.060550922  4139  G   R 9491199 + 256 [dd]
  8,0    3   257657    50.060552096  4139  P   N [dd]
  8,0    3   257658    50.060552531  4139  I   R 9491199 + 256 [dd]
  8,0    3   257659    50.060553101  4139  U   N [dd] 1
  8,0    3   257660    50.060554100  4139  D   R 9491199 + 256 [dd]
  8,0    3   257661    50.060701569  4139  A   R 9491455 + 256 <- (8,1) 9491392
  8,0    3   257662    50.060701890  4139  Q   R 9491455 + 256 [dd]
  8,0    3   257663    50.060703993  4139  G   R 9491455 + 256 [dd]
  8,0    3   257664    50.060705070  4139  P   N [dd]
  8,0    3   257665    50.060705385  4139  I   R 9491455 + 256 [dd]
  8,0    3   257666    50.060706012  4139  U   N [dd] 2
  8,0    3   257667    50.060706987  4139  D   R 9491455 + 256 [dd]
  8,0    3   257668    50.060718784  4139  U   N [dd] 2
  8,0    3   257669    50.062964966     0  C   R 9491199 + 256 [0]
  8,0    3   257670    50.063102772  4139  C   R 9491455 + 256 [0]
  8,0    3   257671    50.063182666  4139  A   R 9491711 + 256 <- (8,1) 9491648
  8,0    3   257672    50.063182939  4139  Q   R 9491711 + 256 [dd]
  8,0    3   257673    50.063184889  4139  G   R 9491711 + 256 [dd]
  8,0    3   257674    50.063186074  4139  P   N [dd]
  8,0    3   257675    50.063186440  4139  I   R 9491711 + 256 [dd]
  8,0    3   257676    50.063187271  4139  U   N [dd] 1
  8,0    3   257677    50.063188312  4139  D   R 9491711 + 256 [dd]
  8,0    3   257678    50.063340467  4139  A   R 9491967 + 256 <- (8,1) 9491904
  8,0    3   257679    50.063340749  4139  Q   R 9491967 + 256 [dd]
  8,0    3   257680    50.063342529  4139  G   R 9491967 + 256 [dd]
  8,0    3   257681    50.063343597  4139  P   N [dd]
  8,0    3   257682    50.063343915  4139  I   R 9491967 + 256 [dd]
  8,0    3   257683    50.063344374  4139  U   N [dd] 2
  8,0    3   257684    50.063345313  4139  D   R 9491967 + 256 [dd]
  8,0    3   257685    50.063357370  4139  U   N [dd] 2
  8,0    3   257686    50.066605011     0  C   R 9491711 + 256 [0]
  8,0    3   257687    50.066643587     0  C   R 9491967 + 256 [0]
  8,0    3   257688    50.066821310  4139  A   R 9492223 + 256 <- (8,1) 9492160
  8,0    3   257689    50.066821601  4139  Q   R 9492223 + 256 [dd]
  8,0    3   257690    50.066823605  4139  G   R 9492223 + 256 [dd]
  8,0    3   257691    50.066825063  4139  P   N [dd]


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-01  1:04   ` Hisashi Hifumi
@ 2009-06-05 15:15     ` Alan D. Brunelle
  2009-06-06 14:36       ` KOSAKI Motohiro
  0 siblings, 1 reply; 65+ messages in thread
From: Alan D. Brunelle @ 2009-06-05 15:15 UTC (permalink / raw)
  To: Hisashi Hifumi; +Cc: Andrew Morton, linux-kernel, linux-fsdevel

Hisashi Hifumi wrote:
> At 09:36 09/06/01, Andrew Morton wrote:
>> On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi 
>> <hifumi.hisashi@oss.ntt.co.jp> wrote:
>>
>>> I added blk_run_backing_dev on page_cache_async_readahead
>>> so readahead I/O is unpluged to improve throughput on 
>>> especially RAID environment. 
>> I skipped the last version of this because KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> said "Please attach blktrace analysis ;)".
>>
>> I'm not sure why he asked for that, but he's a smart chap and
>> presumably had his reasons.
>>
>> If you think that such an analysis is unneeded, or isn't worth the time
>> to generate then please tell us that.  But please don't just ignore the
>> request!
> 
> Hi Andrew.
> 
> Sorry for this.
> 
> I did not ignore KOSAKI Motohiro's request.
> I've got blktrace output for both with and without the patch, 
> but I just did not clarify the reason for throuput improvement
> from this result.
> 
> I do not notice any difference except around unplug behavior by dd.
> Comments?

Pardon my ignorance on the global issues concerning the patch, but 
specifically looking at the traces generated by blktrace leads one to 
also note that the patched version may generate inefficiencies in other 
places in the kernel by reducing the merging going on. In the unpatched 
version it looks like (generally) that two incoming bio's are able to be 
merged to generate a single I/O request. In the patched version - 
because of the quicker unplug(?) - no such merging is going on. This 
leads to more work lower in the stack (twice as many I/O operations 
being managed), perhaps increased interrupts & handling &c. [This may be 
acceptable if the goal is to decrease latencies on a per-bio basis...]

Do you have a place where the raw blktrace data can be retrieved for 
more in-depth analysis?

Regards,
Alan D. Brunelle
Hewlett-Packard

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-05 15:15     ` Alan D. Brunelle
@ 2009-06-06 14:36       ` KOSAKI Motohiro
  2009-06-06 22:45         ` Wu Fengguang
  0 siblings, 1 reply; 65+ messages in thread
From: KOSAKI Motohiro @ 2009-06-06 14:36 UTC (permalink / raw)
  To: Alan D. Brunelle
  Cc: kosaki.motohiro, Hisashi Hifumi, Andrew Morton, linux-kernel,
	linux-fsdevel, Wu Fengguang


sorry for late responce.
I wonder why I and Wu don't contain Cc list in this thread.


> Hisashi Hifumi wrote:
> > At 09:36 09/06/01, Andrew Morton wrote:
> >> On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi 
> >> <hifumi.hisashi@oss.ntt.co.jp> wrote:
> >>
> >>> I added blk_run_backing_dev on page_cache_async_readahead
> >>> so readahead I/O is unpluged to improve throughput on 
> >>> especially RAID environment. 
> >> I skipped the last version of this because KOSAKI Motohiro
> >> <kosaki.motohiro@jp.fujitsu.com> said "Please attach blktrace analysis ;)".
> >>
> >> I'm not sure why he asked for that, but he's a smart chap and
> >> presumably had his reasons.
> >>
> >> If you think that such an analysis is unneeded, or isn't worth the time
> >> to generate then please tell us that.  But please don't just ignore the
> >> request!
> > 
> > Hi Andrew.
> > 
> > Sorry for this.
> > 
> > I did not ignore KOSAKI Motohiro's request.
> > I've got blktrace output for both with and without the patch, 
> > but I just did not clarify the reason for throuput improvement
> > from this result.
> > 
> > I do not notice any difference except around unplug behavior by dd.
> > Comments?
> 
> Pardon my ignorance on the global issues concerning the patch, but 
> specifically looking at the traces generated by blktrace leads one to 
> also note that the patched version may generate inefficiencies in other 
> places in the kernel by reducing the merging going on. In the unpatched 
> version it looks like (generally) that two incoming bio's are able to be 
> merged to generate a single I/O request. In the patched version - 
> because of the quicker unplug(?) - no such merging is going on. This 
> leads to more work lower in the stack (twice as many I/O operations 
> being managed), perhaps increased interrupts & handling &c. [This may be 
> acceptable if the goal is to decrease latencies on a per-bio basis...]
> 
> Do you have a place where the raw blktrace data can be retrieved for 
> more in-depth analysis?

I think your comment is really adequate. In another thread, Wu Fengguang pointed
out the same issue.
I and Wu also wait his analysis.

Thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-06 14:36       ` KOSAKI Motohiro
@ 2009-06-06 22:45         ` Wu Fengguang
  2009-06-18 19:04           ` Andrew Morton
  0 siblings, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2009-06-06 22:45 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Alan D. Brunelle, Hisashi Hifumi, Andrew Morton, linux-kernel,
	linux-fsdevel, Jens Axboe, Randy Dunlap

On Sat, Jun 06, 2009 at 10:36:41PM +0800, KOSAKI Motohiro wrote:
>
> sorry for late responce.
> I wonder why I and Wu don't contain Cc list in this thread.

[restore more CC]

> > Hisashi Hifumi wrote:
> > > At 09:36 09/06/01, Andrew Morton wrote:
> > >> On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi
> > >> <hifumi.hisashi@oss.ntt.co.jp> wrote:
> > >>
> > >>> I added blk_run_backing_dev on page_cache_async_readahead
> > >>> so readahead I/O is unpluged to improve throughput on
> > >>> especially RAID environment.
> > >> I skipped the last version of this because KOSAKI Motohiro
> > >> <kosaki.motohiro@jp.fujitsu.com> said "Please attach blktrace analysis ;)".
> > >>
> > >> I'm not sure why he asked for that, but he's a smart chap and
> > >> presumably had his reasons.
> > >>
> > >> If you think that such an analysis is unneeded, or isn't worth the time
> > >> to generate then please tell us that.  But please don't just ignore the
> > >> request!
> > >
> > > Hi Andrew.
> > >
> > > Sorry for this.
> > >
> > > I did not ignore KOSAKI Motohiro's request.
> > > I've got blktrace output for both with and without the patch,
> > > but I just did not clarify the reason for throuput improvement
> > > from this result.
> > >
> > > I do not notice any difference except around unplug behavior by dd.
> > > Comments?
> >
> > Pardon my ignorance on the global issues concerning the patch, but
> > specifically looking at the traces generated by blktrace leads one to
> > also note that the patched version may generate inefficiencies in other
> > places in the kernel by reducing the merging going on. In the unpatched
> > version it looks like (generally) that two incoming bio's are able to be
> > merged to generate a single I/O request. In the patched version -
> > because of the quicker unplug(?) - no such merging is going on. This
> > leads to more work lower in the stack (twice as many I/O operations
> > being managed), perhaps increased interrupts & handling &c. [This may be
> > acceptable if the goal is to decrease latencies on a per-bio basis...]
> >
> > Do you have a place where the raw blktrace data can be retrieved for
> > more in-depth analysis?
>
> I think your comment is really adequate. In another thread, Wu Fengguang pointed
> out the same issue.
> I and Wu also wait his analysis.

And do it with a large readahead size :)

Alan, this was my analysis:

: Hifumi, can you help retest with some large readahead size?
:
: Your readahead size (128K) is smaller than your max_sectors_kb (256K),
: so two readahead IO requests get merged into one real IO, that means
: half of the readahead requests are delayed.

ie. two readahead requests get merged and complete together, thus the effective
IO size is doubled but at the same time it becomes completely synchronous IO.

:
: The IO completion size goes down from 512 to 256 sectors:
:
: before patch:
:   8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
:   8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
:   8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
:   8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
:   8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]
:
: after patch:
:   8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
:   8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
:   8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
:   8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
:   8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-06 22:45         ` Wu Fengguang
@ 2009-06-18 19:04           ` Andrew Morton
  2009-06-20  3:55             ` Wu Fengguang
  0 siblings, 1 reply; 65+ messages in thread
From: Andrew Morton @ 2009-06-18 19:04 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: kosaki.motohiro, Alan.Brunelle, hifumi.hisashi, linux-kernel,
	linux-fsdevel, jens.axboe, randy.dunlap

On Sun, 7 Jun 2009 06:45:38 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> > > Do you have a place where the raw blktrace data can be retrieved for
> > > more in-depth analysis?
> >
> > I think your comment is really adequate. In another thread, Wu Fengguang pointed
> > out the same issue.
> > I and Wu also wait his analysis.
> 
> And do it with a large readahead size :)
> 
> Alan, this was my analysis:
> 
> : Hifumi, can you help retest with some large readahead size?
> :
> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> : so two readahead IO requests get merged into one real IO, that means
> : half of the readahead requests are delayed.
> 
> ie. two readahead requests get merged and complete together, thus the effective
> IO size is doubled but at the same time it becomes completely synchronous IO.
> 
> :
> : The IO completion size goes down from 512 to 256 sectors:
> :
> : before patch:
> :   8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
> :   8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
> :   8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
> :   8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
> :   8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]
> :
> : after patch:
> :   8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
> :   8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
> :   8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
> :   8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
> :   8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]
> 

I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
and it's looking like 2.6.32 material, if ever.

If it turns out to be wonderful, we could always ask the -stable
maintainers to put it in 2.6.x.y I guess.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-18 19:04           ` Andrew Morton
@ 2009-06-20  3:55             ` Wu Fengguang
  2009-06-20 12:29               ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2009-06-20  3:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, Alan.Brunelle, hifumi.hisashi, linux-kernel,
	linux-fsdevel, jens.axboe, randy.dunlap

On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
> On Sun, 7 Jun 2009 06:45:38 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > > > Do you have a place where the raw blktrace data can be retrieved for
> > > > more in-depth analysis?
> > >
> > > I think your comment is really adequate. In another thread, Wu Fengguang pointed
> > > out the same issue.
> > > I and Wu also wait his analysis.
> > 
> > And do it with a large readahead size :)
> > 
> > Alan, this was my analysis:
> > 
> > : Hifumi, can you help retest with some large readahead size?
> > :
> > : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> > : so two readahead IO requests get merged into one real IO, that means
> > : half of the readahead requests are delayed.
> > 
> > ie. two readahead requests get merged and complete together, thus the effective
> > IO size is doubled but at the same time it becomes completely synchronous IO.
> > 
> > :
> > : The IO completion size goes down from 512 to 256 sectors:
> > :
> > : before patch:
> > :   8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
> > :   8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
> > :   8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
> > :   8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
> > :   8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]
> > :
> > : after patch:
> > :   8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
> > :   8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
> > :   8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
> > :   8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
> > :   8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]
> > 
> 
> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
> and it's looking like 2.6.32 material, if ever.
> 
> If it turns out to be wonderful, we could always ask the -stable
> maintainers to put it in 2.6.x.y I guess.

Agreed. The expected (and interesting) test on a properly configured
HW RAID has not happened yet, hence the theory remains unsupported.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-20  3:55             ` Wu Fengguang
@ 2009-06-20 12:29               ` Vladislav Bolkhovitin
  2009-06-29  9:34                 ` Wu Fengguang
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-20 12:29 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, kosaki.motohiro, Alan.Brunelle, hifumi.hisashi,
	linux-kernel, linux-fsdevel, jens.axboe, randy.dunlap,
	Beheer InterCommIT


Wu Fengguang, on 06/20/2009 07:55 AM wrote:
> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>> On Sun, 7 Jun 2009 06:45:38 +0800
>> Wu Fengguang <fengguang.wu@intel.com> wrote:
>>
>>>>> Do you have a place where the raw blktrace data can be retrieved for
>>>>> more in-depth analysis?
>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>>>> out the same issue.
>>>> I and Wu also wait his analysis.
>>> And do it with a large readahead size :)
>>>
>>> Alan, this was my analysis:
>>>
>>> : Hifumi, can you help retest with some large readahead size?
>>> :
>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>>> : so two readahead IO requests get merged into one real IO, that means
>>> : half of the readahead requests are delayed.
>>>
>>> ie. two readahead requests get merged and complete together, thus the effective
>>> IO size is doubled but at the same time it becomes completely synchronous IO.
>>>
>>> :
>>> : The IO completion size goes down from 512 to 256 sectors:
>>> :
>>> : before patch:
>>> :   8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
>>> :   8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
>>> :   8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
>>> :   8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
>>> :   8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]
>>> :
>>> : after patch:
>>> :   8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
>>> :   8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
>>> :   8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
>>> :   8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
>>> :   8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]
>>>
>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>> and it's looking like 2.6.32 material, if ever.
>>
>> If it turns out to be wonderful, we could always ask the -stable
>> maintainers to put it in 2.6.x.y I guess.
> 
> Agreed. The expected (and interesting) test on a properly configured
> HW RAID has not happened yet, hence the theory remains unsupported.

Hmm, do you see anything improper in the Ronald's setup (see 
http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)? 
It is HW RAID based.

As I already wrote, we can ask Ronald to perform any needed tests.

> Thanks,
> Fengguang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-20 12:29               ` Vladislav Bolkhovitin
@ 2009-06-29  9:34                 ` Wu Fengguang
  2009-06-29 10:26                   ` Ronald Moesbergen
  2009-06-29 10:55                   ` Vladislav Bolkhovitin
  0 siblings, 2 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29  9:34 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Andrew Morton, kosaki.motohiro, Alan.Brunelle, hifumi.hisashi,
	linux-kernel, linux-fsdevel, jens.axboe, randy.dunlap,
	Beheer InterCommIT

On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
> 
> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
> > On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
> >> On Sun, 7 Jun 2009 06:45:38 +0800
> >> Wu Fengguang <fengguang.wu@intel.com> wrote:
> >>
> >>>>> Do you have a place where the raw blktrace data can be retrieved for
> >>>>> more in-depth analysis?
> >>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
> >>>> out the same issue.
> >>>> I and Wu also wait his analysis.
> >>> And do it with a large readahead size :)
> >>>
> >>> Alan, this was my analysis:
> >>>
> >>> : Hifumi, can you help retest with some large readahead size?
> >>> :
> >>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> >>> : so two readahead IO requests get merged into one real IO, that means
> >>> : half of the readahead requests are delayed.
> >>>
> >>> ie. two readahead requests get merged and complete together, thus the effective
> >>> IO size is doubled but at the same time it becomes completely synchronous IO.
> >>>
> >>> :
> >>> : The IO completion size goes down from 512 to 256 sectors:
> >>> :
> >>> : before patch:
> >>> :   8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
> >>> :   8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
> >>> :   8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
> >>> :   8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
> >>> :   8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]
> >>> :
> >>> : after patch:
> >>> :   8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
> >>> :   8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
> >>> :   8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
> >>> :   8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
> >>> :   8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]
> >>>
> >> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
> >> and it's looking like 2.6.32 material, if ever.
> >>
> >> If it turns out to be wonderful, we could always ask the -stable
> >> maintainers to put it in 2.6.x.y I guess.
> > 
> > Agreed. The expected (and interesting) test on a properly configured
> > HW RAID has not happened yet, hence the theory remains unsupported.
> 
> Hmm, do you see anything improper in the Ronald's setup (see
> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
> It is HW RAID based.

No. Ronald's HW RAID performance is reasonably good.  I meant Hifumi's
RAID performance is too bad and may be improved by increasing the
readahead size, hehe.

> As I already wrote, we can ask Ronald to perform any needed tests.

Thanks!  Ronald's test results are:

231   MB/s   HW RAID                        
 69.6 MB/s   HW RAID + SCST                 
 89.7 MB/s   HW RAID + SCST + this patch

So this patch seem to help SCST, but again it would be better to
improve the SCST throughput first - it is now quite sub-optimal.
(Sorry for the long delay: currently I have not got an idea on
 how to measure such timing issues.)

And if Ronald could provide the HW RAID performance with this patch,
then we can confirm if this patch really makes a difference for RAID.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29  9:34                 ` Wu Fengguang
@ 2009-06-29 10:26                   ` Ronald Moesbergen
  2009-06-29 10:55                     ` Vladislav Bolkhovitin
  2009-06-29 10:55                   ` Vladislav Bolkhovitin
  1 sibling, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-06-29 10:26 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Vladislav Bolkhovitin, Andrew Morton, kosaki.motohiro,
	Alan.Brunelle, hifumi.hisashi, linux-kernel, linux-fsdevel,
	jens.axboe, randy.dunlap

2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>>
>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
>> > On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>> >> On Sun, 7 Jun 2009 06:45:38 +0800
>> >> Wu Fengguang <fengguang.wu@intel.com> wrote:
>> >>
>> >>>>> Do you have a place where the raw blktrace data can be retrieved for
>> >>>>> more in-depth analysis?
>> >>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>> >>>> out the same issue.
>> >>>> I and Wu also wait his analysis.
>> >>> And do it with a large readahead size :)
>> >>>
>> >>> Alan, this was my analysis:
>> >>>
>> >>> : Hifumi, can you help retest with some large readahead size?
>> >>> :
>> >>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>> >>> : so two readahead IO requests get merged into one real IO, that means
>> >>> : half of the readahead requests are delayed.
>> >>>
>> >>> ie. two readahead requests get merged and complete together, thus the effective
>> >>> IO size is doubled but at the same time it becomes completely synchronous IO.
>> >>>
>> >>> :
>> >>> : The IO completion size goes down from 512 to 256 sectors:
>> >>> :
>> >>> : before patch:
>> >>> :   8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
>> >>> :   8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
>> >>> :   8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
>> >>> :   8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
>> >>> :   8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]
>> >>> :
>> >>> : after patch:
>> >>> :   8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
>> >>> :   8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
>> >>> :   8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
>> >>> :   8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
>> >>> :   8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]
>> >>>
>> >> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>> >> and it's looking like 2.6.32 material, if ever.
>> >>
>> >> If it turns out to be wonderful, we could always ask the -stable
>> >> maintainers to put it in 2.6.x.y I guess.
>> >
>> > Agreed. The expected (and interesting) test on a properly configured
>> > HW RAID has not happened yet, hence the theory remains unsupported.
>>
>> Hmm, do you see anything improper in the Ronald's setup (see
>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
>> It is HW RAID based.
>
> No. Ronald's HW RAID performance is reasonably good.  I meant Hifumi's
> RAID performance is too bad and may be improved by increasing the
> readahead size, hehe.
>
>> As I already wrote, we can ask Ronald to perform any needed tests.
>
> Thanks!  Ronald's test results are:
>
> 231   MB/s   HW RAID
>  69.6 MB/s   HW RAID + SCST
>  89.7 MB/s   HW RAID + SCST + this patch
>
> So this patch seem to help SCST, but again it would be better to
> improve the SCST throughput first - it is now quite sub-optimal.
> (Sorry for the long delay: currently I have not got an idea on
>  how to measure such timing issues.)
>
> And if Ronald could provide the HW RAID performance with this patch,
> then we can confirm if this patch really makes a difference for RAID.

I just tested raw HW RAID throughput with the patch applied, same
readahead setting (512KB), and it doesn't look promising:

./blockdev-perftest -d -r /dev/cciss/c0d0
blocksize        W        W        W        R        R        R
 67108864       -1       -1       -1  5.59686   5.4098  5.45396
 33554432       -1       -1       -1  6.18616  6.13232  5.96124
 16777216       -1       -1       -1   7.6757  7.32139   7.4966
  8388608       -1       -1       -1  8.82793  9.02057  9.01055
  4194304       -1       -1       -1  12.2289  12.6804    12.19
  2097152       -1       -1       -1  13.3012   13.706  14.7542
  1048576       -1       -1       -1  11.7577  12.3609  11.9507
   524288       -1       -1       -1  12.4112  12.2383  11.9105
   262144       -1       -1       -1  7.30687   7.4417  7.38246
   131072       -1       -1       -1  7.95752  7.95053  8.60796
    65536       -1       -1       -1  10.1282  10.1286  10.1956
    32768       -1       -1       -1  9.91857  9.98597  10.8421
    16384       -1       -1       -1  10.8267  10.8899  10.8718
     8192       -1       -1       -1  12.0345  12.5275   12.005
     4096       -1       -1       -1  15.1537  15.0771  15.1753
     2048       -1       -1       -1   25.432  24.8985  25.4303
     1024       -1       -1       -1  45.2674  45.2707  45.3504
      512       -1       -1       -1  87.9405  88.5047  87.4726

It dropped down to 189 MB/s. :(

Ronald.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29  9:34                 ` Wu Fengguang
  2009-06-29 10:26                   ` Ronald Moesbergen
@ 2009-06-29 10:55                   ` Vladislav Bolkhovitin
  2009-06-29 13:00                     ` Wu Fengguang
  1 sibling, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-29 10:55 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, kosaki.motohiro, Alan.Brunelle, hifumi.hisashi,
	linux-kernel, linux-fsdevel, jens.axboe, randy.dunlap,
	Beheer InterCommIT



Wu Fengguang, on 06/29/2009 01:34 PM wrote:
> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
>>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>>>> On Sun, 7 Jun 2009 06:45:38 +0800
>>>> Wu Fengguang <fengguang.wu@intel.com> wrote:
>>>>
>>>>>>> Do you have a place where the raw blktrace data can be retrieved for
>>>>>>> more in-depth analysis?
>>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>>>>>> out the same issue.
>>>>>> I and Wu also wait his analysis.
>>>>> And do it with a large readahead size :)
>>>>>
>>>>> Alan, this was my analysis:
>>>>>
>>>>> : Hifumi, can you help retest with some large readahead size?
>>>>> :
>>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>>>>> : so two readahead IO requests get merged into one real IO, that means
>>>>> : half of the readahead requests are delayed.
>>>>>
>>>>> ie. two readahead requests get merged and complete together, thus the effective
>>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
>>>>>
>>>>> :
>>>>> : The IO completion size goes down from 512 to 256 sectors:
>>>>> :
>>>>> : before patch:
>>>>> :   8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
>>>>> :   8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
>>>>> :   8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
>>>>> :   8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
>>>>> :   8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]
>>>>> :
>>>>> : after patch:
>>>>> :   8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
>>>>> :   8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
>>>>> :   8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
>>>>> :   8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
>>>>> :   8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]
>>>>>
>>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>>>> and it's looking like 2.6.32 material, if ever.
>>>>
>>>> If it turns out to be wonderful, we could always ask the -stable
>>>> maintainers to put it in 2.6.x.y I guess.
>>> Agreed. The expected (and interesting) test on a properly configured
>>> HW RAID has not happened yet, hence the theory remains unsupported.
>> Hmm, do you see anything improper in the Ronald's setup (see
>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
>> It is HW RAID based.
> 
> No. Ronald's HW RAID performance is reasonably good.  I meant Hifumi's
> RAID performance is too bad and may be improved by increasing the
> readahead size, hehe.
> 
>> As I already wrote, we can ask Ronald to perform any needed tests.
> 
> Thanks!  Ronald's test results are:
> 
> 231   MB/s   HW RAID                        
>  69.6 MB/s   HW RAID + SCST                 
>  89.7 MB/s   HW RAID + SCST + this patch
> 
> So this patch seem to help SCST, but again it would be better to
> improve the SCST throughput first - it is now quite sub-optimal.

No, SCST performance isn't an issue here. You simply can't get more than 
110 MB/s from iSCSI over 1GbE, hence 231 MB/s fundamentally isn't 
possible. There is only room for 20% improvement, which should be 
achieved with better client-side-driven pipelining (see our other 
discussions, e.g. http://lkml.org/lkml/2009/5/12/370)

> (Sorry for the long delay: currently I have not got an idea on
>  how to measure such timing issues.)
> 
> And if Ronald could provide the HW RAID performance with this patch,
> then we can confirm if this patch really makes a difference for RAID.
> 
> Thanks,
> Fengguang


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 10:26                   ` Ronald Moesbergen
@ 2009-06-29 10:55                     ` Vladislav Bolkhovitin
  2009-06-29 12:54                       ` Wu Fengguang
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-29 10:55 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: Wu Fengguang, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche

Ronald Moesbergen, on 06/29/2009 02:26 PM wrote:
> 2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
>> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
>>>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>>>>> On Sun, 7 Jun 2009 06:45:38 +0800
>>>>> Wu Fengguang <fengguang.wu@intel.com> wrote:
>>>>>
>>>>>>>> Do you have a place where the raw blktrace data can be retrieved for
>>>>>>>> more in-depth analysis?
>>>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>>>>>>> out the same issue.
>>>>>>> I and Wu also wait his analysis.
>>>>>> And do it with a large readahead size :)
>>>>>>
>>>>>> Alan, this was my analysis:
>>>>>>
>>>>>> : Hifumi, can you help retest with some large readahead size?
>>>>>> :
>>>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>>>>>> : so two readahead IO requests get merged into one real IO, that means
>>>>>> : half of the readahead requests are delayed.
>>>>>>
>>>>>> ie. two readahead requests get merged and complete together, thus the effective
>>>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
>>>>>>
>>>>>> :
>>>>>> : The IO completion size goes down from 512 to 256 sectors:
>>>>>> :
>>>>>> : before patch:
>>>>>> :   8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
>>>>>> :   8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
>>>>>> :   8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
>>>>>> :   8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
>>>>>> :   8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]
>>>>>> :
>>>>>> : after patch:
>>>>>> :   8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
>>>>>> :   8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
>>>>>> :   8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
>>>>>> :   8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
>>>>>> :   8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]
>>>>>>
>>>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>>>>> and it's looking like 2.6.32 material, if ever.
>>>>>
>>>>> If it turns out to be wonderful, we could always ask the -stable
>>>>> maintainers to put it in 2.6.x.y I guess.
>>>> Agreed. The expected (and interesting) test on a properly configured
>>>> HW RAID has not happened yet, hence the theory remains unsupported.
>>> Hmm, do you see anything improper in the Ronald's setup (see
>>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
>>> It is HW RAID based.
>> No. Ronald's HW RAID performance is reasonably good.  I meant Hifumi's
>> RAID performance is too bad and may be improved by increasing the
>> readahead size, hehe.
>>
>>> As I already wrote, we can ask Ronald to perform any needed tests.
>> Thanks!  Ronald's test results are:
>>
>> 231   MB/s   HW RAID
>>  69.6 MB/s   HW RAID + SCST
>>  89.7 MB/s   HW RAID + SCST + this patch
>>
>> So this patch seem to help SCST, but again it would be better to
>> improve the SCST throughput first - it is now quite sub-optimal.
>> (Sorry for the long delay: currently I have not got an idea on
>>  how to measure such timing issues.)
>>
>> And if Ronald could provide the HW RAID performance with this patch,
>> then we can confirm if this patch really makes a difference for RAID.
> 
> I just tested raw HW RAID throughput with the patch applied, same
> readahead setting (512KB), and it doesn't look promising:
> 
> ./blockdev-perftest -d -r /dev/cciss/c0d0
> blocksize        W        W        W        R        R        R
>  67108864       -1       -1       -1  5.59686   5.4098  5.45396
>  33554432       -1       -1       -1  6.18616  6.13232  5.96124
>  16777216       -1       -1       -1   7.6757  7.32139   7.4966
>   8388608       -1       -1       -1  8.82793  9.02057  9.01055
>   4194304       -1       -1       -1  12.2289  12.6804    12.19
>   2097152       -1       -1       -1  13.3012   13.706  14.7542
>   1048576       -1       -1       -1  11.7577  12.3609  11.9507
>    524288       -1       -1       -1  12.4112  12.2383  11.9105
>    262144       -1       -1       -1  7.30687   7.4417  7.38246
>    131072       -1       -1       -1  7.95752  7.95053  8.60796
>     65536       -1       -1       -1  10.1282  10.1286  10.1956
>     32768       -1       -1       -1  9.91857  9.98597  10.8421
>     16384       -1       -1       -1  10.8267  10.8899  10.8718
>      8192       -1       -1       -1  12.0345  12.5275   12.005
>      4096       -1       -1       -1  15.1537  15.0771  15.1753
>      2048       -1       -1       -1   25.432  24.8985  25.4303
>      1024       -1       -1       -1  45.2674  45.2707  45.3504
>       512       -1       -1       -1  87.9405  88.5047  87.4726
> 
> It dropped down to 189 MB/s. :(

Ronald,

Can you, please, rerun this test locally on the target with the latest 
version of blockdev-perftest, which produces much more readable results, 
for the following 6 cases:

1. Default vanilla 2.6.29 kernel, default parameters, including read-ahead

2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default

3. Default vanilla 2.6.29 kernel, 512 KB read-ahead, 64 KB 
max_sectors_kb, the rest is default

4. Patched by the Fengguang's patch http://lkml.org/lkml/2009/5/21/319 
vanilla 2.6.29 kernel, default parameters, including read-ahead

5. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB 
read-ahead, the rest is default

6. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB 
read-ahead, 64 KB max_sectors_kb, the rest is default

Thanks,
Vlad

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 10:55                     ` Vladislav Bolkhovitin
@ 2009-06-29 12:54                       ` Wu Fengguang
  2009-06-29 12:58                         ` Bart Van Assche
  2009-06-29 13:04                         ` Vladislav Bolkhovitin
  0 siblings, 2 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 12:54 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Ronald Moesbergen, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche

On Mon, Jun 29, 2009 at 06:55:40PM +0800, Vladislav Bolkhovitin wrote:
> Ronald Moesbergen, on 06/29/2009 02:26 PM wrote:
> > 2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
> >> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
> >>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
> >>>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
> >>>>> On Sun, 7 Jun 2009 06:45:38 +0800
> >>>>> Wu Fengguang <fengguang.wu@intel.com> wrote:
> >>>>>
> >>>>>>>> Do you have a place where the raw blktrace data can be retrieved for
> >>>>>>>> more in-depth analysis?
> >>>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
> >>>>>>> out the same issue.
> >>>>>>> I and Wu also wait his analysis.
> >>>>>> And do it with a large readahead size :)
> >>>>>>
> >>>>>> Alan, this was my analysis:
> >>>>>>
> >>>>>> : Hifumi, can you help retest with some large readahead size?
> >>>>>> :
> >>>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> >>>>>> : so two readahead IO requests get merged into one real IO, that means
> >>>>>> : half of the readahead requests are delayed.
> >>>>>>
> >>>>>> ie. two readahead requests get merged and complete together, thus the effective
> >>>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
> >>>>>>
> >>>>>> :
> >>>>>> : The IO completion size goes down from 512 to 256 sectors:
> >>>>>> :
> >>>>>> : before patch:
> >>>>>> :   8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
> >>>>>> :   8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
> >>>>>> :   8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
> >>>>>> :   8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
> >>>>>> :   8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]
> >>>>>> :
> >>>>>> : after patch:
> >>>>>> :   8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
> >>>>>> :   8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
> >>>>>> :   8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
> >>>>>> :   8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
> >>>>>> :   8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]
> >>>>>>
> >>>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
> >>>>> and it's looking like 2.6.32 material, if ever.
> >>>>>
> >>>>> If it turns out to be wonderful, we could always ask the -stable
> >>>>> maintainers to put it in 2.6.x.y I guess.
> >>>> Agreed. The expected (and interesting) test on a properly configured
> >>>> HW RAID has not happened yet, hence the theory remains unsupported.
> >>> Hmm, do you see anything improper in the Ronald's setup (see
> >>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
> >>> It is HW RAID based.
> >> No. Ronald's HW RAID performance is reasonably good.  I meant Hifumi's
> >> RAID performance is too bad and may be improved by increasing the
> >> readahead size, hehe.
> >>
> >>> As I already wrote, we can ask Ronald to perform any needed tests.
> >> Thanks!  Ronald's test results are:
> >>
> >> 231   MB/s   HW RAID
> >>  69.6 MB/s   HW RAID + SCST
> >>  89.7 MB/s   HW RAID + SCST + this patch
> >>
> >> So this patch seem to help SCST, but again it would be better to
> >> improve the SCST throughput first - it is now quite sub-optimal.
> >> (Sorry for the long delay: currently I have not got an idea on
> >>  how to measure such timing issues.)
> >>
> >> And if Ronald could provide the HW RAID performance with this patch,
> >> then we can confirm if this patch really makes a difference for RAID.
> > 
> > I just tested raw HW RAID throughput with the patch applied, same
> > readahead setting (512KB), and it doesn't look promising:
> > 
> > ./blockdev-perftest -d -r /dev/cciss/c0d0
> > blocksize        W        W        W        R        R        R
> >  67108864       -1       -1       -1  5.59686   5.4098  5.45396
> >  33554432       -1       -1       -1  6.18616  6.13232  5.96124
> >  16777216       -1       -1       -1   7.6757  7.32139   7.4966
> >   8388608       -1       -1       -1  8.82793  9.02057  9.01055
> >   4194304       -1       -1       -1  12.2289  12.6804    12.19
> >   2097152       -1       -1       -1  13.3012   13.706  14.7542
> >   1048576       -1       -1       -1  11.7577  12.3609  11.9507
> >    524288       -1       -1       -1  12.4112  12.2383  11.9105
> >    262144       -1       -1       -1  7.30687   7.4417  7.38246
> >    131072       -1       -1       -1  7.95752  7.95053  8.60796
> >     65536       -1       -1       -1  10.1282  10.1286  10.1956
> >     32768       -1       -1       -1  9.91857  9.98597  10.8421
> >     16384       -1       -1       -1  10.8267  10.8899  10.8718
> >      8192       -1       -1       -1  12.0345  12.5275   12.005
> >      4096       -1       -1       -1  15.1537  15.0771  15.1753
> >      2048       -1       -1       -1   25.432  24.8985  25.4303
> >      1024       -1       -1       -1  45.2674  45.2707  45.3504
> >       512       -1       -1       -1  87.9405  88.5047  87.4726
> > 
> > It dropped down to 189 MB/s. :(
> 
> Ronald,
> 
> Can you, please, rerun this test locally on the target with the latest 
> version of blockdev-perftest, which produces much more readable results, 

Is blockdev-perftest public available? It's not obvious from google search.

> for the following 6 cases:
> 
> 1. Default vanilla 2.6.29 kernel, default parameters, including read-ahead

Why not 2.6.30? :)

> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default

How about 2MB RAID readahead size? That transforms into about 512KB
per-disk readahead size.

> 3. Default vanilla 2.6.29 kernel, 512 KB read-ahead, 64 KB 
> max_sectors_kb, the rest is default
> 
> 4. Patched by the Fengguang's patch http://lkml.org/lkml/2009/5/21/319 
> vanilla 2.6.29 kernel, default parameters, including read-ahead
> 
> 5. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB 
> read-ahead, the rest is default
> 
> 6. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB 
> read-ahead, 64 KB max_sectors_kb, the rest is default

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 12:54                       ` Wu Fengguang
@ 2009-06-29 12:58                         ` Bart Van Assche
  2009-06-29 13:01                           ` Wu Fengguang
  2009-06-29 13:04                         ` Vladislav Bolkhovitin
  1 sibling, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2009-06-29 12:58 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Vladislav Bolkhovitin, Ronald Moesbergen, Andrew Morton,
	kosaki.motohiro, Alan.Brunelle, hifumi.hisashi, linux-kernel,
	linux-fsdevel, jens.axboe, randy.dunlap

On Mon, Jun 29, 2009 at 2:54 PM, Wu Fengguang<fengguang.wu@intel.com> wrote:
> Is blockdev-perftest public available? It's not obvious from google search.

This script is publicly available. You can retrieve it by running the
following command:
svn co https://scst.svn.sourceforge.net/svnroot/scst/trunk/scripts

Bart.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 10:55                   ` Vladislav Bolkhovitin
@ 2009-06-29 13:00                     ` Wu Fengguang
  0 siblings, 0 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 13:00 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Andrew Morton, kosaki.motohiro, Alan.Brunelle, hifumi.hisashi,
	linux-kernel, linux-fsdevel, jens.axboe, randy.dunlap,
	Beheer InterCommIT

On Mon, Jun 29, 2009 at 06:55:21PM +0800, Vladislav Bolkhovitin wrote:
> 
> 
> Wu Fengguang, on 06/29/2009 01:34 PM wrote:
> > On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
> >> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
> >>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
> >>>> On Sun, 7 Jun 2009 06:45:38 +0800
> >>>> Wu Fengguang <fengguang.wu@intel.com> wrote:
> >>>>
> >>>>>>> Do you have a place where the raw blktrace data can be retrieved for
> >>>>>>> more in-depth analysis?
> >>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
> >>>>>> out the same issue.
> >>>>>> I and Wu also wait his analysis.
> >>>>> And do it with a large readahead size :)
> >>>>>
> >>>>> Alan, this was my analysis:
> >>>>>
> >>>>> : Hifumi, can you help retest with some large readahead size?
> >>>>> :
> >>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> >>>>> : so two readahead IO requests get merged into one real IO, that means
> >>>>> : half of the readahead requests are delayed.
> >>>>>
> >>>>> ie. two readahead requests get merged and complete together, thus the effective
> >>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
> >>>>>
> >>>>> :
> >>>>> : The IO completion size goes down from 512 to 256 sectors:
> >>>>> :
> >>>>> : before patch:
> >>>>> :   8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
> >>>>> :   8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
> >>>>> :   8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
> >>>>> :   8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
> >>>>> :   8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]
> >>>>> :
> >>>>> : after patch:
> >>>>> :   8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
> >>>>> :   8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
> >>>>> :   8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
> >>>>> :   8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
> >>>>> :   8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]
> >>>>>
> >>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
> >>>> and it's looking like 2.6.32 material, if ever.
> >>>>
> >>>> If it turns out to be wonderful, we could always ask the -stable
> >>>> maintainers to put it in 2.6.x.y I guess.
> >>> Agreed. The expected (and interesting) test on a properly configured
> >>> HW RAID has not happened yet, hence the theory remains unsupported.
> >> Hmm, do you see anything improper in the Ronald's setup (see
> >> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
> >> It is HW RAID based.
> > 
> > No. Ronald's HW RAID performance is reasonably good.  I meant Hifumi's
> > RAID performance is too bad and may be improved by increasing the
> > readahead size, hehe.
> > 
> >> As I already wrote, we can ask Ronald to perform any needed tests.
> > 
> > Thanks!  Ronald's test results are:
> > 
> > 231   MB/s   HW RAID                        
> >  69.6 MB/s   HW RAID + SCST                 
> >  89.7 MB/s   HW RAID + SCST + this patch
> > 
> > So this patch seem to help SCST, but again it would be better to
> > improve the SCST throughput first - it is now quite sub-optimal.
> 
> No, SCST performance isn't an issue here. You simply can't get more than 
> 110 MB/s from iSCSI over 1GbE, hence 231 MB/s fundamentally isn't 
> possible. There is only room for 20% improvement, which should be 

Ah yes.

> achieved with better client-side-driven pipelining (see our other 
> discussions, e.g. http://lkml.org/lkml/2009/5/12/370)

Yeah, that's what I want to figure out why :)

Thanks,
Fengguang

> > (Sorry for the long delay: currently I have not got an idea on
> >  how to measure such timing issues.)
> > 
> > And if Ronald could provide the HW RAID performance with this patch,
> > then we can confirm if this patch really makes a difference for RAID.
> > 
> > Thanks,
> > Fengguang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 12:58                         ` Bart Van Assche
@ 2009-06-29 13:01                           ` Wu Fengguang
  0 siblings, 0 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 13:01 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Vladislav Bolkhovitin, Ronald Moesbergen, Andrew Morton,
	kosaki.motohiro, Alan.Brunelle, hifumi.hisashi, linux-kernel,
	linux-fsdevel, jens.axboe, randy.dunlap

On Mon, Jun 29, 2009 at 08:58:24PM +0800, Bart Van Assche wrote:
> On Mon, Jun 29, 2009 at 2:54 PM, Wu Fengguang<fengguang.wu@intel.com> wrote:
> > Is blockdev-perftest public available? It's not obvious from google search.
> 
> This script is publicly available. You can retrieve it by running the
> following command:
> svn co https://scst.svn.sourceforge.net/svnroot/scst/trunk/scripts

Thank you! This is a handy tool :)

Fengguang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 12:54                       ` Wu Fengguang
  2009-06-29 12:58                         ` Bart Van Assche
@ 2009-06-29 13:04                         ` Vladislav Bolkhovitin
  2009-06-29 13:13                           ` Wu Fengguang
  2009-06-29 14:00                           ` Ronald Moesbergen
  1 sibling, 2 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-29 13:04 UTC (permalink / raw)
  To: Wu Fengguang, Ronald Moesbergen
  Cc: Andrew Morton, kosaki.motohiro, Alan.Brunelle, hifumi.hisashi,
	linux-kernel, linux-fsdevel, jens.axboe, randy.dunlap,
	Bart Van Assche

Wu Fengguang, on 06/29/2009 04:54 PM wrote:
> On Mon, Jun 29, 2009 at 06:55:40PM +0800, Vladislav Bolkhovitin wrote:
>> Ronald Moesbergen, on 06/29/2009 02:26 PM wrote:
>>> 2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
>>>> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>>>>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
>>>>>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>>>>>>> On Sun, 7 Jun 2009 06:45:38 +0800
>>>>>>> Wu Fengguang <fengguang.wu@intel.com> wrote:
>>>>>>>
>>>>>>>>>> Do you have a place where the raw blktrace data can be retrieved for
>>>>>>>>>> more in-depth analysis?
>>>>>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>>>>>>>>> out the same issue.
>>>>>>>>> I and Wu also wait his analysis.
>>>>>>>> And do it with a large readahead size :)
>>>>>>>>
>>>>>>>> Alan, this was my analysis:
>>>>>>>>
>>>>>>>> : Hifumi, can you help retest with some large readahead size?
>>>>>>>> :
>>>>>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>>>>>>>> : so two readahead IO requests get merged into one real IO, that means
>>>>>>>> : half of the readahead requests are delayed.
>>>>>>>>
>>>>>>>> ie. two readahead requests get merged and complete together, thus the effective
>>>>>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
>>>>>>>>
>>>>>>>> :
>>>>>>>> : The IO completion size goes down from 512 to 256 sectors:
>>>>>>>> :
>>>>>>>> : before patch:
>>>>>>>> :   8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
>>>>>>>> :   8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
>>>>>>>> :   8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
>>>>>>>> :   8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
>>>>>>>> :   8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]
>>>>>>>> :
>>>>>>>> : after patch:
>>>>>>>> :   8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
>>>>>>>> :   8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
>>>>>>>> :   8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
>>>>>>>> :   8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
>>>>>>>> :   8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]
>>>>>>>>
>>>>>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>>>>>>> and it's looking like 2.6.32 material, if ever.
>>>>>>>
>>>>>>> If it turns out to be wonderful, we could always ask the -stable
>>>>>>> maintainers to put it in 2.6.x.y I guess.
>>>>>> Agreed. The expected (and interesting) test on a properly configured
>>>>>> HW RAID has not happened yet, hence the theory remains unsupported.
>>>>> Hmm, do you see anything improper in the Ronald's setup (see
>>>>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
>>>>> It is HW RAID based.
>>>> No. Ronald's HW RAID performance is reasonably good.  I meant Hifumi's
>>>> RAID performance is too bad and may be improved by increasing the
>>>> readahead size, hehe.
>>>>
>>>>> As I already wrote, we can ask Ronald to perform any needed tests.
>>>> Thanks!  Ronald's test results are:
>>>>
>>>> 231   MB/s   HW RAID
>>>>  69.6 MB/s   HW RAID + SCST
>>>>  89.7 MB/s   HW RAID + SCST + this patch
>>>>
>>>> So this patch seem to help SCST, but again it would be better to
>>>> improve the SCST throughput first - it is now quite sub-optimal.
>>>> (Sorry for the long delay: currently I have not got an idea on
>>>>  how to measure such timing issues.)
>>>>
>>>> And if Ronald could provide the HW RAID performance with this patch,
>>>> then we can confirm if this patch really makes a difference for RAID.
>>> I just tested raw HW RAID throughput with the patch applied, same
>>> readahead setting (512KB), and it doesn't look promising:
>>>
>>> ./blockdev-perftest -d -r /dev/cciss/c0d0
>>> blocksize        W        W        W        R        R        R
>>>  67108864       -1       -1       -1  5.59686   5.4098  5.45396
>>>  33554432       -1       -1       -1  6.18616  6.13232  5.96124
>>>  16777216       -1       -1       -1   7.6757  7.32139   7.4966
>>>   8388608       -1       -1       -1  8.82793  9.02057  9.01055
>>>   4194304       -1       -1       -1  12.2289  12.6804    12.19
>>>   2097152       -1       -1       -1  13.3012   13.706  14.7542
>>>   1048576       -1       -1       -1  11.7577  12.3609  11.9507
>>>    524288       -1       -1       -1  12.4112  12.2383  11.9105
>>>    262144       -1       -1       -1  7.30687   7.4417  7.38246
>>>    131072       -1       -1       -1  7.95752  7.95053  8.60796
>>>     65536       -1       -1       -1  10.1282  10.1286  10.1956
>>>     32768       -1       -1       -1  9.91857  9.98597  10.8421
>>>     16384       -1       -1       -1  10.8267  10.8899  10.8718
>>>      8192       -1       -1       -1  12.0345  12.5275   12.005
>>>      4096       -1       -1       -1  15.1537  15.0771  15.1753
>>>      2048       -1       -1       -1   25.432  24.8985  25.4303
>>>      1024       -1       -1       -1  45.2674  45.2707  45.3504
>>>       512       -1       -1       -1  87.9405  88.5047  87.4726
>>>
>>> It dropped down to 189 MB/s. :(
>> Ronald,
>>
>> Can you, please, rerun this test locally on the target with the latest 
>> version of blockdev-perftest, which produces much more readable results, 
> 
> Is blockdev-perftest public available? It's not obvious from google search.
> 
>> for the following 6 cases:
>>
>> 1. Default vanilla 2.6.29 kernel, default parameters, including read-ahead
> 
> Why not 2.6.30? :)

We started with 2.6.29, so why not complete with it (to save additional 
Ronald's effort to move on 2.6.30)?

>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
> 
> How about 2MB RAID readahead size? That transforms into about 512KB
> per-disk readahead size.

OK. Ronald, can you 4 more test cases, please:

7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default

8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
max_sectors_kb, the rest is default

9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
read-ahead, the rest is default

10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
read-ahead, 64 KB max_sectors_kb, the rest is default

>> 3. Default vanilla 2.6.29 kernel, 512 KB read-ahead, 64 KB 
>> max_sectors_kb, the rest is default
>>
>> 4. Patched by the Fengguang's patch http://lkml.org/lkml/2009/5/21/319 
>> vanilla 2.6.29 kernel, default parameters, including read-ahead
>>
>> 5. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB 
>> read-ahead, the rest is default
>>
>> 6. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB 
>> read-ahead, 64 KB max_sectors_kb, the rest is default
> 
> Thanks,
> Fengguang
> 
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 13:04                         ` Vladislav Bolkhovitin
@ 2009-06-29 13:13                           ` Wu Fengguang
  2009-06-29 13:28                             ` Wu Fengguang
  2009-06-29 14:00                           ` Ronald Moesbergen
  1 sibling, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 13:13 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Ronald Moesbergen, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche

On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
> Wu Fengguang, on 06/29/2009 04:54 PM wrote:
> > 
> > Why not 2.6.30? :)
> 
> We started with 2.6.29, so why not complete with it (to save additional 
> Ronald's effort to move on 2.6.30)?

OK, that's fair enough.

Fengguang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 13:13                           ` Wu Fengguang
@ 2009-06-29 13:28                             ` Wu Fengguang
  2009-06-29 14:43                               ` Ronald Moesbergen
  0 siblings, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 13:28 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Ronald Moesbergen, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche

[-- Attachment #1: Type: text/plain, Size: 639 bytes --]

On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
> > Wu Fengguang, on 06/29/2009 04:54 PM wrote:
> > > 
> > > Why not 2.6.30? :)
> > 
> > We started with 2.6.29, so why not complete with it (to save additional 
> > Ronald's effort to move on 2.6.30)?
> 
> OK, that's fair enough.

btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
in case it will help the SCST performance.

Ronald, if you run context readahead, please make sure that the server
side readahead size is bigger than the client side readahead size.

Thanks,
Fengguang

[-- Attachment #2: readahead-context-2.6.29.patch --]
[-- Type: text/x-diff, Size: 6651 bytes --]

--- linux.orig/mm/readahead.c
+++ linux/mm/readahead.c
@@ -337,6 +337,59 @@ static unsigned long get_next_ra_size(st
  */
 
 /*
+ * Count contiguously cached pages from @offset-1 to @offset-@max,
+ * this count is a conservative estimation of
+ * 	- length of the sequential read sequence, or
+ * 	- thrashing threshold in memory tight systems
+ */
+static pgoff_t count_history_pages(struct address_space *mapping,
+				   struct file_ra_state *ra,
+				   pgoff_t offset, unsigned long max)
+{
+	pgoff_t head;
+
+	rcu_read_lock();
+	head = radix_tree_prev_hole(&mapping->page_tree, offset - 1, max);
+	rcu_read_unlock();
+
+	return offset - 1 - head;
+}
+
+/*
+ * page cache context based read-ahead
+ */
+static int try_context_readahead(struct address_space *mapping,
+				 struct file_ra_state *ra,
+				 pgoff_t offset,
+				 unsigned long req_size,
+				 unsigned long max)
+{
+	pgoff_t size;
+
+	size = count_history_pages(mapping, ra, offset, max);
+
+	/*
+	 * no history pages:
+	 * it could be a random read
+	 */
+	if (!size)
+		return 0;
+
+	/*
+	 * starts from beginning of file:
+	 * it is a strong indication of long-run stream (or whole-file-read)
+	 */
+	if (size >= offset)
+		size *= 2;
+
+	ra->start = offset;
+	ra->size = get_init_ra_size(size + req_size, max);
+	ra->async_size = ra->size;
+
+	return 1;
+}
+
+/*
  * A minimal readahead algorithm for trivial sequential/random reads.
  */
 static unsigned long
@@ -345,34 +398,26 @@ ondemand_readahead(struct address_space 
 		   bool hit_readahead_marker, pgoff_t offset,
 		   unsigned long req_size)
 {
-	int	max = ra->ra_pages;	/* max readahead pages */
-	pgoff_t prev_offset;
-	int	sequential;
+	unsigned long max = max_sane_readahead(ra->ra_pages);
+
+	/*
+	 * start of file
+	 */
+	if (!offset)
+		goto initial_readahead;
 
 	/*
 	 * It's the expected callback offset, assume sequential access.
 	 * Ramp up sizes, and push forward the readahead window.
 	 */
-	if (offset && (offset == (ra->start + ra->size - ra->async_size) ||
-			offset == (ra->start + ra->size))) {
+	if ((offset == (ra->start + ra->size - ra->async_size) ||
+	     offset == (ra->start + ra->size))) {
 		ra->start += ra->size;
 		ra->size = get_next_ra_size(ra, max);
 		ra->async_size = ra->size;
 		goto readit;
 	}
 
-	prev_offset = ra->prev_pos >> PAGE_CACHE_SHIFT;
-	sequential = offset - prev_offset <= 1UL || req_size > max;
-
-	/*
-	 * Standalone, small read.
-	 * Read as is, and do not pollute the readahead state.
-	 */
-	if (!hit_readahead_marker && !sequential) {
-		return __do_page_cache_readahead(mapping, filp,
-						offset, req_size, 0);
-	}
-
 	/*
 	 * Hit a marked page without valid readahead state.
 	 * E.g. interleaved reads.
@@ -383,7 +428,7 @@ ondemand_readahead(struct address_space 
 		pgoff_t start;
 
 		rcu_read_lock();
-		start = radix_tree_next_hole(&mapping->page_tree, offset,max+1);
+		start = radix_tree_next_hole(&mapping->page_tree, offset+1,max);
 		rcu_read_unlock();
 
 		if (!start || start - offset > max)
@@ -391,23 +436,53 @@ ondemand_readahead(struct address_space 
 
 		ra->start = start;
 		ra->size = start - offset;	/* old async_size */
+		ra->size += req_size;
 		ra->size = get_next_ra_size(ra, max);
 		ra->async_size = ra->size;
 		goto readit;
 	}
 
 	/*
-	 * It may be one of
-	 * 	- first read on start of file
-	 * 	- sequential cache miss
-	 * 	- oversize random read
-	 * Start readahead for it.
+	 * oversize read
+	 */
+	if (req_size > max)
+		goto initial_readahead;
+
+	/*
+	 * sequential cache miss
 	 */
+	if (offset - (ra->prev_pos >> PAGE_CACHE_SHIFT) <= 1UL)
+		goto initial_readahead;
+
+	/*
+	 * Query the page cache and look for the traces(cached history pages)
+	 * that a sequential stream would leave behind.
+	 */
+	if (try_context_readahead(mapping, ra, offset, req_size, max))
+		goto readit;
+
+	/*
+	 * standalone, small random read
+	 * Read as is, and do not pollute the readahead state.
+	 */
+	return __do_page_cache_readahead(mapping, filp, offset, req_size, 0);
+
+initial_readahead:
 	ra->start = offset;
 	ra->size = get_init_ra_size(req_size, max);
 	ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
 
 readit:
+	/*
+	 * Will this read hit the readahead marker made by itself?
+	 * If so, trigger the readahead marker hit now, and merge
+	 * the resulted next readahead window into the current one.
+	 */
+	if (offset == ra->start && ra->size == ra->async_size) {
+		ra->async_size = get_next_ra_size(ra, max);
+		ra->size += ra->async_size;
+	}
+
 	return ra_submit(ra, mapping, filp);
 }
 
--- linux.orig/lib/radix-tree.c
+++ linux/lib/radix-tree.c
@@ -666,6 +666,43 @@ unsigned long radix_tree_next_hole(struc
 }
 EXPORT_SYMBOL(radix_tree_next_hole);
 
+/**
+ *	radix_tree_prev_hole    -    find the prev hole (not-present entry)
+ *	@root:		tree root
+ *	@index:		index key
+ *	@max_scan:	maximum range to search
+ *
+ *	Search backwards in the range [max(index-max_scan+1, 0), index]
+ *	for the first hole.
+ *
+ *	Returns: the index of the hole if found, otherwise returns an index
+ *	outside of the set specified (in which case 'index - return >= max_scan'
+ *	will be true). In rare cases of wrap-around, LONG_MAX will be returned.
+ *
+ *	radix_tree_next_hole may be called under rcu_read_lock. However, like
+ *	radix_tree_gang_lookup, this will not atomically search a snapshot of
+ *	the tree at a single point in time. For example, if a hole is created
+ *	at index 10, then subsequently a hole is created at index 5,
+ *	radix_tree_prev_hole covering both indexes may return 5 if called under
+ *	rcu_read_lock.
+ */
+unsigned long radix_tree_prev_hole(struct radix_tree_root *root,
+				   unsigned long index, unsigned long max_scan)
+{
+	unsigned long i;
+
+	for (i = 0; i < max_scan; i++) {
+		if (!radix_tree_lookup(root, index))
+			break;
+		index--;
+		if (index == LONG_MAX)
+			break;
+	}
+
+	return index;
+}
+EXPORT_SYMBOL(radix_tree_prev_hole);
+
 static unsigned int
 __lookup(struct radix_tree_node *slot, void ***results, unsigned long index,
 	unsigned int max_items, unsigned long *next_index)
--- linux.orig/include/linux/radix-tree.h
+++ linux/include/linux/radix-tree.h
@@ -167,6 +167,8 @@ radix_tree_gang_lookup_slot(struct radix
 			unsigned long first_index, unsigned int max_items);
 unsigned long radix_tree_next_hole(struct radix_tree_root *root,
 				unsigned long index, unsigned long max_scan);
+unsigned long radix_tree_prev_hole(struct radix_tree_root *root,
+				unsigned long index, unsigned long max_scan);
 int radix_tree_preload(gfp_t gfp_mask);
 void radix_tree_init(void);
 void *radix_tree_tag_set(struct radix_tree_root *root,

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 13:04                         ` Vladislav Bolkhovitin
  2009-06-29 13:13                           ` Wu Fengguang
@ 2009-06-29 14:00                           ` Ronald Moesbergen
  2009-06-29 14:21                             ` Wu Fengguang
  2009-06-30 10:22                             ` Vladislav Bolkhovitin
  1 sibling, 2 replies; 65+ messages in thread
From: Ronald Moesbergen @ 2009-06-29 14:00 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Wu Fengguang, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche

... tests ...

> We started with 2.6.29, so why not complete with it (to save additional
> Ronald's effort to move on 2.6.30)?
>
>>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
>>
>> How about 2MB RAID readahead size? That transforms into about 512KB
>> per-disk readahead size.
>
> OK. Ronald, can you 4 more test cases, please:
>
> 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
>
> 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
> max_sectors_kb, the rest is default
>
> 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> read-ahead, the rest is default
>
> 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> read-ahead, 64 KB max_sectors_kb, the rest is default

The results:

Unpatched, 128KB readahead, 512 max_sectors_kb
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864   5.621    5.503    5.419  185.744    2.780    2.902
 33554432   6.628    5.897    6.242  164.068    7.827    5.127
 16777216   7.312    7.165    7.614  139.148    3.501    8.697
  8388608   8.719    8.408    8.694  119.003    1.973   14.875
  4194304  11.836   12.192   12.137   84.958    1.111   21.239
  2097152  13.452   13.992   14.035   74.090    1.442   37.045
  1048576  12.759   11.996   12.195   83.194    2.152   83.194
   524288  11.895   12.297   12.587   83.570    1.945  167.140
   262144   7.325    7.285    7.444  139.304    1.272  557.214
   131072   7.992    8.832    7.952  124.279    5.901  994.228
    65536  10.940   10.062   10.122   98.847    3.715 1581.545
    32768   9.973   10.012    9.945  102.640    0.281 3284.493
    16384  11.377   10.538   10.692   94.316    3.100 6036.222

Unpatched, 512KB readahead, 512 max_sectors_kb
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864   5.032    4.770    5.265  204.228    8.271    3.191
 33554432   5.569    5.712    5.863  179.263    3.755    5.602
 16777216   6.661    6.857    6.550  153.132    2.888    9.571
  8388608   8.022    8.000    7.978  127.998    0.288   16.000
  4194304  10.959   11.579   12.208   88.586    3.902   22.146
  2097152  13.692   12.670   12.625   78.906    2.914   39.453
  1048576  11.120   11.144   10.878   92.703    1.018   92.703
   524288  11.234   10.915   11.374   91.667    1.587  183.334
   262144   6.848    6.678    6.795  151.191    1.594  604.763
   131072   7.393    7.367    7.337  139.025    0.428 1112.202
    65536  10.003   10.919   10.015   99.466    4.019 1591.462
    32768  10.117   10.124   10.169  101.018    0.229 3232.574
    16384  11.614   11.027   11.029   91.293    2.207 5842.771

Unpatched, 2MB readahead, 512 max_sectors_kb
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864   5.268    5.316    5.418  191.996    2.241    3.000
 33554432   5.831    6.459    6.110  167.259    6.977    5.227
 16777216   7.313    7.069    7.197  142.385    1.972    8.899
  8388608   8.657    8.500    8.498  119.754    1.039   14.969
  4194304  11.846   12.116   11.801   85.911    0.994   21.478
  2097152  12.917   13.652   13.100   77.484    1.808   38.742
  1048576   9.544   10.667   10.807   99.345    5.640   99.345
   524288  11.736    7.171    6.599  128.410   29.539  256.821
   262144   7.530    7.403    7.416  137.464    1.053  549.857
   131072   8.741    8.002    8.022  124.256    5.029  994.051
    65536  10.701   10.138   10.090   99.394    2.629 1590.311
    32768   9.978    9.950    9.934  102.875    0.188 3291.994
    16384  11.435   10.823   10.907   92.684    2.234 5931.749

Unpatched, 512KB readahead, 64 max_sectors_kb
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864   3.994    3.991    4.123  253.774    3.838    3.965
 33554432   4.100    4.329    4.161  244.111    5.569    7.628
 16777216   5.476    4.835    5.079  200.148   10.177   12.509
  8388608   5.484    5.258    5.227  192.470    4.084   24.059
  4194304   6.429    6.458    6.435  158.989    0.315   39.747
  2097152   7.219    7.744    7.306  138.081    4.187   69.040
  1048576   6.850    6.897    6.776  149.696    1.089  149.696
   524288   6.406    6.393    6.469  159.439    0.814  318.877
   262144   6.865    7.508    6.861  144.931    6.041  579.726
   131072   8.435    8.482    8.307  121.792    1.076  974.334
    65536   9.616    9.610   10.262  104.279    3.176 1668.462
    32768   9.682    9.932   10.015  103.701    1.497 3318.428
    16384  10.962   10.852   11.565   92.106    2.547 5894.813

Unpatched, 2MB readahead, 64 max_sectors_kb
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864   3.730    3.714    3.914  270.615    6.396    4.228
 33554432   4.445    3.999    3.989  247.710   12.276    7.741
 16777216   4.763    4.712    4.709  216.590    1.122   13.537
  8388608   5.001    5.086    5.229  200.649    3.673   25.081
  4194304   6.365    6.362    6.905  156.710    5.948   39.178
  2097152   7.390    7.367    7.270  139.470    0.992   69.735
  1048576   7.038    7.050    7.090  145.052    0.456  145.052
   524288   6.862    7.167    7.278  144.272    3.617  288.544
   262144   7.266    7.313    7.265  140.635    0.436  562.540
   131072   8.677    8.735    8.821  117.108    0.790  936.865
    65536  10.865   10.040   10.038   99.418    3.658 1590.685
    32768  10.167   10.130   10.177  100.805    0.201 3225.749
    16384  11.643   11.017   11.103   91.041    2.203 5826.629

Patched, 128KB readahead, 512 max_sectors_kb
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864   5.670    5.188    5.636  186.555    7.671    2.915
 33554432   6.069    5.971    6.141  168.992    1.954    5.281
 16777216   7.821    7.501    7.372  135.451    3.340    8.466
  8388608   9.147    8.618    9.000  114.849    2.908   14.356
  4194304  12.199   12.914   12.381   81.981    1.964   20.495
  2097152  13.449   13.891   14.288   73.842    1.828   36.921
  1048576  11.890   12.182   11.519   86.360    1.984   86.360
   524288  11.899   12.706   12.135   83.678    2.287  167.357
   262144   7.460    7.559    7.563  136.041    0.864  544.164
   131072   7.987    8.003    8.530  125.403    3.792 1003.220
    65536  10.179   10.119   10.131  100.957    0.255 1615.312
    32768   9.899    9.923   10.589  101.114    3.121 3235.656
    16384  10.849   10.835   10.876   94.351    0.150 6038.474

Patched, 512KB readahead, 512 max_sectors_kb
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864   5.062    5.111    5.083  201.358    0.795    3.146
 33554432   5.589    5.713    5.657  181.165    1.625    5.661
 16777216   6.337    7.220    6.457  154.002    8.690    9.625
  8388608   7.952    7.880    7.527  131.588    3.192   16.448
  4194304  10.695   11.224   10.736   94.119    2.047   23.530
  2097152  10.898   12.072   12.358   87.215    4.839   43.607
  1048576  10.890   11.347    9.290   98.166    8.664   98.166
   524288  10.898   11.032   10.887   93.611    0.560  187.223
   262144   6.714    7.230    6.804  148.219    4.724  592.875
   131072   7.325    7.342    7.363  139.441    0.295 1115.530
    65536   9.773    9.988   10.592  101.327    3.417 1621.227
    32768  10.031    9.995   10.086  102.019    0.377 3264.620
    16384  11.041   10.987   11.564   91.502    2.093 5856.144

Patched, 2MB readahead, 512 max_sectors_kb
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864   4.970    5.097    5.188  201.435    3.559    3.147
 33554432   5.588    5.793    5.169  186.042    8.923    5.814
 16777216   6.151    6.414    6.526  161.012    4.027   10.063
  8388608   7.836    7.299    7.475  135.980    3.989   16.998
  4194304  11.792   10.964   10.158   93.683    5.706   23.421
  2097152  11.225   11.492   11.357   90.162    0.866   45.081
  1048576  12.017   11.258   11.432   88.580    2.449   88.580
   524288   5.974   10.883   11.840  117.323   38.361  234.647
   262144   6.774    6.765    6.526  153.155    2.661  612.619
   131072   8.036    7.324    7.341  135.579    5.766 1084.633
    65536   9.964   10.595    9.999  100.608    2.806 1609.735
    32768  10.132   10.036   10.190  101.197    0.637 3238.308
    16384  11.133   11.568   11.036   91.093    1.850 5829.981

Patched, 512KB readahead, 64 max_sectors_kb
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864   3.722    3.698    3.721  275.759    0.809    4.309
 33554432   4.058    3.849    3.957  259.063    5.580    8.096
 16777216   4.601    4.613    4.738  220.212    2.913   13.763
  8388608   5.039    5.534    5.017  197.452    8.791   24.682
  4194304   6.302    6.270    6.282  162.942    0.341   40.735
  2097152   7.314    7.302    7.069  141.700    2.233   70.850
  1048576   6.881    7.655    6.909  143.597    6.951  143.597
   524288   7.163    7.025    6.951  145.344    1.803  290.687
   262144   7.315    7.233    7.299  140.621    0.689  562.482
   131072   9.292    8.756    8.807  114.475    3.036  915.803
    65536   9.942    9.985    9.960  102.787    0.181 1644.598
    32768  10.721   10.091   10.192   99.154    2.605 3172.935
    16384  11.049   11.016   11.065   92.727    0.169 5934.531

Patched, 2MB readahead, 64 max_sectors_kb
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864   3.697    3.819    3.741  272.931    3.661    4.265
 33554432   3.951    3.905    4.038  258.320    3.586    8.073
 16777216   5.595    5.182    4.864  197.044   11.236   12.315
  8388608   5.267    5.156    5.116  197.725    2.431   24.716
  4194304   6.411    6.335    6.290  161.389    1.267   40.347
  2097152   7.329    7.663    7.462  136.860    2.502   68.430
  1048576   7.225    7.077    7.215  142.784    1.352  142.784
   524288   6.903    7.015    7.095  146.210    1.647  292.419
   262144   7.365    7.926    7.278  136.309    5.076  545.237
   131072   8.796    8.819    8.814  116.233    0.130  929.862
    65536   9.998   10.609    9.995  100.464    2.786 1607.423
    32768  10.161   10.124   10.246  100.623    0.505 3219.943

Regards,
Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 14:00                           ` Ronald Moesbergen
@ 2009-06-29 14:21                             ` Wu Fengguang
  2009-06-29 15:01                               ` Wu Fengguang
  2009-06-30 10:22                             ` Vladislav Bolkhovitin
  1 sibling, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 14:21 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: Vladislav Bolkhovitin, Andrew Morton, kosaki.motohiro,
	Alan.Brunelle, hifumi.hisashi, linux-kernel, linux-fsdevel,
	jens.axboe, randy.dunlap, Bart Van Assche

On Mon, Jun 29, 2009 at 10:00:20PM +0800, Ronald Moesbergen wrote:
> ... tests ...
> 
> > We started with 2.6.29, so why not complete with it (to save additional
> > Ronald's effort to move on 2.6.30)?
> >
> >>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
> >>
> >> How about 2MB RAID readahead size? That transforms into about 512KB
> >> per-disk readahead size.
> >
> > OK. Ronald, can you 4 more test cases, please:
> >
> > 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
> >
> > 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
> > max_sectors_kb, the rest is default
> >
> > 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> > read-ahead, the rest is default
> >
> > 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> > read-ahead, 64 KB max_sectors_kb, the rest is default
> 
> The results:

I made a blindless average:

N       MB/s          IOPS      case

0      114.859       984.148    Unpatched, 128KB readahead, 512 max_sectors_kb
1      122.960       981.213    Unpatched, 512KB readahead, 512 max_sectors_kb
2      120.709       985.111    Unpatched, 2MB readahead, 512 max_sectors_kb
3      158.732      1004.714    Unpatched, 512KB readahead, 64 max_sectors_kb
4      159.237       979.659    Unpatched, 2MB readahead, 64 max_sectors_kb

5      114.583       982.998    Patched, 128KB readahead, 512 max_sectors_kb
6      124.902       987.523    Patched, 512KB readahead, 512 max_sectors_kb
7      127.373       984.848    Patched, 2MB readahead, 512 max_sectors_kb
8      161.218       986.698    Patched, 512KB readahead, 64 max_sectors_kb
9      163.908       574.651    Patched, 2MB readahead, 64 max_sectors_kb

So before/after patch:

        avg throughput      135.299 => 138.397  by +2.3%
        avg IOPS            986.969 => 903.344  by -8.5%     

The IOPS is a bit weird.

Summaries:
- this patch improves RAID throughput by +2.3% on average
- after this patch, 2MB readahead performs slightly better
  (by 1-2%) than 512KB readahead

Thanks,
Fengguang

> Unpatched, 128KB readahead, 512 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   5.621    5.503    5.419  185.744    2.780    2.902
>  33554432   6.628    5.897    6.242  164.068    7.827    5.127
>  16777216   7.312    7.165    7.614  139.148    3.501    8.697
>   8388608   8.719    8.408    8.694  119.003    1.973   14.875
>   4194304  11.836   12.192   12.137   84.958    1.111   21.239
>   2097152  13.452   13.992   14.035   74.090    1.442   37.045
>   1048576  12.759   11.996   12.195   83.194    2.152   83.194
>    524288  11.895   12.297   12.587   83.570    1.945  167.140
>    262144   7.325    7.285    7.444  139.304    1.272  557.214
>    131072   7.992    8.832    7.952  124.279    5.901  994.228
>     65536  10.940   10.062   10.122   98.847    3.715 1581.545
>     32768   9.973   10.012    9.945  102.640    0.281 3284.493
>     16384  11.377   10.538   10.692   94.316    3.100 6036.222
> 
> Unpatched, 512KB readahead, 512 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   5.032    4.770    5.265  204.228    8.271    3.191
>  33554432   5.569    5.712    5.863  179.263    3.755    5.602
>  16777216   6.661    6.857    6.550  153.132    2.888    9.571
>   8388608   8.022    8.000    7.978  127.998    0.288   16.000
>   4194304  10.959   11.579   12.208   88.586    3.902   22.146
>   2097152  13.692   12.670   12.625   78.906    2.914   39.453
>   1048576  11.120   11.144   10.878   92.703    1.018   92.703
>    524288  11.234   10.915   11.374   91.667    1.587  183.334
>    262144   6.848    6.678    6.795  151.191    1.594  604.763
>    131072   7.393    7.367    7.337  139.025    0.428 1112.202
>     65536  10.003   10.919   10.015   99.466    4.019 1591.462
>     32768  10.117   10.124   10.169  101.018    0.229 3232.574
>     16384  11.614   11.027   11.029   91.293    2.207 5842.771
> 
> Unpatched, 2MB readahead, 512 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   5.268    5.316    5.418  191.996    2.241    3.000
>  33554432   5.831    6.459    6.110  167.259    6.977    5.227
>  16777216   7.313    7.069    7.197  142.385    1.972    8.899
>   8388608   8.657    8.500    8.498  119.754    1.039   14.969
>   4194304  11.846   12.116   11.801   85.911    0.994   21.478
>   2097152  12.917   13.652   13.100   77.484    1.808   38.742
>   1048576   9.544   10.667   10.807   99.345    5.640   99.345
>    524288  11.736    7.171    6.599  128.410   29.539  256.821
>    262144   7.530    7.403    7.416  137.464    1.053  549.857
>    131072   8.741    8.002    8.022  124.256    5.029  994.051
>     65536  10.701   10.138   10.090   99.394    2.629 1590.311
>     32768   9.978    9.950    9.934  102.875    0.188 3291.994
>     16384  11.435   10.823   10.907   92.684    2.234 5931.749
> 
> Unpatched, 512KB readahead, 64 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   3.994    3.991    4.123  253.774    3.838    3.965
>  33554432   4.100    4.329    4.161  244.111    5.569    7.628
>  16777216   5.476    4.835    5.079  200.148   10.177   12.509
>   8388608   5.484    5.258    5.227  192.470    4.084   24.059
>   4194304   6.429    6.458    6.435  158.989    0.315   39.747
>   2097152   7.219    7.744    7.306  138.081    4.187   69.040
>   1048576   6.850    6.897    6.776  149.696    1.089  149.696
>    524288   6.406    6.393    6.469  159.439    0.814  318.877
>    262144   6.865    7.508    6.861  144.931    6.041  579.726
>    131072   8.435    8.482    8.307  121.792    1.076  974.334
>     65536   9.616    9.610   10.262  104.279    3.176 1668.462
>     32768   9.682    9.932   10.015  103.701    1.497 3318.428
>     16384  10.962   10.852   11.565   92.106    2.547 5894.813
> 
> Unpatched, 2MB readahead, 64 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   3.730    3.714    3.914  270.615    6.396    4.228
>  33554432   4.445    3.999    3.989  247.710   12.276    7.741
>  16777216   4.763    4.712    4.709  216.590    1.122   13.537
>   8388608   5.001    5.086    5.229  200.649    3.673   25.081
>   4194304   6.365    6.362    6.905  156.710    5.948   39.178
>   2097152   7.390    7.367    7.270  139.470    0.992   69.735
>   1048576   7.038    7.050    7.090  145.052    0.456  145.052
>    524288   6.862    7.167    7.278  144.272    3.617  288.544
>    262144   7.266    7.313    7.265  140.635    0.436  562.540
>    131072   8.677    8.735    8.821  117.108    0.790  936.865
>     65536  10.865   10.040   10.038   99.418    3.658 1590.685
>     32768  10.167   10.130   10.177  100.805    0.201 3225.749
>     16384  11.643   11.017   11.103   91.041    2.203 5826.629
> 
> Patched, 128KB readahead, 512 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   5.670    5.188    5.636  186.555    7.671    2.915
>  33554432   6.069    5.971    6.141  168.992    1.954    5.281
>  16777216   7.821    7.501    7.372  135.451    3.340    8.466
>   8388608   9.147    8.618    9.000  114.849    2.908   14.356
>   4194304  12.199   12.914   12.381   81.981    1.964   20.495
>   2097152  13.449   13.891   14.288   73.842    1.828   36.921
>   1048576  11.890   12.182   11.519   86.360    1.984   86.360
>    524288  11.899   12.706   12.135   83.678    2.287  167.357
>    262144   7.460    7.559    7.563  136.041    0.864  544.164
>    131072   7.987    8.003    8.530  125.403    3.792 1003.220
>     65536  10.179   10.119   10.131  100.957    0.255 1615.312
>     32768   9.899    9.923   10.589  101.114    3.121 3235.656
>     16384  10.849   10.835   10.876   94.351    0.150 6038.474
> 
> Patched, 512KB readahead, 512 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   5.062    5.111    5.083  201.358    0.795    3.146
>  33554432   5.589    5.713    5.657  181.165    1.625    5.661
>  16777216   6.337    7.220    6.457  154.002    8.690    9.625
>   8388608   7.952    7.880    7.527  131.588    3.192   16.448
>   4194304  10.695   11.224   10.736   94.119    2.047   23.530
>   2097152  10.898   12.072   12.358   87.215    4.839   43.607
>   1048576  10.890   11.347    9.290   98.166    8.664   98.166
>    524288  10.898   11.032   10.887   93.611    0.560  187.223
>    262144   6.714    7.230    6.804  148.219    4.724  592.875
>    131072   7.325    7.342    7.363  139.441    0.295 1115.530
>     65536   9.773    9.988   10.592  101.327    3.417 1621.227
>     32768  10.031    9.995   10.086  102.019    0.377 3264.620
>     16384  11.041   10.987   11.564   91.502    2.093 5856.144
> 
> Patched, 2MB readahead, 512 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   4.970    5.097    5.188  201.435    3.559    3.147
>  33554432   5.588    5.793    5.169  186.042    8.923    5.814
>  16777216   6.151    6.414    6.526  161.012    4.027   10.063
>   8388608   7.836    7.299    7.475  135.980    3.989   16.998
>   4194304  11.792   10.964   10.158   93.683    5.706   23.421
>   2097152  11.225   11.492   11.357   90.162    0.866   45.081
>   1048576  12.017   11.258   11.432   88.580    2.449   88.580
>    524288   5.974   10.883   11.840  117.323   38.361  234.647
>    262144   6.774    6.765    6.526  153.155    2.661  612.619
>    131072   8.036    7.324    7.341  135.579    5.766 1084.633
>     65536   9.964   10.595    9.999  100.608    2.806 1609.735
>     32768  10.132   10.036   10.190  101.197    0.637 3238.308
>     16384  11.133   11.568   11.036   91.093    1.850 5829.981
> 
> Patched, 512KB readahead, 64 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   3.722    3.698    3.721  275.759    0.809    4.309
>  33554432   4.058    3.849    3.957  259.063    5.580    8.096
>  16777216   4.601    4.613    4.738  220.212    2.913   13.763
>   8388608   5.039    5.534    5.017  197.452    8.791   24.682
>   4194304   6.302    6.270    6.282  162.942    0.341   40.735
>   2097152   7.314    7.302    7.069  141.700    2.233   70.850
>   1048576   6.881    7.655    6.909  143.597    6.951  143.597
>    524288   7.163    7.025    6.951  145.344    1.803  290.687
>    262144   7.315    7.233    7.299  140.621    0.689  562.482
>    131072   9.292    8.756    8.807  114.475    3.036  915.803
>     65536   9.942    9.985    9.960  102.787    0.181 1644.598
>     32768  10.721   10.091   10.192   99.154    2.605 3172.935
>     16384  11.049   11.016   11.065   92.727    0.169 5934.531
> 
> Patched, 2MB readahead, 64 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   3.697    3.819    3.741  272.931    3.661    4.265
>  33554432   3.951    3.905    4.038  258.320    3.586    8.073
>  16777216   5.595    5.182    4.864  197.044   11.236   12.315
>   8388608   5.267    5.156    5.116  197.725    2.431   24.716
>   4194304   6.411    6.335    6.290  161.389    1.267   40.347
>   2097152   7.329    7.663    7.462  136.860    2.502   68.430
>   1048576   7.225    7.077    7.215  142.784    1.352  142.784
>    524288   6.903    7.015    7.095  146.210    1.647  292.419
>    262144   7.365    7.926    7.278  136.309    5.076  545.237
>    131072   8.796    8.819    8.814  116.233    0.130  929.862
>     65536   9.998   10.609    9.995  100.464    2.786 1607.423
>     32768  10.161   10.124   10.246  100.623    0.505 3219.943
> 
> Regards,
> Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 13:28                             ` Wu Fengguang
@ 2009-06-29 14:43                               ` Ronald Moesbergen
  2009-06-29 14:51                                 ` Wu Fengguang
  0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-06-29 14:43 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Vladislav Bolkhovitin, Andrew Morton, kosaki.motohiro,
	Alan.Brunelle, hifumi.hisashi, linux-kernel, linux-fsdevel,
	jens.axboe, randy.dunlap, Bart Van Assche

2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
> On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
>> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
>> > Wu Fengguang, on 06/29/2009 04:54 PM wrote:
>> > >
>> > > Why not 2.6.30? :)
>> >
>> > We started with 2.6.29, so why not complete with it (to save additional
>> > Ronald's effort to move on 2.6.30)?
>>
>> OK, that's fair enough.
>
> btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
> in case it will help the SCST performance.
>
> Ronald, if you run context readahead, please make sure that the server
> side readahead size is bigger than the client side readahead size.

I tried this patch on a vanilla kernel and no other patches applied,
but it does not seem to help. The iSCSI throughput does not go above
60MB/s. (1GB in 17 seconds). I have tried several readahead settings
from 128KB up to 4MB and kept the server readahead at twice the client
readahead, but it never comes above 60MB/s. This is using SCST on the
serverside and openiscsi on the client. I get much better throughput
(90 MB/s) when using the patches supplied with SCST, together with the
blk_run_backing_dev readahead patch.

Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 14:43                               ` Ronald Moesbergen
@ 2009-06-29 14:51                                 ` Wu Fengguang
  2009-06-29 14:56                                   ` Ronald Moesbergen
  2009-06-29 15:37                                   ` Vladislav Bolkhovitin
  0 siblings, 2 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 14:51 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: Vladislav Bolkhovitin, Andrew Morton, kosaki.motohiro,
	Alan.Brunelle, hifumi.hisashi, linux-kernel, linux-fsdevel,
	jens.axboe, randy.dunlap, Bart Van Assche

On Mon, Jun 29, 2009 at 10:43:48PM +0800, Ronald Moesbergen wrote:
> 2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
> > On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
> >> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
> >> > Wu Fengguang, on 06/29/2009 04:54 PM wrote:
> >> > >
> >> > > Why not 2.6.30? :)
> >> >
> >> > We started with 2.6.29, so why not complete with it (to save additional
> >> > Ronald's effort to move on 2.6.30)?
> >>
> >> OK, that's fair enough.
> >
> > btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
> > in case it will help the SCST performance.
> >
> > Ronald, if you run context readahead, please make sure that the server
> > side readahead size is bigger than the client side readahead size.
> 
> I tried this patch on a vanilla kernel and no other patches applied,
> but it does not seem to help. The iSCSI throughput does not go above
> 60MB/s. (1GB in 17 seconds). I have tried several readahead settings
> from 128KB up to 4MB and kept the server readahead at twice the client
> readahead, but it never comes above 60MB/s. This is using SCST on the

OK, thanks for the tests anyway!

> serverside and openiscsi on the client. I get much better throughput
> (90 MB/s) when using the patches supplied with SCST, together with the

What do you mean by "patches supplied with SCST"?

> blk_run_backing_dev readahead patch.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 14:51                                 ` Wu Fengguang
@ 2009-06-29 14:56                                   ` Ronald Moesbergen
  2009-06-29 15:37                                   ` Vladislav Bolkhovitin
  1 sibling, 0 replies; 65+ messages in thread
From: Ronald Moesbergen @ 2009-06-29 14:56 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Vladislav Bolkhovitin, Andrew Morton, kosaki.motohiro,
	Alan.Brunelle, hifumi.hisashi, linux-kernel, linux-fsdevel,
	jens.axboe, randy.dunlap, Bart Van Assche

2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
> On Mon, Jun 29, 2009 at 10:43:48PM +0800, Ronald Moesbergen wrote:
>> 2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
>> > On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
>> >> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
>> >> > Wu Fengguang, on 06/29/2009 04:54 PM wrote:
>> >> > >
>> >> > > Why not 2.6.30? :)
>> >> >
>> >> > We started with 2.6.29, so why not complete with it (to save additional
>> >> > Ronald's effort to move on 2.6.30)?
>> >>
>> >> OK, that's fair enough.
>> >
>> > btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
>> > in case it will help the SCST performance.
>> >
>> > Ronald, if you run context readahead, please make sure that the server
>> > side readahead size is bigger than the client side readahead size.
>>
>> I tried this patch on a vanilla kernel and no other patches applied,
>> but it does not seem to help. The iSCSI throughput does not go above
>> 60MB/s. (1GB in 17 seconds). I have tried several readahead settings
>> from 128KB up to 4MB and kept the server readahead at twice the client
>> readahead, but it never comes above 60MB/s. This is using SCST on the
>
> OK, thanks for the tests anyway!

You're welcome.

>> serverside and openiscsi on the client. I get much better throughput
>> (90 MB/s) when using the patches supplied with SCST, together with the
>
> What do you mean by "patches supplied with SCST"?

These:
http://scst.svn.sourceforge.net/viewvc/scst/trunk/scst/kernel/

Regards,
Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 14:21                             ` Wu Fengguang
@ 2009-06-29 15:01                               ` Wu Fengguang
  2009-06-29 15:37                                 ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2009-06-29 15:01 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: Vladislav Bolkhovitin, Andrew Morton, kosaki.motohiro,
	Alan.Brunelle, hifumi.hisashi, linux-kernel, linux-fsdevel,
	jens.axboe, randy.dunlap, Bart Van Assche

On Mon, Jun 29, 2009 at 10:21:24PM +0800, Wu Fengguang wrote:
> On Mon, Jun 29, 2009 at 10:00:20PM +0800, Ronald Moesbergen wrote:
> > ... tests ...
> > 
> > > We started with 2.6.29, so why not complete with it (to save additional
> > > Ronald's effort to move on 2.6.30)?
> > >
> > >>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
> > >>
> > >> How about 2MB RAID readahead size? That transforms into about 512KB
> > >> per-disk readahead size.
> > >
> > > OK. Ronald, can you 4 more test cases, please:
> > >
> > > 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
> > >
> > > 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
> > > max_sectors_kb, the rest is default
> > >
> > > 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> > > read-ahead, the rest is default
> > >
> > > 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> > > read-ahead, 64 KB max_sectors_kb, the rest is default
> > 
> > The results:
> 
> I made a blindless average:
> 
> N       MB/s          IOPS      case
> 
> 0      114.859       984.148    Unpatched, 128KB readahead, 512 max_sectors_kb
> 1      122.960       981.213    Unpatched, 512KB readahead, 512 max_sectors_kb
> 2      120.709       985.111    Unpatched, 2MB readahead, 512 max_sectors_kb
> 3      158.732      1004.714    Unpatched, 512KB readahead, 64 max_sectors_kb
> 4      159.237       979.659    Unpatched, 2MB readahead, 64 max_sectors_kb
> 
> 5      114.583       982.998    Patched, 128KB readahead, 512 max_sectors_kb
> 6      124.902       987.523    Patched, 512KB readahead, 512 max_sectors_kb
> 7      127.373       984.848    Patched, 2MB readahead, 512 max_sectors_kb
> 8      161.218       986.698    Patched, 512KB readahead, 64 max_sectors_kb
> 9      163.908       574.651    Patched, 2MB readahead, 64 max_sectors_kb
> 
> So before/after patch:
> 
>         avg throughput      135.299 => 138.397  by +2.3%
>         avg IOPS            986.969 => 903.344  by -8.5%     
> 
> The IOPS is a bit weird.
> 
> Summaries:
> - this patch improves RAID throughput by +2.3% on average
> - after this patch, 2MB readahead performs slightly better
>   (by 1-2%) than 512KB readahead

and the most important one:
- 64 max_sectors_kb performs much better than 256 max_sectors_kb, by ~30% !

Thanks,
Fengguang

> > Unpatched, 128KB readahead, 512 max_sectors_kb
> > blocksize       R        R        R   R(avg,    R(std        R
> >   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
> >  67108864   5.621    5.503    5.419  185.744    2.780    2.902
> >  33554432   6.628    5.897    6.242  164.068    7.827    5.127
> >  16777216   7.312    7.165    7.614  139.148    3.501    8.697
> >   8388608   8.719    8.408    8.694  119.003    1.973   14.875
> >   4194304  11.836   12.192   12.137   84.958    1.111   21.239
> >   2097152  13.452   13.992   14.035   74.090    1.442   37.045
> >   1048576  12.759   11.996   12.195   83.194    2.152   83.194
> >    524288  11.895   12.297   12.587   83.570    1.945  167.140
> >    262144   7.325    7.285    7.444  139.304    1.272  557.214
> >    131072   7.992    8.832    7.952  124.279    5.901  994.228
> >     65536  10.940   10.062   10.122   98.847    3.715 1581.545
> >     32768   9.973   10.012    9.945  102.640    0.281 3284.493
> >     16384  11.377   10.538   10.692   94.316    3.100 6036.222
> > 
> > Unpatched, 512KB readahead, 512 max_sectors_kb
> > blocksize       R        R        R   R(avg,    R(std        R
> >   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
> >  67108864   5.032    4.770    5.265  204.228    8.271    3.191
> >  33554432   5.569    5.712    5.863  179.263    3.755    5.602
> >  16777216   6.661    6.857    6.550  153.132    2.888    9.571
> >   8388608   8.022    8.000    7.978  127.998    0.288   16.000
> >   4194304  10.959   11.579   12.208   88.586    3.902   22.146
> >   2097152  13.692   12.670   12.625   78.906    2.914   39.453
> >   1048576  11.120   11.144   10.878   92.703    1.018   92.703
> >    524288  11.234   10.915   11.374   91.667    1.587  183.334
> >    262144   6.848    6.678    6.795  151.191    1.594  604.763
> >    131072   7.393    7.367    7.337  139.025    0.428 1112.202
> >     65536  10.003   10.919   10.015   99.466    4.019 1591.462
> >     32768  10.117   10.124   10.169  101.018    0.229 3232.574
> >     16384  11.614   11.027   11.029   91.293    2.207 5842.771
> > 
> > Unpatched, 2MB readahead, 512 max_sectors_kb
> > blocksize       R        R        R   R(avg,    R(std        R
> >   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
> >  67108864   5.268    5.316    5.418  191.996    2.241    3.000
> >  33554432   5.831    6.459    6.110  167.259    6.977    5.227
> >  16777216   7.313    7.069    7.197  142.385    1.972    8.899
> >   8388608   8.657    8.500    8.498  119.754    1.039   14.969
> >   4194304  11.846   12.116   11.801   85.911    0.994   21.478
> >   2097152  12.917   13.652   13.100   77.484    1.808   38.742
> >   1048576   9.544   10.667   10.807   99.345    5.640   99.345
> >    524288  11.736    7.171    6.599  128.410   29.539  256.821
> >    262144   7.530    7.403    7.416  137.464    1.053  549.857
> >    131072   8.741    8.002    8.022  124.256    5.029  994.051
> >     65536  10.701   10.138   10.090   99.394    2.629 1590.311
> >     32768   9.978    9.950    9.934  102.875    0.188 3291.994
> >     16384  11.435   10.823   10.907   92.684    2.234 5931.749
> > 
> > Unpatched, 512KB readahead, 64 max_sectors_kb
> > blocksize       R        R        R   R(avg,    R(std        R
> >   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
> >  67108864   3.994    3.991    4.123  253.774    3.838    3.965
> >  33554432   4.100    4.329    4.161  244.111    5.569    7.628
> >  16777216   5.476    4.835    5.079  200.148   10.177   12.509
> >   8388608   5.484    5.258    5.227  192.470    4.084   24.059
> >   4194304   6.429    6.458    6.435  158.989    0.315   39.747
> >   2097152   7.219    7.744    7.306  138.081    4.187   69.040
> >   1048576   6.850    6.897    6.776  149.696    1.089  149.696
> >    524288   6.406    6.393    6.469  159.439    0.814  318.877
> >    262144   6.865    7.508    6.861  144.931    6.041  579.726
> >    131072   8.435    8.482    8.307  121.792    1.076  974.334
> >     65536   9.616    9.610   10.262  104.279    3.176 1668.462
> >     32768   9.682    9.932   10.015  103.701    1.497 3318.428
> >     16384  10.962   10.852   11.565   92.106    2.547 5894.813
> > 
> > Unpatched, 2MB readahead, 64 max_sectors_kb
> > blocksize       R        R        R   R(avg,    R(std        R
> >   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
> >  67108864   3.730    3.714    3.914  270.615    6.396    4.228
> >  33554432   4.445    3.999    3.989  247.710   12.276    7.741
> >  16777216   4.763    4.712    4.709  216.590    1.122   13.537
> >   8388608   5.001    5.086    5.229  200.649    3.673   25.081
> >   4194304   6.365    6.362    6.905  156.710    5.948   39.178
> >   2097152   7.390    7.367    7.270  139.470    0.992   69.735
> >   1048576   7.038    7.050    7.090  145.052    0.456  145.052
> >    524288   6.862    7.167    7.278  144.272    3.617  288.544
> >    262144   7.266    7.313    7.265  140.635    0.436  562.540
> >    131072   8.677    8.735    8.821  117.108    0.790  936.865
> >     65536  10.865   10.040   10.038   99.418    3.658 1590.685
> >     32768  10.167   10.130   10.177  100.805    0.201 3225.749
> >     16384  11.643   11.017   11.103   91.041    2.203 5826.629
> > 
> > Patched, 128KB readahead, 512 max_sectors_kb
> > blocksize       R        R        R   R(avg,    R(std        R
> >   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
> >  67108864   5.670    5.188    5.636  186.555    7.671    2.915
> >  33554432   6.069    5.971    6.141  168.992    1.954    5.281
> >  16777216   7.821    7.501    7.372  135.451    3.340    8.466
> >   8388608   9.147    8.618    9.000  114.849    2.908   14.356
> >   4194304  12.199   12.914   12.381   81.981    1.964   20.495
> >   2097152  13.449   13.891   14.288   73.842    1.828   36.921
> >   1048576  11.890   12.182   11.519   86.360    1.984   86.360
> >    524288  11.899   12.706   12.135   83.678    2.287  167.357
> >    262144   7.460    7.559    7.563  136.041    0.864  544.164
> >    131072   7.987    8.003    8.530  125.403    3.792 1003.220
> >     65536  10.179   10.119   10.131  100.957    0.255 1615.312
> >     32768   9.899    9.923   10.589  101.114    3.121 3235.656
> >     16384  10.849   10.835   10.876   94.351    0.150 6038.474
> > 
> > Patched, 512KB readahead, 512 max_sectors_kb
> > blocksize       R        R        R   R(avg,    R(std        R
> >   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
> >  67108864   5.062    5.111    5.083  201.358    0.795    3.146
> >  33554432   5.589    5.713    5.657  181.165    1.625    5.661
> >  16777216   6.337    7.220    6.457  154.002    8.690    9.625
> >   8388608   7.952    7.880    7.527  131.588    3.192   16.448
> >   4194304  10.695   11.224   10.736   94.119    2.047   23.530
> >   2097152  10.898   12.072   12.358   87.215    4.839   43.607
> >   1048576  10.890   11.347    9.290   98.166    8.664   98.166
> >    524288  10.898   11.032   10.887   93.611    0.560  187.223
> >    262144   6.714    7.230    6.804  148.219    4.724  592.875
> >    131072   7.325    7.342    7.363  139.441    0.295 1115.530
> >     65536   9.773    9.988   10.592  101.327    3.417 1621.227
> >     32768  10.031    9.995   10.086  102.019    0.377 3264.620
> >     16384  11.041   10.987   11.564   91.502    2.093 5856.144
> > 
> > Patched, 2MB readahead, 512 max_sectors_kb
> > blocksize       R        R        R   R(avg,    R(std        R
> >   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
> >  67108864   4.970    5.097    5.188  201.435    3.559    3.147
> >  33554432   5.588    5.793    5.169  186.042    8.923    5.814
> >  16777216   6.151    6.414    6.526  161.012    4.027   10.063
> >   8388608   7.836    7.299    7.475  135.980    3.989   16.998
> >   4194304  11.792   10.964   10.158   93.683    5.706   23.421
> >   2097152  11.225   11.492   11.357   90.162    0.866   45.081
> >   1048576  12.017   11.258   11.432   88.580    2.449   88.580
> >    524288   5.974   10.883   11.840  117.323   38.361  234.647
> >    262144   6.774    6.765    6.526  153.155    2.661  612.619
> >    131072   8.036    7.324    7.341  135.579    5.766 1084.633
> >     65536   9.964   10.595    9.999  100.608    2.806 1609.735
> >     32768  10.132   10.036   10.190  101.197    0.637 3238.308
> >     16384  11.133   11.568   11.036   91.093    1.850 5829.981
> > 
> > Patched, 512KB readahead, 64 max_sectors_kb
> > blocksize       R        R        R   R(avg,    R(std        R
> >   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
> >  67108864   3.722    3.698    3.721  275.759    0.809    4.309
> >  33554432   4.058    3.849    3.957  259.063    5.580    8.096
> >  16777216   4.601    4.613    4.738  220.212    2.913   13.763
> >   8388608   5.039    5.534    5.017  197.452    8.791   24.682
> >   4194304   6.302    6.270    6.282  162.942    0.341   40.735
> >   2097152   7.314    7.302    7.069  141.700    2.233   70.850
> >   1048576   6.881    7.655    6.909  143.597    6.951  143.597
> >    524288   7.163    7.025    6.951  145.344    1.803  290.687
> >    262144   7.315    7.233    7.299  140.621    0.689  562.482
> >    131072   9.292    8.756    8.807  114.475    3.036  915.803
> >     65536   9.942    9.985    9.960  102.787    0.181 1644.598
> >     32768  10.721   10.091   10.192   99.154    2.605 3172.935
> >     16384  11.049   11.016   11.065   92.727    0.169 5934.531
> > 
> > Patched, 2MB readahead, 64 max_sectors_kb
> > blocksize       R        R        R   R(avg,    R(std        R
> >   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
> >  67108864   3.697    3.819    3.741  272.931    3.661    4.265
> >  33554432   3.951    3.905    4.038  258.320    3.586    8.073
> >  16777216   5.595    5.182    4.864  197.044   11.236   12.315
> >   8388608   5.267    5.156    5.116  197.725    2.431   24.716
> >   4194304   6.411    6.335    6.290  161.389    1.267   40.347
> >   2097152   7.329    7.663    7.462  136.860    2.502   68.430
> >   1048576   7.225    7.077    7.215  142.784    1.352  142.784
> >    524288   6.903    7.015    7.095  146.210    1.647  292.419
> >    262144   7.365    7.926    7.278  136.309    5.076  545.237
> >    131072   8.796    8.819    8.814  116.233    0.130  929.862
> >     65536   9.998   10.609    9.995  100.464    2.786 1607.423
> >     32768  10.161   10.124   10.246  100.623    0.505 3219.943
> > 
> > Regards,
> > Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 14:51                                 ` Wu Fengguang
  2009-06-29 14:56                                   ` Ronald Moesbergen
@ 2009-06-29 15:37                                   ` Vladislav Bolkhovitin
  1 sibling, 0 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-29 15:37 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Ronald Moesbergen, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche


Wu Fengguang, on 06/29/2009 06:51 PM wrote:
> On Mon, Jun 29, 2009 at 10:43:48PM +0800, Ronald Moesbergen wrote:
>> 2009/6/29 Wu Fengguang <fengguang.wu@intel.com>:
>>> On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
>>>> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
>>>>> Wu Fengguang, on 06/29/2009 04:54 PM wrote:
>>>>>> Why not 2.6.30? :)
>>>>> We started with 2.6.29, so why not complete with it (to save additional
>>>>> Ronald's effort to move on 2.6.30)?
>>>> OK, that's fair enough.
>>> btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
>>> in case it will help the SCST performance.
>>>
>>> Ronald, if you run context readahead, please make sure that the server
>>> side readahead size is bigger than the client side readahead size.
>> I tried this patch on a vanilla kernel and no other patches applied,
>> but it does not seem to help. The iSCSI throughput does not go above
>> 60MB/s. (1GB in 17 seconds). I have tried several readahead settings
>> from 128KB up to 4MB and kept the server readahead at twice the client
>> readahead, but it never comes above 60MB/s. This is using SCST on the
> 
> OK, thanks for the tests anyway!
> 
>> serverside and openiscsi on the client. I get much better throughput
>> (90 MB/s) when using the patches supplied with SCST, together with the
> 
> What do you mean by "patches supplied with SCST"?

Ronald means io_context patch 
(http://scst.svn.sourceforge.net/viewvc/scst/trunk/scst/kernel/io_context-2.6.29.patch?revision=717), 
which allows SCST's I/O threads to share a single IO context.

Vlad


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 15:01                               ` Wu Fengguang
@ 2009-06-29 15:37                                 ` Vladislav Bolkhovitin
       [not found]                                   ` <20090630010414.GB31418@localhost>
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-29 15:37 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Ronald Moesbergen, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche


Wu Fengguang, on 06/29/2009 07:01 PM wrote:
> On Mon, Jun 29, 2009 at 10:21:24PM +0800, Wu Fengguang wrote:
>> On Mon, Jun 29, 2009 at 10:00:20PM +0800, Ronald Moesbergen wrote:
>>> ... tests ...
>>>
>>>> We started with 2.6.29, so why not complete with it (to save additional
>>>> Ronald's effort to move on 2.6.30)?
>>>>
>>>>>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
>>>>> How about 2MB RAID readahead size? That transforms into about 512KB
>>>>> per-disk readahead size.
>>>> OK. Ronald, can you 4 more test cases, please:
>>>>
>>>> 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
>>>>
>>>> 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
>>>> max_sectors_kb, the rest is default
>>>>
>>>> 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
>>>> read-ahead, the rest is default
>>>>
>>>> 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
>>>> read-ahead, 64 KB max_sectors_kb, the rest is default
>>> The results:
>> I made a blindless average:
>>
>> N       MB/s          IOPS      case
>>
>> 0      114.859       984.148    Unpatched, 128KB readahead, 512 max_sectors_kb
>> 1      122.960       981.213    Unpatched, 512KB readahead, 512 max_sectors_kb
>> 2      120.709       985.111    Unpatched, 2MB readahead, 512 max_sectors_kb
>> 3      158.732      1004.714    Unpatched, 512KB readahead, 64 max_sectors_kb
>> 4      159.237       979.659    Unpatched, 2MB readahead, 64 max_sectors_kb
>>
>> 5      114.583       982.998    Patched, 128KB readahead, 512 max_sectors_kb
>> 6      124.902       987.523    Patched, 512KB readahead, 512 max_sectors_kb
>> 7      127.373       984.848    Patched, 2MB readahead, 512 max_sectors_kb
>> 8      161.218       986.698    Patched, 512KB readahead, 64 max_sectors_kb
>> 9      163.908       574.651    Patched, 2MB readahead, 64 max_sectors_kb
>>
>> So before/after patch:
>>
>>         avg throughput      135.299 => 138.397  by +2.3%
>>         avg IOPS            986.969 => 903.344  by -8.5%     
>>
>> The IOPS is a bit weird.
>>
>> Summaries:
>> - this patch improves RAID throughput by +2.3% on average
>> - after this patch, 2MB readahead performs slightly better
>>   (by 1-2%) than 512KB readahead
> 
> and the most important one:
> - 64 max_sectors_kb performs much better than 256 max_sectors_kb, by ~30% !

Yes, I've just wanted to point it out ;)

> Thanks,
> Fengguang
> 
>>> Unpatched, 128KB readahead, 512 max_sectors_kb
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864   5.621    5.503    5.419  185.744    2.780    2.902
>>>  33554432   6.628    5.897    6.242  164.068    7.827    5.127
>>>  16777216   7.312    7.165    7.614  139.148    3.501    8.697
>>>   8388608   8.719    8.408    8.694  119.003    1.973   14.875
>>>   4194304  11.836   12.192   12.137   84.958    1.111   21.239
>>>   2097152  13.452   13.992   14.035   74.090    1.442   37.045
>>>   1048576  12.759   11.996   12.195   83.194    2.152   83.194
>>>    524288  11.895   12.297   12.587   83.570    1.945  167.140
>>>    262144   7.325    7.285    7.444  139.304    1.272  557.214
>>>    131072   7.992    8.832    7.952  124.279    5.901  994.228
>>>     65536  10.940   10.062   10.122   98.847    3.715 1581.545
>>>     32768   9.973   10.012    9.945  102.640    0.281 3284.493
>>>     16384  11.377   10.538   10.692   94.316    3.100 6036.222
>>>
>>> Unpatched, 512KB readahead, 512 max_sectors_kb
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864   5.032    4.770    5.265  204.228    8.271    3.191
>>>  33554432   5.569    5.712    5.863  179.263    3.755    5.602
>>>  16777216   6.661    6.857    6.550  153.132    2.888    9.571
>>>   8388608   8.022    8.000    7.978  127.998    0.288   16.000
>>>   4194304  10.959   11.579   12.208   88.586    3.902   22.146
>>>   2097152  13.692   12.670   12.625   78.906    2.914   39.453
>>>   1048576  11.120   11.144   10.878   92.703    1.018   92.703
>>>    524288  11.234   10.915   11.374   91.667    1.587  183.334
>>>    262144   6.848    6.678    6.795  151.191    1.594  604.763
>>>    131072   7.393    7.367    7.337  139.025    0.428 1112.202
>>>     65536  10.003   10.919   10.015   99.466    4.019 1591.462
>>>     32768  10.117   10.124   10.169  101.018    0.229 3232.574
>>>     16384  11.614   11.027   11.029   91.293    2.207 5842.771
>>>
>>> Unpatched, 2MB readahead, 512 max_sectors_kb
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864   5.268    5.316    5.418  191.996    2.241    3.000
>>>  33554432   5.831    6.459    6.110  167.259    6.977    5.227
>>>  16777216   7.313    7.069    7.197  142.385    1.972    8.899
>>>   8388608   8.657    8.500    8.498  119.754    1.039   14.969
>>>   4194304  11.846   12.116   11.801   85.911    0.994   21.478
>>>   2097152  12.917   13.652   13.100   77.484    1.808   38.742
>>>   1048576   9.544   10.667   10.807   99.345    5.640   99.345
>>>    524288  11.736    7.171    6.599  128.410   29.539  256.821
>>>    262144   7.530    7.403    7.416  137.464    1.053  549.857
>>>    131072   8.741    8.002    8.022  124.256    5.029  994.051
>>>     65536  10.701   10.138   10.090   99.394    2.629 1590.311
>>>     32768   9.978    9.950    9.934  102.875    0.188 3291.994
>>>     16384  11.435   10.823   10.907   92.684    2.234 5931.749
>>>
>>> Unpatched, 512KB readahead, 64 max_sectors_kb
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864   3.994    3.991    4.123  253.774    3.838    3.965
>>>  33554432   4.100    4.329    4.161  244.111    5.569    7.628
>>>  16777216   5.476    4.835    5.079  200.148   10.177   12.509
>>>   8388608   5.484    5.258    5.227  192.470    4.084   24.059
>>>   4194304   6.429    6.458    6.435  158.989    0.315   39.747
>>>   2097152   7.219    7.744    7.306  138.081    4.187   69.040
>>>   1048576   6.850    6.897    6.776  149.696    1.089  149.696
>>>    524288   6.406    6.393    6.469  159.439    0.814  318.877
>>>    262144   6.865    7.508    6.861  144.931    6.041  579.726
>>>    131072   8.435    8.482    8.307  121.792    1.076  974.334
>>>     65536   9.616    9.610   10.262  104.279    3.176 1668.462
>>>     32768   9.682    9.932   10.015  103.701    1.497 3318.428
>>>     16384  10.962   10.852   11.565   92.106    2.547 5894.813
>>>
>>> Unpatched, 2MB readahead, 64 max_sectors_kb
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864   3.730    3.714    3.914  270.615    6.396    4.228
>>>  33554432   4.445    3.999    3.989  247.710   12.276    7.741
>>>  16777216   4.763    4.712    4.709  216.590    1.122   13.537
>>>   8388608   5.001    5.086    5.229  200.649    3.673   25.081
>>>   4194304   6.365    6.362    6.905  156.710    5.948   39.178
>>>   2097152   7.390    7.367    7.270  139.470    0.992   69.735
>>>   1048576   7.038    7.050    7.090  145.052    0.456  145.052
>>>    524288   6.862    7.167    7.278  144.272    3.617  288.544
>>>    262144   7.266    7.313    7.265  140.635    0.436  562.540
>>>    131072   8.677    8.735    8.821  117.108    0.790  936.865
>>>     65536  10.865   10.040   10.038   99.418    3.658 1590.685
>>>     32768  10.167   10.130   10.177  100.805    0.201 3225.749
>>>     16384  11.643   11.017   11.103   91.041    2.203 5826.629
>>>
>>> Patched, 128KB readahead, 512 max_sectors_kb
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864   5.670    5.188    5.636  186.555    7.671    2.915
>>>  33554432   6.069    5.971    6.141  168.992    1.954    5.281
>>>  16777216   7.821    7.501    7.372  135.451    3.340    8.466
>>>   8388608   9.147    8.618    9.000  114.849    2.908   14.356
>>>   4194304  12.199   12.914   12.381   81.981    1.964   20.495
>>>   2097152  13.449   13.891   14.288   73.842    1.828   36.921
>>>   1048576  11.890   12.182   11.519   86.360    1.984   86.360
>>>    524288  11.899   12.706   12.135   83.678    2.287  167.357
>>>    262144   7.460    7.559    7.563  136.041    0.864  544.164
>>>    131072   7.987    8.003    8.530  125.403    3.792 1003.220
>>>     65536  10.179   10.119   10.131  100.957    0.255 1615.312
>>>     32768   9.899    9.923   10.589  101.114    3.121 3235.656
>>>     16384  10.849   10.835   10.876   94.351    0.150 6038.474
>>>
>>> Patched, 512KB readahead, 512 max_sectors_kb
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864   5.062    5.111    5.083  201.358    0.795    3.146
>>>  33554432   5.589    5.713    5.657  181.165    1.625    5.661
>>>  16777216   6.337    7.220    6.457  154.002    8.690    9.625
>>>   8388608   7.952    7.880    7.527  131.588    3.192   16.448
>>>   4194304  10.695   11.224   10.736   94.119    2.047   23.530
>>>   2097152  10.898   12.072   12.358   87.215    4.839   43.607
>>>   1048576  10.890   11.347    9.290   98.166    8.664   98.166
>>>    524288  10.898   11.032   10.887   93.611    0.560  187.223
>>>    262144   6.714    7.230    6.804  148.219    4.724  592.875
>>>    131072   7.325    7.342    7.363  139.441    0.295 1115.530
>>>     65536   9.773    9.988   10.592  101.327    3.417 1621.227
>>>     32768  10.031    9.995   10.086  102.019    0.377 3264.620
>>>     16384  11.041   10.987   11.564   91.502    2.093 5856.144
>>>
>>> Patched, 2MB readahead, 512 max_sectors_kb
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864   4.970    5.097    5.188  201.435    3.559    3.147
>>>  33554432   5.588    5.793    5.169  186.042    8.923    5.814
>>>  16777216   6.151    6.414    6.526  161.012    4.027   10.063
>>>   8388608   7.836    7.299    7.475  135.980    3.989   16.998
>>>   4194304  11.792   10.964   10.158   93.683    5.706   23.421
>>>   2097152  11.225   11.492   11.357   90.162    0.866   45.081
>>>   1048576  12.017   11.258   11.432   88.580    2.449   88.580
>>>    524288   5.974   10.883   11.840  117.323   38.361  234.647
>>>    262144   6.774    6.765    6.526  153.155    2.661  612.619
>>>    131072   8.036    7.324    7.341  135.579    5.766 1084.633
>>>     65536   9.964   10.595    9.999  100.608    2.806 1609.735
>>>     32768  10.132   10.036   10.190  101.197    0.637 3238.308
>>>     16384  11.133   11.568   11.036   91.093    1.850 5829.981
>>>
>>> Patched, 512KB readahead, 64 max_sectors_kb
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864   3.722    3.698    3.721  275.759    0.809    4.309
>>>  33554432   4.058    3.849    3.957  259.063    5.580    8.096
>>>  16777216   4.601    4.613    4.738  220.212    2.913   13.763
>>>   8388608   5.039    5.534    5.017  197.452    8.791   24.682
>>>   4194304   6.302    6.270    6.282  162.942    0.341   40.735
>>>   2097152   7.314    7.302    7.069  141.700    2.233   70.850
>>>   1048576   6.881    7.655    6.909  143.597    6.951  143.597
>>>    524288   7.163    7.025    6.951  145.344    1.803  290.687
>>>    262144   7.315    7.233    7.299  140.621    0.689  562.482
>>>    131072   9.292    8.756    8.807  114.475    3.036  915.803
>>>     65536   9.942    9.985    9.960  102.787    0.181 1644.598
>>>     32768  10.721   10.091   10.192   99.154    2.605 3172.935
>>>     16384  11.049   11.016   11.065   92.727    0.169 5934.531
>>>
>>> Patched, 2MB readahead, 64 max_sectors_kb
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864   3.697    3.819    3.741  272.931    3.661    4.265
>>>  33554432   3.951    3.905    4.038  258.320    3.586    8.073
>>>  16777216   5.595    5.182    4.864  197.044   11.236   12.315
>>>   8388608   5.267    5.156    5.116  197.725    2.431   24.716
>>>   4194304   6.411    6.335    6.290  161.389    1.267   40.347
>>>   2097152   7.329    7.663    7.462  136.860    2.502   68.430
>>>   1048576   7.225    7.077    7.215  142.784    1.352  142.784
>>>    524288   6.903    7.015    7.095  146.210    1.647  292.419
>>>    262144   7.365    7.926    7.278  136.309    5.076  545.237
>>>    131072   8.796    8.819    8.814  116.233    0.130  929.862
>>>     65536   9.998   10.609    9.995  100.464    2.786 1607.423
>>>     32768  10.161   10.124   10.246  100.623    0.505 3219.943
>>>
>>> Regards,
>>> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-06-29 14:00                           ` Ronald Moesbergen
  2009-06-29 14:21                             ` Wu Fengguang
@ 2009-06-30 10:22                             ` Vladislav Bolkhovitin
  1 sibling, 0 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-30 10:22 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: Wu Fengguang, Andrew Morton, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-kernel, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche

[-- Attachment #1: Type: text/plain, Size: 11777 bytes --]


Ronald Moesbergen, on 06/29/2009 06:00 PM wrote:
> ... tests ...
> 
>> We started with 2.6.29, so why not complete with it (to save additional
>> Ronald's effort to move on 2.6.30)?
>>
>>>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
>>> How about 2MB RAID readahead size? That transforms into about 512KB
>>> per-disk readahead size.
>> OK. Ronald, can you 4 more test cases, please:
>>
>> 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
>>
>> 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
>> max_sectors_kb, the rest is default
>>
>> 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
>> read-ahead, the rest is default
>>
>> 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
>> read-ahead, 64 KB max_sectors_kb, the rest is default
> 
> The results:
> 
> Unpatched, 128KB readahead, 512 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   5.621    5.503    5.419  185.744    2.780    2.902
>  33554432   6.628    5.897    6.242  164.068    7.827    5.127
>  16777216   7.312    7.165    7.614  139.148    3.501    8.697
>   8388608   8.719    8.408    8.694  119.003    1.973   14.875
>   4194304  11.836   12.192   12.137   84.958    1.111   21.239
>   2097152  13.452   13.992   14.035   74.090    1.442   37.045
>   1048576  12.759   11.996   12.195   83.194    2.152   83.194
>    524288  11.895   12.297   12.587   83.570    1.945  167.140
>    262144   7.325    7.285    7.444  139.304    1.272  557.214
>    131072   7.992    8.832    7.952  124.279    5.901  994.228
>     65536  10.940   10.062   10.122   98.847    3.715 1581.545
>     32768   9.973   10.012    9.945  102.640    0.281 3284.493
>     16384  11.377   10.538   10.692   94.316    3.100 6036.222
> 
> Unpatched, 512KB readahead, 512 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   5.032    4.770    5.265  204.228    8.271    3.191
>  33554432   5.569    5.712    5.863  179.263    3.755    5.602
>  16777216   6.661    6.857    6.550  153.132    2.888    9.571
>   8388608   8.022    8.000    7.978  127.998    0.288   16.000
>   4194304  10.959   11.579   12.208   88.586    3.902   22.146
>   2097152  13.692   12.670   12.625   78.906    2.914   39.453
>   1048576  11.120   11.144   10.878   92.703    1.018   92.703
>    524288  11.234   10.915   11.374   91.667    1.587  183.334

Can somebody explain those big throughput drops (66% in this case, 68% 
in the above case)? It happens nearly in all the tests, only cases of 64 
max_sectors_kb with big RA sizes suffer less from it.

It looks like a possible sing of some not understood deficiency in I/O 
submission or read-ahead path.

(blockdev-perftest just runs dd reading 1 GB for each "bs" 3 times, then 
calculates the average and IOPS, then prints the results. It's small, so 
I attached it.)

>    262144   6.848    6.678    6.795  151.191    1.594  604.763
>    131072   7.393    7.367    7.337  139.025    0.428 1112.202
>     65536  10.003   10.919   10.015   99.466    4.019 1591.462
>     32768  10.117   10.124   10.169  101.018    0.229 3232.574
>     16384  11.614   11.027   11.029   91.293    2.207 5842.771
> 
> Unpatched, 2MB readahead, 512 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   5.268    5.316    5.418  191.996    2.241    3.000
>  33554432   5.831    6.459    6.110  167.259    6.977    5.227
>  16777216   7.313    7.069    7.197  142.385    1.972    8.899
>   8388608   8.657    8.500    8.498  119.754    1.039   14.969
>   4194304  11.846   12.116   11.801   85.911    0.994   21.478
>   2097152  12.917   13.652   13.100   77.484    1.808   38.742
>   1048576   9.544   10.667   10.807   99.345    5.640   99.345
>    524288  11.736    7.171    6.599  128.410   29.539  256.821
>    262144   7.530    7.403    7.416  137.464    1.053  549.857
>    131072   8.741    8.002    8.022  124.256    5.029  994.051
>     65536  10.701   10.138   10.090   99.394    2.629 1590.311
>     32768   9.978    9.950    9.934  102.875    0.188 3291.994
>     16384  11.435   10.823   10.907   92.684    2.234 5931.749
> 
> Unpatched, 512KB readahead, 64 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   3.994    3.991    4.123  253.774    3.838    3.965
>  33554432   4.100    4.329    4.161  244.111    5.569    7.628
>  16777216   5.476    4.835    5.079  200.148   10.177   12.509
>   8388608   5.484    5.258    5.227  192.470    4.084   24.059
>   4194304   6.429    6.458    6.435  158.989    0.315   39.747
>   2097152   7.219    7.744    7.306  138.081    4.187   69.040
>   1048576   6.850    6.897    6.776  149.696    1.089  149.696
>    524288   6.406    6.393    6.469  159.439    0.814  318.877
>    262144   6.865    7.508    6.861  144.931    6.041  579.726
>    131072   8.435    8.482    8.307  121.792    1.076  974.334
>     65536   9.616    9.610   10.262  104.279    3.176 1668.462
>     32768   9.682    9.932   10.015  103.701    1.497 3318.428
>     16384  10.962   10.852   11.565   92.106    2.547 5894.813
> 
> Unpatched, 2MB readahead, 64 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   3.730    3.714    3.914  270.615    6.396    4.228
>  33554432   4.445    3.999    3.989  247.710   12.276    7.741
>  16777216   4.763    4.712    4.709  216.590    1.122   13.537
>   8388608   5.001    5.086    5.229  200.649    3.673   25.081
>   4194304   6.365    6.362    6.905  156.710    5.948   39.178
>   2097152   7.390    7.367    7.270  139.470    0.992   69.735
>   1048576   7.038    7.050    7.090  145.052    0.456  145.052
>    524288   6.862    7.167    7.278  144.272    3.617  288.544
>    262144   7.266    7.313    7.265  140.635    0.436  562.540
>    131072   8.677    8.735    8.821  117.108    0.790  936.865
>     65536  10.865   10.040   10.038   99.418    3.658 1590.685
>     32768  10.167   10.130   10.177  100.805    0.201 3225.749
>     16384  11.643   11.017   11.103   91.041    2.203 5826.629
> 
> Patched, 128KB readahead, 512 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   5.670    5.188    5.636  186.555    7.671    2.915
>  33554432   6.069    5.971    6.141  168.992    1.954    5.281
>  16777216   7.821    7.501    7.372  135.451    3.340    8.466
>   8388608   9.147    8.618    9.000  114.849    2.908   14.356
>   4194304  12.199   12.914   12.381   81.981    1.964   20.495
>   2097152  13.449   13.891   14.288   73.842    1.828   36.921
>   1048576  11.890   12.182   11.519   86.360    1.984   86.360
>    524288  11.899   12.706   12.135   83.678    2.287  167.357
>    262144   7.460    7.559    7.563  136.041    0.864  544.164
>    131072   7.987    8.003    8.530  125.403    3.792 1003.220
>     65536  10.179   10.119   10.131  100.957    0.255 1615.312
>     32768   9.899    9.923   10.589  101.114    3.121 3235.656
>     16384  10.849   10.835   10.876   94.351    0.150 6038.474
> 
> Patched, 512KB readahead, 512 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   5.062    5.111    5.083  201.358    0.795    3.146
>  33554432   5.589    5.713    5.657  181.165    1.625    5.661
>  16777216   6.337    7.220    6.457  154.002    8.690    9.625
>   8388608   7.952    7.880    7.527  131.588    3.192   16.448
>   4194304  10.695   11.224   10.736   94.119    2.047   23.530
>   2097152  10.898   12.072   12.358   87.215    4.839   43.607
>   1048576  10.890   11.347    9.290   98.166    8.664   98.166
>    524288  10.898   11.032   10.887   93.611    0.560  187.223
>    262144   6.714    7.230    6.804  148.219    4.724  592.875
>    131072   7.325    7.342    7.363  139.441    0.295 1115.530
>     65536   9.773    9.988   10.592  101.327    3.417 1621.227
>     32768  10.031    9.995   10.086  102.019    0.377 3264.620
>     16384  11.041   10.987   11.564   91.502    2.093 5856.144
> 
> Patched, 2MB readahead, 512 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   4.970    5.097    5.188  201.435    3.559    3.147
>  33554432   5.588    5.793    5.169  186.042    8.923    5.814
>  16777216   6.151    6.414    6.526  161.012    4.027   10.063
>   8388608   7.836    7.299    7.475  135.980    3.989   16.998
>   4194304  11.792   10.964   10.158   93.683    5.706   23.421
>   2097152  11.225   11.492   11.357   90.162    0.866   45.081
>   1048576  12.017   11.258   11.432   88.580    2.449   88.580
>    524288   5.974   10.883   11.840  117.323   38.361  234.647
>    262144   6.774    6.765    6.526  153.155    2.661  612.619
>    131072   8.036    7.324    7.341  135.579    5.766 1084.633
>     65536   9.964   10.595    9.999  100.608    2.806 1609.735
>     32768  10.132   10.036   10.190  101.197    0.637 3238.308
>     16384  11.133   11.568   11.036   91.093    1.850 5829.981
> 
> Patched, 512KB readahead, 64 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   3.722    3.698    3.721  275.759    0.809    4.309
>  33554432   4.058    3.849    3.957  259.063    5.580    8.096
>  16777216   4.601    4.613    4.738  220.212    2.913   13.763
>   8388608   5.039    5.534    5.017  197.452    8.791   24.682
>   4194304   6.302    6.270    6.282  162.942    0.341   40.735
>   2097152   7.314    7.302    7.069  141.700    2.233   70.850
>   1048576   6.881    7.655    6.909  143.597    6.951  143.597
>    524288   7.163    7.025    6.951  145.344    1.803  290.687
>    262144   7.315    7.233    7.299  140.621    0.689  562.482
>    131072   9.292    8.756    8.807  114.475    3.036  915.803
>     65536   9.942    9.985    9.960  102.787    0.181 1644.598
>     32768  10.721   10.091   10.192   99.154    2.605 3172.935
>     16384  11.049   11.016   11.065   92.727    0.169 5934.531
> 
> Patched, 2MB readahead, 64 max_sectors_kb
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864   3.697    3.819    3.741  272.931    3.661    4.265
>  33554432   3.951    3.905    4.038  258.320    3.586    8.073
>  16777216   5.595    5.182    4.864  197.044   11.236   12.315
>   8388608   5.267    5.156    5.116  197.725    2.431   24.716
>   4194304   6.411    6.335    6.290  161.389    1.267   40.347
>   2097152   7.329    7.663    7.462  136.860    2.502   68.430
>   1048576   7.225    7.077    7.215  142.784    1.352  142.784
>    524288   6.903    7.015    7.095  146.210    1.647  292.419
>    262144   7.365    7.926    7.278  136.309    5.076  545.237
>    131072   8.796    8.819    8.814  116.233    0.130  929.862
>     65536   9.998   10.609    9.995  100.464    2.786 1607.423
>     32768  10.161   10.124   10.246  100.623    0.505 3219.943
> 
> Regards,
> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

[-- Attachment #2: blockdev-perftest --]
[-- Type: text/plain, Size: 5399 bytes --]

#!/bin/sh

############################################################################
#
# Script for testing block device I/O performance. Running this script on a
# block device that is connected to a remote SCST target device allows to
# test the performance of the transport protocols implemented in SCST. The
# operation of this script is similar to iozone, while this script is easier
# to use.
#
# Copyright (C) 2009 Bart Van Assche <bart.vanassche@gmail.com>.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation, version 2
# of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
############################################################################

#########################
# Function definitions  #
#########################

usage() {
  echo "Usage: $0 [-a] [-d] [-i <i>] [-n] [-r] [-s <l2s>] <dev>"
  echo "        -a - use asynchronous (buffered) I/O."
  echo "        -d - use direct (non-buffered) I/O."
  echo "        -i - number times each test is iterated."
  echo "        -n - do not verify the data on <dev> before overwriting it."
  echo "        -r - only perform the read test."
  echo "        -s - logarithm base two of the I/O size."
  echo "        <dev> - block device to run the I/O performance test on."
}

# Echo ((2**$1))
pow2() {
  if [ $1 = 0 ]; then
    echo 1
  else
    echo $((2 * $(pow2 $(($1 - 1)) ) ))
  fi
}

drop_caches() {
  sync
  if [ -w /proc/sys/vm/drop_caches ]; then
    echo 3 > /proc/sys/vm/drop_caches
  fi
}

# Read times in seconds from stdin, one number per line, echo each number
# using format $1, and also echo the average transfer size in MB/s, its
# standard deviation and the number of IOPS using the total I/O size $2 and
# the block transfer size $3.
echo_and_calc_avg() {
  awk -v fmt="$1" -v iosize="$2" -v blocksize="$3" 'BEGIN{pow_2_20=1024*1024}{if ($1 != 0){n++;sum+=iosize/$1;sumsq+=iosize*iosize/($1*$1)};printf fmt, $1} END{d=(n>0?sumsq/n-sum*sum/n/n:0);avg=(n>0?sum/n:0);stddev=(d>0?sqrt(d):0);iops=avg/blocksize;printf fmt fmt fmt,avg/pow_2_20,stddev/pow_2_20,iops}'
}

#########################
# Default settings      #
#########################

iterations=3
log2_io_size=30       # 1 GB
log2_min_blocksize=9  # 512 bytes
log2_max_blocksize=26 # 64 MB
iotype=direct
read_test_only=false
verify_device_data=true


#########################
# Argument processing   #
#########################

set -- $(/usr/bin/getopt "adhi:nrs:" "$@")
while [ "$1" != "${1#-}" ]
do
  case "$1" in
    '-a') iotype="buffered"; shift;;
    '-d') iotype="direct"; shift;;
    '-i') iterations="$2"; shift; shift;;
    '-n') verify_device_data="false"; shift;;
    '-r') read_test_only="true"; shift;;
    '-s') log2_io_size="$2"; shift; shift;;
    '--') shift;;
    *)    usage; exit 1;;
  esac
done

if [ "$#" != 1 ]; then
  usage
  exit 1
fi

device="$1"


####################
# Performance test #
####################

if [ ! -e "${device}" ]; then
  echo "Error: device ${device} does not exist."
  exit 1
fi

if [ "${read_test_only}" = "false" -a ! -w "${device}" ]; then
  echo "Error: device ${device} is not writeable."
  exit 1
fi

if [ "${read_test_only}" = "false" -a "${verify_device_data}" = "true" ] \
   && ! cmp -s -n $(pow2 $log2_io_size) "${device}" /dev/zero
then
  echo "Error: device ${device} still contains data."
  exit 1
fi

if [ "${iotype}" = "direct" ]; then
  dd_oflags="oflag=direct"
  dd_iflags="iflag=direct"
else
  dd_oflags="oflag=sync"
  dd_iflags=""
fi

# Header, line 1
printf "%9s " blocksize
i=0
while [ $i -lt ${iterations} ]
do
  printf "%8s " "W"
  i=$((i+1))
done
printf "%8s %8s %8s " "W(avg," "W(std," "W"
i=0
while [ $i -lt ${iterations} ]
do
  printf "%8s " "R"
  i=$((i+1))
done
printf "%8s %8s %8s" "R(avg," "R(std" "R"
printf "\n"

# Header, line 2
printf "%9s " "(bytes)"
i=0
while [ $i -lt ${iterations} ]
do
  printf "%8s " "(s)"
  i=$((i+1))
done
printf "%8s %8s %8s " "MB/s)" ",MB/s)" "(IOPS)"
i=0
while [ $i -lt ${iterations} ]
do
  printf "%8s " "(s)"
  i=$((i+1))
done
printf "%8s %8s %8s" "MB/s)" ",MB/s)" "(IOPS)"
printf "\n"

# Measurements
log2_blocksize=${log2_max_blocksize}
while [ ! $log2_blocksize -lt $log2_min_blocksize ]
do
  if [ $log2_blocksize -gt $log2_io_size ]; then
    continue
  fi
  iosize=$(pow2 $log2_io_size)
  bs=$(pow2 $log2_blocksize)
  count=$(pow2 $(($log2_io_size - $log2_blocksize)))
  printf "%9d " ${bs}
  i=0
  while [ $i -lt ${iterations} ]
  do
    if [ "${read_test_only}" = "false" ]; then
      drop_caches
      dd if=/dev/zero of="${device}" bs=${bs} count=${count} \
                    ${dd_oflags} 2>&1 \
                 | sed -n 's/.* \([0-9.]*\) s,.*/\1/p'
    else
      echo 0
    fi
    i=$((i+1))
  done | echo_and_calc_avg "%8.3f " ${iosize} ${bs}

  i=0
  while [ $i -lt ${iterations} ]
  do
    drop_caches
    dd if="${device}" of=/dev/null bs=${bs} count=${count} \
                  ${dd_iflags} 2>&1 \
               | sed -n 's/.* \([0-9.]*\) s,.*/\1/p'
    i=$((i+1))
  done | echo_and_calc_avg "%8.3f " ${iosize} ${bs}
  printf "\n"
  log2_blocksize=$((log2_blocksize - 1))
done

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
       [not found]                                           ` <a0272b440907040819l5289483cp44b37d967440ef73@mail.gmail.com>
@ 2009-07-06 11:12                                             ` Vladislav Bolkhovitin
  2009-07-06 14:37                                               ` Ronald Moesbergen
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-06 11:12 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: Wu Fengguang, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
	Bart Van Assche

(Restored the original list of recipients in this thread as I was asked.)

Hi Ronald,

Ronald Moesbergen, on 07/04/2009 07:19 PM wrote:
> 2009/7/3 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Ronald Moesbergen, on 07/03/2009 01:14 PM wrote:
>>>>> OK, now I tend to agree on decreasing max_sectors_kb and increasing
>>>>> read_ahead_kb. But before actually trying to push that idea I'd like
>>>>> to
>>>>> - do more benchmarks
>>>>> - figure out why context readahead didn't help SCST performance
>>>>>  (previous traces show that context readahead is submitting perfect
>>>>>  large io requests, so I wonder if it's some io scheduler bug)
>>>> Because, as we found out, without your http://lkml.org/lkml/2009/5/21/319
>>>> patch read-ahead was nearly disabled, hence there were no difference
>>>> which
>>>> algorithm was used?
>>>>
>>>> Ronald, can you run the following tests, please? This time with 2 hosts,
>>>> initiator (client) and target (server) connected using 1 Gbps iSCSI. It
>>>> would be the best if on the client vanilla 2.6.29 will be ran, but any
>>>> other
>>>> kernel will be fine as well, only specify which. Blockdev-perftest should
>>>> be
>>>> ran as before in buffered mode, i.e. with "-a" switch.
>>>>
>>>> 1. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 patch with all default settings.
>>>>
>>>> 2. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 patch with default RA size and 64KB
>>>> max_sectors_kb.
>>>>
>>>> 3. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and default
>>>> max_sectors_kb.
>>>>
>>>> 4. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and 64KB
>>>> max_sectors_kb.
>>>>
>>>> 5. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 patch and with context RA patch. RA
>>>> size
>>>> and max_sectors_kb are default. For your convenience I committed the
>>>> backported context RA patches into the SCST SVN repository.
>>>>
>>>> 6. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with default RA
>>>> size and 64KB max_sectors_kb.
>>>>
>>>> 7. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>> size
>>>> and default max_sectors_kb.
>>>>
>>>> 8. All defaults on the client, on the server vanilla 2.6.29 with
>>>> Fengguang's
>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>> size
>>>> and 64KB max_sectors_kb.
>>>>
>>>> 9. On the client default RA size and 64KB max_sectors_kb. On the server
>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>
>>>> 10. On the client 2MB RA size and default max_sectors_kb. On the server
>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>
>>>> 11. On the client 2MB RA size and 64KB max_sectors_kb. On the server
>>>> vanilla
>>>> 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and context RA
>>>> patches with 2MB RA size and 64KB max_sectors_kb.
>>> Ok, done. Performance is pretty bad overall :(
>>>
>>> The kernels I used:
>>> client kernel: 2.6.26-15lenny3 (debian)
>>> server kernel: 2.6.29.5 with blk_dev_run patch
>>>
>>> And I adjusted the blockdev-perftest script to drop caches on both the
>>> server (via ssh) and the client.
>>>
>>> The results:
>>>
> 
> ... previous results ...
> 
>> Those are on the server without io_context-2.6.29 and readahead-2.6.29
>> patches applied and with CFQ scheduler, correct?
>>
>> Then we see how reorder of requests caused by many I/O threads submitting
>> I/O in separate I/O contexts badly affect performance and no RA, especially
>> with default 128KB RA size, can solve it. Less max_sectors_kb on the client
>> => more requests it sends at once => more reorder on the server => worse
>> throughput. Although, Fengguang, in theory, context RA with 2MB RA size
>> should considerably help it, no?
>>
>> Ronald, can you perform those tests again with both io_context-2.6.29 and
>> readahead-2.6.29 patches applied on the server, please?
> 
> Hi Vlad,
> 
> I have retested with the patches you requested (and got access to the
> systems today :) ) The results are better, but still not great.
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with io_context and readahead patch
> 
> 5) client: default, server: default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.303   19.867   18.481   54.299    1.961    0.848
>  33554432  18.321   17.681   18.708   56.181    1.314    1.756
>  16777216  17.816   17.406   19.257   56.494    2.410    3.531
>   8388608  18.077   17.727   19.338   55.789    2.056    6.974
>   4194304  17.918   16.601   18.287   58.276    2.454   14.569
>   2097152  17.426   17.334   17.610   58.661    0.384   29.331
>   1048576  19.358   18.764   17.253   55.607    2.734   55.607
>    524288  17.951   18.163   17.440   57.379    0.983  114.757
>    262144  18.196   17.724   17.520   57.499    0.907  229.995
>    131072  18.342   18.259   17.551   56.751    1.131  454.010
>     65536  17.733   18.572   17.134   57.548    1.893  920.766
>     32768  19.081   19.321   17.364   55.213    2.673 1766.818
>     16384  17.181   18.729   17.731   57.343    2.033 3669.932
> 
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  21.790   20.062   19.534   50.153    2.304    0.784
>  33554432  20.212   19.744   19.564   51.623    0.706    1.613
>  16777216  20.404   19.329   19.738   51.680    1.148    3.230
>   8388608  20.170   20.772   19.509   50.852    1.304    6.356
>   4194304  19.334   18.742   18.522   54.296    0.978   13.574
>   2097152  19.413   18.858   18.884   53.758    0.715   26.879
>   1048576  20.472   18.755   18.476   53.347    2.377   53.347
>    524288  19.120   20.104   18.404   53.378    1.925  106.756
>    262144  20.337   19.213   18.636   52.866    1.901  211.464
>    131072  19.199   18.312   19.970   53.510    1.900  428.083
>     65536  19.855   20.114   19.592   51.584    0.555  825.342
>     32768  20.586   18.724   20.340   51.592    2.204 1650.941
>     16384  21.119   19.834   19.594   50.792    1.651 3250.669
> 
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  17.767   16.489   16.949   60.050    1.842    0.938
>  33554432  16.777   17.034   17.102   60.341    0.500    1.886
>  16777216  18.509   16.784   16.971   58.891    2.537    3.681
>   8388608  18.058   17.949   17.599   57.313    0.632    7.164
>   4194304  18.286   17.648   17.026   58.055    1.692   14.514
>   2097152  17.387   18.451   17.875   57.226    1.388   28.613
>   1048576  18.270   17.698   17.570   57.397    0.969   57.397
>    524288  16.708   17.900   17.233   59.306    1.668  118.611
>    262144  18.041   17.381   18.035   57.484    1.011  229.934
>    131072  17.994   17.777   18.146   56.981    0.481  455.844
>     65536  17.097   18.597   17.737   57.563    1.975  921.011
>     32768  17.167   17.035   19.693   57.254    3.721 1832.127
>     16384  17.144   16.664   17.623   59.762    1.367 3824.774
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  20.003   21.133   19.308   50.894    1.881    0.795
>  33554432  19.448   20.015   18.908   52.657    1.222    1.646
>  16777216  19.964   19.350   19.106   52.603    0.967    3.288
>   8388608  18.961   19.213   19.318   53.437    0.419    6.680
>   4194304  18.135   19.508   19.361   53.948    1.788   13.487
>   2097152  18.753   19.471   18.367   54.315    1.306   27.158
>   1048576  19.189   18.586   18.867   54.244    0.707   54.244
>    524288  18.985   19.199   18.840   53.874    0.417  107.749
>    262144  19.064   21.143   19.674   51.398    2.204  205.592
>    131072  18.691   18.664   19.116   54.406    0.594  435.245
>     65536  18.468   20.673   18.554   53.389    2.729  854.229
>     32768  20.401   21.156   19.552   50.323    1.623 1610.331
>     16384  19.532   20.028   20.466   51.196    0.977 3276.567
> 
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  16.458   16.649   17.346   60.919    1.364    0.952
>  33554432  16.479   16.744   17.069   61.096    0.878    1.909
>  16777216  17.128   16.585   17.112   60.456    0.910    3.778
>   8388608  17.322   16.780   16.885   60.262    0.824    7.533
>   4194304  17.530   16.725   16.756   60.250    1.299   15.063
>   2097152  16.580   17.875   16.619   60.221    2.076   30.110
>   1048576  17.550   17.406   17.075   59.049    0.681   59.049
>    524288  16.492   18.211   16.832   59.718    2.519  119.436
>    262144  17.241   17.115   17.365   59.397    0.352  237.588
>    131072  17.430   16.902   17.511   59.271    0.936  474.167
>     65536  16.726   16.894   17.246   60.404    0.768  966.461
>     32768  16.662   17.517   17.052   59.989    1.224 1919.658
>     16384  17.429   16.793   16.753   60.285    1.085 3858.268
> 
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  17.601   18.334   17.379   57.650    1.307    0.901
>  33554432  18.281   18.128   17.169   57.381    1.610    1.793
>  16777216  17.660   17.875   17.356   58.091    0.703    3.631
>   8388608  17.724   17.810   18.383   56.992    0.918    7.124
>   4194304  17.475   17.770   19.003   56.704    2.031   14.176
>   2097152  17.287   17.674   18.492   57.516    1.604   28.758
>   1048576  17.972   17.460   18.777   56.721    1.689   56.721
>    524288  18.680   18.952   19.445   53.837    0.890  107.673
>    262144  18.070   18.337   18.639   55.817    0.707  223.270
>    131072  16.990   16.651   16.862   60.832    0.507  486.657
>     65536  17.707   16.972   17.520   58.870    1.066  941.924
>     32768  17.767   17.208   17.205   58.887    0.885 1884.399
>     16384  18.258   17.252   18.035   57.407    1.407 3674.059
> 
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  17.993   18.307   18.718   55.850    0.902    0.873
>  33554432  19.554   18.485   17.902   54.988    1.993    1.718
>  16777216  18.829   18.236   18.748   55.052    0.785    3.441
>   8388608  21.152   19.065   18.738   52.257    2.745    6.532
>   4194304  19.131   19.703   17.850   54.288    2.268   13.572
>   2097152  19.093   19.152   19.509   53.196    0.504   26.598
>   1048576  19.371   18.775   18.804   53.953    0.772   53.953
>    524288  20.003   17.911   18.602   54.470    2.476  108.940
>    262144  19.182   19.460   18.476   53.809    1.183  215.236
>    131072  19.403   19.192   18.907   53.429    0.567  427.435
>     65536  19.502   19.656   18.599   53.219    1.309  851.509
>     32768  18.746   18.747   18.250   55.119    0.701 1763.817
>     16384  20.977   19.437   18.840   51.951    2.319 3324.862

The results look inconsistently with what you had previously (89.7 
MB/s). How can you explain it?

I think, most likely, there was some confusion between the tested and 
patched versions of the kernel or you forgot to apply the io_context 
patch. Please recheck.

> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-06 11:12                                             ` Vladislav Bolkhovitin
@ 2009-07-06 14:37                                               ` Ronald Moesbergen
  2009-07-06 17:48                                                 ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-06 14:37 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Wu Fengguang, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
	Bart Van Assche

2009/7/6 Vladislav Bolkhovitin <vst@vlnb.net>:
> (Restored the original list of recipients in this thread as I was asked.)
>
> Hi Ronald,
>
> Ronald Moesbergen, on 07/04/2009 07:19 PM wrote:
>>
>> 2009/7/3 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>
>>> Ronald Moesbergen, on 07/03/2009 01:14 PM wrote:
>>>>>>
>>>>>> OK, now I tend to agree on decreasing max_sectors_kb and increasing
>>>>>> read_ahead_kb. But before actually trying to push that idea I'd like
>>>>>> to
>>>>>> - do more benchmarks
>>>>>> - figure out why context readahead didn't help SCST performance
>>>>>>  (previous traces show that context readahead is submitting perfect
>>>>>>  large io requests, so I wonder if it's some io scheduler bug)
>>>>>
>>>>> Because, as we found out, without your
>>>>> http://lkml.org/lkml/2009/5/21/319
>>>>> patch read-ahead was nearly disabled, hence there were no difference
>>>>> which
>>>>> algorithm was used?
>>>>>
>>>>> Ronald, can you run the following tests, please? This time with 2
>>>>> hosts,
>>>>> initiator (client) and target (server) connected using 1 Gbps iSCSI. It
>>>>> would be the best if on the client vanilla 2.6.29 will be ran, but any
>>>>> other
>>>>> kernel will be fine as well, only specify which. Blockdev-perftest
>>>>> should
>>>>> be
>>>>> ran as before in buffered mode, i.e. with "-a" switch.
>>>>>
>>>>> 1. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 patch with all default settings.
>>>>>
>>>>> 2. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 patch with default RA size and 64KB
>>>>> max_sectors_kb.
>>>>>
>>>>> 3. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and default
>>>>> max_sectors_kb.
>>>>>
>>>>> 4. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and 64KB
>>>>> max_sectors_kb.
>>>>>
>>>>> 5. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 patch and with context RA patch. RA
>>>>> size
>>>>> and max_sectors_kb are default. For your convenience I committed the
>>>>> backported context RA patches into the SCST SVN repository.
>>>>>
>>>>> 6. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with default
>>>>> RA
>>>>> size and 64KB max_sectors_kb.
>>>>>
>>>>> 7. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>>> size
>>>>> and default max_sectors_kb.
>>>>>
>>>>> 8. All defaults on the client, on the server vanilla 2.6.29 with
>>>>> Fengguang's
>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>>> size
>>>>> and 64KB max_sectors_kb.
>>>>>
>>>>> 9. On the client default RA size and 64KB max_sectors_kb. On the server
>>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>
>>>>> 10. On the client 2MB RA size and default max_sectors_kb. On the server
>>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>
>>>>> 11. On the client 2MB RA size and 64KB max_sectors_kb. On the server
>>>>> vanilla
>>>>> 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and context
>>>>> RA
>>>>> patches with 2MB RA size and 64KB max_sectors_kb.
>>>>
>>>> Ok, done. Performance is pretty bad overall :(
>>>>
>>>> The kernels I used:
>>>> client kernel: 2.6.26-15lenny3 (debian)
>>>> server kernel: 2.6.29.5 with blk_dev_run patch
>>>>
>>>> And I adjusted the blockdev-perftest script to drop caches on both the
>>>> server (via ssh) and the client.
>>>>
>>>> The results:
>>>>
>>
>> ... previous results ...
>>
>>> Those are on the server without io_context-2.6.29 and readahead-2.6.29
>>> patches applied and with CFQ scheduler, correct?
>>>
>>> Then we see how reorder of requests caused by many I/O threads submitting
>>> I/O in separate I/O contexts badly affect performance and no RA,
>>> especially
>>> with default 128KB RA size, can solve it. Less max_sectors_kb on the
>>> client
>>> => more requests it sends at once => more reorder on the server => worse
>>> throughput. Although, Fengguang, in theory, context RA with 2MB RA size
>>> should considerably help it, no?
>>>
>>> Ronald, can you perform those tests again with both io_context-2.6.29 and
>>> readahead-2.6.29 patches applied on the server, please?
>>
>> Hi Vlad,
>>
>> I have retested with the patches you requested (and got access to the
>> systems today :) ) The results are better, but still not great.
>>
>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with io_context and readahead patch
>>
>> 5) client: default, server: default
>> blocksize       R        R        R   R(avg,    R(std        R
>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  18.303   19.867   18.481   54.299    1.961    0.848
>>  33554432  18.321   17.681   18.708   56.181    1.314    1.756
>>  16777216  17.816   17.406   19.257   56.494    2.410    3.531
>>  8388608  18.077   17.727   19.338   55.789    2.056    6.974
>>  4194304  17.918   16.601   18.287   58.276    2.454   14.569
>>  2097152  17.426   17.334   17.610   58.661    0.384   29.331
>>  1048576  19.358   18.764   17.253   55.607    2.734   55.607
>>   524288  17.951   18.163   17.440   57.379    0.983  114.757
>>   262144  18.196   17.724   17.520   57.499    0.907  229.995
>>   131072  18.342   18.259   17.551   56.751    1.131  454.010
>>    65536  17.733   18.572   17.134   57.548    1.893  920.766
>>    32768  19.081   19.321   17.364   55.213    2.673 1766.818
>>    16384  17.181   18.729   17.731   57.343    2.033 3669.932
>>
>> 6) client: default, server: 64 max_sectors_kb, RA default
>> blocksize       R        R        R   R(avg,    R(std        R
>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  21.790   20.062   19.534   50.153    2.304    0.784
>>  33554432  20.212   19.744   19.564   51.623    0.706    1.613
>>  16777216  20.404   19.329   19.738   51.680    1.148    3.230
>>  8388608  20.170   20.772   19.509   50.852    1.304    6.356
>>  4194304  19.334   18.742   18.522   54.296    0.978   13.574
>>  2097152  19.413   18.858   18.884   53.758    0.715   26.879
>>  1048576  20.472   18.755   18.476   53.347    2.377   53.347
>>   524288  19.120   20.104   18.404   53.378    1.925  106.756
>>   262144  20.337   19.213   18.636   52.866    1.901  211.464
>>   131072  19.199   18.312   19.970   53.510    1.900  428.083
>>    65536  19.855   20.114   19.592   51.584    0.555  825.342
>>    32768  20.586   18.724   20.340   51.592    2.204 1650.941
>>    16384  21.119   19.834   19.594   50.792    1.651 3250.669
>>
>> 7) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  17.767   16.489   16.949   60.050    1.842    0.938
>>  33554432  16.777   17.034   17.102   60.341    0.500    1.886
>>  16777216  18.509   16.784   16.971   58.891    2.537    3.681
>>  8388608  18.058   17.949   17.599   57.313    0.632    7.164
>>  4194304  18.286   17.648   17.026   58.055    1.692   14.514
>>  2097152  17.387   18.451   17.875   57.226    1.388   28.613
>>  1048576  18.270   17.698   17.570   57.397    0.969   57.397
>>   524288  16.708   17.900   17.233   59.306    1.668  118.611
>>   262144  18.041   17.381   18.035   57.484    1.011  229.934
>>   131072  17.994   17.777   18.146   56.981    0.481  455.844
>>    65536  17.097   18.597   17.737   57.563    1.975  921.011
>>    32768  17.167   17.035   19.693   57.254    3.721 1832.127
>>    16384  17.144   16.664   17.623   59.762    1.367 3824.774
>>
>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  20.003   21.133   19.308   50.894    1.881    0.795
>>  33554432  19.448   20.015   18.908   52.657    1.222    1.646
>>  16777216  19.964   19.350   19.106   52.603    0.967    3.288
>>  8388608  18.961   19.213   19.318   53.437    0.419    6.680
>>  4194304  18.135   19.508   19.361   53.948    1.788   13.487
>>  2097152  18.753   19.471   18.367   54.315    1.306   27.158
>>  1048576  19.189   18.586   18.867   54.244    0.707   54.244
>>   524288  18.985   19.199   18.840   53.874    0.417  107.749
>>   262144  19.064   21.143   19.674   51.398    2.204  205.592
>>   131072  18.691   18.664   19.116   54.406    0.594  435.245
>>    65536  18.468   20.673   18.554   53.389    2.729  854.229
>>    32768  20.401   21.156   19.552   50.323    1.623 1610.331
>>    16384  19.532   20.028   20.466   51.196    0.977 3276.567
>>
>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA
>> 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  16.458   16.649   17.346   60.919    1.364    0.952
>>  33554432  16.479   16.744   17.069   61.096    0.878    1.909
>>  16777216  17.128   16.585   17.112   60.456    0.910    3.778
>>  8388608  17.322   16.780   16.885   60.262    0.824    7.533
>>  4194304  17.530   16.725   16.756   60.250    1.299   15.063
>>  2097152  16.580   17.875   16.619   60.221    2.076   30.110
>>  1048576  17.550   17.406   17.075   59.049    0.681   59.049
>>   524288  16.492   18.211   16.832   59.718    2.519  119.436
>>   262144  17.241   17.115   17.365   59.397    0.352  237.588
>>   131072  17.430   16.902   17.511   59.271    0.936  474.167
>>    65536  16.726   16.894   17.246   60.404    0.768  966.461
>>    32768  16.662   17.517   17.052   59.989    1.224 1919.658
>>    16384  17.429   16.793   16.753   60.285    1.085 3858.268
>>
>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA
>> 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  17.601   18.334   17.379   57.650    1.307    0.901
>>  33554432  18.281   18.128   17.169   57.381    1.610    1.793
>>  16777216  17.660   17.875   17.356   58.091    0.703    3.631
>>  8388608  17.724   17.810   18.383   56.992    0.918    7.124
>>  4194304  17.475   17.770   19.003   56.704    2.031   14.176
>>  2097152  17.287   17.674   18.492   57.516    1.604   28.758
>>  1048576  17.972   17.460   18.777   56.721    1.689   56.721
>>   524288  18.680   18.952   19.445   53.837    0.890  107.673
>>   262144  18.070   18.337   18.639   55.817    0.707  223.270
>>   131072  16.990   16.651   16.862   60.832    0.507  486.657
>>    65536  17.707   16.972   17.520   58.870    1.066  941.924
>>    32768  17.767   17.208   17.205   58.887    0.885 1884.399
>>    16384  18.258   17.252   18.035   57.407    1.407 3674.059
>>
>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  17.993   18.307   18.718   55.850    0.902    0.873
>>  33554432  19.554   18.485   17.902   54.988    1.993    1.718
>>  16777216  18.829   18.236   18.748   55.052    0.785    3.441
>>  8388608  21.152   19.065   18.738   52.257    2.745    6.532
>>  4194304  19.131   19.703   17.850   54.288    2.268   13.572
>>  2097152  19.093   19.152   19.509   53.196    0.504   26.598
>>  1048576  19.371   18.775   18.804   53.953    0.772   53.953
>>   524288  20.003   17.911   18.602   54.470    2.476  108.940
>>   262144  19.182   19.460   18.476   53.809    1.183  215.236
>>   131072  19.403   19.192   18.907   53.429    0.567  427.435
>>    65536  19.502   19.656   18.599   53.219    1.309  851.509
>>    32768  18.746   18.747   18.250   55.119    0.701 1763.817
>>    16384  20.977   19.437   18.840   51.951    2.319 3324.862
>
> The results look inconsistently with what you had previously (89.7 MB/s).
> How can you explain it?

I had more patches applied with that test: (scst_exec_req_fifo-2.6.29,
put_page_callback-2.6.29) and I used a different dd command:

dd if=/dev/sdc of=/dev/zero bs=512K count=2000

But all that said, I can't reproduce speeds that high now. Must have
made a mistake back then (maybe I forgot to clear the pagecache).

> I think, most likely, there was some confusion between the tested and
> patched versions of the kernel or you forgot to apply the io_context patch.
> Please recheck.

The tests above were definitely done right, I just rechecked the
patches, and I do see an average increase of about 10MB/s over an
unpatched kernel. But overall the performance is still pretty bad.

Ronald.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-06 14:37                                               ` Ronald Moesbergen
@ 2009-07-06 17:48                                                 ` Vladislav Bolkhovitin
  2009-07-07  6:49                                                   ` Ronald Moesbergen
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-06 17:48 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: Wu Fengguang, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
	Bart Van Assche

Ronald Moesbergen, on 07/06/2009 06:37 PM wrote:
> 2009/7/6 Vladislav Bolkhovitin <vst@vlnb.net>:
>> (Restored the original list of recipients in this thread as I was asked.)
>>
>> Hi Ronald,
>>
>> Ronald Moesbergen, on 07/04/2009 07:19 PM wrote:
>>> 2009/7/3 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>> Ronald Moesbergen, on 07/03/2009 01:14 PM wrote:
>>>>>>> OK, now I tend to agree on decreasing max_sectors_kb and increasing
>>>>>>> read_ahead_kb. But before actually trying to push that idea I'd like
>>>>>>> to
>>>>>>> - do more benchmarks
>>>>>>> - figure out why context readahead didn't help SCST performance
>>>>>>>  (previous traces show that context readahead is submitting perfect
>>>>>>>  large io requests, so I wonder if it's some io scheduler bug)
>>>>>> Because, as we found out, without your
>>>>>> http://lkml.org/lkml/2009/5/21/319
>>>>>> patch read-ahead was nearly disabled, hence there were no difference
>>>>>> which
>>>>>> algorithm was used?
>>>>>>
>>>>>> Ronald, can you run the following tests, please? This time with 2
>>>>>> hosts,
>>>>>> initiator (client) and target (server) connected using 1 Gbps iSCSI. It
>>>>>> would be the best if on the client vanilla 2.6.29 will be ran, but any
>>>>>> other
>>>>>> kernel will be fine as well, only specify which. Blockdev-perftest
>>>>>> should
>>>>>> be
>>>>>> ran as before in buffered mode, i.e. with "-a" switch.
>>>>>>
>>>>>> 1. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with all default settings.
>>>>>>
>>>>>> 2. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with default RA size and 64KB
>>>>>> max_sectors_kb.
>>>>>>
>>>>>> 3. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and default
>>>>>> max_sectors_kb.
>>>>>>
>>>>>> 4. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and 64KB
>>>>>> max_sectors_kb.
>>>>>>
>>>>>> 5. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 patch and with context RA patch. RA
>>>>>> size
>>>>>> and max_sectors_kb are default. For your convenience I committed the
>>>>>> backported context RA patches into the SCST SVN repository.
>>>>>>
>>>>>> 6. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with default
>>>>>> RA
>>>>>> size and 64KB max_sectors_kb.
>>>>>>
>>>>>> 7. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>>>> size
>>>>>> and default max_sectors_kb.
>>>>>>
>>>>>> 8. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>> Fengguang's
>>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>>>> size
>>>>>> and 64KB max_sectors_kb.
>>>>>>
>>>>>> 9. On the client default RA size and 64KB max_sectors_kb. On the server
>>>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>>
>>>>>> 10. On the client 2MB RA size and default max_sectors_kb. On the server
>>>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>>
>>>>>> 11. On the client 2MB RA size and 64KB max_sectors_kb. On the server
>>>>>> vanilla
>>>>>> 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and context
>>>>>> RA
>>>>>> patches with 2MB RA size and 64KB max_sectors_kb.
>>>>> Ok, done. Performance is pretty bad overall :(
>>>>>
>>>>> The kernels I used:
>>>>> client kernel: 2.6.26-15lenny3 (debian)
>>>>> server kernel: 2.6.29.5 with blk_dev_run patch
>>>>>
>>>>> And I adjusted the blockdev-perftest script to drop caches on both the
>>>>> server (via ssh) and the client.
>>>>>
>>>>> The results:
>>>>>
>>> ... previous results ...
>>>
>>>> Those are on the server without io_context-2.6.29 and readahead-2.6.29
>>>> patches applied and with CFQ scheduler, correct?
>>>>
>>>> Then we see how reorder of requests caused by many I/O threads submitting
>>>> I/O in separate I/O contexts badly affect performance and no RA,
>>>> especially
>>>> with default 128KB RA size, can solve it. Less max_sectors_kb on the
>>>> client
>>>> => more requests it sends at once => more reorder on the server => worse
>>>> throughput. Although, Fengguang, in theory, context RA with 2MB RA size
>>>> should considerably help it, no?
>>>>
>>>> Ronald, can you perform those tests again with both io_context-2.6.29 and
>>>> readahead-2.6.29 patches applied on the server, please?
>>> Hi Vlad,
>>>
>>> I have retested with the patches you requested (and got access to the
>>> systems today :) ) The results are better, but still not great.
>>>
>>> client kernel: 2.6.26-15lenny3 (debian)
>>> server kernel: 2.6.29.5 with io_context and readahead patch
>>>
>>> 5) client: default, server: default
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864  18.303   19.867   18.481   54.299    1.961    0.848
>>>  33554432  18.321   17.681   18.708   56.181    1.314    1.756
>>>  16777216  17.816   17.406   19.257   56.494    2.410    3.531
>>>  8388608  18.077   17.727   19.338   55.789    2.056    6.974
>>>  4194304  17.918   16.601   18.287   58.276    2.454   14.569
>>>  2097152  17.426   17.334   17.610   58.661    0.384   29.331
>>>  1048576  19.358   18.764   17.253   55.607    2.734   55.607
>>>   524288  17.951   18.163   17.440   57.379    0.983  114.757
>>>   262144  18.196   17.724   17.520   57.499    0.907  229.995
>>>   131072  18.342   18.259   17.551   56.751    1.131  454.010
>>>    65536  17.733   18.572   17.134   57.548    1.893  920.766
>>>    32768  19.081   19.321   17.364   55.213    2.673 1766.818
>>>    16384  17.181   18.729   17.731   57.343    2.033 3669.932
>>>
>>> 6) client: default, server: 64 max_sectors_kb, RA default
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864  21.790   20.062   19.534   50.153    2.304    0.784
>>>  33554432  20.212   19.744   19.564   51.623    0.706    1.613
>>>  16777216  20.404   19.329   19.738   51.680    1.148    3.230
>>>  8388608  20.170   20.772   19.509   50.852    1.304    6.356
>>>  4194304  19.334   18.742   18.522   54.296    0.978   13.574
>>>  2097152  19.413   18.858   18.884   53.758    0.715   26.879
>>>  1048576  20.472   18.755   18.476   53.347    2.377   53.347
>>>   524288  19.120   20.104   18.404   53.378    1.925  106.756
>>>   262144  20.337   19.213   18.636   52.866    1.901  211.464
>>>   131072  19.199   18.312   19.970   53.510    1.900  428.083
>>>    65536  19.855   20.114   19.592   51.584    0.555  825.342
>>>    32768  20.586   18.724   20.340   51.592    2.204 1650.941
>>>    16384  21.119   19.834   19.594   50.792    1.651 3250.669
>>>
>>> 7) client: default, server: default max_sectors_kb, RA 2MB
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864  17.767   16.489   16.949   60.050    1.842    0.938
>>>  33554432  16.777   17.034   17.102   60.341    0.500    1.886
>>>  16777216  18.509   16.784   16.971   58.891    2.537    3.681
>>>  8388608  18.058   17.949   17.599   57.313    0.632    7.164
>>>  4194304  18.286   17.648   17.026   58.055    1.692   14.514
>>>  2097152  17.387   18.451   17.875   57.226    1.388   28.613
>>>  1048576  18.270   17.698   17.570   57.397    0.969   57.397
>>>   524288  16.708   17.900   17.233   59.306    1.668  118.611
>>>   262144  18.041   17.381   18.035   57.484    1.011  229.934
>>>   131072  17.994   17.777   18.146   56.981    0.481  455.844
>>>    65536  17.097   18.597   17.737   57.563    1.975  921.011
>>>    32768  17.167   17.035   19.693   57.254    3.721 1832.127
>>>    16384  17.144   16.664   17.623   59.762    1.367 3824.774
>>>
>>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864  20.003   21.133   19.308   50.894    1.881    0.795
>>>  33554432  19.448   20.015   18.908   52.657    1.222    1.646
>>>  16777216  19.964   19.350   19.106   52.603    0.967    3.288
>>>  8388608  18.961   19.213   19.318   53.437    0.419    6.680
>>>  4194304  18.135   19.508   19.361   53.948    1.788   13.487
>>>  2097152  18.753   19.471   18.367   54.315    1.306   27.158
>>>  1048576  19.189   18.586   18.867   54.244    0.707   54.244
>>>   524288  18.985   19.199   18.840   53.874    0.417  107.749
>>>   262144  19.064   21.143   19.674   51.398    2.204  205.592
>>>   131072  18.691   18.664   19.116   54.406    0.594  435.245
>>>    65536  18.468   20.673   18.554   53.389    2.729  854.229
>>>    32768  20.401   21.156   19.552   50.323    1.623 1610.331
>>>    16384  19.532   20.028   20.466   51.196    0.977 3276.567
>>>
>>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA
>>> 2MB
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864  16.458   16.649   17.346   60.919    1.364    0.952
>>>  33554432  16.479   16.744   17.069   61.096    0.878    1.909
>>>  16777216  17.128   16.585   17.112   60.456    0.910    3.778
>>>  8388608  17.322   16.780   16.885   60.262    0.824    7.533
>>>  4194304  17.530   16.725   16.756   60.250    1.299   15.063
>>>  2097152  16.580   17.875   16.619   60.221    2.076   30.110
>>>  1048576  17.550   17.406   17.075   59.049    0.681   59.049
>>>   524288  16.492   18.211   16.832   59.718    2.519  119.436
>>>   262144  17.241   17.115   17.365   59.397    0.352  237.588
>>>   131072  17.430   16.902   17.511   59.271    0.936  474.167
>>>    65536  16.726   16.894   17.246   60.404    0.768  966.461
>>>    32768  16.662   17.517   17.052   59.989    1.224 1919.658
>>>    16384  17.429   16.793   16.753   60.285    1.085 3858.268
>>>
>>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA
>>> 2MB
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864  17.601   18.334   17.379   57.650    1.307    0.901
>>>  33554432  18.281   18.128   17.169   57.381    1.610    1.793
>>>  16777216  17.660   17.875   17.356   58.091    0.703    3.631
>>>  8388608  17.724   17.810   18.383   56.992    0.918    7.124
>>>  4194304  17.475   17.770   19.003   56.704    2.031   14.176
>>>  2097152  17.287   17.674   18.492   57.516    1.604   28.758
>>>  1048576  17.972   17.460   18.777   56.721    1.689   56.721
>>>   524288  18.680   18.952   19.445   53.837    0.890  107.673
>>>   262144  18.070   18.337   18.639   55.817    0.707  223.270
>>>   131072  16.990   16.651   16.862   60.832    0.507  486.657
>>>    65536  17.707   16.972   17.520   58.870    1.066  941.924
>>>    32768  17.767   17.208   17.205   58.887    0.885 1884.399
>>>    16384  18.258   17.252   18.035   57.407    1.407 3674.059
>>>
>>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>>> blocksize       R        R        R   R(avg,    R(std        R
>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>  67108864  17.993   18.307   18.718   55.850    0.902    0.873
>>>  33554432  19.554   18.485   17.902   54.988    1.993    1.718
>>>  16777216  18.829   18.236   18.748   55.052    0.785    3.441
>>>  8388608  21.152   19.065   18.738   52.257    2.745    6.532
>>>  4194304  19.131   19.703   17.850   54.288    2.268   13.572
>>>  2097152  19.093   19.152   19.509   53.196    0.504   26.598
>>>  1048576  19.371   18.775   18.804   53.953    0.772   53.953
>>>   524288  20.003   17.911   18.602   54.470    2.476  108.940
>>>   262144  19.182   19.460   18.476   53.809    1.183  215.236
>>>   131072  19.403   19.192   18.907   53.429    0.567  427.435
>>>    65536  19.502   19.656   18.599   53.219    1.309  851.509
>>>    32768  18.746   18.747   18.250   55.119    0.701 1763.817
>>>    16384  20.977   19.437   18.840   51.951    2.319 3324.862
>> The results look inconsistently with what you had previously (89.7 MB/s).
>> How can you explain it?
> 
> I had more patches applied with that test: (scst_exec_req_fifo-2.6.29,
> put_page_callback-2.6.29) and I used a different dd command:
> 
> dd if=/dev/sdc of=/dev/zero bs=512K count=2000
> 
> But all that said, I can't reproduce speeds that high now. Must have
> made a mistake back then (maybe I forgot to clear the pagecache).

If you forgot to clear the cache, you would had had the wire throughput 
(110 MB/s) or more.

>> I think, most likely, there was some confusion between the tested and
>> patched versions of the kernel or you forgot to apply the io_context patch.
>> Please recheck.
> 
> The tests above were definitely done right, I just rechecked the
> patches, and I do see an average increase of about 10MB/s over an
> unpatched kernel. But overall the performance is still pretty bad.

Have you rebuild and reinstall SCST after patching kernel?

> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-06 17:48                                                 ` Vladislav Bolkhovitin
@ 2009-07-07  6:49                                                   ` Ronald Moesbergen
       [not found]                                                     ` <4A5395FD.2040507@vlnb.net>
  0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-07  6:49 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Wu Fengguang, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
	Bart Van Assche

2009/7/6 Vladislav Bolkhovitin <vst@vlnb.net>:
> Ronald Moesbergen, on 07/06/2009 06:37 PM wrote:
>>
>> 2009/7/6 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>
>>> (Restored the original list of recipients in this thread as I was asked.)
>>>
>>> Hi Ronald,
>>>
>>> Ronald Moesbergen, on 07/04/2009 07:19 PM wrote:
>>>>
>>>> 2009/7/3 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>>>
>>>>> Ronald Moesbergen, on 07/03/2009 01:14 PM wrote:
>>>>>>>>
>>>>>>>> OK, now I tend to agree on decreasing max_sectors_kb and increasing
>>>>>>>> read_ahead_kb. But before actually trying to push that idea I'd like
>>>>>>>> to
>>>>>>>> - do more benchmarks
>>>>>>>> - figure out why context readahead didn't help SCST performance
>>>>>>>>  (previous traces show that context readahead is submitting perfect
>>>>>>>>  large io requests, so I wonder if it's some io scheduler bug)
>>>>>>>
>>>>>>> Because, as we found out, without your
>>>>>>> http://lkml.org/lkml/2009/5/21/319
>>>>>>> patch read-ahead was nearly disabled, hence there were no difference
>>>>>>> which
>>>>>>> algorithm was used?
>>>>>>>
>>>>>>> Ronald, can you run the following tests, please? This time with 2
>>>>>>> hosts,
>>>>>>> initiator (client) and target (server) connected using 1 Gbps iSCSI.
>>>>>>> It
>>>>>>> would be the best if on the client vanilla 2.6.29 will be ran, but
>>>>>>> any
>>>>>>> other
>>>>>>> kernel will be fine as well, only specify which. Blockdev-perftest
>>>>>>> should
>>>>>>> be
>>>>>>> ran as before in buffered mode, i.e. with "-a" switch.
>>>>>>>
>>>>>>> 1. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with all default settings.
>>>>>>>
>>>>>>> 2. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with default RA size and
>>>>>>> 64KB
>>>>>>> max_sectors_kb.
>>>>>>>
>>>>>>> 3. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and default
>>>>>>> max_sectors_kb.
>>>>>>>
>>>>>>> 4. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 patch with 2MB RA size and 64KB
>>>>>>> max_sectors_kb.
>>>>>>>
>>>>>>> 5. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 patch and with context RA patch.
>>>>>>> RA
>>>>>>> size
>>>>>>> and max_sectors_kb are default. For your convenience I committed the
>>>>>>> backported context RA patches into the SCST SVN repository.
>>>>>>>
>>>>>>> 6. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with
>>>>>>> default
>>>>>>> RA
>>>>>>> size and 64KB max_sectors_kb.
>>>>>>>
>>>>>>> 7. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>>>>> size
>>>>>>> and default max_sectors_kb.
>>>>>>>
>>>>>>> 8. All defaults on the client, on the server vanilla 2.6.29 with
>>>>>>> Fengguang's
>>>>>>> http://lkml.org/lkml/2009/5/21/319 and context RA patches with 2MB RA
>>>>>>> size
>>>>>>> and 64KB max_sectors_kb.
>>>>>>>
>>>>>>> 9. On the client default RA size and 64KB max_sectors_kb. On the
>>>>>>> server
>>>>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319
>>>>>>> and
>>>>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>>>
>>>>>>> 10. On the client 2MB RA size and default max_sectors_kb. On the
>>>>>>> server
>>>>>>> vanilla 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319
>>>>>>> and
>>>>>>> context RA patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>>>
>>>>>>> 11. On the client 2MB RA size and 64KB max_sectors_kb. On the server
>>>>>>> vanilla
>>>>>>> 2.6.29 with Fengguang's http://lkml.org/lkml/2009/5/21/319 and
>>>>>>> context
>>>>>>> RA
>>>>>>> patches with 2MB RA size and 64KB max_sectors_kb.
>>>>>>
>>>>>> Ok, done. Performance is pretty bad overall :(
>>>>>>
>>>>>> The kernels I used:
>>>>>> client kernel: 2.6.26-15lenny3 (debian)
>>>>>> server kernel: 2.6.29.5 with blk_dev_run patch
>>>>>>
>>>>>> And I adjusted the blockdev-perftest script to drop caches on both the
>>>>>> server (via ssh) and the client.
>>>>>>
>>>>>> The results:
>>>>>>
>>>> ... previous results ...
>>>>
>>>>> Those are on the server without io_context-2.6.29 and readahead-2.6.29
>>>>> patches applied and with CFQ scheduler, correct?
>>>>>
>>>>> Then we see how reorder of requests caused by many I/O threads
>>>>> submitting
>>>>> I/O in separate I/O contexts badly affect performance and no RA,
>>>>> especially
>>>>> with default 128KB RA size, can solve it. Less max_sectors_kb on the
>>>>> client
>>>>> => more requests it sends at once => more reorder on the server =>
>>>>> worse
>>>>> throughput. Although, Fengguang, in theory, context RA with 2MB RA size
>>>>> should considerably help it, no?
>>>>>
>>>>> Ronald, can you perform those tests again with both io_context-2.6.29
>>>>> and
>>>>> readahead-2.6.29 patches applied on the server, please?
>>>>
>>>> Hi Vlad,
>>>>
>>>> I have retested with the patches you requested (and got access to the
>>>> systems today :) ) The results are better, but still not great.
>>>>
>>>> client kernel: 2.6.26-15lenny3 (debian)
>>>> server kernel: 2.6.29.5 with io_context and readahead patch
>>>>
>>>> 5) client: default, server: default
>>>> blocksize       R        R        R   R(avg,    R(std        R
>>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>>  67108864  18.303   19.867   18.481   54.299    1.961    0.848
>>>>  33554432  18.321   17.681   18.708   56.181    1.314    1.756
>>>>  16777216  17.816   17.406   19.257   56.494    2.410    3.531
>>>>  8388608  18.077   17.727   19.338   55.789    2.056    6.974
>>>>  4194304  17.918   16.601   18.287   58.276    2.454   14.569
>>>>  2097152  17.426   17.334   17.610   58.661    0.384   29.331
>>>>  1048576  19.358   18.764   17.253   55.607    2.734   55.607
>>>>  524288  17.951   18.163   17.440   57.379    0.983  114.757
>>>>  262144  18.196   17.724   17.520   57.499    0.907  229.995
>>>>  131072  18.342   18.259   17.551   56.751    1.131  454.010
>>>>   65536  17.733   18.572   17.134   57.548    1.893  920.766
>>>>   32768  19.081   19.321   17.364   55.213    2.673 1766.818
>>>>   16384  17.181   18.729   17.731   57.343    2.033 3669.932
>>>>
>>>> 6) client: default, server: 64 max_sectors_kb, RA default
>>>> blocksize       R        R        R   R(avg,    R(std        R
>>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>>  67108864  21.790   20.062   19.534   50.153    2.304    0.784
>>>>  33554432  20.212   19.744   19.564   51.623    0.706    1.613
>>>>  16777216  20.404   19.329   19.738   51.680    1.148    3.230
>>>>  8388608  20.170   20.772   19.509   50.852    1.304    6.356
>>>>  4194304  19.334   18.742   18.522   54.296    0.978   13.574
>>>>  2097152  19.413   18.858   18.884   53.758    0.715   26.879
>>>>  1048576  20.472   18.755   18.476   53.347    2.377   53.347
>>>>  524288  19.120   20.104   18.404   53.378    1.925  106.756
>>>>  262144  20.337   19.213   18.636   52.866    1.901  211.464
>>>>  131072  19.199   18.312   19.970   53.510    1.900  428.083
>>>>   65536  19.855   20.114   19.592   51.584    0.555  825.342
>>>>   32768  20.586   18.724   20.340   51.592    2.204 1650.941
>>>>   16384  21.119   19.834   19.594   50.792    1.651 3250.669
>>>>
>>>> 7) client: default, server: default max_sectors_kb, RA 2MB
>>>> blocksize       R        R        R   R(avg,    R(std        R
>>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>>  67108864  17.767   16.489   16.949   60.050    1.842    0.938
>>>>  33554432  16.777   17.034   17.102   60.341    0.500    1.886
>>>>  16777216  18.509   16.784   16.971   58.891    2.537    3.681
>>>>  8388608  18.058   17.949   17.599   57.313    0.632    7.164
>>>>  4194304  18.286   17.648   17.026   58.055    1.692   14.514
>>>>  2097152  17.387   18.451   17.875   57.226    1.388   28.613
>>>>  1048576  18.270   17.698   17.570   57.397    0.969   57.397
>>>>  524288  16.708   17.900   17.233   59.306    1.668  118.611
>>>>  262144  18.041   17.381   18.035   57.484    1.011  229.934
>>>>  131072  17.994   17.777   18.146   56.981    0.481  455.844
>>>>   65536  17.097   18.597   17.737   57.563    1.975  921.011
>>>>   32768  17.167   17.035   19.693   57.254    3.721 1832.127
>>>>   16384  17.144   16.664   17.623   59.762    1.367 3824.774
>>>>
>>>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>>>> blocksize       R        R        R   R(avg,    R(std        R
>>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>>  67108864  20.003   21.133   19.308   50.894    1.881    0.795
>>>>  33554432  19.448   20.015   18.908   52.657    1.222    1.646
>>>>  16777216  19.964   19.350   19.106   52.603    0.967    3.288
>>>>  8388608  18.961   19.213   19.318   53.437    0.419    6.680
>>>>  4194304  18.135   19.508   19.361   53.948    1.788   13.487
>>>>  2097152  18.753   19.471   18.367   54.315    1.306   27.158
>>>>  1048576  19.189   18.586   18.867   54.244    0.707   54.244
>>>>  524288  18.985   19.199   18.840   53.874    0.417  107.749
>>>>  262144  19.064   21.143   19.674   51.398    2.204  205.592
>>>>  131072  18.691   18.664   19.116   54.406    0.594  435.245
>>>>   65536  18.468   20.673   18.554   53.389    2.729  854.229
>>>>   32768  20.401   21.156   19.552   50.323    1.623 1610.331
>>>>   16384  19.532   20.028   20.466   51.196    0.977 3276.567
>>>>
>>>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA
>>>> 2MB
>>>> blocksize       R        R        R   R(avg,    R(std        R
>>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>>  67108864  16.458   16.649   17.346   60.919    1.364    0.952
>>>>  33554432  16.479   16.744   17.069   61.096    0.878    1.909
>>>>  16777216  17.128   16.585   17.112   60.456    0.910    3.778
>>>>  8388608  17.322   16.780   16.885   60.262    0.824    7.533
>>>>  4194304  17.530   16.725   16.756   60.250    1.299   15.063
>>>>  2097152  16.580   17.875   16.619   60.221    2.076   30.110
>>>>  1048576  17.550   17.406   17.075   59.049    0.681   59.049
>>>>  524288  16.492   18.211   16.832   59.718    2.519  119.436
>>>>  262144  17.241   17.115   17.365   59.397    0.352  237.588
>>>>  131072  17.430   16.902   17.511   59.271    0.936  474.167
>>>>   65536  16.726   16.894   17.246   60.404    0.768  966.461
>>>>   32768  16.662   17.517   17.052   59.989    1.224 1919.658
>>>>   16384  17.429   16.793   16.753   60.285    1.085 3858.268
>>>>
>>>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb,
>>>> RA
>>>> 2MB
>>>> blocksize       R        R        R   R(avg,    R(std        R
>>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>>  67108864  17.601   18.334   17.379   57.650    1.307    0.901
>>>>  33554432  18.281   18.128   17.169   57.381    1.610    1.793
>>>>  16777216  17.660   17.875   17.356   58.091    0.703    3.631
>>>>  8388608  17.724   17.810   18.383   56.992    0.918    7.124
>>>>  4194304  17.475   17.770   19.003   56.704    2.031   14.176
>>>>  2097152  17.287   17.674   18.492   57.516    1.604   28.758
>>>>  1048576  17.972   17.460   18.777   56.721    1.689   56.721
>>>>  524288  18.680   18.952   19.445   53.837    0.890  107.673
>>>>  262144  18.070   18.337   18.639   55.817    0.707  223.270
>>>>  131072  16.990   16.651   16.862   60.832    0.507  486.657
>>>>   65536  17.707   16.972   17.520   58.870    1.066  941.924
>>>>   32768  17.767   17.208   17.205   58.887    0.885 1884.399
>>>>   16384  18.258   17.252   18.035   57.407    1.407 3674.059
>>>>
>>>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>>>> blocksize       R        R        R   R(avg,    R(std        R
>>>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>>>  67108864  17.993   18.307   18.718   55.850    0.902    0.873
>>>>  33554432  19.554   18.485   17.902   54.988    1.993    1.718
>>>>  16777216  18.829   18.236   18.748   55.052    0.785    3.441
>>>>  8388608  21.152   19.065   18.738   52.257    2.745    6.532
>>>>  4194304  19.131   19.703   17.850   54.288    2.268   13.572
>>>>  2097152  19.093   19.152   19.509   53.196    0.504   26.598
>>>>  1048576  19.371   18.775   18.804   53.953    0.772   53.953
>>>>  524288  20.003   17.911   18.602   54.470    2.476  108.940
>>>>  262144  19.182   19.460   18.476   53.809    1.183  215.236
>>>>  131072  19.403   19.192   18.907   53.429    0.567  427.435
>>>>   65536  19.502   19.656   18.599   53.219    1.309  851.509
>>>>   32768  18.746   18.747   18.250   55.119    0.701 1763.817
>>>>   16384  20.977   19.437   18.840   51.951    2.319 3324.862
>>>
>>> The results look inconsistently with what you had previously (89.7 MB/s).
>>> How can you explain it?
>>
>> I had more patches applied with that test: (scst_exec_req_fifo-2.6.29,
>> put_page_callback-2.6.29) and I used a different dd command:
>>
>> dd if=/dev/sdc of=/dev/zero bs=512K count=2000
>>
>> But all that said, I can't reproduce speeds that high now. Must have
>> made a mistake back then (maybe I forgot to clear the pagecache).
>
> If you forgot to clear the cache, you would had had the wire throughput (110
> MB/s) or more.

Maybe. Maybe just part of what I was transferring was in cache. I had
done some tests on the filesystem on that same block device too.

>>> I think, most likely, there was some confusion between the tested and
>>> patched versions of the kernel or you forgot to apply the io_context
>>> patch.
>>> Please recheck.
>>
>> The tests above were definitely done right, I just rechecked the
>> patches, and I do see an average increase of about 10MB/s over an
>> unpatched kernel. But overall the performance is still pretty bad.
>
> Have you rebuild and reinstall SCST after patching kernel?

Yes I have. And the warning about missing io_context patches wasn't
there during the compilation.

Ronald.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
       [not found]                                                       ` <a0272b440907080149j3eeeb9bat13f942520db059a8@mail.gmail.com>
@ 2009-07-08 12:40                                                         ` Vladislav Bolkhovitin
  2009-07-10  6:32                                                           ` Ronald Moesbergen
  2009-07-15 20:52                                                           ` Kurt Garloff
  0 siblings, 2 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-08 12:40 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
	Bart Van Assche

Ronald Moesbergen, on 07/08/2009 12:49 PM wrote:
> 2009/7/7 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Ronald Moesbergen, on 07/07/2009 10:49 AM wrote:
>>>>>> I think, most likely, there was some confusion between the tested and
>>>>>> patched versions of the kernel or you forgot to apply the io_context
>>>>>> patch.
>>>>>> Please recheck.
>>>>> The tests above were definitely done right, I just rechecked the
>>>>> patches, and I do see an average increase of about 10MB/s over an
>>>>> unpatched kernel. But overall the performance is still pretty bad.
>>>> Have you rebuild and reinstall SCST after patching kernel?
>>> Yes I have. And the warning about missing io_context patches wasn't
>>> there during the compilation.
>> Can you update to the latest trunk/ and send me the kernel logs from the
>> kernel's boot after one dd with any block size you like >128K and the
>> transfer rate the dd reported, please?
>>
> 
> I think I just reproduced the 'wrong' result:
> 
> dd if=/dev/sdc of=/dev/null bs=512K count=2000
> 2000+0 records in
> 2000+0 records out
> 1048576000 bytes (1.0 GB) copied, 12.1291 s, 86.5 MB/s
> 
> This happens when I do a 'dd' on the device with a mounted filesystem.
> The filesystem mount causes some of the blocks on the device to be
> cached and therefore the results are wrong. This was not the case in
> all the blockdev-perftest run's I did (the filesystem was never
> mounted).

Why do you think the file system (which one, BTW?) has any additional 
caching if you did "echo 3 > /proc/sys/vm/drop_caches" before the tests? 
All block devices and file systems use the same cache facilities.

I've also long ago noticed that reading data from block devices is 
slower than from files from mounted on those block devices file systems. 
Can anybody explain it?

Looks like this is strangeness #2 which we uncovered in our tests (the 
first one was earlier in this thread why the context RA doesn't work 
with cooperative I/O threads as good as it should).

Can you rerun the same 11 tests over a file on the file system, please?

> Ronald.
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-08 12:40                                                         ` Vladislav Bolkhovitin
@ 2009-07-10  6:32                                                           ` Ronald Moesbergen
  2009-07-10  8:43                                                             ` Vladislav Bolkhovitin
  2009-07-15 20:52                                                           ` Kurt Garloff
  1 sibling, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-10  6:32 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
	Bart Van Assche

2009/7/8 Vladislav Bolkhovitin <vst@vlnb.net>:
> Ronald Moesbergen, on 07/08/2009 12:49 PM wrote:
>>
>> 2009/7/7 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>
>>> Ronald Moesbergen, on 07/07/2009 10:49 AM wrote:
>>>>>>>
>>>>>>> I think, most likely, there was some confusion between the tested and
>>>>>>> patched versions of the kernel or you forgot to apply the io_context
>>>>>>> patch.
>>>>>>> Please recheck.
>>>>>>
>>>>>> The tests above were definitely done right, I just rechecked the
>>>>>> patches, and I do see an average increase of about 10MB/s over an
>>>>>> unpatched kernel. But overall the performance is still pretty bad.
>>>>>
>>>>> Have you rebuild and reinstall SCST after patching kernel?
>>>>
>>>> Yes I have. And the warning about missing io_context patches wasn't
>>>> there during the compilation.
>>>
>>> Can you update to the latest trunk/ and send me the kernel logs from the
>>> kernel's boot after one dd with any block size you like >128K and the
>>> transfer rate the dd reported, please?
>>>
>>
>> I think I just reproduced the 'wrong' result:
>>
>> dd if=/dev/sdc of=/dev/null bs=512K count=2000
>> 2000+0 records in
>> 2000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 12.1291 s, 86.5 MB/s
>>
>> This happens when I do a 'dd' on the device with a mounted filesystem.
>> The filesystem mount causes some of the blocks on the device to be
>> cached and therefore the results are wrong. This was not the case in
>> all the blockdev-perftest run's I did (the filesystem was never
>> mounted).
>
> Why do you think the file system (which one, BTW?) has any additional
> caching if you did "echo 3 > /proc/sys/vm/drop_caches" before the tests? All
> block devices and file systems use the same cache facilities.

I didn't drop the caches because I just restarted both machines and
thought that would be enough. But because of the mounted filesystem
the results were invalid. (The filesystem is OCFS2, but that doesn't
matter).

> I've also long ago noticed that reading data from block devices is slower
> than from files from mounted on those block devices file systems. Can
> anybody explain it?
>
> Looks like this is strangeness #2 which we uncovered in our tests (the first
> one was earlier in this thread why the context RA doesn't work with
> cooperative I/O threads as good as it should).
>
> Can you rerun the same 11 tests over a file on the file system, please?

I'll see what I can do. Just te be sure: you want me to run
blockdev-perftest on a file on the OCFS2 filesystem which is mounted
on the client over iScsi, right?

Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-10  6:32                                                           ` Ronald Moesbergen
@ 2009-07-10  8:43                                                             ` Vladislav Bolkhovitin
  2009-07-10  9:27                                                               ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-10  8:43 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
	Bart Van Assche


Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>> I've also long ago noticed that reading data from block devices is slower
>> than from files from mounted on those block devices file systems. Can
>> anybody explain it?
>>
>> Looks like this is strangeness #2 which we uncovered in our tests (the first
>> one was earlier in this thread why the context RA doesn't work with
>> cooperative I/O threads as good as it should).
>>
>> Can you rerun the same 11 tests over a file on the file system, please?
> 
> I'll see what I can do. Just te be sure: you want me to run
> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
> on the client over iScsi, right?

Yes, please.

> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-10  8:43                                                             ` Vladislav Bolkhovitin
@ 2009-07-10  9:27                                                               ` Vladislav Bolkhovitin
  2009-07-13 12:12                                                                 ` Ronald Moesbergen
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-10  9:27 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
	Bart Van Assche


Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>>> I've also long ago noticed that reading data from block devices is slower
>>> than from files from mounted on those block devices file systems. Can
>>> anybody explain it?
>>>
>>> Looks like this is strangeness #2 which we uncovered in our tests (the first
>>> one was earlier in this thread why the context RA doesn't work with
>>> cooperative I/O threads as good as it should).
>>>
>>> Can you rerun the same 11 tests over a file on the file system, please?
>> I'll see what I can do. Just te be sure: you want me to run
>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>> on the client over iScsi, right?
> 
> Yes, please.

Forgot to mention that you should also configure your backend storage as 
a big file on a file system (preferably, XFS) too, not as direct device, 
like /dev/vg/db-master.

Thanks,
Vlad


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-10  9:27                                                               ` Vladislav Bolkhovitin
@ 2009-07-13 12:12                                                                 ` Ronald Moesbergen
  2009-07-13 12:36                                                                   ` Wu Fengguang
  2009-07-14 18:52                                                                   ` Vladislav Bolkhovitin
  0 siblings, 2 replies; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-13 12:12 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
	Bart Van Assche

2009/7/10 Vladislav Bolkhovitin <vst@vlnb.net>:
>
> Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
>>
>> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>>>>
>>>> I've also long ago noticed that reading data from block devices is
>>>> slower
>>>> than from files from mounted on those block devices file systems. Can
>>>> anybody explain it?
>>>>
>>>> Looks like this is strangeness #2 which we uncovered in our tests (the
>>>> first
>>>> one was earlier in this thread why the context RA doesn't work with
>>>> cooperative I/O threads as good as it should).
>>>>
>>>> Can you rerun the same 11 tests over a file on the file system, please?
>>>
>>> I'll see what I can do. Just te be sure: you want me to run
>>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>>> on the client over iScsi, right?
>>
>> Yes, please.
>
> Forgot to mention that you should also configure your backend storage as a
> big file on a file system (preferably, XFS) too, not as direct device, like
> /dev/vg/db-master.

Ok, here are the results:

client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead patch

Test done with XFS on both the target and the initiator. This confirms
your findings, using files instead of block devices is faster, but
only when using the io_context patch.

Without io_context patch:
1) client: default, server: default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  18.327   18.327   17.740   56.491    0.872    0.883
 33554432  18.662   18.311   18.116   55.772    0.683    1.743
 16777216  18.900   18.421   18.312   55.229    0.754    3.452
  8388608  18.893   18.533   18.281   55.156    0.743    6.895
  4194304  18.512   18.097   18.400   55.850    0.536   13.963
  2097152  18.635   18.313   18.676   55.232    0.486   27.616
  1048576  18.441   18.264   18.245   55.907    0.267   55.907
   524288  17.773   18.669   18.459   55.980    1.184  111.960
   262144  18.580   18.758   17.483   56.091    1.767  224.365
   131072  17.224   18.333   18.765   56.626    2.067  453.006
    65536  18.082   19.223   18.238   55.348    1.483  885.567
    32768  17.719   18.293   18.198   56.680    0.795 1813.766
    16384  17.872   18.322   17.537   57.192    1.024 3660.273

2) client: default, server: 64 max_sectors_kb, RA default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  18.738   18.435   18.400   55.283    0.451    0.864
 33554432  18.046   18.167   17.572   57.128    0.826    1.785
 16777216  18.504   18.203   18.377   55.771    0.376    3.486
  8388608  22.069   18.554   17.825   53.013    4.766    6.627
  4194304  19.211   18.136   18.083   55.465    1.529   13.866
  2097152  18.647   17.851   18.511   55.866    1.071   27.933
  1048576  19.084   18.177   18.194   55.425    1.249   55.425
   524288  18.999   18.553   18.380   54.934    0.763  109.868
   262144  18.867   18.273   18.063   55.668    1.020  222.673
   131072  17.846   18.966   18.193   55.885    1.412  447.081
    65536  18.195   18.616   18.482   55.564    0.530  889.023
    32768  17.882   18.841   17.707   56.481    1.525 1807.394
    16384  17.073   18.278   17.985   57.646    1.689 3689.369

3) client: default, server: default max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  18.658   17.830   19.258   55.162    1.750    0.862
 33554432  17.193   18.265   18.517   56.974    1.854    1.780
 16777216  17.531   17.681   18.776   56.955    1.720    3.560
  8388608  18.234   17.547   18.201   56.926    1.014    7.116
  4194304  18.057   17.923   17.901   57.015    0.218   14.254
  2097152  18.565   17.739   17.658   56.958    1.277   28.479
  1048576  18.393   17.433   17.314   57.851    1.550   57.851
   524288  18.939   17.835   18.972   55.152    1.600  110.304
   262144  18.562   19.005   18.069   55.240    1.141  220.959
   131072  19.574   17.562   18.251   55.576    2.476  444.611
    65536  19.117   18.019   17.886   55.882    1.647  894.115
    32768  18.237   17.415   17.482   57.842    1.200 1850.933
    16384  17.760   18.444   18.055   56.631    0.876 3624.391

4) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  18.368   17.495   18.524   56.520    1.434    0.883
 33554432  18.209   17.523   19.146   56.052    2.027    1.752
 16777216  18.765   18.053   18.550   55.497    0.903    3.469
  8388608  17.878   17.848   18.389   56.778    0.774    7.097
  4194304  18.058   17.683   18.567   56.589    1.129   14.147
  2097152  18.896   18.384   18.697   54.888    0.623   27.444
  1048576  18.505   17.769   17.804   56.826    1.055   56.826
   524288  18.319   17.689   17.941   56.955    0.816  113.910
   262144  19.227   17.770   18.212   55.704    1.821  222.815
   131072  18.738   18.227   17.869   56.044    1.090  448.354
    65536  19.319   18.525   18.084   54.969    1.494  879.504
    32768  18.321   17.672   17.870   57.047    0.856 1825.495
    16384  18.249   17.495   18.146   57.025    1.073 3649.582

With io_context patch:
5) client: default, server: default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  12.393   11.925   12.627   83.196    1.989    1.300
 33554432  11.844   11.855   12.191   85.610    1.142    2.675
 16777216  12.729   12.602   12.068   82.187    1.913    5.137
  8388608  12.245   12.060   14.081   80.419    5.469   10.052
  4194304  13.224   11.866   12.110   82.763    3.833   20.691
  2097152  11.585   12.584   11.755   85.623    3.052   42.811
  1048576  12.166   12.144   12.321   83.867    0.539   83.867
   524288  12.019   12.148   12.160   84.568    0.448  169.137
   262144  12.014   12.378   12.074   84.259    1.095  337.036
   131072  11.840   12.068   11.849   85.921    0.756  687.369
    65536  12.098   11.803   12.312   84.857    1.470 1357.720
    32768  11.852   12.635   11.887   84.529    2.465 2704.931
    16384  12.443   13.110   11.881   82.197    3.299 5260.620

6) client: default, server: 64 max_sectors_kb, RA default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.033   12.122   11.950   82.911    3.110    1.295
 33554432  12.386   13.357   12.082   81.364    3.429    2.543
 16777216  12.102   11.542   12.053   86.096    1.860    5.381
  8388608  12.240   11.740   11.789   85.917    1.601   10.740
  4194304  11.824   12.388   12.042   84.768    1.621   21.192
  2097152  11.962   12.283   11.973   84.832    1.036   42.416
  1048576  12.639   11.863   12.010   84.197    2.290   84.197
   524288  11.809   12.919   11.853   84.121    3.439  168.243
   262144  12.105   12.649   12.779   81.894    1.940  327.577
   131072  12.441   12.769   12.713   81.017    0.923  648.137
    65536  12.490   13.308   12.440   80.414    2.457 1286.630
    32768  13.235   11.917   12.300   82.184    3.576 2629.883
    16384  12.335   12.394   12.201   83.187    0.549 5323.990

7) client: default, server: default max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  12.017   12.334   12.151   84.168    0.897    1.315
 33554432  12.265   12.200   11.976   84.310    0.864    2.635
 16777216  12.356   11.972   12.292   83.903    1.165    5.244
  8388608  12.247   12.368   11.769   84.472    1.825   10.559
  4194304  11.888   11.974   12.144   85.325    0.754   21.331
  2097152  12.433   10.938   11.669   87.911    4.595   43.956
  1048576  11.748   12.271   12.498   84.180    2.196   84.180
   524288  11.726   11.681   12.322   86.031    2.075  172.062
   262144  12.593   12.263   11.939   83.530    1.817  334.119
   131072  11.874   12.265   12.441   84.012    1.648  672.093
    65536  12.119   11.848   12.037   85.330    0.809 1365.277
    32768  12.549   12.080   12.008   83.882    1.625 2684.238
    16384  12.369   12.087   12.589   82.949    1.385 5308.766

8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  12.664   11.793   11.963   84.428    2.575    1.319
 33554432  11.825   12.074   12.442   84.571    1.761    2.643
 16777216  11.997   11.952   10.905   88.311    3.958    5.519
  8388608  11.866   12.270   11.796   85.519    1.476   10.690
  4194304  11.754   12.095   12.539   84.483    2.230   21.121
  2097152  11.948   11.633   11.886   86.628    1.007   43.314
  1048576  12.029   12.519   11.701   84.811    2.345   84.811
   524288  11.928   12.011   12.049   85.363    0.361  170.726
   262144  12.559   11.827   11.729   85.140    2.566  340.558
   131072  12.015   12.356   11.587   85.494    2.253  683.952
    65536  11.741   12.113   11.931   85.861    1.093 1373.770
    32768  12.655   11.738   12.237   83.945    2.589 2686.246
    16384  11.928   12.423   11.875   84.834    1.711 5429.381

9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.570   13.491   14.299   74.326    1.927    1.161
 33554432  13.238   13.198   13.255   77.398    0.142    2.419
 16777216  13.851   13.199   13.463   75.857    1.497    4.741
  8388608  13.339   16.695   13.551   71.223    7.010    8.903
  4194304  13.689   13.173   14.258   74.787    2.415   18.697
  2097152  13.518   13.543   13.894   75.021    0.934   37.510
  1048576  14.119   14.030   13.820   73.202    0.659   73.202
   524288  13.747   14.781   13.820   72.621    2.369  145.243
   262144  14.168   13.652   14.165   73.189    1.284  292.757
   131072  14.112   13.868   14.213   72.817    0.753  582.535
    65536  14.604   13.762   13.725   73.045    2.071 1168.728
    32768  14.796   15.356   14.486   68.861    1.653 2203.564
    16384  13.079   13.525   13.427   76.757    1.111 4912.426

10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  20.372   18.077   17.262   55.411    3.800    0.866
 33554432  17.287   17.620   17.828   58.263    0.740    1.821
 16777216  16.802   18.154   17.315   58.831    1.865    3.677
  8388608  17.510   18.291   17.253   57.939    1.427    7.242
  4194304  17.059   17.706   17.352   58.958    0.897   14.740
  2097152  17.252   18.064   17.615   58.059    1.090   29.029
  1048576  17.082   17.373   17.688   58.927    0.838   58.927
   524288  17.129   17.271   17.583   59.103    0.644  118.206
   262144  17.411   17.695   18.048   57.808    0.848  231.231
   131072  17.937   17.704   18.681   56.581    1.285  452.649
    65536  17.927   17.465   17.907   57.646    0.698  922.338
    32768  18.494   17.820   17.719   56.875    1.073 1819.985
    16384  18.800   17.759   17.575   56.798    1.666 3635.058

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  20.045   21.881   20.018   49.680    2.037    0.776
 33554432  20.768   20.291   20.464   49.938    0.479    1.561
 16777216  21.563   20.714   20.429   49.017    1.116    3.064
  8388608  21.290   21.109   21.308   48.221    0.205    6.028
  4194304  22.240   20.662   21.088   48.054    1.479   12.013
  2097152  20.282   21.098   20.580   49.593    0.806   24.796
  1048576  20.367   19.929   20.252   50.741    0.469   50.741
   524288  20.885   21.203   20.684   48.945    0.498   97.890
   262144  19.982   21.375   20.798   49.463    1.373  197.853
   131072  20.744   21.590   19.698   49.593    1.866  396.740
    65536  21.586   20.953   21.055   48.314    0.627  773.024
    32768  21.228   20.307   21.049   49.104    0.950 1571.327
    16384  21.257   21.209   21.150   48.289    0.100 3090.498

Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-13 12:12                                                                 ` Ronald Moesbergen
@ 2009-07-13 12:36                                                                   ` Wu Fengguang
  2009-07-13 12:47                                                                     ` Ronald Moesbergen
  2009-07-14 18:52                                                                     ` Vladislav Bolkhovitin
  2009-07-14 18:52                                                                   ` Vladislav Bolkhovitin
  1 sibling, 2 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-07-13 12:36 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: Vladislav Bolkhovitin, linux-kernel, akpm, kosaki.motohiro,
	Alan.Brunelle, hifumi.hisashi, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche

On Mon, Jul 13, 2009 at 08:12:14PM +0800, Ronald Moesbergen wrote:
> 2009/7/10 Vladislav Bolkhovitin <vst@vlnb.net>:
> >
> > Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
> >>
> >> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
> >>>>
> >>>> I've also long ago noticed that reading data from block devices is
> >>>> slower
> >>>> than from files from mounted on those block devices file systems. Can
> >>>> anybody explain it?
> >>>>
> >>>> Looks like this is strangeness #2 which we uncovered in our tests (the
> >>>> first
> >>>> one was earlier in this thread why the context RA doesn't work with
> >>>> cooperative I/O threads as good as it should).
> >>>>
> >>>> Can you rerun the same 11 tests over a file on the file system, please?
> >>>
> >>> I'll see what I can do. Just te be sure: you want me to run
> >>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
> >>> on the client over iScsi, right?
> >>
> >> Yes, please.
> >
> > Forgot to mention that you should also configure your backend storage as a
> > big file on a file system (preferably, XFS) too, not as direct device, like
> > /dev/vg/db-master.
> 
> Ok, here are the results:
 
Ronald, thanks for the numbers!

> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead patch

Do you mean the context readahead patch?

> Test done with XFS on both the target and the initiator. This confirms
> your findings, using files instead of block devices is faster, but
> only when using the io_context patch.

It shows that the one really matters is the io_context patch,
even when context readahead is running. I guess what happened
in the tests are:
- without readahead (or readahead algorithm failed to do proper
  sequential readaheads), the SCST processes will be submitting
  small but close to each other IOs.  CFQ relies on the io_context
  patch to prevent unnecessary idling.
- with proper readahead, the SCST processes will also be submitting
  close readahead IOs. For example, one file's 100-102MB pages is
  readahead by process A, while its 102-104MB pages may be
  readahead by process B. In this case CFQ will also idle waiting
  for process A to submit the next IO, but in fact that IO is being
  submitted by process B. So the io_context patch is still necessary
  even when context readahead is working fine. I guess context
  readahead do have the added value of possibly enlarging the IO size
  (however this benchmark seems to not very sensitive to IO size).

Thanks,
Fengguang

> Without io_context patch:
> 1) client: default, server: default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.327   18.327   17.740   56.491    0.872    0.883
>  33554432  18.662   18.311   18.116   55.772    0.683    1.743
>  16777216  18.900   18.421   18.312   55.229    0.754    3.452
>   8388608  18.893   18.533   18.281   55.156    0.743    6.895
>   4194304  18.512   18.097   18.400   55.850    0.536   13.963
>   2097152  18.635   18.313   18.676   55.232    0.486   27.616
>   1048576  18.441   18.264   18.245   55.907    0.267   55.907
>    524288  17.773   18.669   18.459   55.980    1.184  111.960
>    262144  18.580   18.758   17.483   56.091    1.767  224.365
>    131072  17.224   18.333   18.765   56.626    2.067  453.006
>     65536  18.082   19.223   18.238   55.348    1.483  885.567
>     32768  17.719   18.293   18.198   56.680    0.795 1813.766
>     16384  17.872   18.322   17.537   57.192    1.024 3660.273
> 
> 2) client: default, server: 64 max_sectors_kb, RA default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.738   18.435   18.400   55.283    0.451    0.864
>  33554432  18.046   18.167   17.572   57.128    0.826    1.785
>  16777216  18.504   18.203   18.377   55.771    0.376    3.486
>   8388608  22.069   18.554   17.825   53.013    4.766    6.627
>   4194304  19.211   18.136   18.083   55.465    1.529   13.866
>   2097152  18.647   17.851   18.511   55.866    1.071   27.933
>   1048576  19.084   18.177   18.194   55.425    1.249   55.425
>    524288  18.999   18.553   18.380   54.934    0.763  109.868
>    262144  18.867   18.273   18.063   55.668    1.020  222.673
>    131072  17.846   18.966   18.193   55.885    1.412  447.081
>     65536  18.195   18.616   18.482   55.564    0.530  889.023
>     32768  17.882   18.841   17.707   56.481    1.525 1807.394
>     16384  17.073   18.278   17.985   57.646    1.689 3689.369
> 
> 3) client: default, server: default max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.658   17.830   19.258   55.162    1.750    0.862
>  33554432  17.193   18.265   18.517   56.974    1.854    1.780
>  16777216  17.531   17.681   18.776   56.955    1.720    3.560
>   8388608  18.234   17.547   18.201   56.926    1.014    7.116
>   4194304  18.057   17.923   17.901   57.015    0.218   14.254
>   2097152  18.565   17.739   17.658   56.958    1.277   28.479
>   1048576  18.393   17.433   17.314   57.851    1.550   57.851
>    524288  18.939   17.835   18.972   55.152    1.600  110.304
>    262144  18.562   19.005   18.069   55.240    1.141  220.959
>    131072  19.574   17.562   18.251   55.576    2.476  444.611
>     65536  19.117   18.019   17.886   55.882    1.647  894.115
>     32768  18.237   17.415   17.482   57.842    1.200 1850.933
>     16384  17.760   18.444   18.055   56.631    0.876 3624.391
> 
> 4) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.368   17.495   18.524   56.520    1.434    0.883
>  33554432  18.209   17.523   19.146   56.052    2.027    1.752
>  16777216  18.765   18.053   18.550   55.497    0.903    3.469
>   8388608  17.878   17.848   18.389   56.778    0.774    7.097
>   4194304  18.058   17.683   18.567   56.589    1.129   14.147
>   2097152  18.896   18.384   18.697   54.888    0.623   27.444
>   1048576  18.505   17.769   17.804   56.826    1.055   56.826
>    524288  18.319   17.689   17.941   56.955    0.816  113.910
>    262144  19.227   17.770   18.212   55.704    1.821  222.815
>    131072  18.738   18.227   17.869   56.044    1.090  448.354
>     65536  19.319   18.525   18.084   54.969    1.494  879.504
>     32768  18.321   17.672   17.870   57.047    0.856 1825.495
>     16384  18.249   17.495   18.146   57.025    1.073 3649.582
> 
> With io_context patch:
> 5) client: default, server: default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  12.393   11.925   12.627   83.196    1.989    1.300
>  33554432  11.844   11.855   12.191   85.610    1.142    2.675
>  16777216  12.729   12.602   12.068   82.187    1.913    5.137
>   8388608  12.245   12.060   14.081   80.419    5.469   10.052
>   4194304  13.224   11.866   12.110   82.763    3.833   20.691
>   2097152  11.585   12.584   11.755   85.623    3.052   42.811
>   1048576  12.166   12.144   12.321   83.867    0.539   83.867
>    524288  12.019   12.148   12.160   84.568    0.448  169.137
>    262144  12.014   12.378   12.074   84.259    1.095  337.036
>    131072  11.840   12.068   11.849   85.921    0.756  687.369
>     65536  12.098   11.803   12.312   84.857    1.470 1357.720
>     32768  11.852   12.635   11.887   84.529    2.465 2704.931
>     16384  12.443   13.110   11.881   82.197    3.299 5260.620
> 
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.033   12.122   11.950   82.911    3.110    1.295
>  33554432  12.386   13.357   12.082   81.364    3.429    2.543
>  16777216  12.102   11.542   12.053   86.096    1.860    5.381
>   8388608  12.240   11.740   11.789   85.917    1.601   10.740
>   4194304  11.824   12.388   12.042   84.768    1.621   21.192
>   2097152  11.962   12.283   11.973   84.832    1.036   42.416
>   1048576  12.639   11.863   12.010   84.197    2.290   84.197
>    524288  11.809   12.919   11.853   84.121    3.439  168.243
>    262144  12.105   12.649   12.779   81.894    1.940  327.577
>    131072  12.441   12.769   12.713   81.017    0.923  648.137
>     65536  12.490   13.308   12.440   80.414    2.457 1286.630
>     32768  13.235   11.917   12.300   82.184    3.576 2629.883
>     16384  12.335   12.394   12.201   83.187    0.549 5323.990
> 
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  12.017   12.334   12.151   84.168    0.897    1.315
>  33554432  12.265   12.200   11.976   84.310    0.864    2.635
>  16777216  12.356   11.972   12.292   83.903    1.165    5.244
>   8388608  12.247   12.368   11.769   84.472    1.825   10.559
>   4194304  11.888   11.974   12.144   85.325    0.754   21.331
>   2097152  12.433   10.938   11.669   87.911    4.595   43.956
>   1048576  11.748   12.271   12.498   84.180    2.196   84.180
>    524288  11.726   11.681   12.322   86.031    2.075  172.062
>    262144  12.593   12.263   11.939   83.530    1.817  334.119
>    131072  11.874   12.265   12.441   84.012    1.648  672.093
>     65536  12.119   11.848   12.037   85.330    0.809 1365.277
>     32768  12.549   12.080   12.008   83.882    1.625 2684.238
>     16384  12.369   12.087   12.589   82.949    1.385 5308.766
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  12.664   11.793   11.963   84.428    2.575    1.319
>  33554432  11.825   12.074   12.442   84.571    1.761    2.643
>  16777216  11.997   11.952   10.905   88.311    3.958    5.519
>   8388608  11.866   12.270   11.796   85.519    1.476   10.690
>   4194304  11.754   12.095   12.539   84.483    2.230   21.121
>   2097152  11.948   11.633   11.886   86.628    1.007   43.314
>   1048576  12.029   12.519   11.701   84.811    2.345   84.811
>    524288  11.928   12.011   12.049   85.363    0.361  170.726
>    262144  12.559   11.827   11.729   85.140    2.566  340.558
>    131072  12.015   12.356   11.587   85.494    2.253  683.952
>     65536  11.741   12.113   11.931   85.861    1.093 1373.770
>     32768  12.655   11.738   12.237   83.945    2.589 2686.246
>     16384  11.928   12.423   11.875   84.834    1.711 5429.381
> 
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.570   13.491   14.299   74.326    1.927    1.161
>  33554432  13.238   13.198   13.255   77.398    0.142    2.419
>  16777216  13.851   13.199   13.463   75.857    1.497    4.741
>   8388608  13.339   16.695   13.551   71.223    7.010    8.903
>   4194304  13.689   13.173   14.258   74.787    2.415   18.697
>   2097152  13.518   13.543   13.894   75.021    0.934   37.510
>   1048576  14.119   14.030   13.820   73.202    0.659   73.202
>    524288  13.747   14.781   13.820   72.621    2.369  145.243
>    262144  14.168   13.652   14.165   73.189    1.284  292.757
>    131072  14.112   13.868   14.213   72.817    0.753  582.535
>     65536  14.604   13.762   13.725   73.045    2.071 1168.728
>     32768  14.796   15.356   14.486   68.861    1.653 2203.564
>     16384  13.079   13.525   13.427   76.757    1.111 4912.426
> 
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  20.372   18.077   17.262   55.411    3.800    0.866
>  33554432  17.287   17.620   17.828   58.263    0.740    1.821
>  16777216  16.802   18.154   17.315   58.831    1.865    3.677
>   8388608  17.510   18.291   17.253   57.939    1.427    7.242
>   4194304  17.059   17.706   17.352   58.958    0.897   14.740
>   2097152  17.252   18.064   17.615   58.059    1.090   29.029
>   1048576  17.082   17.373   17.688   58.927    0.838   58.927
>    524288  17.129   17.271   17.583   59.103    0.644  118.206
>    262144  17.411   17.695   18.048   57.808    0.848  231.231
>    131072  17.937   17.704   18.681   56.581    1.285  452.649
>     65536  17.927   17.465   17.907   57.646    0.698  922.338
>     32768  18.494   17.820   17.719   56.875    1.073 1819.985
>     16384  18.800   17.759   17.575   56.798    1.666 3635.058
> 
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  20.045   21.881   20.018   49.680    2.037    0.776
>  33554432  20.768   20.291   20.464   49.938    0.479    1.561
>  16777216  21.563   20.714   20.429   49.017    1.116    3.064
>   8388608  21.290   21.109   21.308   48.221    0.205    6.028
>   4194304  22.240   20.662   21.088   48.054    1.479   12.013
>   2097152  20.282   21.098   20.580   49.593    0.806   24.796
>   1048576  20.367   19.929   20.252   50.741    0.469   50.741
>    524288  20.885   21.203   20.684   48.945    0.498   97.890
>    262144  19.982   21.375   20.798   49.463    1.373  197.853
>    131072  20.744   21.590   19.698   49.593    1.866  396.740
>     65536  21.586   20.953   21.055   48.314    0.627  773.024
>     32768  21.228   20.307   21.049   49.104    0.950 1571.327
>     16384  21.257   21.209   21.150   48.289    0.100 3090.498
> 
> Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-13 12:36                                                                   ` Wu Fengguang
@ 2009-07-13 12:47                                                                     ` Ronald Moesbergen
  2009-07-13 12:52                                                                       ` Wu Fengguang
  2009-07-14 18:52                                                                     ` Vladislav Bolkhovitin
  1 sibling, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-13 12:47 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Vladislav Bolkhovitin, linux-kernel, akpm, kosaki.motohiro,
	Alan.Brunelle, hifumi.hisashi, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche

2009/7/13 Wu Fengguang <fengguang.wu@intel.com>:
> On Mon, Jul 13, 2009 at 08:12:14PM +0800, Ronald Moesbergen wrote:
>> 2009/7/10 Vladislav Bolkhovitin <vst@vlnb.net>:
>> >
>> > Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
>> >>
>> >> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>> >>>>
>> >>>> I've also long ago noticed that reading data from block devices is
>> >>>> slower
>> >>>> than from files from mounted on those block devices file systems. Can
>> >>>> anybody explain it?
>> >>>>
>> >>>> Looks like this is strangeness #2 which we uncovered in our tests (the
>> >>>> first
>> >>>> one was earlier in this thread why the context RA doesn't work with
>> >>>> cooperative I/O threads as good as it should).
>> >>>>
>> >>>> Can you rerun the same 11 tests over a file on the file system, please?
>> >>>
>> >>> I'll see what I can do. Just te be sure: you want me to run
>> >>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>> >>> on the client over iScsi, right?
>> >>
>> >> Yes, please.
>> >
>> > Forgot to mention that you should also configure your backend storage as a
>> > big file on a file system (preferably, XFS) too, not as direct device, like
>> > /dev/vg/db-master.
>>
>> Ok, here are the results:
>
> Ronald, thanks for the numbers!

You're welcome.

>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with readahead patch
>
> Do you mean the context readahead patch?

No, I meant the blk_run_backing_dev patch. The patchnames are
confusing, I'll be sure to clarify them from now on.

Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-13 12:47                                                                     ` Ronald Moesbergen
@ 2009-07-13 12:52                                                                       ` Wu Fengguang
  0 siblings, 0 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-07-13 12:52 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: Vladislav Bolkhovitin, linux-kernel, akpm, kosaki.motohiro,
	Alan.Brunelle, hifumi.hisashi, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche

On Mon, Jul 13, 2009 at 08:47:31PM +0800, Ronald Moesbergen wrote:
> 2009/7/13 Wu Fengguang <fengguang.wu@intel.com>:
> > On Mon, Jul 13, 2009 at 08:12:14PM +0800, Ronald Moesbergen wrote:
> >> 2009/7/10 Vladislav Bolkhovitin <vst@vlnb.net>:
> >> >
> >> > Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
> >> >>
> >> >> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
> >> >>>>
> >> >>>> I've also long ago noticed that reading data from block devices is
> >> >>>> slower
> >> >>>> than from files from mounted on those block devices file systems. Can
> >> >>>> anybody explain it?
> >> >>>>
> >> >>>> Looks like this is strangeness #2 which we uncovered in our tests (the
> >> >>>> first
> >> >>>> one was earlier in this thread why the context RA doesn't work with
> >> >>>> cooperative I/O threads as good as it should).
> >> >>>>
> >> >>>> Can you rerun the same 11 tests over a file on the file system, please?
> >> >>>
> >> >>> I'll see what I can do. Just te be sure: you want me to run
> >> >>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
> >> >>> on the client over iScsi, right?
> >> >>
> >> >> Yes, please.
> >> >
> >> > Forgot to mention that you should also configure your backend storage as a
> >> > big file on a file system (preferably, XFS) too, not as direct device, like
> >> > /dev/vg/db-master.
> >>
> >> Ok, here are the results:
> >
> > Ronald, thanks for the numbers!
> 
> You're welcome.
> 
> >> client kernel: 2.6.26-15lenny3 (debian)
> >> server kernel: 2.6.29.5 with readahead patch
> >
> > Do you mean the context readahead patch?
> 
> No, I meant the blk_run_backing_dev patch. The patchnames are
> confusing, I'll be sure to clarify them from now on.

That's OK.  I did see previous benchmarks were not helped by context
readahead noticeably on CFQ, hehe.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-13 12:12                                                                 ` Ronald Moesbergen
  2009-07-13 12:36                                                                   ` Wu Fengguang
@ 2009-07-14 18:52                                                                   ` Vladislav Bolkhovitin
  2009-07-15  6:30                                                                     ` Vladislav Bolkhovitin
  1 sibling, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-14 18:52 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
	Bart Van Assche

Ronald Moesbergen, on 07/13/2009 04:12 PM wrote:
> 2009/7/10 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
>>> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>>>>> I've also long ago noticed that reading data from block devices is
>>>>> slower
>>>>> than from files from mounted on those block devices file systems. Can
>>>>> anybody explain it?
>>>>>
>>>>> Looks like this is strangeness #2 which we uncovered in our tests (the
>>>>> first
>>>>> one was earlier in this thread why the context RA doesn't work with
>>>>> cooperative I/O threads as good as it should).
>>>>>
>>>>> Can you rerun the same 11 tests over a file on the file system, please?
>>>> I'll see what I can do. Just te be sure: you want me to run
>>>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>>>> on the client over iScsi, right?
>>> Yes, please.
>> Forgot to mention that you should also configure your backend storage as a
>> big file on a file system (preferably, XFS) too, not as direct device, like
>> /dev/vg/db-master.
> 
> Ok, here are the results:
> 
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead patch
> 
> Test done with XFS on both the target and the initiator. This confirms
> your findings, using files instead of block devices is faster, but
> only when using the io_context patch.

Seems, correct, except case (2), which is still 10% faster.

> Without io_context patch:
> 1) client: default, server: default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.327   18.327   17.740   56.491    0.872    0.883
>  33554432  18.662   18.311   18.116   55.772    0.683    1.743
>  16777216  18.900   18.421   18.312   55.229    0.754    3.452
>   8388608  18.893   18.533   18.281   55.156    0.743    6.895
>   4194304  18.512   18.097   18.400   55.850    0.536   13.963
>   2097152  18.635   18.313   18.676   55.232    0.486   27.616
>   1048576  18.441   18.264   18.245   55.907    0.267   55.907
>    524288  17.773   18.669   18.459   55.980    1.184  111.960
>    262144  18.580   18.758   17.483   56.091    1.767  224.365
>    131072  17.224   18.333   18.765   56.626    2.067  453.006
>     65536  18.082   19.223   18.238   55.348    1.483  885.567
>     32768  17.719   18.293   18.198   56.680    0.795 1813.766
>     16384  17.872   18.322   17.537   57.192    1.024 3660.273
> 
> 2) client: default, server: 64 max_sectors_kb, RA default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.738   18.435   18.400   55.283    0.451    0.864
>  33554432  18.046   18.167   17.572   57.128    0.826    1.785
>  16777216  18.504   18.203   18.377   55.771    0.376    3.486
>   8388608  22.069   18.554   17.825   53.013    4.766    6.627
>   4194304  19.211   18.136   18.083   55.465    1.529   13.866
>   2097152  18.647   17.851   18.511   55.866    1.071   27.933
>   1048576  19.084   18.177   18.194   55.425    1.249   55.425
>    524288  18.999   18.553   18.380   54.934    0.763  109.868
>    262144  18.867   18.273   18.063   55.668    1.020  222.673
>    131072  17.846   18.966   18.193   55.885    1.412  447.081
>     65536  18.195   18.616   18.482   55.564    0.530  889.023
>     32768  17.882   18.841   17.707   56.481    1.525 1807.394
>     16384  17.073   18.278   17.985   57.646    1.689 3689.369
> 
> 3) client: default, server: default max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.658   17.830   19.258   55.162    1.750    0.862
>  33554432  17.193   18.265   18.517   56.974    1.854    1.780
>  16777216  17.531   17.681   18.776   56.955    1.720    3.560
>   8388608  18.234   17.547   18.201   56.926    1.014    7.116
>   4194304  18.057   17.923   17.901   57.015    0.218   14.254
>   2097152  18.565   17.739   17.658   56.958    1.277   28.479
>   1048576  18.393   17.433   17.314   57.851    1.550   57.851
>    524288  18.939   17.835   18.972   55.152    1.600  110.304
>    262144  18.562   19.005   18.069   55.240    1.141  220.959
>    131072  19.574   17.562   18.251   55.576    2.476  444.611
>     65536  19.117   18.019   17.886   55.882    1.647  894.115
>     32768  18.237   17.415   17.482   57.842    1.200 1850.933
>     16384  17.760   18.444   18.055   56.631    0.876 3624.391
> 
> 4) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.368   17.495   18.524   56.520    1.434    0.883
>  33554432  18.209   17.523   19.146   56.052    2.027    1.752
>  16777216  18.765   18.053   18.550   55.497    0.903    3.469
>   8388608  17.878   17.848   18.389   56.778    0.774    7.097
>   4194304  18.058   17.683   18.567   56.589    1.129   14.147
>   2097152  18.896   18.384   18.697   54.888    0.623   27.444
>   1048576  18.505   17.769   17.804   56.826    1.055   56.826
>    524288  18.319   17.689   17.941   56.955    0.816  113.910
>    262144  19.227   17.770   18.212   55.704    1.821  222.815
>    131072  18.738   18.227   17.869   56.044    1.090  448.354
>     65536  19.319   18.525   18.084   54.969    1.494  879.504
>     32768  18.321   17.672   17.870   57.047    0.856 1825.495
>     16384  18.249   17.495   18.146   57.025    1.073 3649.582
> 
> With io_context patch:
> 5) client: default, server: default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  12.393   11.925   12.627   83.196    1.989    1.300
>  33554432  11.844   11.855   12.191   85.610    1.142    2.675
>  16777216  12.729   12.602   12.068   82.187    1.913    5.137
>   8388608  12.245   12.060   14.081   80.419    5.469   10.052
>   4194304  13.224   11.866   12.110   82.763    3.833   20.691
>   2097152  11.585   12.584   11.755   85.623    3.052   42.811
>   1048576  12.166   12.144   12.321   83.867    0.539   83.867
>    524288  12.019   12.148   12.160   84.568    0.448  169.137
>    262144  12.014   12.378   12.074   84.259    1.095  337.036
>    131072  11.840   12.068   11.849   85.921    0.756  687.369
>     65536  12.098   11.803   12.312   84.857    1.470 1357.720
>     32768  11.852   12.635   11.887   84.529    2.465 2704.931
>     16384  12.443   13.110   11.881   82.197    3.299 5260.620
> 
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.033   12.122   11.950   82.911    3.110    1.295
>  33554432  12.386   13.357   12.082   81.364    3.429    2.543
>  16777216  12.102   11.542   12.053   86.096    1.860    5.381
>   8388608  12.240   11.740   11.789   85.917    1.601   10.740
>   4194304  11.824   12.388   12.042   84.768    1.621   21.192
>   2097152  11.962   12.283   11.973   84.832    1.036   42.416
>   1048576  12.639   11.863   12.010   84.197    2.290   84.197
>    524288  11.809   12.919   11.853   84.121    3.439  168.243
>    262144  12.105   12.649   12.779   81.894    1.940  327.577
>    131072  12.441   12.769   12.713   81.017    0.923  648.137
>     65536  12.490   13.308   12.440   80.414    2.457 1286.630
>     32768  13.235   11.917   12.300   82.184    3.576 2629.883
>     16384  12.335   12.394   12.201   83.187    0.549 5323.990
> 
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  12.017   12.334   12.151   84.168    0.897    1.315
>  33554432  12.265   12.200   11.976   84.310    0.864    2.635
>  16777216  12.356   11.972   12.292   83.903    1.165    5.244
>   8388608  12.247   12.368   11.769   84.472    1.825   10.559
>   4194304  11.888   11.974   12.144   85.325    0.754   21.331
>   2097152  12.433   10.938   11.669   87.911    4.595   43.956
>   1048576  11.748   12.271   12.498   84.180    2.196   84.180
>    524288  11.726   11.681   12.322   86.031    2.075  172.062
>    262144  12.593   12.263   11.939   83.530    1.817  334.119
>    131072  11.874   12.265   12.441   84.012    1.648  672.093
>     65536  12.119   11.848   12.037   85.330    0.809 1365.277
>     32768  12.549   12.080   12.008   83.882    1.625 2684.238
>     16384  12.369   12.087   12.589   82.949    1.385 5308.766
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  12.664   11.793   11.963   84.428    2.575    1.319
>  33554432  11.825   12.074   12.442   84.571    1.761    2.643
>  16777216  11.997   11.952   10.905   88.311    3.958    5.519
>   8388608  11.866   12.270   11.796   85.519    1.476   10.690
>   4194304  11.754   12.095   12.539   84.483    2.230   21.121
>   2097152  11.948   11.633   11.886   86.628    1.007   43.314
>   1048576  12.029   12.519   11.701   84.811    2.345   84.811
>    524288  11.928   12.011   12.049   85.363    0.361  170.726
>    262144  12.559   11.827   11.729   85.140    2.566  340.558
>    131072  12.015   12.356   11.587   85.494    2.253  683.952
>     65536  11.741   12.113   11.931   85.861    1.093 1373.770
>     32768  12.655   11.738   12.237   83.945    2.589 2686.246
>     16384  11.928   12.423   11.875   84.834    1.711 5429.381
> 
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.570   13.491   14.299   74.326    1.927    1.161
>  33554432  13.238   13.198   13.255   77.398    0.142    2.419
>  16777216  13.851   13.199   13.463   75.857    1.497    4.741
>   8388608  13.339   16.695   13.551   71.223    7.010    8.903
>   4194304  13.689   13.173   14.258   74.787    2.415   18.697
>   2097152  13.518   13.543   13.894   75.021    0.934   37.510
>   1048576  14.119   14.030   13.820   73.202    0.659   73.202
>    524288  13.747   14.781   13.820   72.621    2.369  145.243
>    262144  14.168   13.652   14.165   73.189    1.284  292.757
>    131072  14.112   13.868   14.213   72.817    0.753  582.535
>     65536  14.604   13.762   13.725   73.045    2.071 1168.728
>     32768  14.796   15.356   14.486   68.861    1.653 2203.564
>     16384  13.079   13.525   13.427   76.757    1.111 4912.426
> 
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  20.372   18.077   17.262   55.411    3.800    0.866
>  33554432  17.287   17.620   17.828   58.263    0.740    1.821
>  16777216  16.802   18.154   17.315   58.831    1.865    3.677
>   8388608  17.510   18.291   17.253   57.939    1.427    7.242
>   4194304  17.059   17.706   17.352   58.958    0.897   14.740
>   2097152  17.252   18.064   17.615   58.059    1.090   29.029
>   1048576  17.082   17.373   17.688   58.927    0.838   58.927
>    524288  17.129   17.271   17.583   59.103    0.644  118.206
>    262144  17.411   17.695   18.048   57.808    0.848  231.231
>    131072  17.937   17.704   18.681   56.581    1.285  452.649
>     65536  17.927   17.465   17.907   57.646    0.698  922.338
>     32768  18.494   17.820   17.719   56.875    1.073 1819.985
>     16384  18.800   17.759   17.575   56.798    1.666 3635.058
> 
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  20.045   21.881   20.018   49.680    2.037    0.776
>  33554432  20.768   20.291   20.464   49.938    0.479    1.561
>  16777216  21.563   20.714   20.429   49.017    1.116    3.064
>   8388608  21.290   21.109   21.308   48.221    0.205    6.028
>   4194304  22.240   20.662   21.088   48.054    1.479   12.013
>   2097152  20.282   21.098   20.580   49.593    0.806   24.796
>   1048576  20.367   19.929   20.252   50.741    0.469   50.741
>    524288  20.885   21.203   20.684   48.945    0.498   97.890
>    262144  19.982   21.375   20.798   49.463    1.373  197.853
>    131072  20.744   21.590   19.698   49.593    1.866  396.740
>     65536  21.586   20.953   21.055   48.314    0.627  773.024
>     32768  21.228   20.307   21.049   49.104    0.950 1571.327
>     16384  21.257   21.209   21.150   48.289    0.100 3090.498

The drop with 64 max_sectors_kb on the client is a consequence of how 
CFQ is working. I can't find the exact code responsible for this, but 
from all signs, CFQ stops delaying requests if amount of outstanding 
requests exceeds some threshold, which is 2 or 3. With 64 max_sectors_kb 
and 5 SCST I/O threads this threshold is exceeded, so CFQ doesn't 
recover order of requests, hence the performance drop. With default 512 
max_sectors_kb and 128K RA the server sees at max 2 requests at time.

Ronald, can you perform the same tests with 1 and 2 SCST I/O threads, 
please?

You can limit amount of SCST I/O threads by num_threads parameter of 
scst_vdisk module.

Thanks,
Vlad

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-13 12:36                                                                   ` Wu Fengguang
  2009-07-13 12:47                                                                     ` Ronald Moesbergen
@ 2009-07-14 18:52                                                                     ` Vladislav Bolkhovitin
  2009-07-15  7:06                                                                       ` Wu Fengguang
  1 sibling, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-14 18:52 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Ronald Moesbergen, linux-kernel, akpm, kosaki.motohiro,
	Alan.Brunelle, hifumi.hisashi, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche


Wu Fengguang, on 07/13/2009 04:36 PM wrote:
>> Test done with XFS on both the target and the initiator. This confirms
>> your findings, using files instead of block devices is faster, but
>> only when using the io_context patch.
> 
> It shows that the one really matters is the io_context patch,
> even when context readahead is running. I guess what happened
> in the tests are:
> - without readahead (or readahead algorithm failed to do proper
>   sequential readaheads), the SCST processes will be submitting
>   small but close to each other IOs.  CFQ relies on the io_context
>   patch to prevent unnecessary idling.
> - with proper readahead, the SCST processes will also be submitting
>   close readahead IOs. For example, one file's 100-102MB pages is
>   readahead by process A, while its 102-104MB pages may be
>   readahead by process B. In this case CFQ will also idle waiting
>   for process A to submit the next IO, but in fact that IO is being
>   submitted by process B. So the io_context patch is still necessary
>   even when context readahead is working fine. I guess context
>   readahead do have the added value of possibly enlarging the IO size
>   (however this benchmark seems to not very sensitive to IO size).

Looks like the truth. Although with 2MB RA I expect CFQ to do idling >10 
times less, which should bring bigger improvement than few %%.

For how long CFQ idles? For HZ/125, i.e. 8 ms with HZ 250?

> Thanks,
> Fengguang


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-14 18:52                                                                   ` Vladislav Bolkhovitin
@ 2009-07-15  6:30                                                                     ` Vladislav Bolkhovitin
  2009-07-16  7:32                                                                       ` Ronald Moesbergen
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-15  6:30 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	hifumi.hisashi, linux-fsdevel, jens.axboe, randy.dunlap,
	Bart Van Assche

Vladislav Bolkhovitin, on 07/14/2009 10:52 PM wrote:
> Ronald Moesbergen, on 07/13/2009 04:12 PM wrote:
>> 2009/7/10 Vladislav Bolkhovitin <vst@vlnb.net>:
>>> Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
>>>> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>>>>>> I've also long ago noticed that reading data from block devices is
>>>>>> slower
>>>>>> than from files from mounted on those block devices file systems. Can
>>>>>> anybody explain it?
>>>>>>
>>>>>> Looks like this is strangeness #2 which we uncovered in our tests (the
>>>>>> first
>>>>>> one was earlier in this thread why the context RA doesn't work with
>>>>>> cooperative I/O threads as good as it should).
>>>>>>
>>>>>> Can you rerun the same 11 tests over a file on the file system, please?
>>>>> I'll see what I can do. Just te be sure: you want me to run
>>>>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>>>>> on the client over iScsi, right?
>>>> Yes, please.
>>> Forgot to mention that you should also configure your backend storage as a
>>> big file on a file system (preferably, XFS) too, not as direct device, like
>>> /dev/vg/db-master.
>> Ok, here are the results:
>>
>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with readahead patch
>>
>> Test done with XFS on both the target and the initiator. This confirms
>> your findings, using files instead of block devices is faster, but
>> only when using the io_context patch.
> 
> Seems, correct, except case (2), which is still 10% faster.
> 
>> Without io_context patch:
>> 1) client: default, server: default
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  18.327   18.327   17.740   56.491    0.872    0.883
>>  33554432  18.662   18.311   18.116   55.772    0.683    1.743
>>  16777216  18.900   18.421   18.312   55.229    0.754    3.452
>>   8388608  18.893   18.533   18.281   55.156    0.743    6.895
>>   4194304  18.512   18.097   18.400   55.850    0.536   13.963
>>   2097152  18.635   18.313   18.676   55.232    0.486   27.616
>>   1048576  18.441   18.264   18.245   55.907    0.267   55.907
>>    524288  17.773   18.669   18.459   55.980    1.184  111.960
>>    262144  18.580   18.758   17.483   56.091    1.767  224.365
>>    131072  17.224   18.333   18.765   56.626    2.067  453.006
>>     65536  18.082   19.223   18.238   55.348    1.483  885.567
>>     32768  17.719   18.293   18.198   56.680    0.795 1813.766
>>     16384  17.872   18.322   17.537   57.192    1.024 3660.273
>>
>> 2) client: default, server: 64 max_sectors_kb, RA default
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  18.738   18.435   18.400   55.283    0.451    0.864
>>  33554432  18.046   18.167   17.572   57.128    0.826    1.785
>>  16777216  18.504   18.203   18.377   55.771    0.376    3.486
>>   8388608  22.069   18.554   17.825   53.013    4.766    6.627
>>   4194304  19.211   18.136   18.083   55.465    1.529   13.866
>>   2097152  18.647   17.851   18.511   55.866    1.071   27.933
>>   1048576  19.084   18.177   18.194   55.425    1.249   55.425
>>    524288  18.999   18.553   18.380   54.934    0.763  109.868
>>    262144  18.867   18.273   18.063   55.668    1.020  222.673
>>    131072  17.846   18.966   18.193   55.885    1.412  447.081
>>     65536  18.195   18.616   18.482   55.564    0.530  889.023
>>     32768  17.882   18.841   17.707   56.481    1.525 1807.394
>>     16384  17.073   18.278   17.985   57.646    1.689 3689.369
>>
>> 3) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  18.658   17.830   19.258   55.162    1.750    0.862
>>  33554432  17.193   18.265   18.517   56.974    1.854    1.780
>>  16777216  17.531   17.681   18.776   56.955    1.720    3.560
>>   8388608  18.234   17.547   18.201   56.926    1.014    7.116
>>   4194304  18.057   17.923   17.901   57.015    0.218   14.254
>>   2097152  18.565   17.739   17.658   56.958    1.277   28.479
>>   1048576  18.393   17.433   17.314   57.851    1.550   57.851
>>    524288  18.939   17.835   18.972   55.152    1.600  110.304
>>    262144  18.562   19.005   18.069   55.240    1.141  220.959
>>    131072  19.574   17.562   18.251   55.576    2.476  444.611
>>     65536  19.117   18.019   17.886   55.882    1.647  894.115
>>     32768  18.237   17.415   17.482   57.842    1.200 1850.933
>>     16384  17.760   18.444   18.055   56.631    0.876 3624.391
>>
>> 4) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  18.368   17.495   18.524   56.520    1.434    0.883
>>  33554432  18.209   17.523   19.146   56.052    2.027    1.752
>>  16777216  18.765   18.053   18.550   55.497    0.903    3.469
>>   8388608  17.878   17.848   18.389   56.778    0.774    7.097
>>   4194304  18.058   17.683   18.567   56.589    1.129   14.147
>>   2097152  18.896   18.384   18.697   54.888    0.623   27.444
>>   1048576  18.505   17.769   17.804   56.826    1.055   56.826
>>    524288  18.319   17.689   17.941   56.955    0.816  113.910
>>    262144  19.227   17.770   18.212   55.704    1.821  222.815
>>    131072  18.738   18.227   17.869   56.044    1.090  448.354
>>     65536  19.319   18.525   18.084   54.969    1.494  879.504
>>     32768  18.321   17.672   17.870   57.047    0.856 1825.495
>>     16384  18.249   17.495   18.146   57.025    1.073 3649.582
>>
>> With io_context patch:
>> 5) client: default, server: default
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  12.393   11.925   12.627   83.196    1.989    1.300
>>  33554432  11.844   11.855   12.191   85.610    1.142    2.675
>>  16777216  12.729   12.602   12.068   82.187    1.913    5.137
>>   8388608  12.245   12.060   14.081   80.419    5.469   10.052
>>   4194304  13.224   11.866   12.110   82.763    3.833   20.691
>>   2097152  11.585   12.584   11.755   85.623    3.052   42.811
>>   1048576  12.166   12.144   12.321   83.867    0.539   83.867
>>    524288  12.019   12.148   12.160   84.568    0.448  169.137
>>    262144  12.014   12.378   12.074   84.259    1.095  337.036
>>    131072  11.840   12.068   11.849   85.921    0.756  687.369
>>     65536  12.098   11.803   12.312   84.857    1.470 1357.720
>>     32768  11.852   12.635   11.887   84.529    2.465 2704.931
>>     16384  12.443   13.110   11.881   82.197    3.299 5260.620
>>
>> 6) client: default, server: 64 max_sectors_kb, RA default
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  13.033   12.122   11.950   82.911    3.110    1.295
>>  33554432  12.386   13.357   12.082   81.364    3.429    2.543
>>  16777216  12.102   11.542   12.053   86.096    1.860    5.381
>>   8388608  12.240   11.740   11.789   85.917    1.601   10.740
>>   4194304  11.824   12.388   12.042   84.768    1.621   21.192
>>   2097152  11.962   12.283   11.973   84.832    1.036   42.416
>>   1048576  12.639   11.863   12.010   84.197    2.290   84.197
>>    524288  11.809   12.919   11.853   84.121    3.439  168.243
>>    262144  12.105   12.649   12.779   81.894    1.940  327.577
>>    131072  12.441   12.769   12.713   81.017    0.923  648.137
>>     65536  12.490   13.308   12.440   80.414    2.457 1286.630
>>     32768  13.235   11.917   12.300   82.184    3.576 2629.883
>>     16384  12.335   12.394   12.201   83.187    0.549 5323.990
>>
>> 7) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  12.017   12.334   12.151   84.168    0.897    1.315
>>  33554432  12.265   12.200   11.976   84.310    0.864    2.635
>>  16777216  12.356   11.972   12.292   83.903    1.165    5.244
>>   8388608  12.247   12.368   11.769   84.472    1.825   10.559
>>   4194304  11.888   11.974   12.144   85.325    0.754   21.331
>>   2097152  12.433   10.938   11.669   87.911    4.595   43.956
>>   1048576  11.748   12.271   12.498   84.180    2.196   84.180
>>    524288  11.726   11.681   12.322   86.031    2.075  172.062
>>    262144  12.593   12.263   11.939   83.530    1.817  334.119
>>    131072  11.874   12.265   12.441   84.012    1.648  672.093
>>     65536  12.119   11.848   12.037   85.330    0.809 1365.277
>>     32768  12.549   12.080   12.008   83.882    1.625 2684.238
>>     16384  12.369   12.087   12.589   82.949    1.385 5308.766
>>
>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  12.664   11.793   11.963   84.428    2.575    1.319
>>  33554432  11.825   12.074   12.442   84.571    1.761    2.643
>>  16777216  11.997   11.952   10.905   88.311    3.958    5.519
>>   8388608  11.866   12.270   11.796   85.519    1.476   10.690
>>   4194304  11.754   12.095   12.539   84.483    2.230   21.121
>>   2097152  11.948   11.633   11.886   86.628    1.007   43.314
>>   1048576  12.029   12.519   11.701   84.811    2.345   84.811
>>    524288  11.928   12.011   12.049   85.363    0.361  170.726
>>    262144  12.559   11.827   11.729   85.140    2.566  340.558
>>    131072  12.015   12.356   11.587   85.494    2.253  683.952
>>     65536  11.741   12.113   11.931   85.861    1.093 1373.770
>>     32768  12.655   11.738   12.237   83.945    2.589 2686.246
>>     16384  11.928   12.423   11.875   84.834    1.711 5429.381
>>
>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  13.570   13.491   14.299   74.326    1.927    1.161
>>  33554432  13.238   13.198   13.255   77.398    0.142    2.419
>>  16777216  13.851   13.199   13.463   75.857    1.497    4.741
>>   8388608  13.339   16.695   13.551   71.223    7.010    8.903
>>   4194304  13.689   13.173   14.258   74.787    2.415   18.697
>>   2097152  13.518   13.543   13.894   75.021    0.934   37.510
>>   1048576  14.119   14.030   13.820   73.202    0.659   73.202
>>    524288  13.747   14.781   13.820   72.621    2.369  145.243
>>    262144  14.168   13.652   14.165   73.189    1.284  292.757
>>    131072  14.112   13.868   14.213   72.817    0.753  582.535
>>     65536  14.604   13.762   13.725   73.045    2.071 1168.728
>>     32768  14.796   15.356   14.486   68.861    1.653 2203.564
>>     16384  13.079   13.525   13.427   76.757    1.111 4912.426
>>
>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  20.372   18.077   17.262   55.411    3.800    0.866
>>  33554432  17.287   17.620   17.828   58.263    0.740    1.821
>>  16777216  16.802   18.154   17.315   58.831    1.865    3.677
>>   8388608  17.510   18.291   17.253   57.939    1.427    7.242
>>   4194304  17.059   17.706   17.352   58.958    0.897   14.740
>>   2097152  17.252   18.064   17.615   58.059    1.090   29.029
>>   1048576  17.082   17.373   17.688   58.927    0.838   58.927
>>    524288  17.129   17.271   17.583   59.103    0.644  118.206
>>    262144  17.411   17.695   18.048   57.808    0.848  231.231
>>    131072  17.937   17.704   18.681   56.581    1.285  452.649
>>     65536  17.927   17.465   17.907   57.646    0.698  922.338
>>     32768  18.494   17.820   17.719   56.875    1.073 1819.985
>>     16384  18.800   17.759   17.575   56.798    1.666 3635.058
>>
>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  20.045   21.881   20.018   49.680    2.037    0.776
>>  33554432  20.768   20.291   20.464   49.938    0.479    1.561
>>  16777216  21.563   20.714   20.429   49.017    1.116    3.064
>>   8388608  21.290   21.109   21.308   48.221    0.205    6.028
>>   4194304  22.240   20.662   21.088   48.054    1.479   12.013
>>   2097152  20.282   21.098   20.580   49.593    0.806   24.796
>>   1048576  20.367   19.929   20.252   50.741    0.469   50.741
>>    524288  20.885   21.203   20.684   48.945    0.498   97.890
>>    262144  19.982   21.375   20.798   49.463    1.373  197.853
>>    131072  20.744   21.590   19.698   49.593    1.866  396.740
>>     65536  21.586   20.953   21.055   48.314    0.627  773.024
>>     32768  21.228   20.307   21.049   49.104    0.950 1571.327
>>     16384  21.257   21.209   21.150   48.289    0.100 3090.498
> 
> The drop with 64 max_sectors_kb on the client is a consequence of how 
> CFQ is working. I can't find the exact code responsible for this, but 
> from all signs, CFQ stops delaying requests if amount of outstanding 
> requests exceeds some threshold, which is 2 or 3. With 64 max_sectors_kb 
> and 5 SCST I/O threads this threshold is exceeded, so CFQ doesn't 
> recover order of requests, hence the performance drop. With default 512 
> max_sectors_kb and 128K RA the server sees at max 2 requests at time.
> 
> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads, 
> please?

With context-RA patch, please, in those and future tests, since it 
should make RA for cooperative threads much better.

> You can limit amount of SCST I/O threads by num_threads parameter of 
> scst_vdisk module.
> 
> Thanks,
> Vlad
> 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-14 18:52                                                                     ` Vladislav Bolkhovitin
@ 2009-07-15  7:06                                                                       ` Wu Fengguang
  0 siblings, 0 replies; 65+ messages in thread
From: Wu Fengguang @ 2009-07-15  7:06 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Ronald Moesbergen, linux-kernel, akpm, kosaki.motohiro,
	Alan.Brunelle, hifumi.hisashi, linux-fsdevel, jens.axboe,
	randy.dunlap, Bart Van Assche

On Wed, Jul 15, 2009 at 02:52:27AM +0800, Vladislav Bolkhovitin wrote:
> 
> Wu Fengguang, on 07/13/2009 04:36 PM wrote:
> >> Test done with XFS on both the target and the initiator. This confirms
> >> your findings, using files instead of block devices is faster, but
> >> only when using the io_context patch.
> > 
> > It shows that the one really matters is the io_context patch,
> > even when context readahead is running. I guess what happened
> > in the tests are:
> > - without readahead (or readahead algorithm failed to do proper
> >   sequential readaheads), the SCST processes will be submitting
> >   small but close to each other IOs.  CFQ relies on the io_context
> >   patch to prevent unnecessary idling.
> > - with proper readahead, the SCST processes will also be submitting
> >   close readahead IOs. For example, one file's 100-102MB pages is
> >   readahead by process A, while its 102-104MB pages may be
> >   readahead by process B. In this case CFQ will also idle waiting
> >   for process A to submit the next IO, but in fact that IO is being
> >   submitted by process B. So the io_context patch is still necessary
> >   even when context readahead is working fine. I guess context
> >   readahead do have the added value of possibly enlarging the IO size
> >   (however this benchmark seems to not very sensitive to IO size).
> 
> Looks like the truth. Although with 2MB RA I expect CFQ to do idling >10 
> times less, which should bring bigger improvement than few %%.
> 
> For how long CFQ idles? For HZ/125, i.e. 8 ms with HZ 250?

Yes, 8ms by default. Note that the 8ms idle time is armed when the
last IO from current process completes. So it would be definitely a
waste if the cooperative process submitted the next read/readahead
IO within this 8ms idle window (without cfq_coop.patch).

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-08 12:40                                                         ` Vladislav Bolkhovitin
  2009-07-10  6:32                                                           ` Ronald Moesbergen
@ 2009-07-15 20:52                                                           ` Kurt Garloff
  2009-07-16 10:38                                                             ` Vladislav Bolkhovitin
  1 sibling, 1 reply; 65+ messages in thread
From: Kurt Garloff @ 2009-07-15 20:52 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 657 bytes --]

Hi,

On Wed, Jul 08, 2009 at 04:40:08PM +0400, Vladislav Bolkhovitin wrote:
> I've also long ago noticed that reading data from block devices is slower 
> than from files from mounted on those block devices file systems. Can 
> anybody explain it?

Brainstorming:
- block size (reads on the block dev might be done with smaller size)
- readahead (do we use the same RA algo for block devs)
- page cache might be better optimized than buffer cache?

Just guesses from someone that has not looked into that area of the
kernel for a while, so take it with a grain of salt.

Cheers,
-- 
Kurt Garloff, VP OPS Partner Engineering -- Novell Inc.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-15  6:30                                                                     ` Vladislav Bolkhovitin
@ 2009-07-16  7:32                                                                       ` Ronald Moesbergen
  2009-07-16 10:36                                                                         ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-16  7:32 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche

2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>> The drop with 64 max_sectors_kb on the client is a consequence of how CFQ
>> is working. I can't find the exact code responsible for this, but from all
>> signs, CFQ stops delaying requests if amount of outstanding requests exceeds
>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>> threads this threshold is exceeded, so CFQ doesn't recover order of
>> requests, hence the performance drop. With default 512 max_sectors_kb and
>> 128K RA the server sees at max 2 requests at time.
>>
>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>> please?

Ok. Should I still use the file-on-xfs testcase for this, or should I
go back to using a regular block device? The file-over-iscsi is quite
uncommon I suppose, most people will export a block device over iscsi,
not a file.

> With context-RA patch, please, in those and future tests, since it should
> make RA for cooperative threads much better.
>
>> You can limit amount of SCST I/O threads by num_threads parameter of
>> scst_vdisk module.

Ok, I'll try that and include the blk_run_backing_dev,
readahead-context and io_context patches.

Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-16  7:32                                                                       ` Ronald Moesbergen
@ 2009-07-16 10:36                                                                         ` Vladislav Bolkhovitin
  2009-07-16 14:54                                                                           ` Ronald Moesbergen
  2009-07-17 14:15                                                                           ` Ronald Moesbergen
  0 siblings, 2 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-16 10:36 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche


Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
> 2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>>> The drop with 64 max_sectors_kb on the client is a consequence of how CFQ
>>> is working. I can't find the exact code responsible for this, but from all
>>> signs, CFQ stops delaying requests if amount of outstanding requests exceeds
>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>> requests, hence the performance drop. With default 512 max_sectors_kb and
>>> 128K RA the server sees at max 2 requests at time.
>>>
>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>> please?
> 
> Ok. Should I still use the file-on-xfs testcase for this, or should I
> go back to using a regular block device?

Yes, please

> The file-over-iscsi is quite
> uncommon I suppose, most people will export a block device over iscsi,
> not a file.

No, files are common. The main reason why people use direct block 
devices is a not supported by anything believe that comparing with files 
they "have less overhead", so "should be faster". But it isn't true and 
can be easily checked.

>> With context-RA patch, please, in those and future tests, since it should
>> make RA for cooperative threads much better.
>>
>>> You can limit amount of SCST I/O threads by num_threads parameter of
>>> scst_vdisk module.
> 
> Ok, I'll try that and include the blk_run_backing_dev,
> readahead-context and io_context patches.
> 
> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-15 20:52                                                           ` Kurt Garloff
@ 2009-07-16 10:38                                                             ` Vladislav Bolkhovitin
  0 siblings, 0 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-16 10:38 UTC (permalink / raw)
  To: Kurt Garloff, linux-kernel, linux-fsdevel


Kurt Garloff, on 07/16/2009 12:52 AM wrote:
> Hi,
> 
> On Wed, Jul 08, 2009 at 04:40:08PM +0400, Vladislav Bolkhovitin wrote:
>> I've also long ago noticed that reading data from block devices is slower 
>> than from files from mounted on those block devices file systems. Can 
>> anybody explain it?
> 
> Brainstorming:
> - block size (reads on the block dev might be done with smaller size)

As we already found out in this and other threads, smaller "block size", 
i.e. each request size, is often means better throughput, sometimes much 
better.

> - readahead (do we use the same RA algo for block devs)
> - page cache might be better optimized than buffer cache?
> 
> Just guesses from someone that has not looked into that area of the
> kernel for a while, so take it with a grain of salt.
> 
> Cheers,

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-16 10:36                                                                         ` Vladislav Bolkhovitin
@ 2009-07-16 14:54                                                                           ` Ronald Moesbergen
  2009-07-16 16:03                                                                             ` Vladislav Bolkhovitin
  2009-07-17 14:15                                                                           ` Ronald Moesbergen
  1 sibling, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-16 14:54 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche

2009/7/16 Vladislav Bolkhovitin <vst@vlnb.net>:
>
> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>
>> 2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>>
>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>> CFQ
>>>> is working. I can't find the exact code responsible for this, but from
>>>> all
>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>> exceeds
>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>> and
>>>> 128K RA the server sees at max 2 requests at time.
>>>>
>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>> please?
>>
>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>> go back to using a regular block device?
>
> Yes, please

As in: Yes, go back to block device, or Yes use file-on-xfs?

>> The file-over-iscsi is quite
>> uncommon I suppose, most people will export a block device over iscsi,
>> not a file.
>
> No, files are common. The main reason why people use direct block devices is
> a not supported by anything believe that comparing with files they "have
> less overhead", so "should be faster". But it isn't true and can be easily
> checked.

Well, there are other advantages of using a block device: they are
generally more manageble, for instance you can use LVM for resizing
instead of strange dd magic to extend a file. When using a file you
have to extend the volume that holds the file first, and then the file
itself. And you don't lose disk space to filesystem metadata twice.
Also, I still don't get why reads/writes from a blockdevice are
different in speed than reads/writes from a file on a filesystem. I
for one will not be using files exported over iscsi, but blockdevices
(LVM volumes).

Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-16 14:54                                                                           ` Ronald Moesbergen
@ 2009-07-16 16:03                                                                             ` Vladislav Bolkhovitin
  0 siblings, 0 replies; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-16 16:03 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche


Ronald Moesbergen, on 07/16/2009 06:54 PM wrote:
> 2009/7/16 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>> 2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>>> CFQ
>>>>> is working. I can't find the exact code responsible for this, but from
>>>>> all
>>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>>> exceeds
>>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>>> and
>>>>> 128K RA the server sees at max 2 requests at time.
>>>>>
>>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>>> please?
>>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>>> go back to using a regular block device?
>> Yes, please
> 
> As in: Yes, go back to block device, or Yes use file-on-xfs?

File-on-xfs :)

>>> The file-over-iscsi is quite
>>> uncommon I suppose, most people will export a block device over iscsi,
>>> not a file.
>> No, files are common. The main reason why people use direct block devices is
>> a not supported by anything believe that comparing with files they "have
>> less overhead", so "should be faster". But it isn't true and can be easily
>> checked.
> 
> Well, there are other advantages of using a block device: they are
> generally more manageble, for instance you can use LVM for resizing
> instead of strange dd magic to extend a file. When using a file you
> have to extend the volume that holds the file first, and then the file
> itself.

Files also have advantages. For instance, it's easier to backup them and 
move between servers. On modern systems with fallocate() syscall support 
you don't have to do "strange dd magic" to resize files and can nearly 
instantaneously make them bigger. Also with pretty simple modifications 
scst_vdisk can be improved to make a single virtual device from several 
files.

> And you don't lose disk space to filesystem metadata twice.

This is negligible (0.05% for XFS)

> Also, I still don't get why reads/writes from a blockdevice are
> different in speed than reads/writes from a file on a filesystem.

Me too and I'd appreciate if someone explain it. But I don't want to 
introduce one more variable in the task we are solving (how to make 
100+MB/s from iSCSI on your system).

> I
> for one will not be using files exported over iscsi, but blockdevices
> (LVM volumes).

Are you sure?

> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-16 10:36                                                                         ` Vladislav Bolkhovitin
  2009-07-16 14:54                                                                           ` Ronald Moesbergen
@ 2009-07-17 14:15                                                                           ` Ronald Moesbergen
  2009-07-17 18:23                                                                             ` Vladislav Bolkhovitin
  1 sibling, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-17 14:15 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche

2009/7/16 Vladislav Bolkhovitin <vst@vlnb.net>:
>
> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>
>> 2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>>
>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>> CFQ
>>>> is working. I can't find the exact code responsible for this, but from
>>>> all
>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>> exceeds
>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>> and
>>>> 128K RA the server sees at max 2 requests at time.
>>>>
>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>> please?
>>
>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>> go back to using a regular block device?
>
> Yes, please
>
>> The file-over-iscsi is quite
>> uncommon I suppose, most people will export a block device over iscsi,
>> not a file.
>
> No, files are common. The main reason why people use direct block devices is
> a not supported by anything believe that comparing with files they "have
> less overhead", so "should be faster". But it isn't true and can be easily
> checked.
>
>>> With context-RA patch, please, in those and future tests, since it should
>>> make RA for cooperative threads much better.
>>>
>>>> You can limit amount of SCST I/O threads by num_threads parameter of
>>>> scst_vdisk module.
>>
>> Ok, I'll try that and include the blk_run_backing_dev,
>> readahead-context and io_context patches.

The results:

client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context

With one IO thread:

5) client: default, server: default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  15.990   15.308   16.689   64.097    2.259    1.002
 33554432  15.981   16.064   16.221   63.651    0.392    1.989
 16777216  15.841   15.660   16.031   64.635    0.619    4.040

6) client: default, server: 64 max_sectors_kb, RA default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  16.035   16.024   16.654   63.084    1.130    0.986
 33554432  15.924   15.975   16.359   63.668    0.762    1.990
 16777216  16.168   16.104   15.838   63.858    0.571    3.991

7) client: default, server: default max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  14.895   16.142   15.998   65.398    2.379    1.022
 33554432  16.753   16.169   16.067   62.729    1.146    1.960
 16777216  16.866   15.912   16.099   62.892    1.570    3.931

8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  15.923   15.716   16.741   63.545    1.715    0.993
 33554432  16.010   16.026   16.113   63.802    0.180    1.994
 16777216  16.644   16.239   16.143   62.672    0.827    3.917

9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  15.753   15.882   15.482   65.207    0.697    1.019
 33554432  15.670   16.268   15.669   64.548    1.134    2.017
 16777216  15.746   15.519   16.411   64.471    1.516    4.029

10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.639   14.360   13.654   73.795    1.758    1.153
 33554432  13.584   13.938   14.538   73.095    2.035    2.284
 16777216  13.617   13.510   13.803   75.060    0.665    4.691

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.428   13.541   14.144   74.760    1.690    1.168
 33554432  13.707   13.352   13.462   75.821    0.827    2.369
 16777216  14.380   13.504   13.675   73.975    1.991    4.623

With two threads:
5) client: default, server: default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  12.453   12.173   13.014   81.677    2.254    1.276
 33554432  12.066   11.999   12.960   83.073    2.877    2.596
 16777216  13.719   11.969   12.569   80.554    4.500    5.035

6) client: default, server: 64 max_sectors_kb, RA default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  12.886   12.201   12.147   82.564    2.198    1.290
 33554432  12.344   12.928   12.007   82.483    2.504    2.578
 16777216  12.380   11.951   13.119   82.151    3.141    5.134

7) client: default, server: default max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  12.824   13.485   13.534   77.148    1.913    1.205
 33554432  12.084   13.752   12.111   81.251    4.800    2.539
 16777216  12.658   13.035   11.196   83.640    5.612    5.227

8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  12.253   12.552   11.773   84.044    2.230    1.313
 33554432  13.177   12.456   11.604   82.723    4.316    2.585
 16777216  12.471   12.318   13.006   81.324    1.878    5.083

9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  14.409   13.311   14.278   73.238    2.624    1.144
 33554432  14.665   14.260   14.080   71.455    1.211    2.233
 16777216  14.179   14.810   14.640   70.438    1.303    4.402

10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.401   14.107   13.549   74.860    1.642    1.170
 33554432  14.575   13.221   14.428   72.894    3.236    2.278
 16777216  13.771   14.227   13.594   73.887    1.408    4.618

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  10.286   12.272   10.245   94.317    7.690    1.474
 33554432  10.241   10.415   13.374   91.624   10.670    2.863
 16777216  10.499   10.224   10.792   97.526    2.151    6.095

The last result comes close to 100MB/s!

Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-17 14:15                                                                           ` Ronald Moesbergen
@ 2009-07-17 18:23                                                                             ` Vladislav Bolkhovitin
  2009-07-20  7:20                                                                               ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-17 18:23 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche


Ronald Moesbergen, on 07/17/2009 06:15 PM wrote:
> 2009/7/16 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>> 2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>>> CFQ
>>>>> is working. I can't find the exact code responsible for this, but from
>>>>> all
>>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>>> exceeds
>>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>>> and
>>>>> 128K RA the server sees at max 2 requests at time.
>>>>>
>>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>>> please?
>>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>>> go back to using a regular block device?
>> Yes, please
>>
>>> The file-over-iscsi is quite
>>> uncommon I suppose, most people will export a block device over iscsi,
>>> not a file.
>> No, files are common. The main reason why people use direct block devices is
>> a not supported by anything believe that comparing with files they "have
>> less overhead", so "should be faster". But it isn't true and can be easily
>> checked.
>>
>>>> With context-RA patch, please, in those and future tests, since it should
>>>> make RA for cooperative threads much better.
>>>>
>>>>> You can limit amount of SCST I/O threads by num_threads parameter of
>>>>> scst_vdisk module.
>>> Ok, I'll try that and include the blk_run_backing_dev,
>>> readahead-context and io_context patches.
> 
> The results:
> 
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context
> 
> With one IO thread:
> 
> 5) client: default, server: default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  15.990   15.308   16.689   64.097    2.259    1.002
>  33554432  15.981   16.064   16.221   63.651    0.392    1.989
>  16777216  15.841   15.660   16.031   64.635    0.619    4.040
> 
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  16.035   16.024   16.654   63.084    1.130    0.986
>  33554432  15.924   15.975   16.359   63.668    0.762    1.990
>  16777216  16.168   16.104   15.838   63.858    0.571    3.991
> 
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  14.895   16.142   15.998   65.398    2.379    1.022
>  33554432  16.753   16.169   16.067   62.729    1.146    1.960
>  16777216  16.866   15.912   16.099   62.892    1.570    3.931
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  15.923   15.716   16.741   63.545    1.715    0.993
>  33554432  16.010   16.026   16.113   63.802    0.180    1.994
>  16777216  16.644   16.239   16.143   62.672    0.827    3.917
> 
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  15.753   15.882   15.482   65.207    0.697    1.019
>  33554432  15.670   16.268   15.669   64.548    1.134    2.017
>  16777216  15.746   15.519   16.411   64.471    1.516    4.029
> 
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.639   14.360   13.654   73.795    1.758    1.153
>  33554432  13.584   13.938   14.538   73.095    2.035    2.284
>  16777216  13.617   13.510   13.803   75.060    0.665    4.691
> 
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.428   13.541   14.144   74.760    1.690    1.168
>  33554432  13.707   13.352   13.462   75.821    0.827    2.369
>  16777216  14.380   13.504   13.675   73.975    1.991    4.623
> 
> With two threads:
> 5) client: default, server: default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  12.453   12.173   13.014   81.677    2.254    1.276
>  33554432  12.066   11.999   12.960   83.073    2.877    2.596
>  16777216  13.719   11.969   12.569   80.554    4.500    5.035
> 
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  12.886   12.201   12.147   82.564    2.198    1.290
>  33554432  12.344   12.928   12.007   82.483    2.504    2.578
>  16777216  12.380   11.951   13.119   82.151    3.141    5.134
> 
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  12.824   13.485   13.534   77.148    1.913    1.205
>  33554432  12.084   13.752   12.111   81.251    4.800    2.539
>  16777216  12.658   13.035   11.196   83.640    5.612    5.227
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  12.253   12.552   11.773   84.044    2.230    1.313
>  33554432  13.177   12.456   11.604   82.723    4.316    2.585
>  16777216  12.471   12.318   13.006   81.324    1.878    5.083
> 
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  14.409   13.311   14.278   73.238    2.624    1.144
>  33554432  14.665   14.260   14.080   71.455    1.211    2.233
>  16777216  14.179   14.810   14.640   70.438    1.303    4.402
> 
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.401   14.107   13.549   74.860    1.642    1.170
>  33554432  14.575   13.221   14.428   72.894    3.236    2.278
>  16777216  13.771   14.227   13.594   73.887    1.408    4.618
> 
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  10.286   12.272   10.245   94.317    7.690    1.474
>  33554432  10.241   10.415   13.374   91.624   10.670    2.863
>  16777216  10.499   10.224   10.792   97.526    2.151    6.095
> 
> The last result comes close to 100MB/s!

Good! Although I expected maximum with a single thread.

Can you do the same set of tests with deadline scheduler on the server?

Thanks,
Vlad


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-17 18:23                                                                             ` Vladislav Bolkhovitin
@ 2009-07-20  7:20                                                                               ` Vladislav Bolkhovitin
  2009-07-22  8:44                                                                                 ` Ronald Moesbergen
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-20  7:20 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche


Vladislav Bolkhovitin, on 07/17/2009 10:23 PM wrote:
> Ronald Moesbergen, on 07/17/2009 06:15 PM wrote:
>> 2009/7/16 Vladislav Bolkhovitin <vst@vlnb.net>:
>>> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>>> 2009/7/15 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>>>> CFQ
>>>>>> is working. I can't find the exact code responsible for this, but from
>>>>>> all
>>>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>>>> exceeds
>>>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>>>> and
>>>>>> 128K RA the server sees at max 2 requests at time.
>>>>>>
>>>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>>>> please?
>>>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>>>> go back to using a regular block device?
>>> Yes, please
>>>
>>>> The file-over-iscsi is quite
>>>> uncommon I suppose, most people will export a block device over iscsi,
>>>> not a file.
>>> No, files are common. The main reason why people use direct block devices is
>>> a not supported by anything believe that comparing with files they "have
>>> less overhead", so "should be faster". But it isn't true and can be easily
>>> checked.
>>>
>>>>> With context-RA patch, please, in those and future tests, since it should
>>>>> make RA for cooperative threads much better.
>>>>>
>>>>>> You can limit amount of SCST I/O threads by num_threads parameter of
>>>>>> scst_vdisk module.
>>>> Ok, I'll try that and include the blk_run_backing_dev,
>>>> readahead-context and io_context patches.
>> The results:
>>
>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
>> and io_context
>>
>> With one IO thread:
>>
>> 5) client: default, server: default
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  15.990   15.308   16.689   64.097    2.259    1.002
>>  33554432  15.981   16.064   16.221   63.651    0.392    1.989
>>  16777216  15.841   15.660   16.031   64.635    0.619    4.040
>>
>> 6) client: default, server: 64 max_sectors_kb, RA default
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  16.035   16.024   16.654   63.084    1.130    0.986
>>  33554432  15.924   15.975   16.359   63.668    0.762    1.990
>>  16777216  16.168   16.104   15.838   63.858    0.571    3.991
>>
>> 7) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  14.895   16.142   15.998   65.398    2.379    1.022
>>  33554432  16.753   16.169   16.067   62.729    1.146    1.960
>>  16777216  16.866   15.912   16.099   62.892    1.570    3.931
>>
>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  15.923   15.716   16.741   63.545    1.715    0.993
>>  33554432  16.010   16.026   16.113   63.802    0.180    1.994
>>  16777216  16.644   16.239   16.143   62.672    0.827    3.917
>>
>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  15.753   15.882   15.482   65.207    0.697    1.019
>>  33554432  15.670   16.268   15.669   64.548    1.134    2.017
>>  16777216  15.746   15.519   16.411   64.471    1.516    4.029
>>
>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  13.639   14.360   13.654   73.795    1.758    1.153
>>  33554432  13.584   13.938   14.538   73.095    2.035    2.284
>>  16777216  13.617   13.510   13.803   75.060    0.665    4.691
>>
>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  13.428   13.541   14.144   74.760    1.690    1.168
>>  33554432  13.707   13.352   13.462   75.821    0.827    2.369
>>  16777216  14.380   13.504   13.675   73.975    1.991    4.623
>>
>> With two threads:
>> 5) client: default, server: default
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  12.453   12.173   13.014   81.677    2.254    1.276
>>  33554432  12.066   11.999   12.960   83.073    2.877    2.596
>>  16777216  13.719   11.969   12.569   80.554    4.500    5.035
>>
>> 6) client: default, server: 64 max_sectors_kb, RA default
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  12.886   12.201   12.147   82.564    2.198    1.290
>>  33554432  12.344   12.928   12.007   82.483    2.504    2.578
>>  16777216  12.380   11.951   13.119   82.151    3.141    5.134
>>
>> 7) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  12.824   13.485   13.534   77.148    1.913    1.205
>>  33554432  12.084   13.752   12.111   81.251    4.800    2.539
>>  16777216  12.658   13.035   11.196   83.640    5.612    5.227
>>
>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  12.253   12.552   11.773   84.044    2.230    1.313
>>  33554432  13.177   12.456   11.604   82.723    4.316    2.585
>>  16777216  12.471   12.318   13.006   81.324    1.878    5.083
>>
>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  14.409   13.311   14.278   73.238    2.624    1.144
>>  33554432  14.665   14.260   14.080   71.455    1.211    2.233
>>  16777216  14.179   14.810   14.640   70.438    1.303    4.402
>>
>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  13.401   14.107   13.549   74.860    1.642    1.170
>>  33554432  14.575   13.221   14.428   72.894    3.236    2.278
>>  16777216  13.771   14.227   13.594   73.887    1.408    4.618
>>
>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>> blocksize       R        R        R   R(avg,    R(std        R
>>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  10.286   12.272   10.245   94.317    7.690    1.474
>>  33554432  10.241   10.415   13.374   91.624   10.670    2.863
>>  16777216  10.499   10.224   10.792   97.526    2.151    6.095
>>
>> The last result comes close to 100MB/s!
> 
> Good! Although I expected maximum with a single thread.
> 
> Can you do the same set of tests with deadline scheduler on the server?

Case of 5 I/O threads (default) will also be interesting. I.e., overall, 
cases of 1, 2 and 5 I/O threads with deadline scheduler on the server.

Thanks,
Vlad

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-20  7:20                                                                               ` Vladislav Bolkhovitin
@ 2009-07-22  8:44                                                                                 ` Ronald Moesbergen
  2009-07-27 13:11                                                                                   ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-22  8:44 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche

2009/7/20 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>
>>> The last result comes close to 100MB/s!
>>
>> Good! Although I expected maximum with a single thread.
>>
>> Can you do the same set of tests with deadline scheduler on the server?
>
> Case of 5 I/O threads (default) will also be interesting. I.e., overall,
> cases of 1, 2 and 5 I/O threads with deadline scheduler on the server.

Ok. The results:

Cfq seems to perform better in this case.

client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context
server scheduler: deadline

With one IO thread:
5) client: default, server: default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  16.067   16.883   16.096   62.669    1.427    0.979
 33554432  16.034   16.564   16.050   63.161    0.948    1.974
 16777216  16.045   15.086   16.709   64.329    2.715    4.021

6) client: default, server: 64 max_sectors_kb, RA default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  15.851   15.348   16.652   64.271    2.147    1.004
 33554432  16.182   16.104   16.170   63.397    0.135    1.981
 16777216  15.952   16.085   16.258   63.613    0.493    3.976

7) client: default, server: default max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  15.814   16.222   16.650   63.126    1.327    0.986
 33554432  16.113   15.962   16.340   63.456    0.610    1.983
 16777216  16.149   16.098   15.895   63.815    0.438    3.988

8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  16.032   17.163   15.864   62.695    2.161    0.980
 33554432  16.163   15.499   16.466   63.870    1.626    1.996
 16777216  16.067   16.133   16.710   62.829    1.099    3.927

9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  15.498   15.474   15.195   66.547    0.599    1.040
 33554432  15.729   15.636   15.758   65.192    0.214    2.037
 16777216  15.656   15.481   15.724   65.557    0.430    4.097

10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.480   14.125   13.648   74.497    1.466    1.164
 33554432  13.584   13.518   14.272   74.293    1.806    2.322
 16777216  13.511   13.585   13.552   75.576    0.170    4.723

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.356   13.079   13.488   76.960    0.991    1.203
 33554432  13.713   13.038   13.030   77.268    1.834    2.415
 16777216  13.895   13.032   13.128   76.758    2.178    4.797

With two threads:
5) client: default, server: default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  12.661   12.773   13.654   78.681    2.622    1.229
 33554432  12.709   12.693   12.459   81.145    0.738    2.536
 16777216  12.657   14.055   13.237   77.038    3.292    4.815

6) client: default, server: 64 max_sectors_kb, RA default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.300   12.877   13.705   77.078    1.964    1.204
 33554432  13.025   14.404   12.833   76.501    3.855    2.391
 16777216  13.172   13.220   12.997   77.995    0.570    4.875

7) client: default, server: default max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.365   13.168   12.835   78.053    1.308    1.220
 33554432  13.518   13.122   13.366   76.799    0.942    2.400
 16777216  13.177   13.146   13.839   76.534    1.797    4.783

8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  14.308   12.669   13.520   76.045    3.788    1.188
 33554432  12.586   12.897   13.221   79.405    1.596    2.481
 16777216  13.766   12.583   14.176   76.001    3.903    4.750

9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  14.454   12.537   15.058   73.509    5.893    1.149
 33554432  15.871   14.201   13.846   70.194    4.083    2.194
 16777216  14.721   13.346   14.434   72.410    3.104    4.526

10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.262   13.308   13.416   76.828    0.371    1.200
 33554432  13.915   13.182   13.065   76.551    2.114    2.392
 16777216  13.223   14.133   13.317   75.596    2.232    4.725

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  18.277   17.743   17.534   57.380    0.997    0.897
 33554432  18.018   17.728   17.343   57.879    0.907    1.809
 16777216  17.600   18.466   17.645   57.223    1.253    3.576

With five threads:
5) client: default, server: default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  12.915   13.643   12.572   78.598    2.654    1.228
 33554432  12.716   12.970   13.283   78.858    1.403    2.464
 16777216  14.372   13.282   13.122   75.461    3.002    4.716

6) client: default, server: 64 max_sectors_kb, RA default
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.372   13.205   12.468   78.750    2.421    1.230
 33554432  13.489   13.352   12.883   77.363    1.533    2.418
 16777216  13.127   12.653   14.252   76.928    3.785    4.808

7) client: default, server: default max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.135   13.031   13.824   76.872    1.994    1.201
 33554432  13.079   13.590   13.730   76.076    1.600    2.377
 16777216  12.707   12.951   13.805   77.942    2.735    4.871

8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.030   12.947   13.538   77.772    1.524    1.215
 33554432  12.826   12.973   13.805   77.649    2.482    2.427
 16777216  12.751   13.007   12.986   79.295    0.718    4.956

9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.236   13.349   13.833   76.034    1.445    1.188
 33554432  13.481   14.259   13.582   74.389    1.836    2.325
 16777216  14.394   13.922   13.943   72.712    1.111    4.545

10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  18.245   18.690   17.342   56.654    1.779    0.885
 33554432  17.744   18.122   17.577   57.492    0.731    1.797
 16777216  18.280   18.564   17.846   56.186    0.914    3.512

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  15.241   16.894   15.853   64.131    2.705    1.002
 33554432  14.858   16.904   15.588   65.064    3.435    2.033
 16777216  16.777   15.939   15.034   64.465    2.893    4.029

Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-22  8:44                                                                                 ` Ronald Moesbergen
@ 2009-07-27 13:11                                                                                   ` Vladislav Bolkhovitin
  2009-07-28  9:51                                                                                     ` Ronald Moesbergen
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-27 13:11 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche

[-- Attachment #1: Type: text/plain, Size: 9408 bytes --]



Ronald Moesbergen, on 07/22/2009 12:44 PM wrote:
> 2009/7/20 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>> The last result comes close to 100MB/s!
>>> Good! Although I expected maximum with a single thread.
>>>
>>> Can you do the same set of tests with deadline scheduler on the server?
>> Case of 5 I/O threads (default) will also be interesting. I.e., overall,
>> cases of 1, 2 and 5 I/O threads with deadline scheduler on the server.
> 
> Ok. The results:
> 
> Cfq seems to perform better in this case.
> 
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context
> server scheduler: deadline
> 
> With one IO thread:
> 5) client: default, server: default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  16.067   16.883   16.096   62.669    1.427    0.979
>  33554432  16.034   16.564   16.050   63.161    0.948    1.974
>  16777216  16.045   15.086   16.709   64.329    2.715    4.021
> 
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  15.851   15.348   16.652   64.271    2.147    1.004
>  33554432  16.182   16.104   16.170   63.397    0.135    1.981
>  16777216  15.952   16.085   16.258   63.613    0.493    3.976
> 
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  15.814   16.222   16.650   63.126    1.327    0.986
>  33554432  16.113   15.962   16.340   63.456    0.610    1.983
>  16777216  16.149   16.098   15.895   63.815    0.438    3.988
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  16.032   17.163   15.864   62.695    2.161    0.980
>  33554432  16.163   15.499   16.466   63.870    1.626    1.996
>  16777216  16.067   16.133   16.710   62.829    1.099    3.927
> 
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  15.498   15.474   15.195   66.547    0.599    1.040
>  33554432  15.729   15.636   15.758   65.192    0.214    2.037
>  16777216  15.656   15.481   15.724   65.557    0.430    4.097
> 
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.480   14.125   13.648   74.497    1.466    1.164
>  33554432  13.584   13.518   14.272   74.293    1.806    2.322
>  16777216  13.511   13.585   13.552   75.576    0.170    4.723
> 
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.356   13.079   13.488   76.960    0.991    1.203
>  33554432  13.713   13.038   13.030   77.268    1.834    2.415
>  16777216  13.895   13.032   13.128   76.758    2.178    4.797
> 
> With two threads:
> 5) client: default, server: default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  12.661   12.773   13.654   78.681    2.622    1.229
>  33554432  12.709   12.693   12.459   81.145    0.738    2.536
>  16777216  12.657   14.055   13.237   77.038    3.292    4.815
> 
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.300   12.877   13.705   77.078    1.964    1.204
>  33554432  13.025   14.404   12.833   76.501    3.855    2.391
>  16777216  13.172   13.220   12.997   77.995    0.570    4.875
> 
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.365   13.168   12.835   78.053    1.308    1.220
>  33554432  13.518   13.122   13.366   76.799    0.942    2.400
>  16777216  13.177   13.146   13.839   76.534    1.797    4.783
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  14.308   12.669   13.520   76.045    3.788    1.188
>  33554432  12.586   12.897   13.221   79.405    1.596    2.481
>  16777216  13.766   12.583   14.176   76.001    3.903    4.750
> 
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  14.454   12.537   15.058   73.509    5.893    1.149
>  33554432  15.871   14.201   13.846   70.194    4.083    2.194
>  16777216  14.721   13.346   14.434   72.410    3.104    4.526
> 
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.262   13.308   13.416   76.828    0.371    1.200
>  33554432  13.915   13.182   13.065   76.551    2.114    2.392
>  16777216  13.223   14.133   13.317   75.596    2.232    4.725
> 
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.277   17.743   17.534   57.380    0.997    0.897
>  33554432  18.018   17.728   17.343   57.879    0.907    1.809
>  16777216  17.600   18.466   17.645   57.223    1.253    3.576
> 
> With five threads:
> 5) client: default, server: default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  12.915   13.643   12.572   78.598    2.654    1.228
>  33554432  12.716   12.970   13.283   78.858    1.403    2.464
>  16777216  14.372   13.282   13.122   75.461    3.002    4.716
> 
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.372   13.205   12.468   78.750    2.421    1.230
>  33554432  13.489   13.352   12.883   77.363    1.533    2.418
>  16777216  13.127   12.653   14.252   76.928    3.785    4.808
> 
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.135   13.031   13.824   76.872    1.994    1.201
>  33554432  13.079   13.590   13.730   76.076    1.600    2.377
>  16777216  12.707   12.951   13.805   77.942    2.735    4.871
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.030   12.947   13.538   77.772    1.524    1.215
>  33554432  12.826   12.973   13.805   77.649    2.482    2.427
>  16777216  12.751   13.007   12.986   79.295    0.718    4.956
> 
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.236   13.349   13.833   76.034    1.445    1.188
>  33554432  13.481   14.259   13.582   74.389    1.836    2.325
>  16777216  14.394   13.922   13.943   72.712    1.111    4.545
> 
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.245   18.690   17.342   56.654    1.779    0.885
>  33554432  17.744   18.122   17.577   57.492    0.731    1.797
>  16777216  18.280   18.564   17.846   56.186    0.914    3.512
> 
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  15.241   16.894   15.853   64.131    2.705    1.002
>  33554432  14.858   16.904   15.588   65.064    3.435    2.033
>  16777216  16.777   15.939   15.034   64.465    2.893    4.029

Hmm, it's really weird, why the case of 2 threads is faster. There must 
be some commands reordering somewhere in SCST, which I'm missing, like 
list_add() instead of list_add_tail().

Can you apply the attached patch and repeat tests 5, 8 and 11 with 1 and 
2 threads, please. The patch will enable forced commands order 
protection, i.e. with it all the commands will be executed in exactly 
the same order as they were received.

Thanks,
Vlad

[-- Attachment #2: forced_order.diff --]
[-- Type: text/x-patch, Size: 747 bytes --]

Index: scst/src/scst_targ.c
===================================================================
--- scst/src/scst_targ.c	(revision 971)
+++ scst/src/scst_targ.c	(working copy)
@@ -3182,10 +3182,10 @@ static void scst_cmd_set_sn(struct scst_
 	switch (cmd->queue_type) {
 	case SCST_CMD_QUEUE_SIMPLE:
 	case SCST_CMD_QUEUE_UNTAGGED:
-#if 0 /* left for future performance investigations */
-		if (scst_cmd_is_expected_set(cmd)) {
+#if 1 /* left for future performance investigations */
+/*		if (scst_cmd_is_expected_set(cmd)) {
 			if ((cmd->expected_data_direction == SCST_DATA_READ) &&
-			    (atomic_read(&cmd->dev->write_cmd_count) == 0))
+			    (atomic_read(&cmd->dev->write_cmd_count) == 0))*/
 				goto ordered;
 		} else
 			goto ordered;

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-27 13:11                                                                                   ` Vladislav Bolkhovitin
@ 2009-07-28  9:51                                                                                     ` Ronald Moesbergen
  2009-07-28 19:07                                                                                       ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-28  9:51 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche

2009/7/27 Vladislav Bolkhovitin <vst@vlnb.net>:
>
> Hmm, it's really weird, why the case of 2 threads is faster. There must be
> some commands reordering somewhere in SCST, which I'm missing, like
> list_add() instead of list_add_tail().
>
> Can you apply the attached patch and repeat tests 5, 8 and 11 with 1 and 2
> threads, please. The patch will enable forced commands order protection,
> i.e. with it all the commands will be executed in exactly the same order as
> they were received.

The patched source doesn't compile. I changed the code to this:

@ line 3184:

        case SCST_CMD_QUEUE_UNTAGGED:
#if 1 /* left for future performance investigations */
                goto ordered;
#endif

The results:

Overall performance seems lower.

client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context, forced_order

With one IO thread:
5) client: default, server: default (cfq)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  16.484   16.417   16.068   62.741    0.706    0.980
 33554432  15.684   16.348   16.011   63.961    1.083    1.999
 16777216  16.044   16.239   15.938   63.710    0.493    3.982

8) client: default, server: 64 max_sectors_kb, RA 2MB (cfq)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  16.127   15.784   16.210   63.847    0.740    0.998
 33554432  16.103   16.072   16.106   63.627    0.061    1.988
 16777216  16.637   16.058   16.154   62.902    0.970    3.931

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB (cfq)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.417   15.219   13.912   72.405    3.785    1.131
 33554432  13.868   13.789   14.110   73.558    0.718    2.299
 16777216  13.691   13.784   10.280   82.898   11.822    5.181

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA
2MB (deadline)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.604   13.532   13.978   74.733    1.055    1.168
 33554432  13.523   13.166   13.504   76.443    0.945    2.389
 16777216  13.434   13.409   13.632   75.902    0.557    4.744

With two threads:
5) client: default, server: default (cfq)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  16.206   16.001   15.908   63.851    0.493    0.998
 33554432  16.927   16.033   15.991   62.799    1.631    1.962
 16777216  16.566   15.968   16.212   63.035    0.950    3.940

8) client: default, server: 64 max_sectors_kb, RA 2MB (cfq)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  16.017   15.849   15.748   64.521    0.450    1.008
 33554432  16.652   15.542   16.259   63.454    1.823    1.983
 16777216  16.456   16.071   15.943   63.392    0.849    3.962

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB (cfq)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  14.109    9.985   13.548   83.572   13.478    1.306
 33554432  13.698   14.236   13.754   73.711    1.267    2.303
 16777216  13.610   12.090   14.136   77.458    5.244    4.841

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA
2MB (deadline)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  13.542   13.975   13.978   74.049    1.110    1.157
 33554432   9.921   13.272   13.321   85.746   12.349    2.680
 16777216  13.850   13.600   13.344   75.324    1.144    4.708

Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-28  9:51                                                                                     ` Ronald Moesbergen
@ 2009-07-28 19:07                                                                                       ` Vladislav Bolkhovitin
  2009-07-29 12:48                                                                                         ` Ronald Moesbergen
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-28 19:07 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche


Ronald Moesbergen, on 07/28/2009 01:51 PM wrote:
> 2009/7/27 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Hmm, it's really weird, why the case of 2 threads is faster. There must be
>> some commands reordering somewhere in SCST, which I'm missing, like
>> list_add() instead of list_add_tail().
>>
>> Can you apply the attached patch and repeat tests 5, 8 and 11 with 1 and 2
>> threads, please. The patch will enable forced commands order protection,
>> i.e. with it all the commands will be executed in exactly the same order as
>> they were received.
> 
> The patched source doesn't compile. I changed the code to this:
> 
> @ line 3184:
> 
>         case SCST_CMD_QUEUE_UNTAGGED:
> #if 1 /* left for future performance investigations */
>                 goto ordered;
> #endif
> 
> The results:
> 
> Overall performance seems lower.
> 
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context, forced_order
> 
> With one IO thread:
> 5) client: default, server: default (cfq)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  16.484   16.417   16.068   62.741    0.706    0.980
>  33554432  15.684   16.348   16.011   63.961    1.083    1.999
>  16777216  16.044   16.239   15.938   63.710    0.493    3.982
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (cfq)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  16.127   15.784   16.210   63.847    0.740    0.998
>  33554432  16.103   16.072   16.106   63.627    0.061    1.988
>  16777216  16.637   16.058   16.154   62.902    0.970    3.931
> 
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB (cfq)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.417   15.219   13.912   72.405    3.785    1.131
>  33554432  13.868   13.789   14.110   73.558    0.718    2.299
>  16777216  13.691   13.784   10.280   82.898   11.822    5.181
> 
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA
> 2MB (deadline)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.604   13.532   13.978   74.733    1.055    1.168
>  33554432  13.523   13.166   13.504   76.443    0.945    2.389
>  16777216  13.434   13.409   13.632   75.902    0.557    4.744
> 
> With two threads:
> 5) client: default, server: default (cfq)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  16.206   16.001   15.908   63.851    0.493    0.998
>  33554432  16.927   16.033   15.991   62.799    1.631    1.962
>  16777216  16.566   15.968   16.212   63.035    0.950    3.940
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (cfq)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  16.017   15.849   15.748   64.521    0.450    1.008
>  33554432  16.652   15.542   16.259   63.454    1.823    1.983
>  16777216  16.456   16.071   15.943   63.392    0.849    3.962
> 
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB (cfq)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  14.109    9.985   13.548   83.572   13.478    1.306
>  33554432  13.698   14.236   13.754   73.711    1.267    2.303
>  16777216  13.610   12.090   14.136   77.458    5.244    4.841
> 
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA
> 2MB (deadline)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  13.542   13.975   13.978   74.049    1.110    1.157
>  33554432   9.921   13.272   13.321   85.746   12.349    2.680
>  16777216  13.850   13.600   13.344   75.324    1.144    4.708

Can you perform the tests 5 and 8 the deadline? I asked for deadline..

What I/O scheduler do you use on the initiator? Can you check if 
changing it to deadline or noop makes any difference?

Thanks,
Vlad

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-28 19:07                                                                                       ` Vladislav Bolkhovitin
@ 2009-07-29 12:48                                                                                         ` Ronald Moesbergen
  2009-07-31 18:32                                                                                           ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-07-29 12:48 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche

2009/7/28 Vladislav Bolkhovitin <vst@vlnb.net>:
>
> Can you perform the tests 5 and 8 the deadline? I asked for deadline..
>
> What I/O scheduler do you use on the initiator? Can you check if changing it
> to deadline or noop makes any difference?
>

client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context, forced_order

With one IO thread:
5) client: default, server: default (server deadline, client cfq)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  15.739   15.339   16.511   64.613    1.959    1.010
 33554432  15.411   12.384   15.400   71.876    7.646    2.246
 16777216  16.564   15.569   16.279   63.498    1.667    3.969

5) client: default, server: default (server deadline, client deadline)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  17.578   20.051   18.010   55.395    3.111    0.866
 33554432  19.247   12.607   17.930   63.846   12.390    1.995
 16777216  14.587   19.631   18.032   59.718    7.650    3.732

8) client: default, server: 64 max_sectors_kb, RA 2MB (server
deadline, client deadline)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  17.418   19.520   22.050   52.564    5.043    0.821
 33554432  21.263   17.623   17.782   54.616    4.571    1.707
 16777216  17.896   18.335   19.407   55.278    1.864    3.455

8) client: default, server: 64 max_sectors_kb, RA 2MB (server
deadline, client cfq)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  16.639   15.216   16.035   64.233    2.365    1.004
 33554432  15.750   16.511   16.092   63.557    1.224    1.986
 16777216  16.390   15.866   15.331   64.604    1.763    4.038

11) client: 2MB RA, 64 max_sectors_kb, server: 64 max_sectors_kb, RA
2MB (server deadline, client deadline)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  14.117   13.610   13.558   74.435    1.347    1.163
 33554432  13.450   10.344   13.556   83.555   10.918    2.611
 16777216  13.408   13.319   13.239   76.867    0.398    4.804

With two threads:
5) client: default, server: default (server deadline, client cfq)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  15.723   16.535   16.189   63.438    1.312    0.991
 33554432  16.152   16.363   15.782   63.621    0.954    1.988
 16777216  15.174   16.084   16.682   64.178    2.516    4.011

5) client: default, server: default (server deadline, client deadline)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  18.087   18.082   17.639   57.099    0.674    0.892
 33554432  18.377   15.750   17.551   59.694    3.912    1.865
 16777216  18.490   15.553   18.778   58.585    5.143    3.662

8) client: default, server: 64 max_sectors_kb, RA 2MB (server
deadline, client deadline)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  18.140   19.114   17.442   56.244    2.103    0.879
 33554432  17.183   17.233   21.367   55.646    5.461    1.739
 16777216  19.813   17.965   18.132   55.053    2.393    3.441

8) client: default, server: 64 max_sectors_kb, RA 2MB (server
deadline, client cfq)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  15.753   16.085   16.522   63.548    1.239    0.993
 33554432  13.502   15.912   15.507   68.743    5.065    2.148
 16777216  16.584   16.171   15.959   63.077    1.003    3.942

11) client: 2MB RA, 64 max_sectors_kb, server: 64 max_sectors_kb, RA
2MB (server deadline, client deadline)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  14.051   13.427   13.498   75.001    1.510    1.172
 33554432  13.397   14.008   13.453   75.217    1.503    2.351
 16777216  13.277    9.942   14.318   83.882   13.712    5.243


Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-29 12:48                                                                                         ` Ronald Moesbergen
@ 2009-07-31 18:32                                                                                           ` Vladislav Bolkhovitin
  2009-08-03  9:15                                                                                             ` Ronald Moesbergen
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-07-31 18:32 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche


Ronald Moesbergen, on 07/29/2009 04:48 PM wrote:
> 2009/7/28 Vladislav Bolkhovitin <vst@vlnb.net>:
>> Can you perform the tests 5 and 8 the deadline? I asked for deadline..
>>
>> What I/O scheduler do you use on the initiator? Can you check if changing it
>> to deadline or noop makes any difference?
>>
> 
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context, forced_order
> 
> With one IO thread:
> 5) client: default, server: default (server deadline, client cfq)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  15.739   15.339   16.511   64.613    1.959    1.010
>  33554432  15.411   12.384   15.400   71.876    7.646    2.246
>  16777216  16.564   15.569   16.279   63.498    1.667    3.969
> 
> 5) client: default, server: default (server deadline, client deadline)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  17.578   20.051   18.010   55.395    3.111    0.866
>  33554432  19.247   12.607   17.930   63.846   12.390    1.995
>  16777216  14.587   19.631   18.032   59.718    7.650    3.732
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (server
> deadline, client deadline)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  17.418   19.520   22.050   52.564    5.043    0.821
>  33554432  21.263   17.623   17.782   54.616    4.571    1.707
>  16777216  17.896   18.335   19.407   55.278    1.864    3.455
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (server
> deadline, client cfq)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  16.639   15.216   16.035   64.233    2.365    1.004
>  33554432  15.750   16.511   16.092   63.557    1.224    1.986
>  16777216  16.390   15.866   15.331   64.604    1.763    4.038
> 
> 11) client: 2MB RA, 64 max_sectors_kb, server: 64 max_sectors_kb, RA
> 2MB (server deadline, client deadline)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  14.117   13.610   13.558   74.435    1.347    1.163
>  33554432  13.450   10.344   13.556   83.555   10.918    2.611
>  16777216  13.408   13.319   13.239   76.867    0.398    4.804
> 
> With two threads:
> 5) client: default, server: default (server deadline, client cfq)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  15.723   16.535   16.189   63.438    1.312    0.991
>  33554432  16.152   16.363   15.782   63.621    0.954    1.988
>  16777216  15.174   16.084   16.682   64.178    2.516    4.011
> 
> 5) client: default, server: default (server deadline, client deadline)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.087   18.082   17.639   57.099    0.674    0.892
>  33554432  18.377   15.750   17.551   59.694    3.912    1.865
>  16777216  18.490   15.553   18.778   58.585    5.143    3.662
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (server
> deadline, client deadline)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  18.140   19.114   17.442   56.244    2.103    0.879
>  33554432  17.183   17.233   21.367   55.646    5.461    1.739
>  16777216  19.813   17.965   18.132   55.053    2.393    3.441
> 
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (server
> deadline, client cfq)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  15.753   16.085   16.522   63.548    1.239    0.993
>  33554432  13.502   15.912   15.507   68.743    5.065    2.148
>  16777216  16.584   16.171   15.959   63.077    1.003    3.942
> 
> 11) client: 2MB RA, 64 max_sectors_kb, server: 64 max_sectors_kb, RA
> 2MB (server deadline, client deadline)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  14.051   13.427   13.498   75.001    1.510    1.172
>  33554432  13.397   14.008   13.453   75.217    1.503    2.351
>  16777216  13.277    9.942   14.318   83.882   13.712    5.243

OK, as I expected, on the SCST level everything is clear and the forced 
ordering change didn't change anything.

But still, a single read stream must be the fastest from single thread. 
Otherwise, there's something wrong somewhere in the I/O path: block 
layer, RA, I/O scheduler. And, apparently, this is what we have and 
should find out the cause.

Can you check if noop on the target and/or initiator makes any 
difference? Case 5 with 1 and 2 threads will be sufficient.

Thanks,
Vlad


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-07-31 18:32                                                                                           ` Vladislav Bolkhovitin
@ 2009-08-03  9:15                                                                                             ` Ronald Moesbergen
  2009-08-03  9:20                                                                                               ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 65+ messages in thread
From: Ronald Moesbergen @ 2009-08-03  9:15 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche

2009/7/31 Vladislav Bolkhovitin <vst@vlnb.net>:
>
> OK, as I expected, on the SCST level everything is clear and the forced
> ordering change didn't change anything.
>
> But still, a single read stream must be the fastest from single thread.
> Otherwise, there's something wrong somewhere in the I/O path: block layer,
> RA, I/O scheduler. And, apparently, this is what we have and should find out
> the cause.
>
> Can you check if noop on the target and/or initiator makes any difference?
> Case 5 with 1 and 2 threads will be sufficient.

That doesn't seem to help:

client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context, forced_order

With one IO thread:
5) client: default, server: default (server noop, client noop)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  17.612   21.113   21.355   51.532    4.680    0.805
 33554432  18.329   18.523   19.049   54.969    0.891    1.718
 16777216  18.497   18.219   17.042   57.217    2.059    3.576

With two threads:
5) client: default, server: default (server noop, client noop)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  17.436   18.376   20.493   54.807    3.634    0.856
 33554432  17.466   16.980   18.261   58.337    1.740    1.823
 16777216  18.222   17.567   18.077   57.045    0.901    3.565

Ronald.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-08-03  9:15                                                                                             ` Ronald Moesbergen
@ 2009-08-03  9:20                                                                                               ` Vladislav Bolkhovitin
  2009-08-03 11:44                                                                                                 ` Ronald Moesbergen
  0 siblings, 1 reply; 65+ messages in thread
From: Vladislav Bolkhovitin @ 2009-08-03  9:20 UTC (permalink / raw)
  To: Ronald Moesbergen
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche

Ronald Moesbergen, on 08/03/2009 01:15 PM wrote:
> 2009/7/31 Vladislav Bolkhovitin <vst@vlnb.net>:
>> OK, as I expected, on the SCST level everything is clear and the forced
>> ordering change didn't change anything.
>>
>> But still, a single read stream must be the fastest from single thread.
>> Otherwise, there's something wrong somewhere in the I/O path: block layer,
>> RA, I/O scheduler. And, apparently, this is what we have and should find out
>> the cause.
>>
>> Can you check if noop on the target and/or initiator makes any difference?
>> Case 5 with 1 and 2 threads will be sufficient.
> 
> That doesn't seem to help:
> 
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context, forced_order
> 
> With one IO thread:
> 5) client: default, server: default (server noop, client noop)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  17.612   21.113   21.355   51.532    4.680    0.805
>  33554432  18.329   18.523   19.049   54.969    0.891    1.718
>  16777216  18.497   18.219   17.042   57.217    2.059    3.576
> 
> With two threads:
> 5) client: default, server: default (server noop, client noop)
> blocksize       R        R        R   R(avg,    R(std        R
>   (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>  67108864  17.436   18.376   20.493   54.807    3.634    0.856
>  33554432  17.466   16.980   18.261   58.337    1.740    1.823
>  16777216  18.222   17.567   18.077   57.045    0.901    3.565

And with client cfq, server noop?

> Ronald.
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-08-03  9:20                                                                                               ` Vladislav Bolkhovitin
@ 2009-08-03 11:44                                                                                                 ` Ronald Moesbergen
  0 siblings, 0 replies; 65+ messages in thread
From: Ronald Moesbergen @ 2009-08-03 11:44 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: fengguang.wu, linux-kernel, akpm, kosaki.motohiro, Alan.Brunelle,
	linux-fsdevel, jens.axboe, randy.dunlap, Bart Van Assche

2009/8/3 Vladislav Bolkhovitin <vst@vlnb.net>:
> Ronald Moesbergen, on 08/03/2009 01:15 PM wrote:
>>
>> 2009/7/31 Vladislav Bolkhovitin <vst@vlnb.net>:
>>>
>>> OK, as I expected, on the SCST level everything is clear and the forced
>>> ordering change didn't change anything.
>>>
>>> But still, a single read stream must be the fastest from single thread.
>>> Otherwise, there's something wrong somewhere in the I/O path: block
>>> layer,
>>> RA, I/O scheduler. And, apparently, this is what we have and should find
>>> out
>>> the cause.
>>>
>>> Can you check if noop on the target and/or initiator makes any
>>> difference?
>>> Case 5 with 1 and 2 threads will be sufficient.
>>
>> That doesn't seem to help:
>>
>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
>> and io_context, forced_order
>>
>> With one IO thread:
>> 5) client: default, server: default (server noop, client noop)
>> blocksize       R        R        R   R(avg,    R(std        R
>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  17.612   21.113   21.355   51.532    4.680    0.805
>>  33554432  18.329   18.523   19.049   54.969    0.891    1.718
>>  16777216  18.497   18.219   17.042   57.217    2.059    3.576
>>
>> With two threads:
>> 5) client: default, server: default (server noop, client noop)
>> blocksize       R        R        R   R(avg,    R(std        R
>>  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
>>  67108864  17.436   18.376   20.493   54.807    3.634    0.856
>>  33554432  17.466   16.980   18.261   58.337    1.740    1.823
>>  16777216  18.222   17.567   18.077   57.045    0.901    3.565
>
> And with client cfq, server noop?

client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context, forced_order

With one IO thread:
5) client: default, server: default (server noop, client cfq)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  16.019   16.434   15.730   63.777    1.144    0.997
 33554432  16.020   16.624   15.936   63.258    1.183    1.977
 16777216  15.966   15.465   16.115   64.630    1.145    4.039

With two threads:
5) client: default, server: default (server noop, client cfq)
blocksize       R        R        R   R(avg,    R(std        R
  (bytes)     (s)      (s)      (s)    MB/s)   ,MB/s)   (IOPS)
 67108864  16.504   15.762   14.842   65.335    2.848    1.021
 33554432  16.080   16.627   15.766   63.406    1.386    1.981
 16777216  15.489   16.627   16.043   63.842    1.846    3.990

Ronald.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev
  2009-05-29  5:35 [RESEND] [PATCH] readahead:add blk_run_backing_dev Hisashi Hifumi
  2009-06-01  0:36 ` Andrew Morton
@ 2009-09-22 20:58 ` Andrew Morton
  1 sibling, 0 replies; 65+ messages in thread
From: Andrew Morton @ 2009-09-22 20:58 UTC (permalink / raw)
  To: Hisashi Hifumi; +Cc: linux-kernel, linux-fsdevel, linux-mm, Wu Fengguang

On Fri, 29 May 2009 14:35:55 +0900
Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> wrote:

> I added blk_run_backing_dev on page_cache_async_readahead
> so readahead I/O is unpluged to improve throughput on 
> especially RAID environment. 

I still haven't sent this upstream.  It's unclear to me that we've
decided that it merits merging?



From: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>

I added blk_run_backing_dev on page_cache_async_readahead so readahead I/O
is unpluged to improve throughput on especially RAID environment.

The normal case is, if page N become uptodate at time T(N), then T(N) <=
T(N+1) holds.  With RAID (and NFS to some degree), there is no strict
ordering, the data arrival time depends on runtime status of individual
disks, which breaks that formula.  So in do_generic_file_read(), just
after submitting the async readahead IO request, the current page may well
be uptodate, so the page won't be locked, and the block device won't be
implicitly unplugged:

               if (PageReadahead(page))
                        page_cache_async_readahead()
                if (!PageUptodate(page))
                                goto page_not_up_to_date;
                //...
page_not_up_to_date:
                lock_page_killable(page);

Therefore explicit unplugging can help.

Following is the test result with dd.

#dd if=testdir/testfile of=/dev/null bs=16384

-2.6.30-rc6
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s

-2.6.30-rc6-patched
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s

(7Disks RAID-0 Array)

-2.6.30-rc6
1054976+0 records in
1054976+0 records out
17284726784 bytes (17 GB) copied, 212.233 seconds, 81.4 MB/s

-2.6.30-rc6-patched
1054976+0 records out
17284726784 bytes (17 GB) copied, 198.878 seconds, 86.9 MB/s

(7Disks RAID-5 Array)

The patch was found to improve performance with the SCST scsi target
driver.  See
http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel

[akpm@linux-foundation.org: unbust comment layout]
[akpm@linux-foundation.org: "fix" CONFIG_BLOCK=n]
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: Ronald <intercommit@gmail.com>
Cc: Bart Van Assche <bart.vanassche@gmail.com>
Cc: Vladislav Bolkhovitin <vst@vlnb.net>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/readahead.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff -puN mm/readahead.c~readahead-add-blk_run_backing_dev mm/readahead.c
--- a/mm/readahead.c~readahead-add-blk_run_backing_dev
+++ a/mm/readahead.c
@@ -547,5 +547,17 @@ page_cache_async_readahead(struct addres
 
 	/* do read-ahead */
 	ondemand_readahead(mapping, ra, filp, true, offset, req_size);
+
+#ifdef CONFIG_BLOCK
+	/*
+	 * Normally the current page is !uptodate and lock_page() will be
+	 * immediately called to implicitly unplug the device. However this
+	 * is not always true for RAID conifgurations, where data arrives
+	 * not strictly in their submission order. In this case we need to
+	 * explicitly kick off the IO.
+	 */
+	if (PageUptodate(page))
+		blk_run_backing_dev(mapping->backing_dev_info, NULL);
+#endif
 }
 EXPORT_SYMBOL_GPL(page_cache_async_readahead);
_


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RESEND][PATCH] readahead:add blk_run_backing_dev
@ 2009-05-22  0:09 Hisashi Hifumi
  0 siblings, 0 replies; 65+ messages in thread
From: Hisashi Hifumi @ 2009-05-22  0:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel

Hi Andrew.
Following patch improves sequential read performance and does not harm
other performance.
Please merge my patch.
Comments?
Thanks.

#dd if=testdir/testfile of=/dev/null bs=16384
-2.6.30-rc6
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s

-2.6.30-rc6-patched
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>

diff -Nrup linux-2.6.30-rc6.org/mm/readahead.c linux-2.6.30-rc6.unplug/mm/readahead.c
--- linux-2.6.30-rc6.org/mm/readahead.c	2009-05-18 10:46:15.000000000 +0900
+++ linux-2.6.30-rc6.unplug/mm/readahead.c	2009-05-18 13:00:42.000000000 +0900
@@ -490,5 +490,7 @@ page_cache_async_readahead(struct addres
 
 	/* do read-ahead */
 	ondemand_readahead(mapping, ra, filp, true, offset, req_size);
+
+	blk_run_backing_dev(mapping->backing_dev_info, NULL);
 }
 EXPORT_SYMBOL_GPL(page_cache_async_readahead);


^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2009-09-22 20:58 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-29  5:35 [RESEND] [PATCH] readahead:add blk_run_backing_dev Hisashi Hifumi
2009-06-01  0:36 ` Andrew Morton
2009-06-01  1:04   ` Hisashi Hifumi
2009-06-05 15:15     ` Alan D. Brunelle
2009-06-06 14:36       ` KOSAKI Motohiro
2009-06-06 22:45         ` Wu Fengguang
2009-06-18 19:04           ` Andrew Morton
2009-06-20  3:55             ` Wu Fengguang
2009-06-20 12:29               ` Vladislav Bolkhovitin
2009-06-29  9:34                 ` Wu Fengguang
2009-06-29 10:26                   ` Ronald Moesbergen
2009-06-29 10:55                     ` Vladislav Bolkhovitin
2009-06-29 12:54                       ` Wu Fengguang
2009-06-29 12:58                         ` Bart Van Assche
2009-06-29 13:01                           ` Wu Fengguang
2009-06-29 13:04                         ` Vladislav Bolkhovitin
2009-06-29 13:13                           ` Wu Fengguang
2009-06-29 13:28                             ` Wu Fengguang
2009-06-29 14:43                               ` Ronald Moesbergen
2009-06-29 14:51                                 ` Wu Fengguang
2009-06-29 14:56                                   ` Ronald Moesbergen
2009-06-29 15:37                                   ` Vladislav Bolkhovitin
2009-06-29 14:00                           ` Ronald Moesbergen
2009-06-29 14:21                             ` Wu Fengguang
2009-06-29 15:01                               ` Wu Fengguang
2009-06-29 15:37                                 ` Vladislav Bolkhovitin
     [not found]                                   ` <20090630010414.GB31418@localhost>
     [not found]                                     ` <4A49EEF9.6010205@vlnb.net>
     [not found]                                       ` <a0272b440907030214l4016422bxbc98fd003bfe1b3d@mail.gmail.com>
     [not found]                                         ` <4A4DE3C1.5080307@vlnb.net>
     [not found]                                           ` <a0272b440907040819l5289483cp44b37d967440ef73@mail.gmail.com>
2009-07-06 11:12                                             ` Vladislav Bolkhovitin
2009-07-06 14:37                                               ` Ronald Moesbergen
2009-07-06 17:48                                                 ` Vladislav Bolkhovitin
2009-07-07  6:49                                                   ` Ronald Moesbergen
     [not found]                                                     ` <4A5395FD.2040507@vlnb.net>
     [not found]                                                       ` <a0272b440907080149j3eeeb9bat13f942520db059a8@mail.gmail.com>
2009-07-08 12:40                                                         ` Vladislav Bolkhovitin
2009-07-10  6:32                                                           ` Ronald Moesbergen
2009-07-10  8:43                                                             ` Vladislav Bolkhovitin
2009-07-10  9:27                                                               ` Vladislav Bolkhovitin
2009-07-13 12:12                                                                 ` Ronald Moesbergen
2009-07-13 12:36                                                                   ` Wu Fengguang
2009-07-13 12:47                                                                     ` Ronald Moesbergen
2009-07-13 12:52                                                                       ` Wu Fengguang
2009-07-14 18:52                                                                     ` Vladislav Bolkhovitin
2009-07-15  7:06                                                                       ` Wu Fengguang
2009-07-14 18:52                                                                   ` Vladislav Bolkhovitin
2009-07-15  6:30                                                                     ` Vladislav Bolkhovitin
2009-07-16  7:32                                                                       ` Ronald Moesbergen
2009-07-16 10:36                                                                         ` Vladislav Bolkhovitin
2009-07-16 14:54                                                                           ` Ronald Moesbergen
2009-07-16 16:03                                                                             ` Vladislav Bolkhovitin
2009-07-17 14:15                                                                           ` Ronald Moesbergen
2009-07-17 18:23                                                                             ` Vladislav Bolkhovitin
2009-07-20  7:20                                                                               ` Vladislav Bolkhovitin
2009-07-22  8:44                                                                                 ` Ronald Moesbergen
2009-07-27 13:11                                                                                   ` Vladislav Bolkhovitin
2009-07-28  9:51                                                                                     ` Ronald Moesbergen
2009-07-28 19:07                                                                                       ` Vladislav Bolkhovitin
2009-07-29 12:48                                                                                         ` Ronald Moesbergen
2009-07-31 18:32                                                                                           ` Vladislav Bolkhovitin
2009-08-03  9:15                                                                                             ` Ronald Moesbergen
2009-08-03  9:20                                                                                               ` Vladislav Bolkhovitin
2009-08-03 11:44                                                                                                 ` Ronald Moesbergen
2009-07-15 20:52                                                           ` Kurt Garloff
2009-07-16 10:38                                                             ` Vladislav Bolkhovitin
2009-06-30 10:22                             ` Vladislav Bolkhovitin
2009-06-29 10:55                   ` Vladislav Bolkhovitin
2009-06-29 13:00                     ` Wu Fengguang
2009-09-22 20:58 ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2009-05-22  0:09 [RESEND][PATCH] " Hisashi Hifumi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).