All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.10 + 765d704db: no improvemtn in write rates with md/raid5 group_thread_cnt > 0
@ 2017-04-05 14:13 Nix
  2017-04-10 20:10 ` Shaohua Li
  0 siblings, 1 reply; 3+ messages in thread
From: Nix @ 2017-04-05 14:13 UTC (permalink / raw)
  To: linux-raid; +Cc: Shaohua Li

So you'd expect write rates on a RAID-5 array to be higher than write rates on a
single spinning-rust disk, right? Because, even with Shaohua's commit
765d704db1f583630d52 applied atop 4.10, I see little sign of it. Does this
commit depend upon something else to stop death by seeking with
group_thread_cnt > 0? It didn't look like it to me...

The results Shaohua showed in the original commit were very impressive, but for
the life of me I can't figure out how to get anything like them.


With group_thread_cnt 0, I max out at a bit higher than the 240MiB/s one disk in
this array can manage on its own, for obvious reasons: md_raid5 CPU saturation.
(This is with a 512KiB chunksize, stripe_cache_size of 512: yes, I know that's
small, it's just a random slice taken out of a much larger test series: the
array is a smallish non-degraded unjournalled four-element md5 initialized with
--assume-clean for benchmarking). Similar results are seen with ext4 and xfs.
Trimmed-down iozone -a output, so only one serial writer, but still:

                                                       stride
     kB  reclen    write  rewrite    read    reread      read
     64       4     6752    15647    26489    30145     26678
     64       8     6639    25236    45101    56289     43158
     64      16     6014     9799    67364    89009     60900
     64      32    35200    48781     7374   177207      7336
     64      64    32420    70551   109395   229470     97868
[...]
  32768      64    28181    30576   265403   178438    299889
  32768     128    41659    39989   319709   320689    330949
  32768     256    45402    44555   320689   357564    451256
  32768     512    42559    40556   177862   299744    466529
  32768    1024    68005    52814   415747   391507    706177
  32768    2048    91701   103918   520689   540128   1061339
  32768    4096   177716   169486   487277   514111    683463
  32768    8192   218923   233152   539853   616869    453021
  32768   16384   199068   198872   569353   619913    535240
[...]
 262144      64    25148    32423   385802   378681     27762
 262144     128    42510    41626   436994   380669     48004
 262144     256    43415    44004   436209   418971     76697
 262144     512    41408    40399   342862   401145    116781
 262144    1024    68870    59341   465737   507454    265154
 262144    2048   101994    91693   589277   582836    296474
 262144    4096   176852   166200   581922   649215    421253
 262144    8192   226696   221838   601174   633347    569766
 262144   16384   307843   297985   644679   659060    569302
 524288      64    25155    24527   392401   401908     21461
 524288     128    41422    41525   433156   464331     35360
 524288     256    42059    43742   443281   415799     70171
 524288     512    41253    39360   414306   428993     75387
 524288    1024    66081    61151   498880   517952    186959
 524288    2048   101272    90418   610467   623258    274331
 524288    4096   171489   173381   601689   576333    314290
 524288    8192   220943   215226   641713   607459    444827
 524288   16384   289055   296340   651010   671623    503633

Read rates are as high as I'd expect for a four-disk RAID-5 array, and the
sequential write output rates, while higher than one one disk can manage, are
thresholded here by the performance of the md I/O thread, as expected.

If I boost group_thread_cnt to, say, 2, I see:

     64       4     3677    14565    27936    36056     29629
     64       8     6670    21608    53422    69187     32045
     64      16     6682    26209    70329   103891     66662
     64      32    28624    40048     7312   154556      7345
     64      64    38327    43213    89127   260160     90540
[...]
  32768      64    14328    18580   265136   282946    308082
  32768     128    26310    24803   265762   323414    354685
  32768     256    29115    27073   238659   308974    345723
  32768     512    21572    21345   293312   314086    345365
  32768    1024    43978    38071   395715   345161    545821
  32768    2048    82898    70840   293151   470398    922082
  32768    4096   143350   124658   391980   659819    617984
  32768    8192   164297   227661   570423   645141    515009
  32768   16384   157701   171804   568484   451448    350715
[...]
 262144      64    17150    17693   391561   382751     28374
 262144     128    25385    26498   423685   410359     47148
 262144     256    29219    30244   392992   421748     80403
 262144     512    24303    24686   399453   371882    122861
 262144    1024    42296    42535   403020   508195    261339
 262144    2048    75740    63125   606979   589329    296124
 262144    4096   134646   137543   562749   590893    392938
 262144    8192   237800   239847   631752   620766    475791
 262144   16384   267889   304517   635674   612164    598521
 524288      64    17691    17776   403333   374628     21673
 524288     128    25575    25609   396568   439018     34526
 524288     256    29984    29990   412587   437099     71650
 524288     512    24971    25599   403074   431581     75471
 524288    1024    42545    43657   505740   519112    200811
 524288    2048    72519    75604   559987   589069    257654
 524288    4096   135122   140745   622450   499336    331273
 524288    8192   232848   231307   592729   604849    432296
 524288   16384   280105   271252   647725   664868    472363

Larger writes are clearly still thresholded.

Boost the thread count more, here, to 8:

     64       4     7834    14388    30346    40300      6124
     64       8    17236    21282     6984    37644      6842
     64      16    21100    24720     7208   120277      7199
     64      32    29411    45553     7374   162411      7357
     64      64     3671    59588    78128   256923     82804
[...]
  32768      64    14261    17866   261303   289135    294245
  32768     128    25832    27639   298172   324766    342822
  32768     256    26477    27196   277318   339353    352967
  32768     512    17848    19875   339424   272225    387746
  32768    1024    36017    38945   482068   464194    110825
  32768    2048    64240    67976   551762   505772     76629
  32768    4096    71022   117680   578561   696507    752493
  32768    8192   161080   207790   564343   556796    546488
  32768   16384   172937   233103   521368   603562    418679
[...]
 262144      64    17170    17452   352337   351258     27824
 262144     128    25318    25522   418977   424859     47112
 262144     256    26405    27092   426170   419684     79047
 262144     512    20185    20271   398733   411974    135554
 262144    1024    39013    38238   497919   438150    180384
 262144    2048    71054    70921   586634   535676    258955
 262144    4096   113222   121554   616548   604177    293088
 262144    8192   184086   187845   551395   586126    496147
 262144   16384   286319   272419   645900   659103    589384
 524288      64    16980    16756   385746   381476     21462
 524288     128    24993    25482   428855   438250     34889
 524288     256    26517    26134   448088   395352     70225
 524288     512    19534    19484   418764   416630     76975
 524288    1024    37645    38370   514030   511638    177818
 524288    2048    68469    72200   602688   542627    251162
 524288    4096   115467   121220   598738   629120    289589
 524288    8192   185093   182044   621233   586919    437162
 524288   16384   250990   266257   620428   660663    494770

Still thresholded. Yes, this is only one serial writer, but nonetheless this
seems a bit od.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 4.10 + 765d704db: no improvemtn in write rates with md/raid5 group_thread_cnt > 0
  2017-04-05 14:13 4.10 + 765d704db: no improvemtn in write rates with md/raid5 group_thread_cnt > 0 Nix
@ 2017-04-10 20:10 ` Shaohua Li
  2017-04-11 10:01   ` Nix
  0 siblings, 1 reply; 3+ messages in thread
From: Shaohua Li @ 2017-04-10 20:10 UTC (permalink / raw)
  To: Nix; +Cc: linux-raid, Shaohua Li

On Wed, Apr 05, 2017 at 03:13:48PM +0100, Nix wrote:
> So you'd expect write rates on a RAID-5 array to be higher than write rates on a
> single spinning-rust disk, right? Because, even with Shaohua's commit
> 765d704db1f583630d52 applied atop 4.10, I see little sign of it. Does this
> commit depend upon something else to stop death by seeking with
> group_thread_cnt > 0? It didn't look like it to me...
> 
> The results Shaohua showed in the original commit were very impressive, but for
> the life of me I can't figure out how to get anything like them.

That only works well with large iodepth. For single write, we are still far
from the BW in theory. I actually wrote in the commit log:

"We are pretty close to the maximum bandwidth in the large iodepth
iodepth case. The performance gap of small iodepth sequential write
between software raid and theory value is still very big though, because
we don't have an efficient pipeline."

Thanks,
Shaohua
 
> 
> With group_thread_cnt 0, I max out at a bit higher than the 240MiB/s one disk in
> this array can manage on its own, for obvious reasons: md_raid5 CPU saturation.
> (This is with a 512KiB chunksize, stripe_cache_size of 512: yes, I know that's
> small, it's just a random slice taken out of a much larger test series: the
> array is a smallish non-degraded unjournalled four-element md5 initialized with
> --assume-clean for benchmarking). Similar results are seen with ext4 and xfs.
> Trimmed-down iozone -a output, so only one serial writer, but still:
> 
>                                                        stride
>      kB  reclen    write  rewrite    read    reread      read
>      64       4     6752    15647    26489    30145     26678
>      64       8     6639    25236    45101    56289     43158
>      64      16     6014     9799    67364    89009     60900
>      64      32    35200    48781     7374   177207      7336
>      64      64    32420    70551   109395   229470     97868
> [...]
>   32768      64    28181    30576   265403   178438    299889
>   32768     128    41659    39989   319709   320689    330949
>   32768     256    45402    44555   320689   357564    451256
>   32768     512    42559    40556   177862   299744    466529
>   32768    1024    68005    52814   415747   391507    706177
>   32768    2048    91701   103918   520689   540128   1061339
>   32768    4096   177716   169486   487277   514111    683463
>   32768    8192   218923   233152   539853   616869    453021
>   32768   16384   199068   198872   569353   619913    535240
> [...]
>  262144      64    25148    32423   385802   378681     27762
>  262144     128    42510    41626   436994   380669     48004
>  262144     256    43415    44004   436209   418971     76697
>  262144     512    41408    40399   342862   401145    116781
>  262144    1024    68870    59341   465737   507454    265154
>  262144    2048   101994    91693   589277   582836    296474
>  262144    4096   176852   166200   581922   649215    421253
>  262144    8192   226696   221838   601174   633347    569766
>  262144   16384   307843   297985   644679   659060    569302
>  524288      64    25155    24527   392401   401908     21461
>  524288     128    41422    41525   433156   464331     35360
>  524288     256    42059    43742   443281   415799     70171
>  524288     512    41253    39360   414306   428993     75387
>  524288    1024    66081    61151   498880   517952    186959
>  524288    2048   101272    90418   610467   623258    274331
>  524288    4096   171489   173381   601689   576333    314290
>  524288    8192   220943   215226   641713   607459    444827
>  524288   16384   289055   296340   651010   671623    503633
> 
> Read rates are as high as I'd expect for a four-disk RAID-5 array, and the
> sequential write output rates, while higher than one one disk can manage, are
> thresholded here by the performance of the md I/O thread, as expected.
> 
> If I boost group_thread_cnt to, say, 2, I see:
> 
>      64       4     3677    14565    27936    36056     29629
>      64       8     6670    21608    53422    69187     32045
>      64      16     6682    26209    70329   103891     66662
>      64      32    28624    40048     7312   154556      7345
>      64      64    38327    43213    89127   260160     90540
> [...]
>   32768      64    14328    18580   265136   282946    308082
>   32768     128    26310    24803   265762   323414    354685
>   32768     256    29115    27073   238659   308974    345723
>   32768     512    21572    21345   293312   314086    345365
>   32768    1024    43978    38071   395715   345161    545821
>   32768    2048    82898    70840   293151   470398    922082
>   32768    4096   143350   124658   391980   659819    617984
>   32768    8192   164297   227661   570423   645141    515009
>   32768   16384   157701   171804   568484   451448    350715
> [...]
>  262144      64    17150    17693   391561   382751     28374
>  262144     128    25385    26498   423685   410359     47148
>  262144     256    29219    30244   392992   421748     80403
>  262144     512    24303    24686   399453   371882    122861
>  262144    1024    42296    42535   403020   508195    261339
>  262144    2048    75740    63125   606979   589329    296124
>  262144    4096   134646   137543   562749   590893    392938
>  262144    8192   237800   239847   631752   620766    475791
>  262144   16384   267889   304517   635674   612164    598521
>  524288      64    17691    17776   403333   374628     21673
>  524288     128    25575    25609   396568   439018     34526
>  524288     256    29984    29990   412587   437099     71650
>  524288     512    24971    25599   403074   431581     75471
>  524288    1024    42545    43657   505740   519112    200811
>  524288    2048    72519    75604   559987   589069    257654
>  524288    4096   135122   140745   622450   499336    331273
>  524288    8192   232848   231307   592729   604849    432296
>  524288   16384   280105   271252   647725   664868    472363
> 
> Larger writes are clearly still thresholded.
> 
> Boost the thread count more, here, to 8:
> 
>      64       4     7834    14388    30346    40300      6124
>      64       8    17236    21282     6984    37644      6842
>      64      16    21100    24720     7208   120277      7199
>      64      32    29411    45553     7374   162411      7357
>      64      64     3671    59588    78128   256923     82804
> [...]
>   32768      64    14261    17866   261303   289135    294245
>   32768     128    25832    27639   298172   324766    342822
>   32768     256    26477    27196   277318   339353    352967
>   32768     512    17848    19875   339424   272225    387746
>   32768    1024    36017    38945   482068   464194    110825
>   32768    2048    64240    67976   551762   505772     76629
>   32768    4096    71022   117680   578561   696507    752493
>   32768    8192   161080   207790   564343   556796    546488
>   32768   16384   172937   233103   521368   603562    418679
> [...]
>  262144      64    17170    17452   352337   351258     27824
>  262144     128    25318    25522   418977   424859     47112
>  262144     256    26405    27092   426170   419684     79047
>  262144     512    20185    20271   398733   411974    135554
>  262144    1024    39013    38238   497919   438150    180384
>  262144    2048    71054    70921   586634   535676    258955
>  262144    4096   113222   121554   616548   604177    293088
>  262144    8192   184086   187845   551395   586126    496147
>  262144   16384   286319   272419   645900   659103    589384
>  524288      64    16980    16756   385746   381476     21462
>  524288     128    24993    25482   428855   438250     34889
>  524288     256    26517    26134   448088   395352     70225
>  524288     512    19534    19484   418764   416630     76975
>  524288    1024    37645    38370   514030   511638    177818
>  524288    2048    68469    72200   602688   542627    251162
>  524288    4096   115467   121220   598738   629120    289589
>  524288    8192   185093   182044   621233   586919    437162
>  524288   16384   250990   266257   620428   660663    494770
> 
> Still thresholded. Yes, this is only one serial writer, but nonetheless this
> seems a bit od.
> 
> -- 
> NULL && (void)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 4.10 + 765d704db: no improvemtn in write rates with md/raid5 group_thread_cnt > 0
  2017-04-10 20:10 ` Shaohua Li
@ 2017-04-11 10:01   ` Nix
  0 siblings, 0 replies; 3+ messages in thread
From: Nix @ 2017-04-11 10:01 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid, Shaohua Li

On 10 Apr 2017, Shaohua Li stated:

> On Wed, Apr 05, 2017 at 03:13:48PM +0100, Nix wrote:
>> So you'd expect write rates on a RAID-5 array to be higher than write rates on a
>> single spinning-rust disk, right? Because, even with Shaohua's commit
>> 765d704db1f583630d52 applied atop 4.10, I see little sign of it. Does this
>> commit depend upon something else to stop death by seeking with
>> group_thread_cnt > 0? It didn't look like it to me...
>> 
>> The results Shaohua showed in the original commit were very impressive, but for
>> the life of me I can't figure out how to get anything like them.
>
> That only works well with large iodepth. For single write, we are still far
> from the BW in theory. I actually wrote in the commit log:
>
> "We are pretty close to the maximum bandwidth in the large iodepth
> iodepth case. The performance gap of small iodepth sequential write
> between software raid and theory value is still very big though, because
> we don't have an efficient pipeline."

Ah right, I missed the significance of that. So this helps only if you
have multiple simultaneous multithreaded/async I/Os to the same file at
the same time? Damn, no help in any of my common use cases yet :( :( :(
except maybe massively-parallel compiles, but they are never write-bound
except when linking, and *that* is serial.

I guess I have to wait and hope for a better pipeline :)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-04-11 10:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-05 14:13 4.10 + 765d704db: no improvemtn in write rates with md/raid5 group_thread_cnt > 0 Nix
2017-04-10 20:10 ` Shaohua Li
2017-04-11 10:01   ` Nix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.