Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
@ 2016-01-26  2:32 Chien Lee
  2016-01-26 22:12 ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Chien Lee @ 2016-01-26  2:32 UTC (permalink / raw)
  To: linux-raid, neilb, shli, owner-linux-raid

Hello,

Recently we find a bug about this patch (commit No. is
ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ).

We know that this patch committed after Linux kernel 4.1.x is intended
to allowing resync to go faster when there is competing IO. However,
we find the performance of random read on syncing Raid6 will come up
with a huge drop in this case. The following is our testing detail.

The OS what we choose in our test is CentOS Linux release 7.1.1503
(Core) and the kernel image will be replaced for testing. In our
testing result, the 4K random read performance on syncing raid6 in
Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out
the root cause, we try to rollback this patch in Kernel 4.2.8, and we
find the 4K random read performance on syncing Raid6 will be improved
and go back to as what it should be in Kernel 3.19.8.

Nevertheless, it seems that it will not affect some other read/write
patterns. In our testing result, the 1M sequential read/write, 4K
random write performance in Kernel 4.2.8 is performed almost the same
as in Kernel 3.19.8.

It seems that although this patch increases the resync speed, the
logic of !is_mddev_idle() cause the sync request wait too short and
reduce the chance for raid5d to handle the random read I/O.

Following is our test environment and some testing results:

OS: CentOS Linux release 7.1.1503 (Core)

CPU: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz

Processor number: 8

Memory: 12GB

fio command:

1.      (for numjobs=64):

fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
--runtime=180 --size=50G --name=test-read --ioengine=libaio
--numjobs=64 --iodepth=1 --group_reporting

2.      (for numjobs=1):

fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
--runtime=180 --size=50G --name=test-read --ioengine=libaio
--numjobs=1 --iodepth=1 --group_reporting

Here are test results:

Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))

a.      4K Random Read, numjobs=64

                                             Average Throughput    Average IOPS

Kernel 3.19.8                                 715937KB/s              178984

Kernel 4.2.8                                   489874KB/s              122462

Kernel 4.2.8 Patch Rollback            717377KB/s              179344

b.      4K Random Read, numjobs=1

                                             Average Throughput    Average IOPS

Kernel 3.19.8                                 32203KB/s                8051

Kernel 4.2.8                                  2535.7KB/s                633

Kernel 4.2.8 Patch Rollback            31861KB/s                7965

Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))

a.      4K Random Read, numjobs=64

                                             Average Throughput    Average IOPS

Kernel 3.19.8                                2976.6KB/s               744

Kernel 4.2.8                                  2915.8KB/s               728

Kernel 4.2.8 Patch Rollback           2973.3KB/s               743

b.      4K Random Read, numjobs=1

                                             Average Throughput    Average IOPS

Kernel 3.19.8                                481844 B/s                 117

Kernel 4.2.8                                   24718 B/s                   5

Kernel 4.2.8 Patch Rollback           460090 B/s                 112

Thanks,

-- 

Chien Lee

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
  2016-01-26  2:32 [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO Chien Lee
@ 2016-01-26 22:12 ` NeilBrown
  2016-01-26 22:52   ` Shaohua Li
  2016-01-27  9:49   ` Chien Lee
  0 siblings, 2 replies; 12+ messages in thread
From: NeilBrown @ 2016-01-26 22:12 UTC (permalink / raw)
  To: Chien Lee, linux-raid, shli, owner-linux-raid

[-- Attachment #1: Type: text/plain, Size: 3998 bytes --]

On Tue, Jan 26 2016, Chien Lee wrote:

> Hello,
>
> Recently we find a bug about this patch (commit No. is
> ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ).
>
> We know that this patch committed after Linux kernel 4.1.x is intended
> to allowing resync to go faster when there is competing IO. However,
> we find the performance of random read on syncing Raid6 will come up
> with a huge drop in this case. The following is our testing detail.
>
> The OS what we choose in our test is CentOS Linux release 7.1.1503
> (Core) and the kernel image will be replaced for testing. In our
> testing result, the 4K random read performance on syncing raid6 in
> Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out
> the root cause, we try to rollback this patch in Kernel 4.2.8, and we
> find the 4K random read performance on syncing Raid6 will be improved
> and go back to as what it should be in Kernel 3.19.8.
>
> Nevertheless, it seems that it will not affect some other read/write
> patterns. In our testing result, the 1M sequential read/write, 4K
> random write performance in Kernel 4.2.8 is performed almost the same
> as in Kernel 3.19.8.
>
> It seems that although this patch increases the resync speed, the
> logic of !is_mddev_idle() cause the sync request wait too short and
> reduce the chance for raid5d to handle the random read I/O.

This has been raised before.
Can you please try the patch at the end of 

  http://permalink.gmane.org/gmane.linux.raid/51002

and let me know if it makes any difference.  If it isn't sufficient I
will explore further.

Thanks,
NeilBrown


>
>
> Following is our test environment and some testing results:
>
>
> OS: CentOS Linux release 7.1.1503 (Core)
>
> CPU: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz
>
> Processor number: 8
>
> Memory: 12GB
>
> fio command:
>
> 1.      (for numjobs=64):
>
> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
> --runtime=180 --size=50G --name=test-read --ioengine=libaio
> --numjobs=64 --iodepth=1 --group_reporting
>
> 2.      (for numjobs=1):
>
> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
> --runtime=180 --size=50G --name=test-read --ioengine=libaio
> --numjobs=1 --iodepth=1 --group_reporting
>
>
>
> Here are test results:
>
>
> Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))
>
>
> a.      4K Random Read, numjobs=64
>
>                                              Average Throughput    Average IOPS
>
> Kernel 3.19.8                                 715937KB/s              178984
>
> Kernel 4.2.8                                   489874KB/s              122462
>
> Kernel 4.2.8 Patch Rollback            717377KB/s              179344
>
>
>
> b.      4K Random Read, numjobs=1
>
>                                              Average Throughput    Average IOPS
>
> Kernel 3.19.8                                 32203KB/s                8051
>
> Kernel 4.2.8                                  2535.7KB/s                633
>
> Kernel 4.2.8 Patch Rollback            31861KB/s                7965
>
>
>
>
> Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))
>
>
> a.      4K Random Read, numjobs=64
>
>                                              Average Throughput    Average IOPS
>
> Kernel 3.19.8                                2976.6KB/s               744
>
> Kernel 4.2.8                                  2915.8KB/s               728
>
> Kernel 4.2.8 Patch Rollback           2973.3KB/s               743
>
>
>
> b.      4K Random Read, numjobs=1
>
>                                              Average Throughput    Average IOPS
>
> Kernel 3.19.8                                481844 B/s                 117
>
> Kernel 4.2.8                                   24718 B/s                   5
>
> Kernel 4.2.8 Patch Rollback           460090 B/s                 112
>
>
>
> Thanks,
>
> -- 
>
> Chien Lee

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
  2016-01-26 22:12 ` NeilBrown
@ 2016-01-26 22:52   ` Shaohua Li
  2016-01-26 23:08     ` NeilBrown
  2016-01-27  9:49   ` Chien Lee
  1 sibling, 1 reply; 12+ messages in thread
From: Shaohua Li @ 2016-01-26 22:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chien Lee, linux-raid, owner-linux-raid

On Wed, Jan 27, 2016 at 09:12:23AM +1100, Neil Brown wrote:
> On Tue, Jan 26 2016, Chien Lee wrote:
> 
> > Hello,
> >
> > Recently we find a bug about this patch (commit No. is
> > ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ).
> >
> > We know that this patch committed after Linux kernel 4.1.x is intended
> > to allowing resync to go faster when there is competing IO. However,
> > we find the performance of random read on syncing Raid6 will come up
> > with a huge drop in this case. The following is our testing detail.
> >
> > The OS what we choose in our test is CentOS Linux release 7.1.1503
> > (Core) and the kernel image will be replaced for testing. In our
> > testing result, the 4K random read performance on syncing raid6 in
> > Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out
> > the root cause, we try to rollback this patch in Kernel 4.2.8, and we
> > find the 4K random read performance on syncing Raid6 will be improved
> > and go back to as what it should be in Kernel 3.19.8.
> >
> > Nevertheless, it seems that it will not affect some other read/write
> > patterns. In our testing result, the 1M sequential read/write, 4K
> > random write performance in Kernel 4.2.8 is performed almost the same
> > as in Kernel 3.19.8.
> >
> > It seems that although this patch increases the resync speed, the
> > logic of !is_mddev_idle() cause the sync request wait too short and
> > reduce the chance for raid5d to handle the random read I/O.
> 
> This has been raised before.
> Can you please try the patch at the end of 
> 
>   http://permalink.gmane.org/gmane.linux.raid/51002
> 
> and let me know if it makes any difference.  If it isn't sufficient I
> will explore further.

I'm curious why we don't calculate the wait time. Say the target resync speed
is speed_min. The wait time should be:

(currspeed * SYNC_MARK_STEP - speed_min * SYNC_MARK_STEP) / speed_min
= (currspeed / speed_min - 1) * SYNC_MARK_STEP

if SYNC_MARK_STEP is too big and sync speed has drift, we can make it smaller.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
  2016-01-26 22:52   ` Shaohua Li
@ 2016-01-26 23:08     ` NeilBrown
  2016-01-26 23:27       ` Shaohua Li
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2016-01-26 23:08 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Chien Lee, linux-raid, owner-linux-raid

[-- Attachment #1: Type: text/plain, Size: 2684 bytes --]

On Wed, Jan 27 2016, Shaohua Li wrote:

> On Wed, Jan 27, 2016 at 09:12:23AM +1100, Neil Brown wrote:
>> On Tue, Jan 26 2016, Chien Lee wrote:
>> 
>> > Hello,
>> >
>> > Recently we find a bug about this patch (commit No. is
>> > ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ).
>> >
>> > We know that this patch committed after Linux kernel 4.1.x is intended
>> > to allowing resync to go faster when there is competing IO. However,
>> > we find the performance of random read on syncing Raid6 will come up
>> > with a huge drop in this case. The following is our testing detail.
>> >
>> > The OS what we choose in our test is CentOS Linux release 7.1.1503
>> > (Core) and the kernel image will be replaced for testing. In our
>> > testing result, the 4K random read performance on syncing raid6 in
>> > Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out
>> > the root cause, we try to rollback this patch in Kernel 4.2.8, and we
>> > find the 4K random read performance on syncing Raid6 will be improved
>> > and go back to as what it should be in Kernel 3.19.8.
>> >
>> > Nevertheless, it seems that it will not affect some other read/write
>> > patterns. In our testing result, the 1M sequential read/write, 4K
>> > random write performance in Kernel 4.2.8 is performed almost the same
>> > as in Kernel 3.19.8.
>> >
>> > It seems that although this patch increases the resync speed, the
>> > logic of !is_mddev_idle() cause the sync request wait too short and
>> > reduce the chance for raid5d to handle the random read I/O.
>> 
>> This has been raised before.
>> Can you please try the patch at the end of 
>> 
>>   http://permalink.gmane.org/gmane.linux.raid/51002
>> 
>> and let me know if it makes any difference.  If it isn't sufficient I
>> will explore further.
>
> I'm curious why we don't calculate the wait time. Say the target resync speed
> is speed_min. The wait time should be:
>
> (currspeed * SYNC_MARK_STEP - speed_min * SYNC_MARK_STEP) / speed_min
> = (currspeed / speed_min - 1) * SYNC_MARK_STEP
>
> if SYNC_MARK_STEP is too big and sync speed has drift, we can make it smaller.

What do you hope this would achieve?

If I understand correctly, this might allow the thread to sleep for
longer instead of looping around every 500ms or so.  But we don't really
want to do that.  As soon as filesystem IO pauses, we want resync IO to
go back to full speed.

The "speed_min" isn't really a "target".  It is only a "target" for
those times when there is no filesystem IO.

I'd certainly be happy to discuss alternate approaches to this.  Details
are important through.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
  2016-01-26 23:08     ` NeilBrown
@ 2016-01-26 23:27       ` Shaohua Li
  2016-01-27  1:12         ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Shaohua Li @ 2016-01-26 23:27 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chien Lee, linux-raid, owner-linux-raid

On Wed, Jan 27, 2016 at 10:08:45AM +1100, Neil Brown wrote:
> On Wed, Jan 27 2016, Shaohua Li wrote:
> 
> > On Wed, Jan 27, 2016 at 09:12:23AM +1100, Neil Brown wrote:
> >> On Tue, Jan 26 2016, Chien Lee wrote:
> >> 
> >> > Hello,
> >> >
> >> > Recently we find a bug about this patch (commit No. is
> >> > ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ).
> >> >
> >> > We know that this patch committed after Linux kernel 4.1.x is intended
> >> > to allowing resync to go faster when there is competing IO. However,
> >> > we find the performance of random read on syncing Raid6 will come up
> >> > with a huge drop in this case. The following is our testing detail.
> >> >
> >> > The OS what we choose in our test is CentOS Linux release 7.1.1503
> >> > (Core) and the kernel image will be replaced for testing. In our
> >> > testing result, the 4K random read performance on syncing raid6 in
> >> > Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out
> >> > the root cause, we try to rollback this patch in Kernel 4.2.8, and we
> >> > find the 4K random read performance on syncing Raid6 will be improved
> >> > and go back to as what it should be in Kernel 3.19.8.
> >> >
> >> > Nevertheless, it seems that it will not affect some other read/write
> >> > patterns. In our testing result, the 1M sequential read/write, 4K
> >> > random write performance in Kernel 4.2.8 is performed almost the same
> >> > as in Kernel 3.19.8.
> >> >
> >> > It seems that although this patch increases the resync speed, the
> >> > logic of !is_mddev_idle() cause the sync request wait too short and
> >> > reduce the chance for raid5d to handle the random read I/O.
> >> 
> >> This has been raised before.
> >> Can you please try the patch at the end of 
> >> 
> >>   http://permalink.gmane.org/gmane.linux.raid/51002
> >> 
> >> and let me know if it makes any difference.  If it isn't sufficient I
> >> will explore further.
> >
> > I'm curious why we don't calculate the wait time. Say the target resync speed
> > is speed_min. The wait time should be:
> >
> > (currspeed * SYNC_MARK_STEP - speed_min * SYNC_MARK_STEP) / speed_min
> > = (currspeed / speed_min - 1) * SYNC_MARK_STEP
> >
> > if SYNC_MARK_STEP is too big and sync speed has drift, we can make it smaller.
> 
> What do you hope this would achieve?

The whole point is to throttle sync speed to specific speed. If we know the
target speed, for any given time interval, we can calculate the sync IO size.
 
> If I understand correctly, this might allow the thread to sleep for
> longer instead of looping around every 500ms or so.  But we don't really
> want to do that.  As soon as filesystem IO pauses, we want resync IO to
> go back to full speed.
> 
> The "speed_min" isn't really a "target".  It is only a "target" for
> those times when there is no filesystem IO.

Yep, target is a little bit hard to determine. I think we can do:
if (curspeed > min) {
	if (!is_mddev_idle())
		targetspeed = minspeed;
	if (curspeed > max)
		targetspeed = maxspeed;
	sleep(max((currspeed / targetspeed - 1), 0) * SYNC_MARK_STEP)
}

This way we don't throttle if there is no filesystem IO. would this work?

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
  2016-01-26 23:27       ` Shaohua Li
@ 2016-01-27  1:12         ` NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2016-01-27  1:12 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Chien Lee, linux-raid, owner-linux-raid

[-- Attachment #1: Type: text/plain, Size: 4317 bytes --]

On Wed, Jan 27 2016, Shaohua Li wrote:

> On Wed, Jan 27, 2016 at 10:08:45AM +1100, Neil Brown wrote:
>> On Wed, Jan 27 2016, Shaohua Li wrote:
>> 
>> > On Wed, Jan 27, 2016 at 09:12:23AM +1100, Neil Brown wrote:
>> >> On Tue, Jan 26 2016, Chien Lee wrote:
>> >> 
>> >> > Hello,
>> >> >
>> >> > Recently we find a bug about this patch (commit No. is
>> >> > ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ).
>> >> >
>> >> > We know that this patch committed after Linux kernel 4.1.x is intended
>> >> > to allowing resync to go faster when there is competing IO. However,
>> >> > we find the performance of random read on syncing Raid6 will come up
>> >> > with a huge drop in this case. The following is our testing detail.
>> >> >
>> >> > The OS what we choose in our test is CentOS Linux release 7.1.1503
>> >> > (Core) and the kernel image will be replaced for testing. In our
>> >> > testing result, the 4K random read performance on syncing raid6 in
>> >> > Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out
>> >> > the root cause, we try to rollback this patch in Kernel 4.2.8, and we
>> >> > find the 4K random read performance on syncing Raid6 will be improved
>> >> > and go back to as what it should be in Kernel 3.19.8.
>> >> >
>> >> > Nevertheless, it seems that it will not affect some other read/write
>> >> > patterns. In our testing result, the 1M sequential read/write, 4K
>> >> > random write performance in Kernel 4.2.8 is performed almost the same
>> >> > as in Kernel 3.19.8.
>> >> >
>> >> > It seems that although this patch increases the resync speed, the
>> >> > logic of !is_mddev_idle() cause the sync request wait too short and
>> >> > reduce the chance for raid5d to handle the random read I/O.
>> >> 
>> >> This has been raised before.
>> >> Can you please try the patch at the end of 
>> >> 
>> >>   http://permalink.gmane.org/gmane.linux.raid/51002
>> >> 
>> >> and let me know if it makes any difference.  If it isn't sufficient I
>> >> will explore further.
>> >
>> > I'm curious why we don't calculate the wait time. Say the target resync speed
>> > is speed_min. The wait time should be:
>> >
>> > (currspeed * SYNC_MARK_STEP - speed_min * SYNC_MARK_STEP) / speed_min
>> > = (currspeed / speed_min - 1) * SYNC_MARK_STEP
>> >
>> > if SYNC_MARK_STEP is too big and sync speed has drift, we can make it smaller.
>> 
>> What do you hope this would achieve?
>
> The whole point is to throttle sync speed to specific speed. If we know the
> target speed, for any given time interval, we can calculate the sync
> IO size.

Actually, no.  The main point is to not interfere with filesystem IO too
much.  Limiting to a low target is a fairly poor way to do that (but is
all we have) and as there is such a wide range of device speeds it is no
longer possible to choose a sensible default.

>  
>> If I understand correctly, this might allow the thread to sleep for
>> longer instead of looping around every 500ms or so.  But we don't really
>> want to do that.  As soon as filesystem IO pauses, we want resync IO to
>> go back to full speed.
>> 
>> The "speed_min" isn't really a "target".  It is only a "target" for
>> those times when there is no filesystem IO.
>
> Yep, target is a little bit hard to determine. I think we can do:
> if (curspeed > min) {
> 	if (!is_mddev_idle())
> 		targetspeed = minspeed;
> 	if (curspeed > max)
> 		targetspeed = maxspeed;
> 	sleep(max((currspeed / targetspeed - 1), 0) * SYNC_MARK_STEP)
> }
>
> This way we don't throttle if there is no filesystem IO. would this
> work?

But if there is filesystem IO, then we throttle for at least 3 seconds
(SYNC_MARK_STEP is 3*HZ).  The filesystem could go idle in 1 second and
then we would spend 2 seconds pointlessly doing nothing.  We could make
SYNC_MARK_STEP smaller, but if recent speed had been high we could still
throttle for a lot longer than needed.

Before the patch that caused the regression we would throttle for 500ms,
so if the filesystem went idle we would waste at most 500ms.
After the patch, we throttle until pending requests have completed.
This causes the delay to scale with the speed of the device, but doesn't
seem to be enough of a delay in some cases.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
  2016-01-26 22:12 ` NeilBrown
  2016-01-26 22:52   ` Shaohua Li
@ 2016-01-27  9:49   ` Chien Lee
  2016-01-28  3:10     ` NeilBrown
  1 sibling, 1 reply; 12+ messages in thread
From: Chien Lee @ 2016-01-27  9:49 UTC (permalink / raw)
  To: NeilBrown, linux-raid, shli, owner-linux-raid

2016-01-27 6:12 GMT+08:00 NeilBrown <neilb@suse.com>:
> On Tue, Jan 26 2016, Chien Lee wrote:
>
>> Hello,
>>
>> Recently we find a bug about this patch (commit No. is
>> ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ).
>>
>> We know that this patch committed after Linux kernel 4.1.x is intended
>> to allowing resync to go faster when there is competing IO. However,
>> we find the performance of random read on syncing Raid6 will come up
>> with a huge drop in this case. The following is our testing detail.
>>
>> The OS what we choose in our test is CentOS Linux release 7.1.1503
>> (Core) and the kernel image will be replaced for testing. In our
>> testing result, the 4K random read performance on syncing raid6 in
>> Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out
>> the root cause, we try to rollback this patch in Kernel 4.2.8, and we
>> find the 4K random read performance on syncing Raid6 will be improved
>> and go back to as what it should be in Kernel 3.19.8.
>>
>> Nevertheless, it seems that it will not affect some other read/write
>> patterns. In our testing result, the 1M sequential read/write, 4K
>> random write performance in Kernel 4.2.8 is performed almost the same
>> as in Kernel 3.19.8.
>>
>> It seems that although this patch increases the resync speed, the
>> logic of !is_mddev_idle() cause the sync request wait too short and
>> reduce the chance for raid5d to handle the random read I/O.
>
> This has been raised before.
> Can you please try the patch at the end of
>
>   http://permalink.gmane.org/gmane.linux.raid/51002
>
> and let me know if it makes any difference.  If it isn't sufficient I
> will explore further.
>
> Thanks,
> NeilBrown


Hello Neil,

I try the patch (http://permalink.gmane.org/gmane.linux.raid/51002) in
Kernel 4.2.8. Here are the test results:


Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))

a.  4K Random Read, numjobs=64

                                   Average Throughput    Average IOPS

Kernel 4.2.8 Patch             601249KB/s              150312


b.  4K Random Read, numjobs=1

                                   Average Throughput    Average IOPS

Kernel 4.2.8 Patch             1166.4KB/s                  291



Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))

a.  4K Random Read, numjobs=64

                                   Average Throughput    Average IOPS

Kernel 4.2.8 Patch              2946.4KB/s                 736


b.  4K Random Read, numjobs=1

                                   Average Throughput    Average IOPS

Kernel 4.2.8 Patch              119199 B/s                   28


Although the performance that compare to the original Kernel 4.2.8
test results is increased, the patch
(http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ac8fa4196d205ac8fff3f8932bddbad4f16e4110)
rollback still has the best performance. I also observe the sync speed
at numjobs=64 almost drop to the sync_speed_min, but sync speed at
numjobs=1 almost keep in the original speed.

From my test results, I think this patch isn't sufficient that maybe
Neil can explore further and give me some advice.


Thanks,
Chien Lee


>>
>>
>> Following is our test environment and some testing results:
>>
>>
>> OS: CentOS Linux release 7.1.1503 (Core)
>>
>> CPU: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz
>>
>> Processor number: 8
>>
>> Memory: 12GB
>>
>> fio command:
>>
>> 1.      (for numjobs=64):
>>
>> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
>> --runtime=180 --size=50G --name=test-read --ioengine=libaio
>> --numjobs=64 --iodepth=1 --group_reporting
>>
>> 2.      (for numjobs=1):
>>
>> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
>> --runtime=180 --size=50G --name=test-read --ioengine=libaio
>> --numjobs=1 --iodepth=1 --group_reporting
>>
>>
>>
>> Here are test results:
>>
>>
>> Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))
>>
>>
>> a.      4K Random Read, numjobs=64
>>
>>                                              Average Throughput    Average IOPS
>>
>> Kernel 3.19.8                                 715937KB/s              178984
>>
>> Kernel 4.2.8                                   489874KB/s              122462
>>
>> Kernel 4.2.8 Patch Rollback            717377KB/s              179344
>>
>>
>>
>> b.      4K Random Read, numjobs=1
>>
>>                                              Average Throughput    Average IOPS
>>
>> Kernel 3.19.8                                 32203KB/s                8051
>>
>> Kernel 4.2.8                                  2535.7KB/s                633
>>
>> Kernel 4.2.8 Patch Rollback            31861KB/s                7965
>>
>>
>>
>>
>> Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))
>>
>>
>> a.      4K Random Read, numjobs=64
>>
>>                                              Average Throughput    Average IOPS
>>
>> Kernel 3.19.8                                2976.6KB/s               744
>>
>> Kernel 4.2.8                                  2915.8KB/s               728
>>
>> Kernel 4.2.8 Patch Rollback           2973.3KB/s               743
>>
>>
>>
>> b.      4K Random Read, numjobs=1
>>
>>                                              Average Throughput    Average IOPS
>>
>> Kernel 3.19.8                                481844 B/s                 117
>>
>> Kernel 4.2.8                                   24718 B/s                   5
>>
>> Kernel 4.2.8 Patch Rollback           460090 B/s                 112
>>
>>
>>
>> Thanks,
>>
>> --
>>
>> Chien Lee

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
  2016-01-27  9:49   ` Chien Lee
@ 2016-01-28  3:10     ` NeilBrown
  2016-01-28  4:42       ` Chien Lee
                         ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: NeilBrown @ 2016-01-28  3:10 UTC (permalink / raw)
  To: Chien Lee, linux-raid, shli, owner-linux-raid

[-- Attachment #1: Type: text/plain, Size: 6494 bytes --]

On Wed, Jan 27 2016, Chien Lee wrote:

> 2016-01-27 6:12 GMT+08:00 NeilBrown <neilb@suse.com>:
>> On Tue, Jan 26 2016, Chien Lee wrote:
>>
>>> Hello,
>>>
>>> Recently we find a bug about this patch (commit No. is
>>> ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ).
>>>
>>> We know that this patch committed after Linux kernel 4.1.x is intended
>>> to allowing resync to go faster when there is competing IO. However,
>>> we find the performance of random read on syncing Raid6 will come up
>>> with a huge drop in this case. The following is our testing detail.
>>>
>>> The OS what we choose in our test is CentOS Linux release 7.1.1503
>>> (Core) and the kernel image will be replaced for testing. In our
>>> testing result, the 4K random read performance on syncing raid6 in
>>> Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out
>>> the root cause, we try to rollback this patch in Kernel 4.2.8, and we
>>> find the 4K random read performance on syncing Raid6 will be improved
>>> and go back to as what it should be in Kernel 3.19.8.
>>>
>>> Nevertheless, it seems that it will not affect some other read/write
>>> patterns. In our testing result, the 1M sequential read/write, 4K
>>> random write performance in Kernel 4.2.8 is performed almost the same
>>> as in Kernel 3.19.8.
>>>
>>> It seems that although this patch increases the resync speed, the
>>> logic of !is_mddev_idle() cause the sync request wait too short and
>>> reduce the chance for raid5d to handle the random read I/O.
>>
>> This has been raised before.
>> Can you please try the patch at the end of
>>
>>   http://permalink.gmane.org/gmane.linux.raid/51002
>>
>> and let me know if it makes any difference.  If it isn't sufficient I
>> will explore further.
>>
>> Thanks,
>> NeilBrown
>
>
> Hello Neil,
>
> I try the patch (http://permalink.gmane.org/gmane.linux.raid/51002) in
> Kernel 4.2.8. Here are the test results:
>
>
> Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))
>
> a.  4K Random Read, numjobs=64
>
>                                    Average Throughput    Average IOPS
>
> Kernel 4.2.8 Patch             601249KB/s              150312
>
>
> b.  4K Random Read, numjobs=1
>
>                                    Average Throughput    Average IOPS
>
> Kernel 4.2.8 Patch             1166.4KB/s                  291
>
>
>
> Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))
>
> a.  4K Random Read, numjobs=64
>
>                                    Average Throughput    Average IOPS
>
> Kernel 4.2.8 Patch              2946.4KB/s                 736
>
>
> b.  4K Random Read, numjobs=1
>
>                                    Average Throughput    Average IOPS
>
> Kernel 4.2.8 Patch              119199 B/s                   28
>
>
> Although the performance that compare to the original Kernel 4.2.8
> test results is increased, the patch
> (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ac8fa4196d205ac8fff3f8932bddbad4f16e4110)
> rollback still has the best performance. I also observe the sync speed
> at numjobs=64 almost drop to the sync_speed_min, but sync speed at
> numjobs=1 almost keep in the original speed.
>
>>From my test results, I think this patch isn't sufficient that maybe
> Neil can explore further and give me some advice.
>
>
> Thanks,
> Chien Lee
>
>
>>>
>>>
>>> Following is our test environment and some testing results:
>>>
>>>
>>> OS: CentOS Linux release 7.1.1503 (Core)
>>>
>>> CPU: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz
>>>
>>> Processor number: 8
>>>
>>> Memory: 12GB
>>>
>>> fio command:
>>>
>>> 1.      (for numjobs=64):
>>>
>>> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
>>> --runtime=180 --size=50G --name=test-read --ioengine=libaio
>>> --numjobs=64 --iodepth=1 --group_reporting
>>>
>>> 2.      (for numjobs=1):
>>>
>>> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
>>> --runtime=180 --size=50G --name=test-read --ioengine=libaio
>>> --numjobs=1 --iodepth=1 --group_reporting
>>>
>>>
>>>
>>> Here are test results:
>>>
>>>
>>> Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))
>>>
>>>
>>> a.      4K Random Read, numjobs=64
>>>
>>>                                              Average Throughput    Average IOPS
>>>
>>> Kernel 3.19.8                                 715937KB/s              178984
>>>
>>> Kernel 4.2.8                                   489874KB/s              122462
>>>
>>> Kernel 4.2.8 Patch Rollback            717377KB/s              179344
>>>
>>>
>>>
>>> b.      4K Random Read, numjobs=1
>>>
>>>                                              Average Throughput    Average IOPS
>>>
>>> Kernel 3.19.8                                 32203KB/s                8051
>>>
>>> Kernel 4.2.8                                  2535.7KB/s                633
>>>
>>> Kernel 4.2.8 Patch Rollback            31861KB/s                7965
>>>
>>>
>>>
>>>
>>> Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))
>>>
>>>
>>> a.      4K Random Read, numjobs=64
>>>
>>>                                              Average Throughput    Average IOPS
>>>
>>> Kernel 3.19.8                                2976.6KB/s               744
>>>
>>> Kernel 4.2.8                                  2915.8KB/s               728
>>>
>>> Kernel 4.2.8 Patch Rollback           2973.3KB/s               743
>>>
>>>
>>>
>>> b.      4K Random Read, numjobs=1
>>>
>>>                                              Average Throughput    Average IOPS
>>>
>>> Kernel 3.19.8                                481844 B/s                 117
>>>
>>> Kernel 4.2.8                                   24718 B/s                   5
>>>
>>> Kernel 4.2.8 Patch Rollback           460090 B/s                 112
>>>
>>>
>>>
>>> Thanks,
>>>
>>> --
>>>
>>> Chien Lee

Thanks for testing.

I'd like to suggest that these results are fairly reasonable for the
numjobs=64 case.  Certainly read-speed is reduced by presumably resync
speed is increased.
The numbers for numjob=1 are appalling though.  That would generally
affect any synchronous load.  As the synchronous load doesn't interfere
much with the resync load, the delays that are inserted won't be very
long.

I feel there must be an answer here -  I just cannot find it.
I'd like to be able to dynamically estimate the bandwidth of the array
and use (say) 10% of that, but I cannot think of a way to do that at all
reliably.

I'll ponder it a bit longer.  We may need to ultimately revert that
patch, but not yet.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
  2016-01-28  3:10     ` NeilBrown
@ 2016-01-28  4:42       ` Chien Lee
  2016-01-28  9:58       ` Joshua Kinard
  2016-01-28 20:56       ` Shaohua Li
  2 siblings, 0 replies; 12+ messages in thread
From: Chien Lee @ 2016-01-28  4:42 UTC (permalink / raw)
  To: NeilBrown, linux-raid, shli, owner-linux-raid

>
> Thanks for testing.
>
> I'd like to suggest that these results are fairly reasonable for the
> numjobs=64 case.  Certainly read-speed is reduced by presumably resync
> speed is increased.
> The numbers for numjob=1 are appalling though.  That would generally
> affect any synchronous load.  As the synchronous load doesn't interfere
> much with the resync load, the delays that are inserted won't be very
> long.
>
> I feel there must be an answer here -  I just cannot find it.
> I'd like to be able to dynamically estimate the bandwidth of the array
> and use (say) 10% of that, but I cannot think of a way to do that at all
> reliably.
>
> I'll ponder it a bit longer.  We may need to ultimately revert that
> patch, but not yet.
>
> Thanks,
> NeilBrown

Hello Neil,

This issue about competing between sync IO and non-sync IO is a
trade-off question
and need to consider deeply and widely. Thanks for your advice.

If any progress about this, please let me know first. I'm happy that I
can provide
my test results for community.

Thanks again,
Chien Lee

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
  2016-01-28  3:10     ` NeilBrown
  2016-01-28  4:42       ` Chien Lee
@ 2016-01-28  9:58       ` Joshua Kinard
  2016-01-28 20:56       ` Shaohua Li
  2 siblings, 0 replies; 12+ messages in thread
From: Joshua Kinard @ 2016-01-28  9:58 UTC (permalink / raw)
  To: NeilBrown, Chien Lee, linux-raid, shli, owner-linux-raid

On 01/27/2016 22:10, NeilBrown wrote:
> On Wed, Jan 27 2016, Chien Lee wrote:
> 
>> 2016-01-27 6:12 GMT+08:00 NeilBrown <neilb@suse.com>:
>>> On Tue, Jan 26 2016, Chien Lee wrote:
>>>
>>>> Hello,
>>>>
>>>> Recently we find a bug about this patch (commit No. is
>>>> ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ).
>>>>
>>>> We know that this patch committed after Linux kernel 4.1.x is intended
>>>> to allowing resync to go faster when there is competing IO. However,
>>>> we find the performance of random read on syncing Raid6 will come up
>>>> with a huge drop in this case. The following is our testing detail.
>>>>
>>>> The OS what we choose in our test is CentOS Linux release 7.1.1503
>>>> (Core) and the kernel image will be replaced for testing. In our
>>>> testing result, the 4K random read performance on syncing raid6 in
>>>> Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out
>>>> the root cause, we try to rollback this patch in Kernel 4.2.8, and we
>>>> find the 4K random read performance on syncing Raid6 will be improved
>>>> and go back to as what it should be in Kernel 3.19.8.
>>>>
>>>> Nevertheless, it seems that it will not affect some other read/write
>>>> patterns. In our testing result, the 1M sequential read/write, 4K
>>>> random write performance in Kernel 4.2.8 is performed almost the same
>>>> as in Kernel 3.19.8.
>>>>
>>>> It seems that although this patch increases the resync speed, the
>>>> logic of !is_mddev_idle() cause the sync request wait too short and
>>>> reduce the chance for raid5d to handle the random read I/O.
>>>
>>> This has been raised before.
>>> Can you please try the patch at the end of
>>>
>>>   http://permalink.gmane.org/gmane.linux.raid/51002
>>>
>>> and let me know if it makes any difference.  If it isn't sufficient I
>>> will explore further.
>>>
>>> Thanks,
>>> NeilBrown
>>
>>
>> Hello Neil,
>>
>> I try the patch (http://permalink.gmane.org/gmane.linux.raid/51002) in
>> Kernel 4.2.8. Here are the test results:
>>
>>
>> Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))
>>
>> a.  4K Random Read, numjobs=64
>>
>>                                    Average Throughput    Average IOPS
>>
>> Kernel 4.2.8 Patch             601249KB/s              150312
>>
>>
>> b.  4K Random Read, numjobs=1
>>
>>                                    Average Throughput    Average IOPS
>>
>> Kernel 4.2.8 Patch             1166.4KB/s                  291
>>
>>
>>
>> Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))
>>
>> a.  4K Random Read, numjobs=64
>>
>>                                    Average Throughput    Average IOPS
>>
>> Kernel 4.2.8 Patch              2946.4KB/s                 736
>>
>>
>> b.  4K Random Read, numjobs=1
>>
>>                                    Average Throughput    Average IOPS
>>
>> Kernel 4.2.8 Patch              119199 B/s                   28
>>
>>
>> Although the performance that compare to the original Kernel 4.2.8
>> test results is increased, the patch
>> (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ac8fa4196d205ac8fff3f8932bddbad4f16e4110)
>> rollback still has the best performance. I also observe the sync speed
>> at numjobs=64 almost drop to the sync_speed_min, but sync speed at
>> numjobs=1 almost keep in the original speed.
>>
>> >From my test results, I think this patch isn't sufficient that maybe
>> Neil can explore further and give me some advice.
>>
>>
>> Thanks,
>> Chien Lee
>>
>>
>>>>
>>>>
>>>> Following is our test environment and some testing results:
>>>>
>>>>
>>>> OS: CentOS Linux release 7.1.1503 (Core)
>>>>
>>>> CPU: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz
>>>>
>>>> Processor number: 8
>>>>
>>>> Memory: 12GB
>>>>
>>>> fio command:
>>>>
>>>> 1.      (for numjobs=64):
>>>>
>>>> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
>>>> --runtime=180 --size=50G --name=test-read --ioengine=libaio
>>>> --numjobs=64 --iodepth=1 --group_reporting
>>>>
>>>> 2.      (for numjobs=1):
>>>>
>>>> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
>>>> --runtime=180 --size=50G --name=test-read --ioengine=libaio
>>>> --numjobs=1 --iodepth=1 --group_reporting
>>>>
>>>>
>>>>
>>>> Here are test results:
>>>>
>>>>
>>>> Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))
>>>>
>>>>
>>>> a.      4K Random Read, numjobs=64
>>>>
>>>>                                              Average Throughput    Average IOPS
>>>>
>>>> Kernel 3.19.8                                 715937KB/s              178984
>>>>
>>>> Kernel 4.2.8                                   489874KB/s              122462
>>>>
>>>> Kernel 4.2.8 Patch Rollback            717377KB/s              179344
>>>>
>>>>
>>>>
>>>> b.      4K Random Read, numjobs=1
>>>>
>>>>                                              Average Throughput    Average IOPS
>>>>
>>>> Kernel 3.19.8                                 32203KB/s                8051
>>>>
>>>> Kernel 4.2.8                                  2535.7KB/s                633
>>>>
>>>> Kernel 4.2.8 Patch Rollback            31861KB/s                7965
>>>>
>>>>
>>>>
>>>>
>>>> Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))
>>>>
>>>>
>>>> a.      4K Random Read, numjobs=64
>>>>
>>>>                                              Average Throughput    Average IOPS
>>>>
>>>> Kernel 3.19.8                                2976.6KB/s               744
>>>>
>>>> Kernel 4.2.8                                  2915.8KB/s               728
>>>>
>>>> Kernel 4.2.8 Patch Rollback           2973.3KB/s               743
>>>>
>>>>
>>>>
>>>> b.      4K Random Read, numjobs=1
>>>>
>>>>                                              Average Throughput    Average IOPS
>>>>
>>>> Kernel 3.19.8                                481844 B/s                 117
>>>>
>>>> Kernel 4.2.8                                   24718 B/s                   5
>>>>
>>>> Kernel 4.2.8 Patch Rollback           460090 B/s                 112
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>>
>>>> Chien Lee
> 
> Thanks for testing.
> 
> I'd like to suggest that these results are fairly reasonable for the
> numjobs=64 case.  Certainly read-speed is reduced by presumably resync
> speed is increased.
> The numbers for numjob=1 are appalling though.  That would generally
> affect any synchronous load.  As the synchronous load doesn't interfere
> much with the resync load, the delays that are inserted won't be very
> long.
> 
> I feel there must be an answer here -  I just cannot find it.
> I'd like to be able to dynamically estimate the bandwidth of the array
> and use (say) 10% of that, but I cannot think of a way to do that at all
> reliably.
> 
> I'll ponder it a bit longer.  We may need to ultimately revert that
> patch, but not yet.
> 
> Thanks,
> NeilBrown
> 

So I was one of the original reporters who noticed the problem on some old SGI
hardware that uses a QL1040B chipset.  Per hdparm -tT, the upper-end of the
speed to an MD device on this machine (an SGI Octane) is ~18.5MB/s.

I've been testing other kernel changes on this system, and finally managed to
scramble one of the disks enough that MD kicked off a resync on my largest
partition when booting and slowed the userland bringup down.  But, I also
recently enabled the bitmaps feature, and while it took about ~20mins to boot
to runlevel 3, by the time it got there, the resync had completed.  Usually, if
MD forces a resync, it'd resync that entire partition, which usually took 2+ hours.

So, a win for bitmaps, but the resync issue does need to be dealt with at some
point.  I suspect I noticed it first because this isn't exactly fast hardware
for this day and age (dual 600MHz CPUs), and the modified resync algorithm is
more aggressive in grabbing resources to complete its job (which I don't blame
it, you're skating on thin ice during the small resync window).

As far as a solution, can MD, when it needs to resync, run a test similar to
hdparm to check the speed to one of the member disks and use that value as a
basis to calculate the I/O it needs?  I.e., if it can determine that the upper
bound is ~18.5MB/s, it can then work out how much to use when the system is
idle and when it's not idle?

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
  2016-01-28  3:10     ` NeilBrown
  2016-01-28  4:42       ` Chien Lee
  2016-01-28  9:58       ` Joshua Kinard
@ 2016-01-28 20:56       ` Shaohua Li
  2 siblings, 0 replies; 12+ messages in thread
From: Shaohua Li @ 2016-01-28 20:56 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chien Lee, linux-raid, owner-linux-raid

On Thu, Jan 28, 2016 at 02:10:38PM +1100, Neil Brown wrote:
> On Wed, Jan 27 2016, Chien Lee wrote:
> 
> > 2016-01-27 6:12 GMT+08:00 NeilBrown <neilb@suse.com>:
> >> On Tue, Jan 26 2016, Chien Lee wrote:
> >>
> >>> Hello,
> >>>
> >>> Recently we find a bug about this patch (commit No. is
> >>> ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ).
> >>>
> >>> We know that this patch committed after Linux kernel 4.1.x is intended
> >>> to allowing resync to go faster when there is competing IO. However,
> >>> we find the performance of random read on syncing Raid6 will come up
> >>> with a huge drop in this case. The following is our testing detail.
> >>>
> >>> The OS what we choose in our test is CentOS Linux release 7.1.1503
> >>> (Core) and the kernel image will be replaced for testing. In our
> >>> testing result, the 4K random read performance on syncing raid6 in
> >>> Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out
> >>> the root cause, we try to rollback this patch in Kernel 4.2.8, and we
> >>> find the 4K random read performance on syncing Raid6 will be improved
> >>> and go back to as what it should be in Kernel 3.19.8.
> >>>
> >>> Nevertheless, it seems that it will not affect some other read/write
> >>> patterns. In our testing result, the 1M sequential read/write, 4K
> >>> random write performance in Kernel 4.2.8 is performed almost the same
> >>> as in Kernel 3.19.8.
> >>>
> >>> It seems that although this patch increases the resync speed, the
> >>> logic of !is_mddev_idle() cause the sync request wait too short and
> >>> reduce the chance for raid5d to handle the random read I/O.
> >>
> >> This has been raised before.
> >> Can you please try the patch at the end of
> >>
> >>   http://permalink.gmane.org/gmane.linux.raid/51002
> >>
> >> and let me know if it makes any difference.  If it isn't sufficient I
> >> will explore further.
> >>
> >> Thanks,
> >> NeilBrown
> >
> >
> > Hello Neil,
> >
> > I try the patch (http://permalink.gmane.org/gmane.linux.raid/51002) in
> > Kernel 4.2.8. Here are the test results:
> >
> >
> > Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))
> >
> > a.  4K Random Read, numjobs=64
> >
> >                                    Average Throughput    Average IOPS
> >
> > Kernel 4.2.8 Patch             601249KB/s              150312
> >
> >
> > b.  4K Random Read, numjobs=1
> >
> >                                    Average Throughput    Average IOPS
> >
> > Kernel 4.2.8 Patch             1166.4KB/s                  291
> >
> >
> >
> > Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))
> >
> > a.  4K Random Read, numjobs=64
> >
> >                                    Average Throughput    Average IOPS
> >
> > Kernel 4.2.8 Patch              2946.4KB/s                 736
> >
> >
> > b.  4K Random Read, numjobs=1
> >
> >                                    Average Throughput    Average IOPS
> >
> > Kernel 4.2.8 Patch              119199 B/s                   28
> >
> >
> > Although the performance that compare to the original Kernel 4.2.8
> > test results is increased, the patch
> > (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ac8fa4196d205ac8fff3f8932bddbad4f16e4110)
> > rollback still has the best performance. I also observe the sync speed
> > at numjobs=64 almost drop to the sync_speed_min, but sync speed at
> > numjobs=1 almost keep in the original speed.
> >
> >>From my test results, I think this patch isn't sufficient that maybe
> > Neil can explore further and give me some advice.
> >
> >
> > Thanks,
> > Chien Lee
> >
> >
> >>>
> >>>
> >>> Following is our test environment and some testing results:
> >>>
> >>>
> >>> OS: CentOS Linux release 7.1.1503 (Core)
> >>>
> >>> CPU: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz
> >>>
> >>> Processor number: 8
> >>>
> >>> Memory: 12GB
> >>>
> >>> fio command:
> >>>
> >>> 1.      (for numjobs=64):
> >>>
> >>> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
> >>> --runtime=180 --size=50G --name=test-read --ioengine=libaio
> >>> --numjobs=64 --iodepth=1 --group_reporting
> >>>
> >>> 2.      (for numjobs=1):
> >>>
> >>> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
> >>> --runtime=180 --size=50G --name=test-read --ioengine=libaio
> >>> --numjobs=1 --iodepth=1 --group_reporting
> >>>
> >>>
> >>>
> >>> Here are test results:
> >>>
> >>>
> >>> Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))
> >>>
> >>>
> >>> a.      4K Random Read, numjobs=64
> >>>
> >>>                                              Average Throughput    Average IOPS
> >>>
> >>> Kernel 3.19.8                                 715937KB/s              178984
> >>>
> >>> Kernel 4.2.8                                   489874KB/s              122462
> >>>
> >>> Kernel 4.2.8 Patch Rollback            717377KB/s              179344
> >>>
> >>>
> >>>
> >>> b.      4K Random Read, numjobs=1
> >>>
> >>>                                              Average Throughput    Average IOPS
> >>>
> >>> Kernel 3.19.8                                 32203KB/s                8051
> >>>
> >>> Kernel 4.2.8                                  2535.7KB/s                633
> >>>
> >>> Kernel 4.2.8 Patch Rollback            31861KB/s                7965
> >>>
> >>>
> >>>
> >>>
> >>> Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))
> >>>
> >>>
> >>> a.      4K Random Read, numjobs=64
> >>>
> >>>                                              Average Throughput    Average IOPS
> >>>
> >>> Kernel 3.19.8                                2976.6KB/s               744
> >>>
> >>> Kernel 4.2.8                                  2915.8KB/s               728
> >>>
> >>> Kernel 4.2.8 Patch Rollback           2973.3KB/s               743
> >>>
> >>>
> >>>
> >>> b.      4K Random Read, numjobs=1
> >>>
> >>>                                              Average Throughput    Average IOPS
> >>>
> >>> Kernel 3.19.8                                481844 B/s                 117
> >>>
> >>> Kernel 4.2.8                                   24718 B/s                   5
> >>>
> >>> Kernel 4.2.8 Patch Rollback           460090 B/s                 112
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> --
> >>>
> >>> Chien Lee
> 
> Thanks for testing.
> 
> I'd like to suggest that these results are fairly reasonable for the
> numjobs=64 case.  Certainly read-speed is reduced by presumably resync
> speed is increased.
> The numbers for numjob=1 are appalling though.  That would generally
> affect any synchronous load.  As the synchronous load doesn't interfere
> much with the resync load, the delays that are inserted won't be very
> long.
> 
> I feel there must be an answer here -  I just cannot find it.
> I'd like to be able to dynamically estimate the bandwidth of the array
> and use (say) 10% of that, but I cannot think of a way to do that at all
> reliably.

Had a hack, something like this?

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e55e6cf..7fee8e6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8060,12 +8060,34 @@ void md_do_sync(struct md_thread *thread)
 				goto repeat;
 			}
 			if (!is_mddev_idle(mddev, 0)) {
+				unsigned long start = jiffies;
+				int recov = atomic_read(&mddev->recovery_active);
+				int last_sect, new_sect;
+				int sleep_time = 0;
+
+				last_sect = (int)part_stat_read(&mddev->gendisk->part0, sectors[0]) +
+					(int)part_stat_read(&mddev->gendisk->part0, sectors[1]);
+
 				/*
 				 * Give other IO more of a chance.
 				 * The faster the devices, the less we wait.
 				 */
 				wait_event(mddev->recovery_wait,
 					   !atomic_read(&mddev->recovery_active));
+
+				new_sect = (int)part_stat_read(&mddev->gendisk->part0, sectors[0]) +
+					(int)part_stat_read(&mddev->gendisk->part0, sectors[1]);
+
+				if (recov * 10 > new_sect - last_sect)
+					sleep_time = 9 * (jiffies - start) /
+						((new_sect - last_sect) /
+						 (recov + 1) + 1);
+
+				sleep_time = jiffies_to_msecs(sleep_time);
+				if (sleep_time > 500)
+					sleep_time = 500;
+
+				msleep(sleep_time);
 			}
 		}
 	}

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
@ 2015-02-19  6:04 NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2015-02-19  6:04 UTC (permalink / raw)
  To: linux RAID

[-- Attachment #1: Type: text/plain, Size: 3303 bytes --]

Hi all,
 as you probably know, when md is doing resync and notices other IO it
 throttles the resync to a configured "minimum", which defaults to
 1MB/sec/device.

 On a lot of modern devices, that is extremely slow.

 I don't want to change the default (not all drives are the same) so I
 wanted to come up with something that it a little bit dynamic.

 After a bit of pondering and a bit of trial and error, I have the following.
 It sometimes does what I want.  I don't think it is ever really bad.

 I'd appreciate it if people could test it on different hardware, different
 configs, different loads.

 What I have been doing is running
  while :; do cat /sys/block/md0/md/sync_speed; sleep 5; 
  done > /root/some-file

 while a resync is happening and a load is being imposed.

 I do this with the old kernel and with this patch applied, then use
 gnuplot to look at the sync_speed graphs.

 I'd like to see that the new code is never slower than the old, and rarely more
 than 20% of the available throughput when there is significant load.

 Any test results or other observations most welcome,

Thanks,
NeilBrown

When md notices non-sync IO happening while it is trying
to resync (or reshape or recover) it slows down to the
set minimum.

The default minimum might have made sense many years ago
but the drives have become faster.  Changing the default
to match the times isn't really a long term solution.

This patch changes the code so that instead of waiting until the speed
has dropped to the target, it just waits until pending requests
have completed, and then waits about as long again.
This means that the delay inserted is a function of the speed
of the devices.

Test show that:
 - for some loads, the resync speed is unchanged.  For those loads
   increasing the minimum doesn't change the speed either.
   So this is a good result.  To increase resync speed under such
   loads we would probably need to increase the resync window
   size.

 - for other loads, resync speed does increase to a reasonable
   fraction (e.g. 20%) of maximum possible, and throughput of
   the load only drops a little bit (e.g. 10%)

 - for other loads, throughput of the non-sync load drops quite a bit
   more.  These seem to be latency-sensitive loads.

So it isn't a perfect solution, but it is mostly an improvement.

Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 94741ee6ae69..ce6624b3cc1b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7669,11 +7669,20 @@ void md_do_sync(struct md_thread *thread)
 			/((jiffies-mddev->resync_mark)/HZ +1) +1;

 		if (currspeed > speed_min(mddev)) {
-			if ((currspeed > speed_max(mddev)) ||
-					!is_mddev_idle(mddev, 0)) {
+			if (currspeed > speed_max(mddev)) {
 				msleep(500);
 				goto repeat;
 			}
+			if (!is_mddev_idle(mddev, 0)) {
+				/*
+				 * Give other IO more of a chance.
+				 * The faster the devices, the less we wait.
+				 */
+				unsigned long start = jiffies;
+				wait_event(mddev->recovery_wait,
+					   !atomic_read(&mddev->recovery_active));
+				schedule_timeout_uninterruptible(jiffies-start);
+			}
 		}
 	}
 	printk(KERN_INFO "md: %s: %s %s.\n",mdname(mddev), desc,

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-01-28 20:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-26  2:32 [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO Chien Lee
2016-01-26 22:12 ` NeilBrown
2016-01-26 22:52   ` Shaohua Li
2016-01-26 23:08     ` NeilBrown
2016-01-26 23:27       ` Shaohua Li
2016-01-27  1:12         ` NeilBrown
2016-01-27  9:49   ` Chien Lee
2016-01-28  3:10     ` NeilBrown
2016-01-28  4:42       ` Chien Lee
2016-01-28  9:58       ` Joshua Kinard
2016-01-28 20:56       ` Shaohua Li
  -- strict thread matches above, loose matches on Subject: below --
2015-02-19  6:04 NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.