All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
@ 2016-01-26  2:32 Chien Lee
  2016-01-26 22:12 ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Chien Lee @ 2016-01-26  2:32 UTC (permalink / raw)
  To: linux-raid, neilb, shli, owner-linux-raid

Hello,

Recently we find a bug about this patch (commit No. is
ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ).

We know that this patch committed after Linux kernel 4.1.x is intended
to allowing resync to go faster when there is competing IO. However,
we find the performance of random read on syncing Raid6 will come up
with a huge drop in this case. The following is our testing detail.

The OS what we choose in our test is CentOS Linux release 7.1.1503
(Core) and the kernel image will be replaced for testing. In our
testing result, the 4K random read performance on syncing raid6 in
Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out
the root cause, we try to rollback this patch in Kernel 4.2.8, and we
find the 4K random read performance on syncing Raid6 will be improved
and go back to as what it should be in Kernel 3.19.8.

Nevertheless, it seems that it will not affect some other read/write
patterns. In our testing result, the 1M sequential read/write, 4K
random write performance in Kernel 4.2.8 is performed almost the same
as in Kernel 3.19.8.

It seems that although this patch increases the resync speed, the
logic of !is_mddev_idle() cause the sync request wait too short and
reduce the chance for raid5d to handle the random read I/O.


Following is our test environment and some testing results:


OS: CentOS Linux release 7.1.1503 (Core)

CPU: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz

Processor number: 8

Memory: 12GB

fio command:

1.      (for numjobs=64):

fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
--runtime=180 --size=50G --name=test-read --ioengine=libaio
--numjobs=64 --iodepth=1 --group_reporting

2.      (for numjobs=1):

fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
--runtime=180 --size=50G --name=test-read --ioengine=libaio
--numjobs=1 --iodepth=1 --group_reporting



Here are test results:


Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))


a.      4K Random Read, numjobs=64

                                             Average Throughput    Average IOPS

Kernel 3.19.8                                 715937KB/s              178984

Kernel 4.2.8                                   489874KB/s              122462

Kernel 4.2.8 Patch Rollback            717377KB/s              179344



b.      4K Random Read, numjobs=1

                                             Average Throughput    Average IOPS

Kernel 3.19.8                                 32203KB/s                8051

Kernel 4.2.8                                  2535.7KB/s                633

Kernel 4.2.8 Patch Rollback            31861KB/s                7965




Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))


a.      4K Random Read, numjobs=64

                                             Average Throughput    Average IOPS

Kernel 3.19.8                                2976.6KB/s               744

Kernel 4.2.8                                  2915.8KB/s               728

Kernel 4.2.8 Patch Rollback           2973.3KB/s               743



b.      4K Random Read, numjobs=1

                                             Average Throughput    Average IOPS

Kernel 3.19.8                                481844 B/s                 117

Kernel 4.2.8                                   24718 B/s                   5

Kernel 4.2.8 Patch Rollback           460090 B/s                 112



Thanks,

-- 

Chien Lee

^ permalink raw reply	[flat|nested] 12+ messages in thread
* [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
@ 2015-02-19  6:04 NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2015-02-19  6:04 UTC (permalink / raw)
  To: linux RAID

[-- Attachment #1: Type: text/plain, Size: 3303 bytes --]


Hi all,
 as you probably know, when md is doing resync and notices other IO it
 throttles the resync to a configured "minimum", which defaults to
 1MB/sec/device.

 On a lot of modern devices, that is extremely slow.

 I don't want to change the default (not all drives are the same) so I
 wanted to come up with something that it a little bit dynamic.

 After a bit of pondering and a bit of trial and error, I have the following.
 It sometimes does what I want.  I don't think it is ever really bad.

 I'd appreciate it if people could test it on different hardware, different
 configs, different loads.

 What I have been doing is running
  while :; do cat /sys/block/md0/md/sync_speed; sleep 5; 
  done > /root/some-file

 while a resync is happening and a load is being imposed.

 I do this with the old kernel and with this patch applied, then use
 gnuplot to look at the sync_speed graphs.

 I'd like to see that the new code is never slower than the old, and rarely more
 than 20% of the available throughput when there is significant load.

 Any test results or other observations most welcome,

Thanks,
NeilBrown



When md notices non-sync IO happening while it is trying
to resync (or reshape or recover) it slows down to the
set minimum.

The default minimum might have made sense many years ago
but the drives have become faster.  Changing the default
to match the times isn't really a long term solution.

This patch changes the code so that instead of waiting until the speed
has dropped to the target, it just waits until pending requests
have completed, and then waits about as long again.
This means that the delay inserted is a function of the speed
of the devices.

Test show that:
 - for some loads, the resync speed is unchanged.  For those loads
   increasing the minimum doesn't change the speed either.
   So this is a good result.  To increase resync speed under such
   loads we would probably need to increase the resync window
   size.

 - for other loads, resync speed does increase to a reasonable
   fraction (e.g. 20%) of maximum possible, and throughput of
   the load only drops a little bit (e.g. 10%)

 - for other loads, throughput of the non-sync load drops quite a bit
   more.  These seem to be latency-sensitive loads.

So it isn't a perfect solution, but it is mostly an improvement.

Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 94741ee6ae69..ce6624b3cc1b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7669,11 +7669,20 @@ void md_do_sync(struct md_thread *thread)
 			/((jiffies-mddev->resync_mark)/HZ +1) +1;
 
 		if (currspeed > speed_min(mddev)) {
-			if ((currspeed > speed_max(mddev)) ||
-					!is_mddev_idle(mddev, 0)) {
+			if (currspeed > speed_max(mddev)) {
 				msleep(500);
 				goto repeat;
 			}
+			if (!is_mddev_idle(mddev, 0)) {
+				/*
+				 * Give other IO more of a chance.
+				 * The faster the devices, the less we wait.
+				 */
+				unsigned long start = jiffies;
+				wait_event(mddev->recovery_wait,
+					   !atomic_read(&mddev->recovery_active));
+				schedule_timeout_uninterruptible(jiffies-start);
+			}
 		}
 	}
 	printk(KERN_INFO "md: %s: %s %s.\n",mdname(mddev), desc,

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-01-28 20:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-26  2:32 [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO Chien Lee
2016-01-26 22:12 ` NeilBrown
2016-01-26 22:52   ` Shaohua Li
2016-01-26 23:08     ` NeilBrown
2016-01-26 23:27       ` Shaohua Li
2016-01-27  1:12         ` NeilBrown
2016-01-27  9:49   ` Chien Lee
2016-01-28  3:10     ` NeilBrown
2016-01-28  4:42       ` Chien Lee
2016-01-28  9:58       ` Joshua Kinard
2016-01-28 20:56       ` Shaohua Li
  -- strict thread matches above, loose matches on Subject: below --
2015-02-19  6:04 NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.