All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: NeilBrown <neilb@suse.com>
Cc: Chien Lee <chienlee@qnap.com>,
	linux-raid@vger.kernel.org, owner-linux-raid@vger.kernel.org
Subject: Re: [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO.
Date: Thu, 28 Jan 2016 12:56:48 -0800	[thread overview]
Message-ID: <20160128205648.GA22191@kernel.org> (raw)
In-Reply-To: <87wpqu1jrl.fsf@notabene.neil.brown.name>

On Thu, Jan 28, 2016 at 02:10:38PM +1100, Neil Brown wrote:
> On Wed, Jan 27 2016, Chien Lee wrote:
> 
> > 2016-01-27 6:12 GMT+08:00 NeilBrown <neilb@suse.com>:
> >> On Tue, Jan 26 2016, Chien Lee wrote:
> >>
> >>> Hello,
> >>>
> >>> Recently we find a bug about this patch (commit No. is
> >>> ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ).
> >>>
> >>> We know that this patch committed after Linux kernel 4.1.x is intended
> >>> to allowing resync to go faster when there is competing IO. However,
> >>> we find the performance of random read on syncing Raid6 will come up
> >>> with a huge drop in this case. The following is our testing detail.
> >>>
> >>> The OS what we choose in our test is CentOS Linux release 7.1.1503
> >>> (Core) and the kernel image will be replaced for testing. In our
> >>> testing result, the 4K random read performance on syncing raid6 in
> >>> Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out
> >>> the root cause, we try to rollback this patch in Kernel 4.2.8, and we
> >>> find the 4K random read performance on syncing Raid6 will be improved
> >>> and go back to as what it should be in Kernel 3.19.8.
> >>>
> >>> Nevertheless, it seems that it will not affect some other read/write
> >>> patterns. In our testing result, the 1M sequential read/write, 4K
> >>> random write performance in Kernel 4.2.8 is performed almost the same
> >>> as in Kernel 3.19.8.
> >>>
> >>> It seems that although this patch increases the resync speed, the
> >>> logic of !is_mddev_idle() cause the sync request wait too short and
> >>> reduce the chance for raid5d to handle the random read I/O.
> >>
> >> This has been raised before.
> >> Can you please try the patch at the end of
> >>
> >>   http://permalink.gmane.org/gmane.linux.raid/51002
> >>
> >> and let me know if it makes any difference.  If it isn't sufficient I
> >> will explore further.
> >>
> >> Thanks,
> >> NeilBrown
> >
> >
> > Hello Neil,
> >
> > I try the patch (http://permalink.gmane.org/gmane.linux.raid/51002) in
> > Kernel 4.2.8. Here are the test results:
> >
> >
> > Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))
> >
> > a.  4K Random Read, numjobs=64
> >
> >                                    Average Throughput    Average IOPS
> >
> > Kernel 4.2.8 Patch             601249KB/s              150312
> >
> >
> > b.  4K Random Read, numjobs=1
> >
> >                                    Average Throughput    Average IOPS
> >
> > Kernel 4.2.8 Patch             1166.4KB/s                  291
> >
> >
> >
> > Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))
> >
> > a.  4K Random Read, numjobs=64
> >
> >                                    Average Throughput    Average IOPS
> >
> > Kernel 4.2.8 Patch              2946.4KB/s                 736
> >
> >
> > b.  4K Random Read, numjobs=1
> >
> >                                    Average Throughput    Average IOPS
> >
> > Kernel 4.2.8 Patch              119199 B/s                   28
> >
> >
> > Although the performance that compare to the original Kernel 4.2.8
> > test results is increased, the patch
> > (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ac8fa4196d205ac8fff3f8932bddbad4f16e4110)
> > rollback still has the best performance. I also observe the sync speed
> > at numjobs=64 almost drop to the sync_speed_min, but sync speed at
> > numjobs=1 almost keep in the original speed.
> >
> >>From my test results, I think this patch isn't sufficient that maybe
> > Neil can explore further and give me some advice.
> >
> >
> > Thanks,
> > Chien Lee
> >
> >
> >>>
> >>>
> >>> Following is our test environment and some testing results:
> >>>
> >>>
> >>> OS: CentOS Linux release 7.1.1503 (Core)
> >>>
> >>> CPU: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz
> >>>
> >>> Processor number: 8
> >>>
> >>> Memory: 12GB
> >>>
> >>> fio command:
> >>>
> >>> 1.      (for numjobs=64):
> >>>
> >>> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
> >>> --runtime=180 --size=50G --name=test-read --ioengine=libaio
> >>> --numjobs=64 --iodepth=1 --group_reporting
> >>>
> >>> 2.      (for numjobs=1):
> >>>
> >>> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K
> >>> --runtime=180 --size=50G --name=test-read --ioengine=libaio
> >>> --numjobs=1 --iodepth=1 --group_reporting
> >>>
> >>>
> >>>
> >>> Here are test results:
> >>>
> >>>
> >>> Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing))
> >>>
> >>>
> >>> a.      4K Random Read, numjobs=64
> >>>
> >>>                                              Average Throughput    Average IOPS
> >>>
> >>> Kernel 3.19.8                                 715937KB/s              178984
> >>>
> >>> Kernel 4.2.8                                   489874KB/s              122462
> >>>
> >>> Kernel 4.2.8 Patch Rollback            717377KB/s              179344
> >>>
> >>>
> >>>
> >>> b.      4K Random Read, numjobs=1
> >>>
> >>>                                              Average Throughput    Average IOPS
> >>>
> >>> Kernel 3.19.8                                 32203KB/s                8051
> >>>
> >>> Kernel 4.2.8                                  2535.7KB/s                633
> >>>
> >>> Kernel 4.2.8 Patch Rollback            31861KB/s                7965
> >>>
> >>>
> >>>
> >>>
> >>> Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing))
> >>>
> >>>
> >>> a.      4K Random Read, numjobs=64
> >>>
> >>>                                              Average Throughput    Average IOPS
> >>>
> >>> Kernel 3.19.8                                2976.6KB/s               744
> >>>
> >>> Kernel 4.2.8                                  2915.8KB/s               728
> >>>
> >>> Kernel 4.2.8 Patch Rollback           2973.3KB/s               743
> >>>
> >>>
> >>>
> >>> b.      4K Random Read, numjobs=1
> >>>
> >>>                                              Average Throughput    Average IOPS
> >>>
> >>> Kernel 3.19.8                                481844 B/s                 117
> >>>
> >>> Kernel 4.2.8                                   24718 B/s                   5
> >>>
> >>> Kernel 4.2.8 Patch Rollback           460090 B/s                 112
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> --
> >>>
> >>> Chien Lee
> 
> Thanks for testing.
> 
> I'd like to suggest that these results are fairly reasonable for the
> numjobs=64 case.  Certainly read-speed is reduced by presumably resync
> speed is increased.
> The numbers for numjob=1 are appalling though.  That would generally
> affect any synchronous load.  As the synchronous load doesn't interfere
> much with the resync load, the delays that are inserted won't be very
> long.
> 
> I feel there must be an answer here -  I just cannot find it.
> I'd like to be able to dynamically estimate the bandwidth of the array
> and use (say) 10% of that, but I cannot think of a way to do that at all
> reliably.

Had a hack, something like this?

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e55e6cf..7fee8e6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8060,12 +8060,34 @@ void md_do_sync(struct md_thread *thread)
 				goto repeat;
 			}
 			if (!is_mddev_idle(mddev, 0)) {
+				unsigned long start = jiffies;
+				int recov = atomic_read(&mddev->recovery_active);
+				int last_sect, new_sect;
+				int sleep_time = 0;
+
+				last_sect = (int)part_stat_read(&mddev->gendisk->part0, sectors[0]) +
+					(int)part_stat_read(&mddev->gendisk->part0, sectors[1]);
+
 				/*
 				 * Give other IO more of a chance.
 				 * The faster the devices, the less we wait.
 				 */
 				wait_event(mddev->recovery_wait,
 					   !atomic_read(&mddev->recovery_active));
+
+				new_sect = (int)part_stat_read(&mddev->gendisk->part0, sectors[0]) +
+					(int)part_stat_read(&mddev->gendisk->part0, sectors[1]);
+
+				if (recov * 10 > new_sect - last_sect)
+					sleep_time = 9 * (jiffies - start) /
+						((new_sect - last_sect) /
+						 (recov + 1) + 1);
+
+				sleep_time = jiffies_to_msecs(sleep_time);
+				if (sleep_time > 500)
+					sleep_time = 500;
+
+				msleep(sleep_time);
 			}
 		}
 	}

  parent reply	other threads:[~2016-01-28 20:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-26  2:32 [PATCH/RFC/RFT] md: allow resync to go faster when there is competing IO Chien Lee
2016-01-26 22:12 ` NeilBrown
2016-01-26 22:52   ` Shaohua Li
2016-01-26 23:08     ` NeilBrown
2016-01-26 23:27       ` Shaohua Li
2016-01-27  1:12         ` NeilBrown
2016-01-27  9:49   ` Chien Lee
2016-01-28  3:10     ` NeilBrown
2016-01-28  4:42       ` Chien Lee
2016-01-28  9:58       ` Joshua Kinard
2016-01-28 20:56       ` Shaohua Li [this message]
  -- strict thread matches above, loose matches on Subject: below --
2015-02-19  6:04 NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160128205648.GA22191@kernel.org \
    --to=shli@kernel.org \
    --cc=chienlee@qnap.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=owner-linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.