All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org, axboe@kernel.dk
Subject: Re: [patch 3/3 v3] raid1: prevent merging too large request
Date: Wed, 4 Jul 2012 15:59:07 +1000	[thread overview]
Message-ID: <20120704155907.560607d4@notabene.brown> (raw)
In-Reply-To: <20120702011037.466764171@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 4333 bytes --]

On Mon, 02 Jul 2012 09:08:43 +0800 Shaohua Li <shli@kernel.org> wrote:

> For SSD, if request size exceeds specific value (optimal io size), request size
> isn't important for bandwidth. In such condition, if making request size bigger
> will cause some disks idle, the total throughput will actually drop. A good
> example is doing a readahead in a two-disk raid1 setup.
> 
> So when we should split big request? We absolutly don't want to split big
> request to very small requests. Even in SSD, big request transfer is more
> efficient. Below patch only consider request with size above optimal io size.
> 
> If all disks are busy, is it worthy to do split? Say optimal io size is 16k,
> two requests 32k and two disks. We can let each disk run one 32k request, or
> split the requests to 4 16k requests and each disk runs two. It's hard to say
> which case is better, depending on hardware.
> 
> So only consider case where there are idle disks. For readahead, split is
> always better in this case. And in my test, below patch can improve > 30%
> thoughput. Hmm, not 100%, because disk isn't 100% busy.
> 
> Such case can happen not just in readahead, for example, in directio. But I
> suppose directio usually will have bigger IO depth and make all disks busy, so
> I ignored it.
> 
> Note: if the raid uses any hard disk, we don't prevent merging. That will make
> performace worse.
> 
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> ---
>  drivers/md/raid1.c |   46 ++++++++++++++++++++++++++++++++++++++--------
>  drivers/md/raid1.h |    1 +
>  2 files changed, 39 insertions(+), 8 deletions(-)
> 
> Index: linux/drivers/md/raid1.c
> ===================================================================
> --- linux.orig/drivers/md/raid1.c	2012-06-29 15:17:09.160691540 +0800
> +++ linux/drivers/md/raid1.c	2012-06-29 15:57:54.993943576 +0800
> @@ -489,6 +489,7 @@ static int read_balance(struct r1conf *c
>  	unsigned int min_pending;
>  	struct md_rdev *rdev;
>  	int choose_first;
> +	int choose_idle;
>  
>  	rcu_read_lock();
>  	/*
> @@ -502,6 +503,7 @@ static int read_balance(struct r1conf *c
>  	best_dist = MaxSector;
>  	min_pending = -1;
>  	best_good_sectors = 0;
> +	choose_idle = 0;
>  
>  	if (conf->mddev->recovery_cp < MaxSector &&
>  	    (this_sector + sectors >= conf->next_resync))
> @@ -580,16 +582,35 @@ static int read_balance(struct r1conf *c
>  		nonrot = blk_queue_nonrot(bdev_get_queue(rdev->bdev));
>  		pending = atomic_read(&rdev->nr_pending);
>  		dist = abs(this_sector - conf->mirrors[disk].head_position);
> -		if (choose_first
> -		    /* Don't change to another disk for sequential reads */
> -		    || conf->mirrors[disk].next_seq_sect == this_sector
> -		    || dist == 0
> -		    /* If device is idle, use it */
> -		    || pending == 0) {
> -			best_disk = disk;
> -			break;
> +		if (choose_first)
> +			goto done;
> +		/* Don't change to another disk for sequential reads */
> +		if (conf->mirrors[disk].next_seq_sect == this_sector
> +		    || dist == 0) {
> +			int opt_iosize = bdev_io_opt(rdev->bdev) >> 9;
> +			struct mirror_info *mirror = &conf->mirrors[disk];
> +
> +			/*
> +			 * If buffered sequential IO size exceeds optimal
> +			 * iosize and there is idle disk, choose idle disk
> +			 */
> +			if (nonrot && opt_iosize > 0 && choose_idle == 0 &&
> +			    mirror->seq_start != MaxSector &&
> +			    mirror->next_seq_sect > opt_iosize &&
> +			    mirror->next_seq_sect - opt_iosize >=
> +			    mirror->seq_start) {
> +				choose_idle = 1;
> +				best_disk = disk;
> +				continue;
> +			}
> +			goto done;
>  		}
> +		/* If device is idle, use it */
> +		if (pending == 0 && (!choose_idle || nonrot))
> +			goto done;

That doesn't make sense to me.
Ignoring the nonrot bit, it says:
   if (this device is idle, and we haven't decided to choose an idle device)
        then choose this device

which is wrong.

Also, what about the case where an idle device is found before we we find a
device that the sequentially-previous read was from.  Will that still do the
right thing?  It isn't clear.


And I don't really like the "goto done".  Just have
        best_disk = disk;
        break;

it isn't that much more code.


Thanks,
NeilBrown


>  

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-07-04  5:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-02  1:08 [patch 0/3 v3] Optimize raid1 read balance for SSD Shaohua Li
2012-07-02  1:08 ` [patch 1/3 v3] raid1: make sequential read detection per disk based Shaohua Li
2012-07-04  5:38   ` NeilBrown
2012-07-02  1:08 ` [patch 2/3 v3] raid1: read balance chooses idlest disk for SSD Shaohua Li
2012-07-02  2:13   ` Roberto Spadim
2012-07-02  3:02     ` Shaohua Li
2012-07-02  3:57       ` Roberto Spadim
2012-07-02  4:33         ` Roberto Spadim
2012-07-02  4:31       ` Roberto Spadim
2012-07-02  4:36         ` Roberto Spadim
2012-07-04  5:45   ` NeilBrown
2012-07-02  1:08 ` [patch 3/3 v3] raid1: prevent merging too large request Shaohua Li
2012-07-04  5:59   ` NeilBrown [this message]
2012-07-04  8:01     ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120704155907.560607d4@notabene.brown \
    --to=neilb@suse.de \
    --cc=axboe@kernel.dk \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.