All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai1@huaweicloud.com>
To: Paul Menzel <pmenzel@molgen.mpg.de>, Yu Kuai <yukuai1@huaweicloud.com>
Cc: xni@redhat.com, paul.e.luse@linux.intel.com, song@kernel.org,
	neilb@suse.com, shli@fb.com, linux-raid@vger.kernel.org,
	linux-kernel@vger.kernel.org, yi.zhang@huawei.com,
	yangerkun@huawei.com, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH md-6.9 v4 03/11] md/raid1: record nonrot rdevs while adding/removing rdevs to conf
Date: Fri, 1 Mar 2024 09:59:50 +0800	[thread overview]
Message-ID: <a5d762b9-f8c9-d690-6616-028f507ba7f6@huaweicloud.com> (raw)
In-Reply-To: <7b030433-518e-4fe7-976c-3ffb5f7f1a85@molgen.mpg.de>

Hi,

在 2024/03/01 0:37, Paul Menzel 写道:
> Dear Yu,
> 
> 
> Thank you for your patch.
> 
> 
> Am 29.02.24 um 10:57 schrieb Yu Kuai:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> For raid1, each read will iterate all the rdevs from conf and check if
>> any rdev is non-rotational, then choose rdev with minimal IO inflight
>> if so, or rdev with closest distance otherwise.
>>
>> Disk nonrot info can be changed through sysfs entry:
>>
>> /sys/block/[disk_name]/queue/rotational
>>
>> However, consider that this should only be used for testing, and user
>> really shouldn't do this in real life. Record the number of 
>> non-rotational
>> disks in conf, to avoid checking each rdev in IO fast path and simplify
> 
> The comma is not needed.
> 
>> read_balance() a little bit.
> 
> Just to make sure, I understood correctly. Changing 
> `/sys/block/[disk_name]/queue/rotational` will now not be considered 
> anymore, right?

Yes, and I think this will case performance to be worse in real life.
> 
> For the summary, maybe you could also say “cache”. Maybe:
> 
> Cache attribute rotational while adding/removing rdevs to conf
> 
>> Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com>
>> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/md.h    |  1 +
>>   drivers/md/raid1.c | 17 ++++++++++-------
>>   drivers/md/raid1.h |  1 +
>>   3 files changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index a49ab04ab707..b2076a165c10 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -207,6 +207,7 @@ enum flag_bits {
>>                    * check if there is collision between raid1
>>                    * serial bios.
>>                    */
>> +    Nonrot,            /* non-rotational device (SSD) */
>>   };
>>   static inline int is_badblock(struct md_rdev *rdev, sector_t s, int 
>> sectors,
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 6ec9998f6257..de6ea87d4d24 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -599,7 +599,6 @@ static int read_balance(struct r1conf *conf, 
>> struct r1bio *r1_bio, int *max_sect
>>       int sectors;
>>       int best_good_sectors;
>>       int best_disk, best_dist_disk, best_pending_disk;
>> -    int has_nonrot_disk;
>>       int disk;
>>       sector_t best_dist;
>>       unsigned int min_pending;
>> @@ -620,7 +619,6 @@ static int read_balance(struct r1conf *conf, 
>> struct r1bio *r1_bio, int *max_sect
>>       best_pending_disk = -1;
>>       min_pending = UINT_MAX;
>>       best_good_sectors = 0;
>> -    has_nonrot_disk = 0;
>>       choose_next_idle = 0;
>>       clear_bit(R1BIO_FailFast, &r1_bio->state);
>> @@ -637,7 +635,6 @@ static int read_balance(struct r1conf *conf, 
>> struct r1bio *r1_bio, int *max_sect
>>           sector_t first_bad;
>>           int bad_sectors;
>>           unsigned int pending;
>> -        bool nonrot;
>>           rdev = conf->mirrors[disk].rdev;
>>           if (r1_bio->bios[disk] == IO_BLOCKED
>> @@ -703,8 +700,6 @@ static int read_balance(struct r1conf *conf, 
>> struct r1bio *r1_bio, int *max_sect
>>               /* At least two disks to choose from so failfast is OK */
>>               set_bit(R1BIO_FailFast, &r1_bio->state);
>> -        nonrot = bdev_nonrot(rdev->bdev);
>> -        has_nonrot_disk |= nonrot;
>>           pending = atomic_read(&rdev->nr_pending);
>>           dist = abs(this_sector - conf->mirrors[disk].head_position);
>>           if (choose_first) {
>> @@ -731,7 +726,7 @@ static int read_balance(struct r1conf *conf, 
>> struct r1bio *r1_bio, int *max_sect
>>                * small, but not a big deal since when the second disk
>>                * starts IO, the first disk is likely still busy.
>>                */
>> -            if (nonrot && opt_iosize > 0 &&
>> +            if (test_bit(Nonrot, &rdev->flags) && opt_iosize > 0 &&
>>                   mirror->seq_start != MaxSector &&
>>                   mirror->next_seq_sect > opt_iosize &&
>>                   mirror->next_seq_sect - opt_iosize >=
>> @@ -763,7 +758,7 @@ static int read_balance(struct r1conf *conf, 
>> struct r1bio *r1_bio, int *max_sect
>>        * mixed ratation/non-rotational disks depending on workload.
>>        */
>>       if (best_disk == -1) {
>> -        if (has_nonrot_disk || min_pending == 0)
>> +        if (READ_ONCE(conf->nonrot_disks) || min_pending == 0)
>>               best_disk = best_pending_disk;
>>           else
>>               best_disk = best_dist_disk;
>> @@ -1768,6 +1763,11 @@ static bool raid1_add_conf(struct r1conf *conf, 
>> struct md_rdev *rdev, int disk,
>>       if (info->rdev)
>>           return false;
>> +    if (bdev_nonrot(rdev->bdev)) {
>> +        set_bit(Nonrot, &rdev->flags);
>> +        WRITE_ONCE(conf->nonrot_disks, conf->nonrot_disks + 1);
>> +    }
>> +
>>       rdev->raid_disk = disk;
>>       info->head_position = 0;
>>       info->seq_start = MaxSector;
>> @@ -1791,6 +1791,9 @@ static bool raid1_remove_conf(struct r1conf 
>> *conf, int disk)
>>           rdev->mddev->degraded < conf->raid_disks)
>>           return false;
>> +    if (test_and_clear_bit(Nonrot, &rdev->flags))
>> +        WRITE_ONCE(conf->nonrot_disks, conf->nonrot_disks - 1);
>> +
>>       WRITE_ONCE(info->rdev, NULL);
>>       return true;
>>   }
>> diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
>> index 14d4211a123a..5300cbaa58a4 100644
>> --- a/drivers/md/raid1.h
>> +++ b/drivers/md/raid1.h
>> @@ -71,6 +71,7 @@ struct r1conf {
>>                            * allow for replacements.
>>                            */
>>       int            raid_disks;
>> +    int            nonrot_disks;
>>       spinlock_t        device_lock;
> 
> As you meant “fastpath” in the commit message, if I remember correctly, 
> this does not improve the performance in benchmarks, right?

Yest, this just safe some memory load command, this is to little to
affect performance benchmarks. Main ideal here is make read_balance()
cleaner.

Thanks,
Kuai

> 
> 
> Kind regards,
> 
> Paul
> .
> 


  reply	other threads:[~2024-03-01  1:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-29  9:57 [PATCH md-6.9 v4 00/11] md/raid1: refactor read_balance() and some minor fix Yu Kuai
2024-02-29  9:57 ` [PATCH md-6.9 v4 01/11] md: add a new helper rdev_has_badblock() Yu Kuai
2024-02-29  9:57 ` [PATCH md-6.9 v4 02/11] md/raid1: factor out helpers to add rdev to conf Yu Kuai
2024-02-29  9:57 ` [PATCH md-6.9 v4 03/11] md/raid1: record nonrot rdevs while adding/removing rdevs " Yu Kuai
2024-02-29 16:37   ` Paul Menzel
2024-03-01  1:59     ` Yu Kuai [this message]
2024-02-29  9:57 ` [PATCH md-6.9 v4 04/11] md/raid1: fix choose next idle in read_balance() Yu Kuai
2024-02-29  9:57 ` [PATCH md-6.9 v4 05/11] md/raid1-10: add a helper raid1_check_read_range() Yu Kuai
2024-02-29  9:57 ` [PATCH md-6.9 v4 06/11] md/raid1-10: factor out a new helper raid1_should_read_first() Yu Kuai
2024-02-29  9:57 ` [PATCH md-6.9 v4 07/11] md/raid1: factor out read_first_rdev() from read_balance() Yu Kuai
2024-02-29  9:57 ` [PATCH md-6.9 v4 08/11] md/raid1: factor out choose_slow_rdev() " Yu Kuai
2024-02-29  9:57 ` [PATCH md-6.9 v4 09/11] md/raid1: factor out choose_bb_rdev() " Yu Kuai
2024-02-29  9:57 ` [PATCH md-6.9 v4 10/11] md/raid1: factor out the code to manage sequential IO Yu Kuai
2024-02-29  9:57 ` [PATCH md-6.9 v4 11/11] md/raid1: factor out helpers to choose the best rdev from read_balance() Yu Kuai
2024-03-01  7:16 ` [PATCH md-6.9 v4 00/11] md/raid1: refactor read_balance() and some minor fix Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5d762b9-f8c9-d690-6616-028f507ba7f6@huaweicloud.com \
    --to=yukuai1@huaweicloud.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=paul.e.luse@linux.intel.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=shli@fb.com \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.