On Wed, May 17 2017, Shaohua Li wrote:

> On Tue, May 16, 2017 at 10:46:13PM +0100, Nix wrote:
>> On 16 May 2017, NeilBrown spake thusly:
>> 
>> > Actually, I have another caveat.  I don't think we want these messages
>> > during initial resync, or any resync.  Only during a 'check' or
>> > 'repair'.
>> > So add a check for MD_RECOVERY_REQUESTED or maybe for
>> >   sh->sectors >= conf->mddev->recovery_cp
>> 
>> I completely agree, but it's already inside MD_RECOVERY_CHECK:
>> 
>> if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) {
>>         /* don't try to repair!! */
>>         set_bit(STRIPE_INSYNC, &sh->state);
>>         pr_warn_ratelimited("%s: mismatch sector in range "
>>                             "%llu-%llu\n", mdname(conf->mddev),
>>                             (unsigned long long) sh->sector,
>>                             (unsigned long long) sh->sector +
>>                             STRIPE_SECTORS);
>> } else {
>> 
>> Doesn't that already mean that someone has explicitly triggered a check
>> action?
>
>
> Hi,
> So the idea is: run 'check' and report mismatch, userspace (raid6check for
> example) uses the reported info to fix the mismatch. The pr_warn_ratelimited
> isn't a good way to communicate the info to userspace. I'm wondering why we
> don't just run raid6check solely, it can do the job like what kernel does and
> we avoid the crappy pr_warn_ratelimited.
>

raid6check is *much* slower than doing it in the kernel, as the
interlocking to avoid checking a stripe that is being written are
clumsy.... and async IO is harder in user space.

I think the warnings are useful as warnings quite apart from the
possibility of raid6check using them.
If we really wanted a seamless "fix the raid6 thing" (which I don't
think we do), we'd probably make the list of inconsistencies appear in a
sysfs file.  That would be less 'crappy'.  But as I say, I don't think
we really want to do that.

NeilBrown