All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Guilherme G. Piccoli" <kernel@gpiccoli.net>
To: Song Liu <liu.song.a23@gmail.com>
Cc: axboe@kernel.dk, linux-block@vger.kernel.org,
	kernel@gpiccoli.net,
	"Guilherme G. Piccoli" <gpiccoli@canonical.com>,
	NeilBrown <neilb@suse.com>,
	linux-raid <linux-raid@vger.kernel.org>,
	dm-devel@redhat.com,
	Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
	Jay Vosburgh <jay.vosburgh@canonical.com>,
	gavin.guo@canonical.com
Subject: Re: [RFC] [PATCH V2 0/1] Introduce emergency raid0 stop for mounted arrays
Date: Wed, 1 May 2019 15:00:27 -0300	[thread overview]
Message-ID: <2823f928-d0b6-9049-73ab-b2ce0ef5da83@gpiccoli.net> (raw)
In-Reply-To: <CAPhsuW65EW8JgjE8zknPQPXYcmDhX9LEhTKGb0KHywqKuZkUcA@mail.gmail.com>

 > On 5/1/19 12:33 PM, Song Liu wrote:
>> [...]
>> Indeed, fsync returns -1 in this case.
>> Interestingly, when I do a "dd if=<some_file> of=<raid0_mount>" and try
>> to "sync -f <some_file>" and "sync", it succeeds and the file is
>> written, although corrupted.
> 
> I guess this is some issue with sync command, but I haven't got time
> to look into it. How about running dd with oflag=sync or oflag=direct?
> 

Hi Song, could be some problem with sync command; using either 
'oflag=direct' or 'oflag=sync' fails the dd command instantly when a 
member is removed.


>> Do you think this behavior is correct? In other devices, like a pure
>> SCSI disk or NVMe, the 'dd' write fails.
>> Also, what about the status of the raid0 array in mdadm - it shows as
>> "clean" even after the member is removed, should we change that?
> 
> I guess this is because the kernel hasn't detect the array is gone? In
> that case, I think reducing the latency would be useful for some use
> cases.
> 

Exactly! This is the main concern here, mdadm cannot stop the array 
since it's mounted, and there's no filesystem API to quickly shutdown 
the filesystem, hence it keeps "alive" for too long after the failure.

For instance, if we have a raid0 with 2 members and remove the 1st, it 
fails much quicker than if we remove the 2nd; the filesystem will 
"realize" the device is flaw quickly if we remove the 1st member, and 
goes to RO mode. Specially, xfs seems even faster than ext4 in noticing 
the failure.

Do you have any suggestion on how could we reduce this latency? And how 
about the status exhibited by mdadm, shall it move from 'clean' to 
something more meaningful in the failure case?

Thanks again,


Guilherme

> Thanks,
> Song
> 
>>
>>
>>> Also, could you please highlight changes from V1 (if more than
>>> just rebase)?
>>
>> No changes other than rebase. Worth mentioning here that a kernel bot
>> (and Julia Lawall) found an issue in my patch; I forgot a
>> "mutex_lock(&mddev->open_mutex);" in line 6053, which caused the first
>> caveat (hung mdadm and persistent device in /dev). Thanks for pointing
>> this silly mistake from me! in case this patch gets some traction, I'll
>> re-submit with that fixed.
>>
>> Cheers,
>>
>>
>> Guilherme
>>
>> [0] https://marc.info/?l=linux-block&m=155666385707413
>>
>>>
>>> Thanks,
>>> Song
>>>

WARNING: multiple messages have this Message-ID (diff)
From: "Guilherme G. Piccoli" <kernel@gpiccoli.net>
To: Song Liu <liu.song.a23@gmail.com>
Cc: "Guilherme G. Piccoli" <gpiccoli@canonical.com>,
	axboe@kernel.dk, linux-raid <linux-raid@vger.kernel.org>,
	Jay Vosburgh <jay.vosburgh@canonical.com>,
	kernel@gpiccoli.net, NeilBrown <neilb@suse.com>,
	dm-devel@redhat.com,
	Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-block@vger.kernel.org, gavin.guo@canonical.com
Subject: Re: [RFC] [PATCH V2 0/1] Introduce emergency raid0 stop for mounted arrays
Date: Wed, 1 May 2019 15:00:27 -0300	[thread overview]
Message-ID: <2823f928-d0b6-9049-73ab-b2ce0ef5da83@gpiccoli.net> (raw)
In-Reply-To: <CAPhsuW65EW8JgjE8zknPQPXYcmDhX9LEhTKGb0KHywqKuZkUcA@mail.gmail.com>

 > On 5/1/19 12:33 PM, Song Liu wrote:
>> [...]
>> Indeed, fsync returns -1 in this case.
>> Interestingly, when I do a "dd if=<some_file> of=<raid0_mount>" and try
>> to "sync -f <some_file>" and "sync", it succeeds and the file is
>> written, although corrupted.
> 
> I guess this is some issue with sync command, but I haven't got time
> to look into it. How about running dd with oflag=sync or oflag=direct?
> 

Hi Song, could be some problem with sync command; using either 
'oflag=direct' or 'oflag=sync' fails the dd command instantly when a 
member is removed.


>> Do you think this behavior is correct? In other devices, like a pure
>> SCSI disk or NVMe, the 'dd' write fails.
>> Also, what about the status of the raid0 array in mdadm - it shows as
>> "clean" even after the member is removed, should we change that?
> 
> I guess this is because the kernel hasn't detect the array is gone? In
> that case, I think reducing the latency would be useful for some use
> cases.
> 

Exactly! This is the main concern here, mdadm cannot stop the array 
since it's mounted, and there's no filesystem API to quickly shutdown 
the filesystem, hence it keeps "alive" for too long after the failure.

For instance, if we have a raid0 with 2 members and remove the 1st, it 
fails much quicker than if we remove the 2nd; the filesystem will 
"realize" the device is flaw quickly if we remove the 1st member, and 
goes to RO mode. Specially, xfs seems even faster than ext4 in noticing 
the failure.

Do you have any suggestion on how could we reduce this latency? And how 
about the status exhibited by mdadm, shall it move from 'clean' to 
something more meaningful in the failure case?

Thanks again,


Guilherme

> Thanks,
> Song
> 
>>
>>
>>> Also, could you please highlight changes from V1 (if more than
>>> just rebase)?
>>
>> No changes other than rebase. Worth mentioning here that a kernel bot
>> (and Julia Lawall) found an issue in my patch; I forgot a
>> "mutex_lock(&mddev->open_mutex);" in line 6053, which caused the first
>> caveat (hung mdadm and persistent device in /dev). Thanks for pointing
>> this silly mistake from me! in case this patch gets some traction, I'll
>> re-submit with that fixed.
>>
>> Cheers,
>>
>>
>> Guilherme
>>
>> [0] https://marc.info/?l=linux-block&m=155666385707413
>>
>>>
>>> Thanks,
>>> Song
>>>

  reply	other threads:[~2019-05-01 18:00 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-18 22:04 [RFC] [PATCH V2 0/1] Introduce emergency raid0 stop for mounted arrays Guilherme G. Piccoli
2019-04-18 22:04 ` Guilherme G. Piccoli
2019-04-18 22:04 ` [RFC] [PATCH V2 1/1] md/raid0: Introduce emergency stop for raid0 arrays Guilherme G. Piccoli
2019-04-18 22:04   ` Guilherme G. Piccoli
2019-04-19 17:08 ` [RFC] [PATCH V2 0/1] Introduce emergency raid0 stop for mounted arrays Song Liu
2019-04-19 17:08   ` Song Liu
2019-04-30 22:41   ` Guilherme G. Piccoli
2019-04-30 22:41     ` Guilherme G. Piccoli
2019-05-01 15:33     ` Song Liu
2019-05-01 15:33       ` Song Liu
2019-05-01 18:00       ` Guilherme G. Piccoli [this message]
2019-05-01 18:00         ` Guilherme G. Piccoli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2823f928-d0b6-9049-73ab-b2ce0ef5da83@gpiccoli.net \
    --to=kernel@gpiccoli.net \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=gavin.guo@canonical.com \
    --cc=gpiccoli@canonical.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=liu.song.a23@gmail.com \
    --cc=neilb@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.