All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
To: Song Liu <liu.song.a23@gmail.com>
Cc: axboe@kernel.dk, linux-block@vger.kernel.org,
	kernel@gpiccoli.net, NeilBrown <neilb@suse.com>,
	linux-raid <linux-raid@vger.kernel.org>,
	dm-devel@redhat.com,
	Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
	jay.vosburgh@canonical.com, gavin.guo@canonical.com
Subject: Re: [RFC] [PATCH V2 0/1] Introduce emergency raid0 stop for mounted arrays
Date: Tue, 30 Apr 2019 19:41:11 -0300	[thread overview]
Message-ID: <b39b96ea-2540-a407-2232-1af91e3e6658@canonical.com> (raw)
In-Reply-To: <CAPhsuW4k5zz2pJBPL60VzjTcj6NTnhBh-RjvWASLcOxAk+yDEw@mail.gmail.com>

> On 19/04/2019 14:08, Song Liu wrote:
> [...]
> I read through the discussion in V1, and I would agree with Neil that
> current behavior is reasonable.
> 
> For the following example:
> 
> fd = open("file", "w");
> write(fd, buf, size);
> ret = fsync(fd);
> 
> If "size" is big enough, the write is not expected to be atomic for
> md or other drives. If we remove the underlining block device
> after write() and before fsync(), the file could get corrupted. This
> is the same for md or NVMe/SCSI drives.
> 
> The application need to check "ret" from fsync(), the data is safe
> only when fsync() returns 0.
> 
> Does this make sense?
> 

Hi Song, thanks for your quick response, and sorry for my delay.
I've noticed after v4.18 kernel started to crash when we remove one
raid0 member while writing, so I was investigating this
before perform your test (in fact, found 2 issues [0]), hence my delay.

Your test does make sense; in fact I've tested your scenario with the
following code (with the patches from [0]):
https://pastebin.ubuntu.com/p/cyqpDqpM7x/

Indeed, fsync returns -1 in this case.
Interestingly, when I do a "dd if=<some_file> of=<raid0_mount>" and try
to "sync -f <some_file>" and "sync", it succeeds and the file is
written, although corrupted.

Do you think this behavior is correct? In other devices, like a pure
SCSI disk or NVMe, the 'dd' write fails.
Also, what about the status of the raid0 array in mdadm - it shows as
"clean" even after the member is removed, should we change that?


> Also, could you please highlight changes from V1 (if more than
> just rebase)?

No changes other than rebase. Worth mentioning here that a kernel bot
(and Julia Lawall) found an issue in my patch; I forgot a
"mutex_lock(&mddev->open_mutex);" in line 6053, which caused the first
caveat (hung mdadm and persistent device in /dev). Thanks for pointing
this silly mistake from me! in case this patch gets some traction, I'll
re-submit with that fixed.

Cheers,


Guilherme

[0] https://marc.info/?l=linux-block&m=155666385707413

> 
> Thanks,
> Song
> 

WARNING: multiple messages have this Message-ID (diff)
From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
To: Song Liu <liu.song.a23@gmail.com>
Cc: axboe@kernel.dk, linux-raid <linux-raid@vger.kernel.org>,
	jay.vosburgh@canonical.com, kernel@gpiccoli.net,
	NeilBrown <neilb@suse.com>,
	dm-devel@redhat.com,
	Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-block@vger.kernel.org, gavin.guo@canonical.com
Subject: Re: [RFC] [PATCH V2 0/1] Introduce emergency raid0 stop for mounted arrays
Date: Tue, 30 Apr 2019 19:41:11 -0300	[thread overview]
Message-ID: <b39b96ea-2540-a407-2232-1af91e3e6658@canonical.com> (raw)
In-Reply-To: <CAPhsuW4k5zz2pJBPL60VzjTcj6NTnhBh-RjvWASLcOxAk+yDEw@mail.gmail.com>

> On 19/04/2019 14:08, Song Liu wrote:
> [...]
> I read through the discussion in V1, and I would agree with Neil that
> current behavior is reasonable.
> 
> For the following example:
> 
> fd = open("file", "w");
> write(fd, buf, size);
> ret = fsync(fd);
> 
> If "size" is big enough, the write is not expected to be atomic for
> md or other drives. If we remove the underlining block device
> after write() and before fsync(), the file could get corrupted. This
> is the same for md or NVMe/SCSI drives.
> 
> The application need to check "ret" from fsync(), the data is safe
> only when fsync() returns 0.
> 
> Does this make sense?
> 

Hi Song, thanks for your quick response, and sorry for my delay.
I've noticed after v4.18 kernel started to crash when we remove one
raid0 member while writing, so I was investigating this
before perform your test (in fact, found 2 issues [0]), hence my delay.

Your test does make sense; in fact I've tested your scenario with the
following code (with the patches from [0]):
https://pastebin.ubuntu.com/p/cyqpDqpM7x/

Indeed, fsync returns -1 in this case.
Interestingly, when I do a "dd if=<some_file> of=<raid0_mount>" and try
to "sync -f <some_file>" and "sync", it succeeds and the file is
written, although corrupted.

Do you think this behavior is correct? In other devices, like a pure
SCSI disk or NVMe, the 'dd' write fails.
Also, what about the status of the raid0 array in mdadm - it shows as
"clean" even after the member is removed, should we change that?


> Also, could you please highlight changes from V1 (if more than
> just rebase)?

No changes other than rebase. Worth mentioning here that a kernel bot
(and Julia Lawall) found an issue in my patch; I forgot a
"mutex_lock(&mddev->open_mutex);" in line 6053, which caused the first
caveat (hung mdadm and persistent device in /dev). Thanks for pointing
this silly mistake from me! in case this patch gets some traction, I'll
re-submit with that fixed.

Cheers,


Guilherme

[0] https://marc.info/?l=linux-block&m=155666385707413

> 
> Thanks,
> Song
> 

  reply	other threads:[~2019-04-30 22:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-18 22:04 [RFC] [PATCH V2 0/1] Introduce emergency raid0 stop for mounted arrays Guilherme G. Piccoli
2019-04-18 22:04 ` Guilherme G. Piccoli
2019-04-18 22:04 ` [RFC] [PATCH V2 1/1] md/raid0: Introduce emergency stop for raid0 arrays Guilherme G. Piccoli
2019-04-18 22:04   ` Guilherme G. Piccoli
2019-04-19 17:08 ` [RFC] [PATCH V2 0/1] Introduce emergency raid0 stop for mounted arrays Song Liu
2019-04-19 17:08   ` Song Liu
2019-04-30 22:41   ` Guilherme G. Piccoli [this message]
2019-04-30 22:41     ` Guilherme G. Piccoli
2019-05-01 15:33     ` Song Liu
2019-05-01 15:33       ` Song Liu
2019-05-01 18:00       ` Guilherme G. Piccoli
2019-05-01 18:00         ` Guilherme G. Piccoli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b39b96ea-2540-a407-2232-1af91e3e6658@canonical.com \
    --to=gpiccoli@canonical.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=gavin.guo@canonical.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=kernel@gpiccoli.net \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=liu.song.a23@gmail.com \
    --cc=neilb@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.