linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: dsterba@suse.cz, Qu Wenruo <quwenruo.btrfs@gmx.com>,
	Nikolay Borisov <nborisov@suse.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 4/5] btrfs: remove identified alien device in open_fs_devices
Date: Tue, 8 Oct 2019 11:26:31 +0800	[thread overview]
Message-ID: <75fd55f3-3c42-8daa-a3e7-26dadce6228f@oracle.com> (raw)
In-Reply-To: <20191007170306.GI2751@twin.jikos.cz>

On 10/8/19 1:03 AM, David Sterba wrote:
> On Mon, Oct 07, 2019 at 09:37:49PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/10/7 下午9:30, Nikolay Borisov wrote:
>>>
>>>
>>> On 7.10.19 г. 12:45 ч., Anand Jain wrote:
>>>> Following test case explains it all, even though the degraded mount is
>>>> successful the btrfs-progs fails to report the missing device.
>>>>
>>>>   mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \
>>>>   wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \
>>>>   btrfs fi show -m /btrfs
>>>>
>>>>   Label: none  uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495
>>>> 	Total devices 2 FS bytes used 128.00KiB
>>>> 	devid    1 size 1.09TiB used 2.01GiB path /dev/sdc
>>>> 	devid    2 size 1.09TiB used 2.01GiB path /dev/sdd
>>>>
>>>> This is because btrfs-progs does it fundamentally wrong way that
>>>> it deduces the missing device status in the user land instead of
>>>> refuting from the kernel.
>>>>
>>>> At the same time in the kernel when we know that there is device
>>>> with non-btrfs magic, then remove that device from the list so
>>>> that btrfs-progs or someother userland utility won't be confused.
>>>>
>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>> ---
>>>>   fs/btrfs/disk-io.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 326d5281ad93..e05856432456 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
>>>>   	if (btrfs_super_bytenr(super) != bytenr ||
>>>>   		    btrfs_super_magic(super) != BTRFS_MAGIC) {
>>>>   		brelse(bh);
>>>> -		return -EINVAL;
>>>> +		return -EUCLEAN;
>>>
>>> This is really non-obvious and you are propagating the special-meaning
>>> of EUCLEAN waaaaaaaay beyond btrfs_open_one_device. In fact what this
>>> patch does is make the following call chain return EUCLAN:
>>>
>>> btrfs_open_one_device <-- finally removing the device in this function
>>>   btrfs_get_bdev_and_sb <-- propagating it to here
>>>    btrfs_read_dev_super
>>>      btrfs_read_dev_one_super <-- you return the EUCLEAN
>>>
>>>
>>> And your commit log doesn't mention anything about that. EUCLEAN
>>> warrants a comment in this case since it changes behavior in
>>> higher-level layers.

  Ok. Which means the commit log needs to be fixed.

>>
>> And, for most case, EUCLEAN should have a proper kernel message for
>> what's going wrong, the value we hit and the value we expect.

  If its a kernel message for EUCLEAN in the context of mounted state,
  I would say yes, as it indicates a corruption.
  But here we are still in the unmounted context and moving towards
  mounted context. Having an alien device in the fs_devices is not
  something logical bug nor a corruption, it just about a disk whose
  disk superblock got overwritten by the user operation. And its not
  a good idea to log the kernel messages arising from the user
  operation especially in the unmounted context. Otherwise we shall
  endup cluttering the kernel messages. Moreover if there is an alien
  device, the user must use -o degraded and we do have the missing
  kernel messages, and this will suffice to explain the state about the
  fsid being mounted. And the alien fsid, its a stale we don't worry
  about it. So no kernel messages.

>> And for EUCLEAN, it's more like graceful error out for impossible thing.
>> This is definitely not the case, and I believe the original EINVAL makes
>> more sense than EUCLEAN.
> 
> I agree, EUCLEAN is wrong here.
>

   I am ok with any other suitable. It does not matter. And the other
   choice I have is ESTALE.

   But EINVAL is no go because we still want to keep the messed up device
   unless there is a confirmation that its alien.

   In the same function we use EINVAL if the device/partition size
   changed to smaller size after we have scanned the device. Which
   means we still keep the device in the list. Its the user choice,
   no need to meddle with it.

    bytenr = btrfs_sb_offset(copy_num);
    if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
           return -EINVAL;

   But, EUCLEAN /* structure needs cleaning */ is errno which exactly
   says whats needed here. Because an alien device is a kind of
   corruption but it can easily happen due to unplanned device operations
   in the userland. But as it happened before we assembled the volume so
   its safe to clean and is not a road block when there is redundancy
   with degraded option.

Thanks, Anand


  reply	other threads:[~2019-10-08  3:26 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-07  9:45 [PATCH v3 0/5] btrfs: fix issues due to alien device Anand Jain
2019-10-07  9:45 ` [PATCH 1/5] btrfs: drop useless goto in open_fs_devices Anand Jain
2020-01-16 15:52   ` Josef Bacik
2019-10-07  9:45 ` [PATCH 2/5] btrfs: include non-missing as a qualifier for the latest_bdev Anand Jain
2020-01-16 15:52   ` Josef Bacik
2020-01-17  3:01     ` Anand Jain
2019-10-07  9:45 ` [PATCH v3 3/5] btrfs: remove identified alien btrfs device in open_fs_devices Anand Jain
2020-01-16 15:56   ` Josef Bacik
2020-01-17  9:10     ` Anand Jain
2020-01-17 14:23       ` Josef Bacik
2019-10-07  9:45 ` [PATCH 4/5] btrfs: remove identified alien " Anand Jain
2019-10-07 13:30   ` Nikolay Borisov
2019-10-07 13:37     ` Qu Wenruo
2019-10-07 17:03       ` David Sterba
2019-10-08  3:26         ` Anand Jain [this message]
2020-01-15  8:56           ` Anand Jain
2019-10-07  9:45 ` [PATCH 5/5] btrfs: free alien device due to device add Anand Jain
2020-01-16 16:00   ` Josef Bacik
2019-10-07 17:36 ` [PATCH v3 0/5] btrfs: fix issues due to alien device David Sterba
2019-10-08  6:11   ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75fd55f3-3c42-8daa-a3e7-26dadce6228f@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).