All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: Su Yue <Damenly_Su@gmx.com>
Cc: Nikolay Borisov <nborisov@suse.com>,
	damenly.su@gmail.com, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted)
Date: Fri, 13 Dec 2019 13:36:45 +0800	[thread overview]
Message-ID: <d0da81b4-801d-cb90-6e15-66905f190930@oracle.com> (raw)
In-Reply-To: <07e99b04-ec0c-f027-079e-b0d3c1e54970@gmx.com>



  metadata_uuid code is too confusing, a lot of if and if-nots
  it should be have been better.

  more below.

On 13/12/19 10:46 AM, Su Yue wrote:
> On 2019/12/12 10:15 PM, Nikolay Borisov wrote:
>>
>>
>> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>>
>> <snip>
>>
>>> Acutally, there are two devices in the fs. Device 2 with
>>> FSID_CHANGING_V2 allocated a fs_devices. But, device 1 found the
>>> fs_devices but failed to be added into since fs_devices->opened (
>>
>> It's not clear why device 1 wasn't able to be added to the fs_devices
>> allocated by dev 2. Please elaborate?
>>
>>
> Sure, of course.
> 
> For example.
> 
> $cat test.sh
> ====================================================================
> img1="/tmp/test1.img"
> img2="/tmp/test2.img"
> 
> [ -f "$img1" ] || fallocate -l 300M "$img1"
> [ -f "$img2" ] || fallocate -l 300M "$img2"
> 
> mkfs.btrfs -f $img1 $img2 2>&1 >/dev/null|| exit 1
> losetup -D
> 
> dmesg -C
> rmmod btrfs
> modprobe btrfs
> 
> loop1=$(losetup --find --show "$img1")
> loop2=$(losetup --find --show "$img2")

  Can you explicitly show what devices should be scanned to make the
  device mount (below) successful. Fist you can cleanup the
  device list using

    btrfs device --forget

> mount $loop1 /mnt || exit 1
> umount /mnt
> ====================================================================
> 
> $dmesg
> ====================================================================
> [  395.205221] BTRFS: device fsid 5090db22-5e48-4767-8fb7-d037c619c1ee
> devid 1 transid 5 /dev/loop0 scanned by systemd-udevd (13620)
> [  395.210773] !!!!!!!!fs_device opened
> [  395.213875] BTRFS info (device loop0): disk space caching is enabled
> [  395.214994] BTRFS info (device loop0): has skinny extents
> [  395.215891] BTRFS info (device loop0): flagging fs with big metadata
> feature
> [  395.222639] BTRFS error (device loop0): devid 2 uuid
> adcc8454-695f-4e1d-bde8-94041b7bf761 is missing
> [  395.224147] BTRFS error (device loop0): failed to read the system
> array: -2
> [  395.246163] !!!!!!!!fs_device opened
> [  395.338219] BTRFS error (device loop0): open_ctree failed
> =====================================================================
> 
> The line "!!!!!!!!fs_device opened" is handy added by me in debug purpose.
> 
> =====================================================================
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -794,6 +794,7 @@ static noinline struct btrfs_device
> *device_list_add(const char *path,
> 
>          if (!device) {
>                  if (fs_devices->opened) {
> +                       pr_info("!!!!!!!!fs_device opened\n");
>                          mutex_unlock(&fs_devices->device_list_mutex);
>                          return ERR_PTR(-EBUSY);
>                  }
> =====================================================================
> 
> To make it more clear. The following is in metadata_uuid situation.
> Device 1 is without FSID_CHANGING_V2 but has IMCOMPAT_METADATA_UUID.
> (newer transid).
> 
> Device 2 is with FSID_CHANGING_V2 and IMCOMPAT_METADATA_UUID.(Older
> transid).

How were you able to set both BTRFS_SUPER_FLAG_CHANGING_FSID_V2
and BTRFS_FEATURE_INCOMPAT_METADATA_UUID on only devid 2 ?


> The workflow in misc-tests/034 is
> 
> loop1=$(losetup --find --show "$device2")
> loop2=$(losetup --find --show "$device1")
> 
> mount $loop1 /mnt ---> fails here
> 
> Assume the fs_devices was allocated by systemd-udevd through
> btrfs_control_ioctl() path after finish of scanning of device2.
> 
> Then:
> 

In the two threads which are in race (below), the mount thread can't be 
successful unless -o degraded is used, if it does it means the devid 1 
is already scanned and for that btrfs_device to be in the
btrfs_fs_devices list the fsid has to match (does not matter metadata_uuid).

> Thread *mounting device2*            Thread *scanning device1*
> 
> 
> btrfs_mount_root                     btrfs_control_ioctl
> 
>    mutex_lock(&uuid_mutex);
> 
>      btrfs_read_disk_super
>      btrfs_scan_one_device
>      --> there is only device2
>      in the fs_devices
> 
>      btrfs_open_devices
>        fs_devices->opened = 1
>        fs_devices->latest_bdev = device2
> 
>      mutex_unlock(&uuid_mutex);
> 
>                                        mutex_lock(&uuid_mutex);
>                                        btrfs_scan_one_device
>                                          btrfs_read_disk_super
> 
>                                          device_list_add
>                                            found fs_devices
>                                              device = btrfs_find_device
> 
>                                              rewrite fs_deivces->fsid if
>                                              scanned device1 is newer
>                                               --> Change fs_devices->fsi
>                                                    d to device1->fsid
> 
>                                            if (!device)
>                                               if(fs_devices->opened)
>                           return -EBUSY
>                                               --> the device1 adding
>                                                   aborts since
>                                                   fs_devices was opened
>                                        mutex_unlock(&uuid_mutex);
>    btrfs_fill_super
>      open_ctree
>         btrfs_read_dev_super(
>         fs_devices->latest_bdev)
>         --> the latest_bdev is device2
> 
>         assert fs_devices->fsid equals
>         device2's fsid.
>         --> fs_device->fsid was rewritten by
>             the scanning thread
> 
> The result is fs_device->fsid is from device1 but super->fsid is from
> the lastest device2.
> 

  Oops that's not good. However still not able to image various devices
  and its fsid to achieve that condition. Is it possible to write a test
  case? It would help.

Thanks, Anand

>>> the thread is doing mount device 1). But device 1's fsid was copied
>>> to fs_devices->fsid then the assertion failed.
>>
>>
>> dev 1 fsid should be copied iff its transid is newer.
>>
> 
> Even it was failed to be added into the fs_devices?
> 
>>>
>>> The solution is that only if a new device was added into a existing
>>> fs_device, then the fs_devices->fsid is allowed to be rewritten.
>>
>> fs_devices->fsid must be re-written by any device which is _newer_ w.r.t
>> to the transid.
>>
> 
> Then the assertion failed in above scenario. Just do not update the
> fs_devices->fsid, let later btrfs_read_sys_array() report the device
> missing then reject to mount.
> 
> Thanks
> 
>>>
>>> Fixes: 7a62d0f07377 ("btrfs: Handle one more split-brain scenario 
>>> during fsid change")
>>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>>> ---
>>>   fs/btrfs/volumes.c | 36 +++++++++++++++++++++---------------
>>>   1 file changed, 21 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index d8e5560db285..9efa4123c335 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -732,6 +732,9 @@ static noinline struct btrfs_device 
>>> *device_list_add(const char *path,
>>>           BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
>>>       bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
>>>                       BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
>>> +    bool fs_devices_found = false;
>>> +
>>> +    *new_device_added = false;
>>>
>>>       if (fsid_change_in_progress) {
>>>           if (!has_metadata_uuid) {
>>> @@ -772,24 +775,11 @@ static noinline struct btrfs_device 
>>> *device_list_add(const char *path,
>>>
>>>           device = NULL;
>>>       } else {
>>> +        fs_devices_found = true;
>>> +
>>>           mutex_lock(&fs_devices->device_list_mutex);
>>>           device = btrfs_find_device(fs_devices, devid,
>>>                   disk_super->dev_item.uuid, NULL, false);
>>> -
>>> -        /*
>>> -         * If this disk has been pulled into an fs devices created by
>>> -         * a device which had the CHANGING_FSID_V2 flag then replace 
>>> the
>>> -         * metadata_uuid/fsid values of the fs_devices.
>>> -         */
>>> -        if (has_metadata_uuid && fs_devices->fsid_change &&
>>> -            found_transid > fs_devices->latest_generation) {
>>> -            memcpy(fs_devices->fsid, disk_super->fsid,
>>> -                    BTRFS_FSID_SIZE);
>>> -            memcpy(fs_devices->metadata_uuid,
>>> -                    disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>> -
>>> -            fs_devices->fsid_change = false;
>>> -        }
>>>       }
>>>
>>>       if (!device) {
>>> @@ -912,6 +902,22 @@ static noinline struct btrfs_device 
>>> *device_list_add(const char *path,
>>>           }
>>>       }
>>>
>>> +    /*
>>> +     * If the new added disk has been pulled into an fs devices 
>>> created by
>>> +     * a device which had the CHANGING_FSID_V2 flag then replace the
>>> +     * metadata_uuid/fsid values of the fs_devices.
>>> +     */
>>> +    if (*new_device_added && fs_devices_found &&
>>> +        has_metadata_uuid && fs_devices->fsid_change &&
>>> +        found_transid > fs_devices->latest_generation) {
>>> +        memcpy(fs_devices->fsid, disk_super->fsid,
>>> +               BTRFS_FSID_SIZE);
>>> +        memcpy(fs_devices->metadata_uuid,
>>> +               disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>> +
>>> +        fs_devices->fsid_change = false;
>>> +    }
>>> +
>>>       /*
>>>        * Unmount does not free the btrfs_device struct but would zero
>>>        * generation along with most of the other members. So just update
>>>
> 


  reply	other threads:[~2019-12-13  5:37 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-12 11:01 [PATCH 0/6] btrfs: metadata uuid fixes and enhancements damenly.su
2019-12-12 11:01 ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan damenly.su
2019-12-12 14:15   ` Nikolay Borisov
2019-12-13  2:30     ` Su Yue
2019-12-13  2:46     ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted) Su Yue
2019-12-13  5:36       ` Anand Jain [this message]
2019-12-13  7:15         ` Su Yue
2019-12-13  8:51           ` Anand Jain
2019-12-13 10:10             ` Su Yue
2019-12-12 11:01 ` [PATCH 2/6] btrfs: metadata_uuid: move split-brain handling from fs_id() to new function damenly.su
2019-12-12 13:05   ` Nikolay Borisov
2019-12-12 13:32     ` Su Yue
2019-12-12 11:01 ` [PATCH 3/6] btrfs: split-brain case for scanned changing device with INCOMPAT_METADATA_UUID damenly.su
2019-12-12 13:24   ` Su Yue
2019-12-12 13:34   ` Nikolay Borisov
2019-12-12 14:19     ` Su Yue
2019-12-12 11:01 ` [PATCH 4/6] btrfs: split-brain case for scanned changed device without INCOMPAT_METADATA_UUID damenly.su
2019-12-12 11:01 ` [PATCH 5/6] btrfs: copy fsid and metadata_uuid for pulled disk " damenly.su
2020-01-06 15:12   ` Nikolay Borisov
2020-01-07  1:31     ` Su Yue
2020-01-07  7:18       ` Nikolay Borisov
2020-01-07  7:34         ` Su Yue
2019-12-12 11:01 ` [PATCH 6/6] btrfs: metadata_uuid: move partly logic into find_fsid_inprogress() damenly.su
2019-12-12 13:37   ` Nikolay Borisov
2019-12-13  8:03 ` [PATCH 0/6] btrfs: metadata uuid fixes and enhancements Nikolay Borisov
2019-12-16  0:49   ` Su Yue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d0da81b4-801d-cb90-6e15-66905f190930@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=Damenly_Su@gmx.com \
    --cc=damenly.su@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.