All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: Wang Yugui <wangyugui@e16-tech.com>
Cc: linux-btrfs@vger.kernel.org, Sherry Yang <sherry.yang@oracle.com>,
	kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH] btrfs: fix mkfs/mount/check failures due to race with systemd-udevd scan
Date: Thu, 23 Mar 2023 21:14:21 +0800	[thread overview]
Message-ID: <cbac9a8b-7db4-dc54-1f1d-4dc48e5dfcc9@oracle.com> (raw)
In-Reply-To: <20230323195710.433E.409509F4@e16-tech.com>



On 23/03/2023 19:57, Wang Yugui wrote:
> Hi,
> 
>> During the device scan initiated by systemd-udevd, other user space
>> EXCL operations such as mkfs, mount, or check may get blocked and result
>> in a "Device or resource busy" error. This is because the device
>> scan process opens the device with the EXCL flag in the kernel.
>>
>> Two reports were received:
>>
>>   . One with the btrfs/179 testcase, where the fsck command failed with
>>     the -EBUSY error; and
>>
>>   . Another with the LTP pwritev03 testcase, where mkfs.vfs failed with
>>     the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
>>     on the device.
>>
>> In both cases, fsck and mkfs (respectively) were racing with a
>> systemd-udevd device scan, and systemd-udevd won, resulting in the
>> -EBUSY error for fsck and mkfs.
>>
>> Reproducing the problem has been difficult because there is a very
>> small timeframe during which these userspace threads can race to
>> acquire the exclusive device open. Even on the system where the problem
>> was observed, the problem occurances were anywhere between 10 to 400
>> iterations and chances of reproducing lessen with debug printk()s.
>>
>> However, an exclusive device open is unnecessary for the scan process,
>> as there are no write operations on the device during scan. Furthermore,
>> during the mount process, the superblock is re-read in the below
>> function stack.
>>
>>    btrfs_mount_root
>>     btrfs_open_devices
>>      open_fs_devices
>>       btrfs_open_one_device
>>         btrfs_get_bdev_and_sb
>>
>> So, to fix this issue, this patch removes the FMODE_EXCL flag from the scan
>> operation, and adds a comment.
>>
>> Reported-by: Sherry Yang <sherry.yang@oracle.com>
>> Reported-by: kernel test robot <oliver.sang@intel.com>
>> Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>
>>   This patch should be cc-ed to stable-5.15.y and stable-6.1.y. As for
>>   stable-5.10.y and stable-5.4.y, a conflict fix is necessary, which I
>>   will send separately.
>>
>>   fs/btrfs/volumes.c | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 93bc45001e68..cc1871767c8c 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1366,8 +1366,17 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
>>   	 * So, we need to add a special mount option to scan for
>>   	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>>   	 */
>> -	flags |= FMODE_EXCL;
>>   
>> +	/*
>> +	 * Avoid using flag |= FMODE_EXCL here, as the systemd-udev may
>> +	 * initiate the device scan which may race with the user's mount
>> +	 * or mkfs command, resulting in failure.
> 

> for  FMODE_READ | FMODE_EXCL, we need some sleep/retry,
> for  FMODE_WRITE | FMODE_EXCL, we should fail immediately?

  Sorry I don't understand the context here what represents the we here?

  In the LTP testcase the two sides are
   mkfs.<vfs|btrfs> side (FMODE_WRITE|FMODE_EXCL) and
   device-scan side (now: FMODE_READ, before: FMODE_READ|FMODE_EXCL)

> scan race with with mkfs may result worse?

  In the above example, the mkfs.<vfs|btrfs> failed immediately without
  the patch and with the patch it is successful.

Thanks, Anand

> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2023/03/23
> 
> 
>> +	 * Since the device scan is solely for reading purposes, there is
>> +	 * no need for FMODE_EXCL. Additionally, the devices are read again
>> +	 * during the mount process. It is ok to get some inconsistent
>> +	 * values temporarily, as the device paths of the fsid are the only
>> +	 * required information for assembling the volume.
>> +	 */
>>   	bdev = blkdev_get_by_path(path, flags, holder);
>>   	if (IS_ERR(bdev))
>>   		return ERR_CAST(bdev);
>> -- 
>> 2.38.1
> 
> 

  reply	other threads:[~2023-03-23 13:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-23  7:56 [PATCH] btrfs: fix mkfs/mount/check failures due to race with systemd-udevd scan Anand Jain
2023-03-23 11:57 ` Wang Yugui
2023-03-23 13:14   ` Anand Jain [this message]
2023-03-23 13:27     ` Wang Yugui
2023-03-23 18:27       ` David Sterba
2023-03-23 18:24 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cbac9a8b-7db4-dc54-1f1d-4dc48e5dfcc9@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=sherry.yang@oracle.com \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.