All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: NeilBrown <neilb@suse.de>, Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Neal Gompa <ngompa13@gmail.com>,
	Wang Yugui <wangyugui@e16-tech.com>,
	Christoph Hellwig <hch@infradead.org>,
	Josef Bacik <josef@toxicpanda.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Chuck Lever <chuck.lever@oracle.com>, Chris Mason <clm@fb.com>,
	David Sterba <dsterba@suse.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-nfs@vger.kernel.org,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
Date: Fri, 30 Jul 2021 13:25:54 +0800	[thread overview]
Message-ID: <341403c0-a7a7-f6c8-5ef6-2d966b1907a8@gmx.com> (raw)
In-Reply-To: <162761259105.21659.4838403432058511846@noble.neil.brown.name>



On 2021/7/30 上午10:36, NeilBrown wrote:
>
> I've been pondering all the excellent feedback, and what I have learnt
> from examining the code in btrfs, and I have developed a different
> perspective.

Great! Some new developers into the btrfs realm!

>
> Maybe "subvol" is a poor choice of name because it conjures up
> connections with the Volumes in LVM, and btrfs subvols are very different
> things.  Btrfs subvols are really just subtrees that can be treated as a
> unit for operations like "clone" or "destroy".
>
> As such, they don't really deserve separate st_dev numbers.
>
> Maybe the different st_dev numbers were introduced as a "cheap" way to
> extend to size of the inode-number space.  Like many "cheap" things, it
> has hidden costs.
>
> Maybe objects in different subvols should still be given different inode
> numbers.  This would be problematic on 32bit systems, but much less so on
> 64bit systems.
>
> The patch below, which is just a proof-of-concept, changes btrfs to
> report a uniform st_dev, and different (64bit) st_ino in different subvols.
>
> It has problems:
>   - it will break any 32bit readdir and 32bit stat.  I don't know how big
>     a problem that is these days (ino_t in the kernel is "unsigned long",
>     not "unsigned long long). That surprised me).
>   - It might break some user-space expectations.  One thing I have learnt
>     is not to make any assumption about what other people might expect.

Wouldn't any filesystem boundary check fail to stop at subvolume boundary?

Then it will go through the full btrfs subvolumes/snapshots, which can
be super slow.

>
> However, it would be quite easy to make this opt-in (or opt-out) with a
> mount option, so that people who need the current inode numbers and will
> accept the current breakage can keep working.
>
> I think this approach would be a net-win for NFS export, whether BTRFS
> supports it directly or not.  I might post a patch which modifies NFS to
> intuit improved inode numbers for btrfs exports....

Some extra ideas, but not familiar with VFS enough to be sure.

Can we generate "fake" superblock for each subvolume?
Like using the subolume UUID to replace the FSID of each subvolume.
Could that migrate the problem?

Thanks,
Qu

>
> So: how would this break your use-case??
>
> Thanks,
> NeilBrown
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 0117d867ecf8..8dc58c848502 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6020,6 +6020,37 @@ static int btrfs_opendir(struct inode *inode, struct file *file)
>   	return 0;
>   }
>
> +static u64 btrfs_make_inum(struct btrfs_key *root, struct btrfs_key *ino)
> +{
> +	u64 rootid = root->objectid;
> +	u64 inoid = ino->objectid;
> +	int shift = 64-8;
> +
> +	if (ino->type == BTRFS_ROOT_ITEM_KEY) {
> +		/* This is a subvol root found during readdir. */
> +		rootid = inoid;
> +		inoid = BTRFS_FIRST_FREE_OBJECTID;
> +	}
> +	if (rootid == BTRFS_FS_TREE_OBJECTID)
> +		/* this is main vol, not subvol (I think) */
> +		return inoid;
> +	/* store the rootid in the high bits of the inum.  This
> +	 * will break if 32bit inums are required - we cannot know
> +	 */
> +	while (rootid) {
> +		inoid ^= (rootid & 0xff) << shift;
> +		rootid >>= 8;
> +		shift -= 8;
> +	}
> +	return inoid;
> +}
> +
> +static inline u64 btrfs_ino_to_inum(struct inode *inode)
> +{
> +	return btrfs_make_inum(&BTRFS_I(inode)->root->root_key,
> +			       &BTRFS_I(inode)->location);
> +}
> +
>   struct dir_entry {
>   	u64 ino;
>   	u64 offset;
> @@ -6045,6 +6076,49 @@ static int btrfs_filldir(void *addr, int entries, struct dir_context *ctx)
>   	return 0;
>   }
>
> +static inline bool btrfs_dir_emit_dot(struct file *file,
> +				      struct dir_context *ctx)
> +{
> +	return ctx->actor(ctx, ".", 1, ctx->pos,
> +			  btrfs_ino_to_inum(file->f_path.dentry->d_inode),
> +			  DT_DIR) == 0;
> +}
> +
> +static inline ino_t btrfs_parent_ino(struct dentry *dentry)
> +{
> +	ino_t res;
> +
> +	/*
> +	 * Don't strictly need d_lock here? If the parent ino could change
> +	 * then surely we'd have a deeper race in the caller?
> +	 */
> +	spin_lock(&dentry->d_lock);
> +	res = btrfs_ino_to_inum(dentry->d_parent->d_inode);
> +	spin_unlock(&dentry->d_lock);
> +	return res;
> +}
> +
> +static inline bool btrfs_dir_emit_dotdot(struct file *file,
> +					 struct dir_context *ctx)
> +{
> +	return ctx->actor(ctx, "..", 2, ctx->pos,
> +			  btrfs_parent_ino(file->f_path.dentry), DT_DIR) == 0;
> +}
> +static inline bool btrfs_dir_emit_dots(struct file *file,
> +				       struct dir_context *ctx)
> +{
> +	if (ctx->pos == 0) {
> +		if (!btrfs_dir_emit_dot(file, ctx))
> +			return false;
> +		ctx->pos = 1;
> +	}
> +	if (ctx->pos == 1) {
> +		if (!btrfs_dir_emit_dotdot(file, ctx))
> +			return false;
> +		ctx->pos = 2;
> +	}
> +	return true;
> +}
>   static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>   {
>   	struct inode *inode = file_inode(file);
> @@ -6067,7 +6141,7 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>   	bool put = false;
>   	struct btrfs_key location;
>
> -	if (!dir_emit_dots(file, ctx))
> +	if (!btrfs_dir_emit_dots(file, ctx))
>   		return 0;
>
>   	path = btrfs_alloc_path();
> @@ -6136,7 +6210,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>   		put_unaligned(fs_ftype_to_dtype(btrfs_dir_type(leaf, di)),
>   				&entry->type);
>   		btrfs_dir_item_key_to_cpu(leaf, di, &location);
> -		put_unaligned(location.objectid, &entry->ino);
> +		put_unaligned(btrfs_make_inum(&root->root_key, &location),
> +			      &entry->ino);
>   		put_unaligned(found_key.offset, &entry->offset);
>   		entries++;
>   		addr += sizeof(struct dir_entry) + name_len;
> @@ -9193,7 +9268,7 @@ static int btrfs_getattr(struct user_namespace *mnt_userns,
>   				  STATX_ATTR_NODUMP);
>
>   	generic_fillattr(&init_user_ns, inode, stat);
> -	stat->dev = BTRFS_I(inode)->root->anon_dev;
> +	stat->ino = btrfs_ino_to_inum(inode);
>
>   	spin_lock(&BTRFS_I(inode)->lock);
>   	delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes;
>

  reply	other threads:[~2021-07-30  5:26 UTC|newest]

Thread overview: 127+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
2021-07-27 22:37 ` [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points NeilBrown
2021-07-28 10:13   ` Amir Goldstein
2021-07-29  0:28     ` NeilBrown
2021-07-29  5:27       ` Amir Goldstein
2021-08-06  7:52         ` Miklos Szeredi
2021-08-06  8:08           ` Amir Goldstein
2021-08-06  8:18             ` Miklos Szeredi
2021-07-28 19:17   ` J. Bruce Fields
2021-07-28 22:25     ` NeilBrown
2021-07-27 22:37 ` [PATCH 04/11] VFS: export lookup_mnt() NeilBrown
2021-07-30  0:31   ` Al Viro
2021-07-30  5:33     ` NeilBrown
2021-07-27 22:37 ` [PATCH 01/11] VFS: show correct dev num in mountinfo NeilBrown
2021-07-30  0:25   ` Al Viro
2021-07-30  5:28     ` NeilBrown
2021-07-30  5:54       ` Miklos Szeredi
2021-07-30  6:13         ` NeilBrown
2021-07-30  7:18           ` Miklos Szeredi
2021-07-30  7:33             ` NeilBrown
2021-07-30  7:59               ` Miklos Szeredi
2021-08-02  4:18                 ` A Third perspective on BTRFS nfsd subvol dev/inode number issues NeilBrown
2021-08-02  5:25                   ` Al Viro
2021-08-02  5:40                     ` NeilBrown
2021-08-02  7:54                       ` Amir Goldstein
2021-08-02 13:53                         ` Josef Bacik
2021-08-03 22:29                           ` Qu Wenruo
2021-08-02 14:47                         ` Frank Filz
2021-08-02 21:24                         ` NeilBrown
2021-08-02  7:15                   ` Martin Steigerwald
2021-08-02 21:40                     ` NeilBrown
2021-08-02 12:39                   ` J. Bruce Fields
2021-08-02 20:32                     ` Patrick Goetz
2021-08-02 20:41                       ` J. Bruce Fields
2021-08-02 21:10                     ` NeilBrown
2021-08-02 21:50                       ` J. Bruce Fields
2021-08-02 21:59                         ` NeilBrown
2021-08-02 22:14                           ` J. Bruce Fields
2021-08-02 22:36                             ` NeilBrown
2021-08-03  0:15                               ` J. Bruce Fields
2021-07-27 22:37 ` [PATCH 03/11] VFS: pass lookup_flags into follow_down() NeilBrown
2021-07-27 22:37 ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots NeilBrown
2021-07-28  8:37   ` kernel test robot
2021-07-28  8:37     ` kernel test robot
2021-07-28  8:37   ` [RFC PATCH] btrfs: btrfs_mountpoint_expiry_timeout can be static kernel test robot
2021-07-28  8:37     ` kernel test robot
2021-07-28 13:12   ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots Christian Brauner
2021-07-29  0:43     ` NeilBrown
2021-07-29 14:38       ` Christian Brauner
2021-07-31  6:25   ` [btrfs] 5874902268: xfstests.btrfs.202.fail kernel test robot
2021-07-31  6:25     ` kernel test robot
2021-07-27 22:37 ` [PATCH 06/11] nfsd: include a vfsmount in struct svc_fh NeilBrown
2021-07-27 22:37 ` [PATCH 10/11] btrfs: introduce mapping function from location to inum NeilBrown
2021-07-27 22:37 ` [PATCH 02/11] VFS: allow d_automount to create in-place bind-mount NeilBrown
2021-07-27 22:37 ` [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points NeilBrown
2021-07-28 19:15   ` J. Bruce Fields
2021-07-28 22:29     ` NeilBrown
2021-07-30  0:42   ` Al Viro
2021-07-30  5:43     ` NeilBrown
2021-07-27 22:37 ` [PATCH 08/11] nfsd: change get_parent_attributes() to nfsd_get_mounted_on() NeilBrown
2021-07-27 22:37 ` [PATCH 05/11] VFS: new function: mount_is_internal() NeilBrown
2021-07-28  2:16   ` Al Viro
2021-07-28  3:32     ` NeilBrown
2021-07-30  0:34       ` Al Viro
2021-07-28  2:19 ` [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly Al Viro
2021-07-28  4:58 ` Wang Yugui
2021-07-28  6:04   ` Wang Yugui
2021-07-28  7:01     ` NeilBrown
2021-07-28 12:26       ` Neal Gompa
2021-07-28 19:14         ` J. Bruce Fields
2021-07-29  1:29           ` Zygo Blaxell
2021-07-29  1:43             ` NeilBrown
2021-07-29 23:20               ` Zygo Blaxell
2021-07-28 22:50         ` NeilBrown
2021-07-29  2:37           ` Zygo Blaxell
2021-07-29  3:36             ` NeilBrown
2021-07-29 23:20               ` Zygo Blaxell
2021-07-30  2:36                 ` NeilBrown
2021-07-30  5:25                   ` Qu Wenruo [this message]
2021-07-30  5:31                     ` Qu Wenruo
2021-07-30  5:53                       ` Amir Goldstein
2021-07-30  6:00                       ` NeilBrown
2021-07-30  6:09                         ` Qu Wenruo
2021-07-30  5:58                     ` NeilBrown
2021-07-30  6:23                       ` Qu Wenruo
2021-07-30  6:53                         ` NeilBrown
2021-07-30  7:09                           ` Qu Wenruo
2021-07-30 18:15                             ` Zygo Blaxell
2021-07-30 15:17                         ` J. Bruce Fields
2021-07-30 15:48                           ` Josef Bacik
2021-07-30 16:25                             ` Forza
2021-07-30 17:43                             ` Zygo Blaxell
2021-07-30  5:28                   ` Amir Goldstein
2021-07-28 13:43       ` g.btrfs
2021-07-29  1:39         ` NeilBrown
2021-07-29  9:28           ` Graham Cobb
2021-07-28  7:06   ` NeilBrown
2021-07-28  9:36     ` Wang Yugui
2021-07-28 19:35 ` J. Bruce Fields
2021-07-28 21:30   ` Josef Bacik
2021-07-30  0:13     ` Al Viro
2021-07-30  6:08       ` NeilBrown
2021-08-13  1:45 ` [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export NeilBrown
2021-08-13 14:55   ` Josef Bacik
2021-08-15  7:39   ` Goffredo Baroncelli
2021-08-15 19:35     ` Roman Mamedov
2021-08-15 21:03       ` Goffredo Baroncelli
2021-08-15 21:53         ` NeilBrown
2021-08-17 19:34           ` Goffredo Baroncelli
2021-08-17 21:39             ` NeilBrown
2021-08-18 17:24               ` Goffredo Baroncelli
2021-08-15 22:17       ` NeilBrown
2021-08-19  8:01         ` Amir Goldstein
2021-08-20  3:21           ` NeilBrown
2021-08-20  6:23             ` Amir Goldstein
2021-08-23  4:05         ` [PATCH v2] BTRFS/NFSD: " NeilBrown
2021-08-23  8:17           ` kernel test robot
2021-08-23  8:17             ` kernel test robot
2021-08-18 14:54   ` [PATCH] VFS/BTRFS/NFSD: " Wang Yugui
2021-08-18 21:46     ` NeilBrown
2021-08-19  2:19       ` Zygo Blaxell
2021-08-20  2:54         ` NeilBrown
2021-08-22 19:29           ` Zygo Blaxell
2021-08-23  5:51             ` NeilBrown
2021-08-23 23:22             ` NeilBrown
2021-08-25  2:06               ` Zygo Blaxell
2021-08-23  0:57         ` Wang Yugui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=341403c0-a7a7-f6c8-5ef6-2d966b1907a8@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=bfields@fieldses.org \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=chuck.lever@oracle.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=hch@infradead.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=ngompa13@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.