All of lore.kernel.org
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@libero.it>
To: Roman Mamedov <rm@romanrm.net>, NeilBrown <neilb@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>,
	Josef Bacik <josef@toxicpanda.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Chuck Lever <chuck.lever@oracle.com>, Chris Mason <clm@fb.com>,
	David Sterba <dsterba@suse.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
Date: Sun, 15 Aug 2021 23:03:19 +0200	[thread overview]
Message-ID: <ee167ffe-ad11-ea95-1bd5-c43f273b345a@libero.it> (raw)
In-Reply-To: <20210816003505.7b3e9861@natsu>

On 8/15/21 9:35 PM, Roman Mamedov wrote:
> On Sun, 15 Aug 2021 09:39:08 +0200
> Goffredo Baroncelli <kreijack@libero.it> wrote:
> 
>> I am sure that it was discussed already but I was unable to find any track
>> of this discussion. But if the problem is the collision between the inode
>> number of different subvolume in the nfd export, is it simpler if the export
>> is truncated to the subvolume boundary ? It would be more coherent with the
>> current behavior of vfs+nfsd.
> 
> See this bugreport thread which started it all:
> https://www.spinics.net/lists/linux-btrfs/msg111172.html
> 
> In there the reporting user replied that it is strongly not feasible for them
> to export each individual snapshot.

Thanks for pointing that.

However looking at the 'exports' man page, it seems that NFS has already an
option to cover these cases: 'crossmnt'.

If NFSd detects a "child" filesystem (i.e. a filesystem mounted inside an already
exported one) and the "parent" filesystem is marked as 'crossmnt',  the client mount
the parent AND the child filesystem with two separate mounts, so there is not problem of inode collision.

I tested it mounting two nested ext4 filesystem, and there isn't any problem of inode collision
(even if there are two different files with the same inode number).

---------
# mount -o loop disk2 test3/
# echo 123 >test3/one
# mkdir test3/test4
# sudo mount -o loop disk3 test3/test4/
# echo 123 >test3/test4/one
# ls -liR test3/
test3/:
total 24
11 drwx------ 2 root  root  16384 Aug 15 22:27 lost+found
12 -rw-r--r-- 1 ghigo ghigo     4 Aug 15 22:29 one
  2 drwxr-xrwx 3 root  root   4096 Aug 15 22:46 test4

test3/test4:
total 20
11 drwx------ 2 root  root  16384 Aug 15 22:45 lost+found
12 -rw-r--r-- 1 ghigo ghigo     4 Aug 15 22:46 one

# egrep test3 /etc/exports
/tmp/test3 *(rw,no_subtree_check,crossmnt)

# mount 192.168.1.27:/tmp/test3 3
ls -lRi 3
3:
total 24
11 drwx------ 2 root  root  16384 Aug 15 22:27 lost+found
12 -rw-r--r-- 1 ghigo ghigo     4 Aug 15 22:29 one
  2 drwxr-xrwx 3 root  root   4096 Aug 15 22:46 test4

3/test4:
total 20
11 drwx------ 2 root  root  16384 Aug 15 22:45 lost+found
12 -rw-r--r-- 1 ghigo ghigo     4 Aug 15 22:46 one

# mount | egrep 192
192.168.1.27:/tmp/test3 on /tmp/3 type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.27,local_lock=none,addr=192.168.1.27)
192.168.1.27:/tmp/test3/test4 on /tmp/3/test4 type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.27,local_lock=none,addr=192.168.1.27)


---------------

I tried to mount even with "nfsvers=3", and it seems to work.

However the tests above are related to ext4; in fact it doesn't work with btrfs, but I think this is more a implementation problem than a strategy problem.
What I means is that NFS already has a way to mount different parts of the fs-tree with different mounts (form a client POV). I think that this strategy should be used when NFSd exports a BTRFS filesystem:
- if the 'crossmnt' is NOT Passed, the export should ends at the subvolume boundary (or allow inode collision)
- if the 'crossmnt' is passed, the client should automatically mounts each nested subvolume as a separate mount

> 
>> In fact in btrfs a subvolume is a complete filesystem, with an "own
>> synthetic" device. We could like or not this solution, but this solution is
>> the more aligned to the unix standard, where for each filesystem there is a
>> pair (device, inode-set). NFS (by default) avoids to cross the boundary
>> between the filesystems. So why in BTRFS this should be different ?
> 
>  From the user point of view subvolumes are basically directories; that they
> are "complete filesystems"* is merely a low-level implementation detail.
> 
> * well except they are not, as you cannot 'dd' a subvolume to another
> blockdevice.
> 
>> Why don't rename "ino_uniquifier" as "ino_and_subvolume" and leave to the
>> filesystem the work to combine the inode and the subvolume-id ?
>>
>> I am worried that the logic is split between the filesystem, which
>> synthesizes the ino_uniquifier, and to NFS which combine to the inode. I am
>> thinking that this combination is filesystem specific; for BTRFS is a simple
>> xor but for other filesystem may be a more complex operation, so leaving an
>> half in the filesystem and another half to the NFS seems to not optimal if
>> other filesystem needs to use ino_uniquifier.
> 
> I wondered a bit myself, what are the downsides of just doing the
> uniquefication inside Btrfs, not leaving that to NFSD?
> 
> I mean not even adding the extra stat field, just return the inode itself with
> that already applied. Surely cannot be any worse collision-wise, than
> different subvolumes straight up having the same inode numbers as right now?
> 
> Or is it a performance concern, always doing more work, for something which
> only NFSD has needed so far.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  reply	other threads:[~2021-08-15 21:03 UTC|newest]

Thread overview: 127+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
2021-07-27 22:37 ` [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points NeilBrown
2021-07-28 10:13   ` Amir Goldstein
2021-07-29  0:28     ` NeilBrown
2021-07-29  5:27       ` Amir Goldstein
2021-08-06  7:52         ` Miklos Szeredi
2021-08-06  8:08           ` Amir Goldstein
2021-08-06  8:18             ` Miklos Szeredi
2021-07-28 19:17   ` J. Bruce Fields
2021-07-28 22:25     ` NeilBrown
2021-07-27 22:37 ` [PATCH 04/11] VFS: export lookup_mnt() NeilBrown
2021-07-30  0:31   ` Al Viro
2021-07-30  5:33     ` NeilBrown
2021-07-27 22:37 ` [PATCH 01/11] VFS: show correct dev num in mountinfo NeilBrown
2021-07-30  0:25   ` Al Viro
2021-07-30  5:28     ` NeilBrown
2021-07-30  5:54       ` Miklos Szeredi
2021-07-30  6:13         ` NeilBrown
2021-07-30  7:18           ` Miklos Szeredi
2021-07-30  7:33             ` NeilBrown
2021-07-30  7:59               ` Miklos Szeredi
2021-08-02  4:18                 ` A Third perspective on BTRFS nfsd subvol dev/inode number issues NeilBrown
2021-08-02  5:25                   ` Al Viro
2021-08-02  5:40                     ` NeilBrown
2021-08-02  7:54                       ` Amir Goldstein
2021-08-02 13:53                         ` Josef Bacik
2021-08-03 22:29                           ` Qu Wenruo
2021-08-02 14:47                         ` Frank Filz
2021-08-02 21:24                         ` NeilBrown
2021-08-02  7:15                   ` Martin Steigerwald
2021-08-02 21:40                     ` NeilBrown
2021-08-02 12:39                   ` J. Bruce Fields
2021-08-02 20:32                     ` Patrick Goetz
2021-08-02 20:41                       ` J. Bruce Fields
2021-08-02 21:10                     ` NeilBrown
2021-08-02 21:50                       ` J. Bruce Fields
2021-08-02 21:59                         ` NeilBrown
2021-08-02 22:14                           ` J. Bruce Fields
2021-08-02 22:36                             ` NeilBrown
2021-08-03  0:15                               ` J. Bruce Fields
2021-07-27 22:37 ` [PATCH 03/11] VFS: pass lookup_flags into follow_down() NeilBrown
2021-07-27 22:37 ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots NeilBrown
2021-07-28  8:37   ` kernel test robot
2021-07-28  8:37     ` kernel test robot
2021-07-28  8:37   ` [RFC PATCH] btrfs: btrfs_mountpoint_expiry_timeout can be static kernel test robot
2021-07-28  8:37     ` kernel test robot
2021-07-28 13:12   ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots Christian Brauner
2021-07-29  0:43     ` NeilBrown
2021-07-29 14:38       ` Christian Brauner
2021-07-31  6:25   ` [btrfs] 5874902268: xfstests.btrfs.202.fail kernel test robot
2021-07-31  6:25     ` kernel test robot
2021-07-27 22:37 ` [PATCH 06/11] nfsd: include a vfsmount in struct svc_fh NeilBrown
2021-07-27 22:37 ` [PATCH 10/11] btrfs: introduce mapping function from location to inum NeilBrown
2021-07-27 22:37 ` [PATCH 02/11] VFS: allow d_automount to create in-place bind-mount NeilBrown
2021-07-27 22:37 ` [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points NeilBrown
2021-07-28 19:15   ` J. Bruce Fields
2021-07-28 22:29     ` NeilBrown
2021-07-30  0:42   ` Al Viro
2021-07-30  5:43     ` NeilBrown
2021-07-27 22:37 ` [PATCH 08/11] nfsd: change get_parent_attributes() to nfsd_get_mounted_on() NeilBrown
2021-07-27 22:37 ` [PATCH 05/11] VFS: new function: mount_is_internal() NeilBrown
2021-07-28  2:16   ` Al Viro
2021-07-28  3:32     ` NeilBrown
2021-07-30  0:34       ` Al Viro
2021-07-28  2:19 ` [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly Al Viro
2021-07-28  4:58 ` Wang Yugui
2021-07-28  6:04   ` Wang Yugui
2021-07-28  7:01     ` NeilBrown
2021-07-28 12:26       ` Neal Gompa
2021-07-28 19:14         ` J. Bruce Fields
2021-07-29  1:29           ` Zygo Blaxell
2021-07-29  1:43             ` NeilBrown
2021-07-29 23:20               ` Zygo Blaxell
2021-07-28 22:50         ` NeilBrown
2021-07-29  2:37           ` Zygo Blaxell
2021-07-29  3:36             ` NeilBrown
2021-07-29 23:20               ` Zygo Blaxell
2021-07-30  2:36                 ` NeilBrown
2021-07-30  5:25                   ` Qu Wenruo
2021-07-30  5:31                     ` Qu Wenruo
2021-07-30  5:53                       ` Amir Goldstein
2021-07-30  6:00                       ` NeilBrown
2021-07-30  6:09                         ` Qu Wenruo
2021-07-30  5:58                     ` NeilBrown
2021-07-30  6:23                       ` Qu Wenruo
2021-07-30  6:53                         ` NeilBrown
2021-07-30  7:09                           ` Qu Wenruo
2021-07-30 18:15                             ` Zygo Blaxell
2021-07-30 15:17                         ` J. Bruce Fields
2021-07-30 15:48                           ` Josef Bacik
2021-07-30 16:25                             ` Forza
2021-07-30 17:43                             ` Zygo Blaxell
2021-07-30  5:28                   ` Amir Goldstein
2021-07-28 13:43       ` g.btrfs
2021-07-29  1:39         ` NeilBrown
2021-07-29  9:28           ` Graham Cobb
2021-07-28  7:06   ` NeilBrown
2021-07-28  9:36     ` Wang Yugui
2021-07-28 19:35 ` J. Bruce Fields
2021-07-28 21:30   ` Josef Bacik
2021-07-30  0:13     ` Al Viro
2021-07-30  6:08       ` NeilBrown
2021-08-13  1:45 ` [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export NeilBrown
2021-08-13 14:55   ` Josef Bacik
2021-08-15  7:39   ` Goffredo Baroncelli
2021-08-15 19:35     ` Roman Mamedov
2021-08-15 21:03       ` Goffredo Baroncelli [this message]
2021-08-15 21:53         ` NeilBrown
2021-08-17 19:34           ` Goffredo Baroncelli
2021-08-17 21:39             ` NeilBrown
2021-08-18 17:24               ` Goffredo Baroncelli
2021-08-15 22:17       ` NeilBrown
2021-08-19  8:01         ` Amir Goldstein
2021-08-20  3:21           ` NeilBrown
2021-08-20  6:23             ` Amir Goldstein
2021-08-23  4:05         ` [PATCH v2] BTRFS/NFSD: " NeilBrown
2021-08-23  8:17           ` kernel test robot
2021-08-23  8:17             ` kernel test robot
2021-08-18 14:54   ` [PATCH] VFS/BTRFS/NFSD: " Wang Yugui
2021-08-18 21:46     ` NeilBrown
2021-08-19  2:19       ` Zygo Blaxell
2021-08-20  2:54         ` NeilBrown
2021-08-22 19:29           ` Zygo Blaxell
2021-08-23  5:51             ` NeilBrown
2021-08-23 23:22             ` NeilBrown
2021-08-25  2:06               ` Zygo Blaxell
2021-08-23  0:57         ` Wang Yugui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee167ffe-ad11-ea95-1bd5-c43f273b345a@libero.it \
    --to=kreijack@libero.it \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=hch@infradead.org \
    --cc=josef@toxicpanda.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=rm@romanrm.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.