* nfs subvolume access? @ 2021-03-10 7:46 ` Ulli Horlacher 2021-03-10 7:59 ` Hugo Mills ` (3 more replies) 0 siblings, 4 replies; 56+ messages in thread From: Ulli Horlacher @ 2021-03-10 7:46 UTC (permalink / raw) To: linux-btrfs When I try to access a btrfs filesystem via nfs, I get the error: root@tsmsrvi:~# mount tsmsrvj:/data/fex /nfs/tsmsrvj/fex root@tsmsrvi:~# time find /nfs/tsmsrvj/fex | wc -l find: File system loop detected; '/nfs/tsmsrvj/fex/spool' is part of the same file system loop as '/nfs/tsmsrvj/fex'. 1 root@tsmsrvi:~# On tsmsrvj I have in /etc/exports: /data/fex tsmsrvi(rw,async,no_subtree_check,no_root_squash) This is a btrfs subvolume with snapshots: root@tsmsrvj:~# btrfs subvolume list /data ID 257 gen 35 top level 5 path fex ID 270 gen 36 top level 257 path fex/spool ID 271 gen 21 top level 270 path fex/spool/.snapshot/2021-03-07_1453.test ID 272 gen 23 top level 270 path fex/spool/.snapshot/2021-03-07_1531.test ID 273 gen 25 top level 270 path fex/spool/.snapshot/2021-03-07_1532.test ID 274 gen 27 top level 270 path fex/spool/.snapshot/2021-03-07_1718.test root@tsmsrvj:~# find /data/fex | wc -l 489887 root@tsmsrvj:~# What must I add to /etc/exports to enable subvolume access for the nfs client? tsmsrvi and tsmsrvj (nfs client and server) both run Ubuntu 20.04 with btrfs-progs v5.4.1 -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<20210310074620.GA2158@tik.uni-stuttgart.de> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: nfs subvolume access? 2021-03-10 7:46 ` nfs subvolume access? Ulli Horlacher @ 2021-03-10 7:59 ` Hugo Mills 2021-03-10 8:09 ` Ulli Horlacher 2021-03-10 8:17 ` Ulli Horlacher ` (2 subsequent siblings) 3 siblings, 1 reply; 56+ messages in thread From: Hugo Mills @ 2021-03-10 7:59 UTC (permalink / raw) To: linux-btrfs On Wed, Mar 10, 2021 at 08:46:20AM +0100, Ulli Horlacher wrote: > When I try to access a btrfs filesystem via nfs, I get the error: > > root@tsmsrvi:~# mount tsmsrvj:/data/fex /nfs/tsmsrvj/fex > root@tsmsrvi:~# time find /nfs/tsmsrvj/fex | wc -l > find: File system loop detected; '/nfs/tsmsrvj/fex/spool' is part of the same file system loop as '/nfs/tsmsrvj/fex'. > 1 > root@tsmsrvi:~# > > > > On tsmsrvj I have in /etc/exports: > > /data/fex tsmsrvi(rw,async,no_subtree_check,no_root_squash) > > This is a btrfs subvolume with snapshots: > > root@tsmsrvj:~# btrfs subvolume list /data > ID 257 gen 35 top level 5 path fex > ID 270 gen 36 top level 257 path fex/spool > ID 271 gen 21 top level 270 path fex/spool/.snapshot/2021-03-07_1453.test > ID 272 gen 23 top level 270 path fex/spool/.snapshot/2021-03-07_1531.test > ID 273 gen 25 top level 270 path fex/spool/.snapshot/2021-03-07_1532.test > ID 274 gen 27 top level 270 path fex/spool/.snapshot/2021-03-07_1718.test > > root@tsmsrvj:~# find /data/fex | wc -l > 489887 > root@tsmsrvj:~# > > What must I add to /etc/exports to enable subvolume access for the nfs > client? > > tsmsrvi and tsmsrvj (nfs client and server) both run Ubuntu 20.04 with > btrfs-progs v5.4.1 I can't remember if this is why, but I've had to put a distinct fsid field in each separate subvolume being exported: /srv/nfs/home -rw,async,fsid=0x1730,no_subtree_check,no_root_squash It doesn't matter what value you use, as long as each one's different. Hugo. -- Hugo Mills | Alert status mauve ocelot: Slight chance of hugo@... carfax.org.uk | brimstone. Be prepared to make a nice cup of tea. http://carfax.org.uk/ | PGP: E2AB1DE4 | ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: nfs subvolume access? 2021-03-10 7:59 ` Hugo Mills @ 2021-03-10 8:09 ` Ulli Horlacher 2021-03-10 9:35 ` Graham Cobb 0 siblings, 1 reply; 56+ messages in thread From: Ulli Horlacher @ 2021-03-10 8:09 UTC (permalink / raw) To: linux-btrfs On Wed 2021-03-10 (07:59), Hugo Mills wrote: > > On tsmsrvj I have in /etc/exports: > > > > /data/fex tsmsrvi(rw,async,no_subtree_check,no_root_squash) > > > > This is a btrfs subvolume with snapshots: > > > > root@tsmsrvj:~# btrfs subvolume list /data > > ID 257 gen 35 top level 5 path fex > > ID 270 gen 36 top level 257 path fex/spool > > ID 271 gen 21 top level 270 path fex/spool/.snapshot/2021-03-07_1453.test > > ID 272 gen 23 top level 270 path fex/spool/.snapshot/2021-03-07_1531.test > > ID 273 gen 25 top level 270 path fex/spool/.snapshot/2021-03-07_1532.test > > ID 274 gen 27 top level 270 path fex/spool/.snapshot/2021-03-07_1718.test > > > > root@tsmsrvj:~# find /data/fex | wc -l > > 489887 > I can't remember if this is why, but I've had to put a distinct > fsid field in each separate subvolume being exported: > > /srv/nfs/home -rw,async,fsid=0x1730,no_subtree_check,no_root_squash I must export EACH subvolume?! The snapshots are generated automatically (via cron)! I cannot add them to /etc/exports -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<20210310075957.GG22502@savella.carfax.org.uk> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Re: nfs subvolume access? 2021-03-10 8:09 ` Ulli Horlacher @ 2021-03-10 9:35 ` Graham Cobb 2021-03-10 15:55 ` Ulli Horlacher 0 siblings, 1 reply; 56+ messages in thread From: Graham Cobb @ 2021-03-10 9:35 UTC (permalink / raw) To: linux-btrfs On 10/03/2021 08:09, Ulli Horlacher wrote: > On Wed 2021-03-10 (07:59), Hugo Mills wrote: > >>> On tsmsrvj I have in /etc/exports: >>> >>> /data/fex tsmsrvi(rw,async,no_subtree_check,no_root_squash) >>> >>> This is a btrfs subvolume with snapshots: >>> >>> root@tsmsrvj:~# btrfs subvolume list /data >>> ID 257 gen 35 top level 5 path fex >>> ID 270 gen 36 top level 257 path fex/spool >>> ID 271 gen 21 top level 270 path fex/spool/.snapshot/2021-03-07_1453.test >>> ID 272 gen 23 top level 270 path fex/spool/.snapshot/2021-03-07_1531.test >>> ID 273 gen 25 top level 270 path fex/spool/.snapshot/2021-03-07_1532.test >>> ID 274 gen 27 top level 270 path fex/spool/.snapshot/2021-03-07_1718.test >>> >>> root@tsmsrvj:~# find /data/fex | wc -l >>> 489887 > >> I can't remember if this is why, but I've had to put a distinct >> fsid field in each separate subvolume being exported: >> >> /srv/nfs/home -rw,async,fsid=0x1730,no_subtree_check,no_root_squash > > I must export EACH subvolume?! I have had similar problems. I *think* the current case is that modern NFS, using NFS V4, can cope with the whole disk being accessible without giving each subvolume its own FSID (which I have stopped doing). HOWEVER, I think that find (and anything else which uses fsids and inode numbers) will see subvolumes as having duplicated inodes. > The snapshots are generated automatically (via cron)! > I cannot add them to /etc/exports Well, you could write some scripts... but I don't think it is necessary. I *think* it is only necessary if you want `find` to be able to cross between subvolumes on the NFS mounted disks. However, I am NOT an NFS expert, nor have I done a lot of work on this. I might be wrong. But I do NFS mount my snapshots disk remotely and use it. And I do see occasional complaints from find, but I live with it. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: nfs subvolume access? 2021-03-10 9:35 ` Graham Cobb @ 2021-03-10 15:55 ` Ulli Horlacher 2021-03-10 17:29 ` Forza 0 siblings, 1 reply; 56+ messages in thread From: Ulli Horlacher @ 2021-03-10 15:55 UTC (permalink / raw) To: linux-btrfs On Wed 2021-03-10 (09:35), Graham Cobb wrote: > >>> root@tsmsrvj:~# find /data/fex | wc -l > >>> 489887 > > > >> I can't remember if this is why, but I've had to put a distinct > >> fsid field in each separate subvolume being exported: > >> > >> /srv/nfs/home -rw,async,fsid=0x1730,no_subtree_check,no_root_squash > > > > I must export EACH subvolume?! > > I have had similar problems. I *think* the current case is that modern > NFS, using NFS V4, can cope with the whole disk being accessible without > giving each subvolume its own FSID (which I have stopped doing). I cannot use NFS4 (for several reasons). I must use NFS3 > > The snapshots are generated automatically (via cron)! > > I cannot add them to /etc/exports > > Well, you could write some scripts... but I don't think it is necessary. > I *think* it is only necessary if you want `find` to be able to cross > between subvolumes on the NFS mounted disks. It is not only a find problem: root@fex:/nfs/tsmsrvj/fex# ls -R : spool ls: ./spool: not listing already-listed directory And as I wrote: there is no such problem with Ubuntu 18.04! So, is it a btrfs or a nfs bug? -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<5bded122-8adf-e5e7-dceb-37a3875f1a4b@cobb.uk.net> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: nfs subvolume access? 2021-03-10 15:55 ` Ulli Horlacher @ 2021-03-10 17:29 ` Forza 2021-03-10 17:46 ` Ulli Horlacher 0 siblings, 1 reply; 56+ messages in thread From: Forza @ 2021-03-10 17:29 UTC (permalink / raw) To: Ulli Horlacher, linux-btrfs ---- From: Ulli Horlacher <framstag@rus.uni-stuttgart.de> -- Sent: 2021-03-10 - 16:55 ---- > On Wed 2021-03-10 (09:35), Graham Cobb wrote: > >> >>> root@tsmsrvj:~# find /data/fex | wc -l >> >>> 489887 >> > >> >> I can't remember if this is why, but I've had to put a distinct >> >> fsid field in each separate subvolume being exported: >> >> >> >> /srv/nfs/home -rw,async,fsid=0x1730,no_subtree_check,no_root_squash >> > >> > I must export EACH subvolume?! >> >> I have had similar problems. I *think* the current case is that modern >> NFS, using NFS V4, can cope with the whole disk being accessible without >> giving each subvolume its own FSID (which I have stopped doing). > > I cannot use NFS4 (for several reasons). I must use NFS3 > > >> > The snapshots are generated automatically (via cron)! >> > I cannot add them to /etc/exports >> >> Well, you could write some scripts... but I don't think it is necessary. >> I *think* it is only necessary if you want `find` to be able to cross >> between subvolumes on the NFS mounted disks. > > It is not only a find problem: > > root@fex:/nfs/tsmsrvj/fex# ls -R > : > spool > ls: ./spool: not listing already-listed directory > > > And as I wrote: there is no such problem with Ubuntu 18.04! > So, is it a btrfs or a nfs bug? > > Did you try the fsid on the export? (not separate exports for all subvols) Without it the NFS server tries to enumerate it from the filesystem itself, which can cause weird issues. It is good practice to always use fsid on all exports in any case. At least with NFS4 server on my Ubuntu NFS servers at work, there are no issues with subvols for clients the mount with vers=3 You may want to enable debug logging on your server. https://wiki.tnonline.net/w/Blog/NFS_Server_Logging /Forza ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: nfs subvolume access? 2021-03-10 17:29 ` Forza @ 2021-03-10 17:46 ` Ulli Horlacher 0 siblings, 0 replies; 56+ messages in thread From: Ulli Horlacher @ 2021-03-10 17:46 UTC (permalink / raw) To: linux-btrfs On Wed 2021-03-10 (18:29), Forza wrote: > Did you try the fsid on the export? Yes: root@tsmsrvj:/etc# grep tsm exports /data/fex tsmsrvi(rw,async,no_subtree_check,no_root_squash,fsid=0x0011) root@tsmsrvj:/etc# exportfs -va exporting fex.rus.uni-stuttgart.de:/data/fex exporting tsmsrvi.rus.uni-stuttgart.de:/data/fex root@tsmsrvi:~# umount /nfs/tsmsrvj/fex root@tsmsrvi:~# mount -o nfsvers=3,proto=tcp tsmsrvj:/data/fex /nfs/tsmsrvj/fex root@tsmsrvi:~# find /nfs/tsmsrvj/fex /nfs/tsmsrvj/fex find: File system loop detected; '/nfs/tsmsrvj/fex/spool' is part of the same file system loop as '/nfs/tsmsrvj/fex'. > You may want to enable debug logging on your server. > https://wiki.tnonline.net/w/Blog/NFS_Server_Logging root@tsmsrvj:/etc# rpcdebug -m nfsd all nfsd sock fh export svc proc fileop auth repcache xdr lockd root@tsmsrvj:/var/log# tailf kern.log 2021-03-10 18:45:17 [106259.649850] nfsd_dispatch: vers 3 proc 1 2021-03-10 18:45:17 [106259.649854] nfsd: GETATTR(3) 8: 00010001 00000011 00000000 00000000 00000000 00000000 2021-03-10 18:45:17 [106259.649856] nfsd: fh_verify(8: 00010001 00000011 00000000 00000000 00000000 00000000) 2021-03-10 18:45:17 [106259.650306] nfsd_dispatch: vers 3 proc 4 2021-03-10 18:45:17 [106259.650310] nfsd: ACCESS(3) 8: 00010001 00000011 00000000 00000000 00000000 00000000 0x1f 2021-03-10 18:45:17 [106259.650313] nfsd: fh_verify(8: 00010001 00000011 00000000 00000000 00000000 00000000) 2021-03-10 18:45:17 [106259.650869] nfsd_dispatch: vers 3 proc 17 2021-03-10 18:45:17 [106259.650874] nfsd: READDIR+(3) 8: 00010001 00000011 00000000 00000000 00000000 00000000 32768 bytes at 0 2021-03-10 18:45:17 [106259.650877] nfsd: fh_verify(8: 00010001 00000011 00000000 00000000 00000000 00000000) 2021-03-10 18:45:17 [106259.650883] nfsd: fh_verify(8: 00010001 00000011 00000000 00000000 00000000 00000000) 2021-03-10 18:45:17 [106259.650903] nfsd: fh_compose(exp 00:31/256 /fex, ino=256) 2021-03-10 18:45:17 [106259.650907] nfsd: fh_compose(exp 00:31/256 /, ino=256) 2021-03-10 18:45:17 [106259.651454] nfsd_dispatch: vers 3 proc 3 2021-03-10 18:45:17 [106259.651459] nfsd: LOOKUP(3) 8: 00010001 00000011 00000000 00000000 00000000 00000000 spool 2021-03-10 18:45:17 [106259.651463] nfsd: fh_verify(8: 00010001 00000011 00000000 00000000 00000000 00000000) 2021-03-10 18:45:17 [106259.651471] nfsd: nfsd_lookup(fh 8: 00010001 00000011 00000000 00000000 00000000 00000000, spool) 2021-03-10 18:45:17 [106259.651477] nfsd: fh_compose(exp 00:31/256 fex/spool, ino=256) Hmmm... and now? -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<55bb7f3.9ce44d1.1781d2fedd6@tnonline.net> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: nfs subvolume access? 2021-03-10 7:46 ` nfs subvolume access? Ulli Horlacher 2021-03-10 7:59 ` Hugo Mills @ 2021-03-10 8:17 ` Ulli Horlacher 2021-03-11 7:46 ` Ulli Horlacher [not found] ` <162632387205.13764.6196748476850020429@noble.neil.brown.name> 3 siblings, 0 replies; 56+ messages in thread From: Ulli Horlacher @ 2021-03-10 8:17 UTC (permalink / raw) To: linux-btrfs On Wed 2021-03-10 (08:46), Ulli Horlacher wrote: > When I try to access a btrfs filesystem via nfs, I get the error: > > root@tsmsrvi:~# mount tsmsrvj:/data/fex /nfs/tsmsrvj/fex > root@tsmsrvi:~# time find /nfs/tsmsrvj/fex | wc -l > find: File system loop detected; '/nfs/tsmsrvj/fex/spool' is part of the same file system loop as '/nfs/tsmsrvj/fex'. > 1 > tsmsrvi and tsmsrvj (nfs client and server) both run Ubuntu 20.04 with > btrfs-progs v5.4.1 On Ubuntu 18.04 this setup works without errors: root@mutter:/backup/rsync# grep tandem /etc/exports /backup/rsync/tandem 176.9.135.138(rw,async,no_subtree_check,no_root_squash) root@mutter:/backup/rsync# btrfs subvolume list /backup/rsync | grep tandem ID 257 gen 62652 top level 5 path tandem ID 5898 gen 62284 top level 257 path tandem/.snapshot/2021-03-01_0300.rsync ID 5906 gen 62284 top level 257 path tandem/.snapshot/2021-03-02_0300.rsync ID 5914 gen 62284 top level 257 path tandem/.snapshot/2021-03-03_0300.rsync ID 5924 gen 62284 top level 257 path tandem/.snapshot/2021-03-04_0300.rsync ID 5932 gen 62284 top level 257 path tandem/.snapshot/2021-03-05_0300.rsync ID 5941 gen 62284 top level 257 path tandem/.snapshot/2021-03-06_0300.rsync ID 5950 gen 62284 top level 257 path tandem/.snapshot/2021-03-07_0300.rsync ID 5962 gen 62413 top level 257 path tandem/.snapshot/2021-03-08_0300.rsync ID 5970 gen 62522 top level 257 path tandem/.snapshot/2021-03-09_0300.rsync ID 5978 gen 62626 top level 257 path tandem/.snapshot/2021-03-10_0300.rsync root@mutter:/backup/rsync# btrfs version btrfs-progs v4.15.1 root@tandem:/backup# mount | grep backup mutter:/backup/rsync/tandem on /backup type nfs (ro,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=1,sec=sys,mountaddr=176.9.68.251,mountvers=3,mountport=52943,mountproto=tcp,local_lock=none,addr=176.9.68.251) root@tandem:/backup# ls -l .snapshot/ total 0 drwxr-xr-x 1 root root 392 Mar 1 03:00 2021-03-01_0300.rsync drwxr-xr-x 1 root root 392 Mar 2 03:00 2021-03-02_0300.rsync drwxr-xr-x 1 root root 392 Mar 3 03:00 2021-03-03_0300.rsync drwxr-xr-x 1 root root 392 Mar 4 03:00 2021-03-04_0300.rsync drwxr-xr-x 1 root root 392 Mar 5 03:00 2021-03-05_0300.rsync drwxr-xr-x 1 root root 392 Mar 6 03:00 2021-03-06_0300.rsync drwxr-xr-x 1 root root 392 Mar 7 03:00 2021-03-07_0300.rsync drwxr-xr-x 1 root root 392 Mar 8 03:00 2021-03-08_0300.rsync drwxr-xr-x 1 root root 392 Mar 9 03:00 2021-03-09_0300.rsync drwxr-xr-x 1 root root 392 Mar 10 03:00 2021-03-10_0300.rsync So, it is an issue with the newer btrfs version on Ubuntu 20.04? -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<20210310074620.GA2158@tik.uni-stuttgart.de> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: nfs subvolume access? 2021-03-10 7:46 ` nfs subvolume access? Ulli Horlacher 2021-03-10 7:59 ` Hugo Mills 2021-03-10 8:17 ` Ulli Horlacher @ 2021-03-11 7:46 ` Ulli Horlacher 2021-07-08 22:17 ` cannot use btrfs for nfs server Ulli Horlacher [not found] ` <162632387205.13764.6196748476850020429@noble.neil.brown.name> 3 siblings, 1 reply; 56+ messages in thread From: Ulli Horlacher @ 2021-03-11 7:46 UTC (permalink / raw) To: linux-btrfs On Wed 2021-03-10 (08:46), Ulli Horlacher wrote: > When I try to access a btrfs filesystem via nfs, I get the error: > > root@tsmsrvi:~# mount tsmsrvj:/data/fex /nfs/tsmsrvj/fex > root@tsmsrvi:~# time find /nfs/tsmsrvj/fex | wc -l > find: File system loop detected; '/nfs/tsmsrvj/fex/spool' is part of the same file system loop as '/nfs/tsmsrvj/fex'. It is even worse: root@tsmsrvj:# grep localhost /etc/exports /data/fex localhost(rw,async,no_subtree_check,no_root_squash) root@tsmsrvj:# mount localhost:/data/fex /nfs/localhost/fex root@tsmsrvj:# du -s /data/fex 64282240 /data/fex root@tsmsrvj:# du -s /nfs/localhost/fex du: WARNING: Circular directory structure. This almost certainly means that you have a corrupted file system. NOTIFY YOUR SYSTEM MANAGER. The following directory is part of the cycle: /nfs/localhost/fex/spool 0 /nfs/localhost/fex root@tsmsrvj:# btrfs subvolume list /data ID 257 gen 42 top level 5 path fex ID 270 gen 42 top level 257 path fex/spool ID 271 gen 21 top level 270 path fex/spool/.snapshot/2021-03-07_1453.test ID 272 gen 23 top level 270 path fex/spool/.snapshot/2021-03-07_1531.test ID 273 gen 25 top level 270 path fex/spool/.snapshot/2021-03-07_1532.test ID 274 gen 27 top level 270 path fex/spool/.snapshot/2021-03-07_1718.test root@tsmsrvj:# uname -a Linux tsmsrvj 5.4.0-66-generic #74-Ubuntu SMP Wed Jan 27 22:54:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux root@tsmsrvj:# btrfs version btrfs-progs v5.4.1 root@tsmsrvj:# dpkg -l | grep nfs- ii nfs-common 1:1.3.4-2.5ubuntu3.3 amd64 NFS support files common to client and server ii nfs-kernel-server 1:1.3.4-2.5ubuntu3.3 amd64 support for NFS kernel server The same bug appears if nfs server and client are different hosts or the client is an older Ubuntu 18.04 system. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<20210310074620.GA2158@tik.uni-stuttgart.de> ^ permalink raw reply [flat|nested] 56+ messages in thread
* cannot use btrfs for nfs server 2021-03-11 7:46 ` Ulli Horlacher @ 2021-07-08 22:17 ` Ulli Horlacher 2021-07-09 0:05 ` Graham Cobb 2021-07-09 16:06 ` Lord Vader 0 siblings, 2 replies; 56+ messages in thread From: Ulli Horlacher @ 2021-07-08 22:17 UTC (permalink / raw) To: linux-btrfs I have waited some time and some Ubuntu updates, but the bug is still there: On Thu 2021-03-11 (08:46), Ulli Horlacher wrote: > On Wed 2021-03-10 (08:46), Ulli Horlacher wrote: > > > When I try to access a btrfs filesystem via nfs, I get the error: > > > > root@tsmsrvi:~# mount tsmsrvj:/data/fex /nfs/tsmsrvj/fex > > root@tsmsrvi:~# time find /nfs/tsmsrvj/fex | wc -l > > find: File system loop detected; '/nfs/tsmsrvj/fex/spool' is part of the same file system loop as '/nfs/tsmsrvj/fex'. > > It is even worse: > > root@tsmsrvj:# grep localhost /etc/exports > /data/fex localhost(rw,async,no_subtree_check,no_root_squash) > > root@tsmsrvj:# mount localhost:/data/fex /nfs/localhost/fex > > root@tsmsrvj:# du -s /data/fex > 64282240 /data/fex > > root@tsmsrvj:# du -s /nfs/localhost/fex > du: WARNING: Circular directory structure. > This almost certainly means that you have a corrupted file system. > NOTIFY YOUR SYSTEM MANAGER. > The following directory is part of the cycle: > /nfs/localhost/fex/spool > > 0 /nfs/localhost/fex > > root@tsmsrvj:# btrfs subvolume list /data > ID 257 gen 42 top level 5 path fex > ID 270 gen 42 top level 257 path fex/spool > ID 271 gen 21 top level 270 path fex/spool/.snapshot/2021-03-07_1453.test > ID 272 gen 23 top level 270 path fex/spool/.snapshot/2021-03-07_1531.test > ID 273 gen 25 top level 270 path fex/spool/.snapshot/2021-03-07_1532.test > ID 274 gen 27 top level 270 path fex/spool/.snapshot/2021-03-07_1718.test root@tsmsrvj:~# uname -a Linux tsmsrvj 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux root@tsmsrvj:~# btrfs version btrfs-progs v5.4.1 root@tsmsrvj:~# dpkg -l | grep nfs- ii nfs-common 1:1.3.4-2.5ubuntu3.4 amd64 NFS support files common to client and server ii nfs-kernel-server 1:1.3.4-2.5ubuntu3.4 amd64 support for NFS kernel server This makes btrfs with snapshots unusable as a nfs server :-( How/where can I escalate it further? My Ubuntu bug report has been ignored :-( https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1918599 -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<20210311074636.GA28705@tik.uni-stuttgart.de> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-08 22:17 ` cannot use btrfs for nfs server Ulli Horlacher @ 2021-07-09 0:05 ` Graham Cobb 2021-07-09 4:05 ` NeilBrown 2021-07-09 6:53 ` Ulli Horlacher 2021-07-09 16:06 ` Lord Vader 1 sibling, 2 replies; 56+ messages in thread From: Graham Cobb @ 2021-07-09 0:05 UTC (permalink / raw) To: linux-btrfs On 08/07/2021 23:17, Ulli Horlacher wrote: > > I have waited some time and some Ubuntu updates, but the bug is still there: Yes: find and du get confused about seeing inode numbers reused in what they think is a single filesystem. However, the filesystems are not actually corrupted, and all normal file and directory actions work correctly. The loops and cycles are not there - but find and du can't tell that. I use NFS mounts of btrfs disks all the time and have never had any real problem - just find and du confused. You can eliminate the problems by exporting and mounting single subvolumes only - making sure that there are no nested subvolumes exported, or that the subvolumes are all mounted individually. > This makes btrfs with snapshots unusable as a nfs server :-( No, it doesn't. I use it ALL the time: my main data lives on btrfs servers and is exported to the clients. I use tools like btrbk and btrfs-snapshot-aware-rsnapshot on the server and then access those btrfs snapshots from the clients over NFS as well (for example to retrieve accidentally deleted files). You just have to be careful with subvolume structure and what you mount where. And I recommend only using find and du operations on the server, not the client. > How/where can I escalate it further? Try complaining to NFS. It might be that it would work better if NFS assigned different NFS filesystem IDs to each subvolume - I don't know. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-09 0:05 ` Graham Cobb @ 2021-07-09 4:05 ` NeilBrown 2021-07-09 6:53 ` Ulli Horlacher 1 sibling, 0 replies; 56+ messages in thread From: NeilBrown @ 2021-07-09 4:05 UTC (permalink / raw) To: Graham Cobb; +Cc: linux-btrfs On Fri, 09 Jul 2021, Graham Cobb wrote: > > > How/where can I escalate it further? > > Try complaining to NFS. It might be that it would work better if NFS > assigned different NFS filesystem IDs to each subvolume - I don't know. > > Better than complaining...: Apply the patch you can find at https://lore.kernel.org/linux-nfs/162457725899.28671.14092003979067994699@noble.neil.brown.name/T/#mc4752a019af79cbb166d5338d6ed0db141832546 then apply the fix described at https://lore.kernel.org/linux-nfs/162457725899.28671.14092003979067994699@noble.neil.brown.name/T/#mc26984e10e7837e28aca3209fcb03b38a4df6fe7 which I think is shown in more detail in a subsequent message in the thread. Then confirm for yourself that it works. Then reply to that thread (or send a new message to linux-nfs) saying something like: Hi, I've been having problems with NFS and btrfs too. I found this patch and it works really well for me. Any chance we can get it included upstream? That might spur us on to further action - enthusiasm is much better than complaints :-) (the problem is not that NFS doesn't assign different filesystem IDs, the problem is that NFSd doesn't tell NFS that there are different volumes). NeilBrown ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-09 0:05 ` Graham Cobb 2021-07-09 4:05 ` NeilBrown @ 2021-07-09 6:53 ` Ulli Horlacher 2021-07-09 7:23 ` Forza 2021-07-09 16:35 ` Chris Murphy 1 sibling, 2 replies; 56+ messages in thread From: Ulli Horlacher @ 2021-07-09 6:53 UTC (permalink / raw) To: linux-btrfs On Fri 2021-07-09 (01:05), Graham Cobb wrote: > On 08/07/2021 23:17, Ulli Horlacher wrote: > > > > > I have waited some time and some Ubuntu updates, but the bug is still there: > > Yes: find and du get confused about seeing inode numbers reused in what > they think is a single filesystem. A lot of tools aren't working correctly any more, even ls: root@tsmsrvj:~# ls -R /nfs/localhost/fex | wc ls: /nfs/localhost/fex/spool: not listing already-listed directory In consequence many cronjobs and montoring tools will fail :-( > You can eliminate the problems by exporting and mounting single > subvolumes only This is not possible at our site, we use rotating snapshots created by a cronjob. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<56c40592-0937-060a-5f8a-969d8a88d541@cobb.uk.net> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-09 6:53 ` Ulli Horlacher @ 2021-07-09 7:23 ` Forza 2021-07-09 7:24 ` Hugo Mills 2021-07-09 7:34 ` Ulli Horlacher 2021-07-09 16:35 ` Chris Murphy 1 sibling, 2 replies; 56+ messages in thread From: Forza @ 2021-07-09 7:23 UTC (permalink / raw) To: Ulli Horlacher, linux-btrfs Hello everyone, ---- From: Ulli Horlacher <framstag@rus.uni-stuttgart.de> -- Sent: 2021-07-09 - 08:53 ---- > On Fri 2021-07-09 (01:05), Graham Cobb wrote: >> On 08/07/2021 23:17, Ulli Horlacher wrote: >> >> > >> > I have waited some time and some Ubuntu updates, but the bug is still there: >> >> Yes: find and du get confused about seeing inode numbers reused in what >> they think is a single filesystem. > > A lot of tools aren't working correctly any more, even ls: > > root@tsmsrvj:~# ls -R /nfs/localhost/fex | wc > ls: /nfs/localhost/fex/spool: not listing already-listed directory > > In consequence many cronjobs and montoring tools will fail :-( > > >> You can eliminate the problems by exporting and mounting single >> subvolumes only > > This is not possible at our site, we use rotating snapshots created by a > cronjob. > > Have you tried using the fsid= export option in /etc/exports? Example: /media/nfs/ 192.168.0.*(fsid=20000001,rw,sync,no_subtree_check,no_root_squash) We're using this with Btrfs subvols without issues. We use NFSv4 so I do not know how this works with NFSv3. Example: ## On the Ubuntu NFS server: # btrfs sub list -o . ID 5384 gen 345641 top level 258 path volume/nfs_ssd/132bbc3e-aed1-15a5-f30d-9515e490e62c/subvol1 ID 5385 gen 345640 top level 258 path volume/nfs_ssd/132bbc3e-aed1-15a5-f30d-9515e490e62c/subvol2 ## On the NFS client: [09:20 srv01 132bbc3e-aed1-15a5-f30d-9515e490e62c]# ll total 0 drwxr-xr-x 1 root root 6 Jul 9 09:17 subvol1 drwxr-xr-x 1 root root 0 Jul 9 09:17 subvol2 [09:20 srv01 132bbc3e-aed1-15a5-f30d-9515e490e62c]# touch subvol1/foo [09:20 srv01 132bbc3e-aed1-15a5-f30d-9515e490e62c]# touch subvol2/bar [09:20 srv01 132bbc3e-aed1-15a5-f30d-9515e490e62c]# touch foobar [09:20 srv01 132bbc3e-aed1-15a5-f30d-9515e490e62c]# ll -R .: total 0 -rw-r--r-- 1 root root 0 Jul 9 09:20 foobar drwxr-xr-x 1 root root 12 Jul 9 09:20 subvol1 drwxr-xr-x 1 root root 6 Jul 9 09:20 subvol2 ./subvol1: total 0 -rw-r--r-- 1 root root 0 Jul 9 09:17 bar -rw-r--r-- 1 root root 0 Jul 9 09:20 foo ./subvol2: total 0 -rw-r--r-- 1 root root 0 Jul 9 09:20 bar ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-09 7:23 ` Forza @ 2021-07-09 7:24 ` Hugo Mills 2021-07-09 7:34 ` Ulli Horlacher 1 sibling, 0 replies; 56+ messages in thread From: Hugo Mills @ 2021-07-09 7:24 UTC (permalink / raw) To: Forza; +Cc: Ulli Horlacher, linux-btrfs I'm using it on NFSv3 and it works fine for me. Hugo. On Fri, Jul 09, 2021 at 09:23:14AM +0200, Forza wrote: > Hello everyone, > > ---- From: Ulli Horlacher <framstag@rus.uni-stuttgart.de> -- Sent: 2021-07-09 - 08:53 ---- > > > On Fri 2021-07-09 (01:05), Graham Cobb wrote: > >> On 08/07/2021 23:17, Ulli Horlacher wrote: > >> > >> > > >> > I have waited some time and some Ubuntu updates, but the bug is still there: > >> > >> Yes: find and du get confused about seeing inode numbers reused in what > >> they think is a single filesystem. > > > > A lot of tools aren't working correctly any more, even ls: > > > > root@tsmsrvj:~# ls -R /nfs/localhost/fex | wc > > ls: /nfs/localhost/fex/spool: not listing already-listed directory > > > > In consequence many cronjobs and montoring tools will fail :-( > > > > > >> You can eliminate the problems by exporting and mounting single > >> subvolumes only > > > > This is not possible at our site, we use rotating snapshots created by a > > cronjob. > > > > > > Have you tried using the fsid= export option in /etc/exports? > > Example: > /media/nfs/ 192.168.0.*(fsid=20000001,rw,sync,no_subtree_check,no_root_squash) > > We're using this with Btrfs subvols without issues. We use NFSv4 so I do not know how this works with NFSv3. > > Example: > ## On the Ubuntu NFS server: > # btrfs sub list -o . > ID 5384 gen 345641 top level 258 path volume/nfs_ssd/132bbc3e-aed1-15a5-f30d-9515e490e62c/subvol1 > ID 5385 gen 345640 top level 258 path volume/nfs_ssd/132bbc3e-aed1-15a5-f30d-9515e490e62c/subvol2 > > ## On the NFS client: > [09:20 srv01 132bbc3e-aed1-15a5-f30d-9515e490e62c]# ll > total 0 > drwxr-xr-x 1 root root 6 Jul 9 09:17 subvol1 > drwxr-xr-x 1 root root 0 Jul 9 09:17 subvol2 > [09:20 srv01 132bbc3e-aed1-15a5-f30d-9515e490e62c]# touch subvol1/foo > [09:20 srv01 132bbc3e-aed1-15a5-f30d-9515e490e62c]# touch subvol2/bar > [09:20 srv01 132bbc3e-aed1-15a5-f30d-9515e490e62c]# touch foobar > [09:20 srv01 132bbc3e-aed1-15a5-f30d-9515e490e62c]# ll -R > .: > total 0 > -rw-r--r-- 1 root root 0 Jul 9 09:20 foobar > drwxr-xr-x 1 root root 12 Jul 9 09:20 subvol1 > drwxr-xr-x 1 root root 6 Jul 9 09:20 subvol2 > > ./subvol1: > total 0 > -rw-r--r-- 1 root root 0 Jul 9 09:17 bar > -rw-r--r-- 1 root root 0 Jul 9 09:20 foo > > ./subvol2: > total 0 > -rw-r--r-- 1 root root 0 Jul 9 09:20 bar > > > -- Hugo Mills | Modern medicine does not treat causes: headaches are hugo@... carfax.org.uk | not caused by a paracetamol deficiency. http://carfax.org.uk/ | PGP: E2AB1DE4 | ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-09 7:23 ` Forza 2021-07-09 7:24 ` Hugo Mills @ 2021-07-09 7:34 ` Ulli Horlacher 2021-07-09 16:30 ` Chris Murphy 1 sibling, 1 reply; 56+ messages in thread From: Ulli Horlacher @ 2021-07-09 7:34 UTC (permalink / raw) To: linux-btrfs On Fri 2021-07-09 (09:23), Forza wrote: > > In consequence many cronjobs and montoring tools will fail :-( > > > >> You can eliminate the problems by exporting and mounting single > >> subvolumes only > > > > This is not possible at our site, we use rotating snapshots created by a > > cronjob. > Have you tried using the fsid= export option in /etc/exports? I have testet it with localhost: root@tsmsrvj:/# grep localhost /etc/exports /data/fex localhost(rw,async,no_subtree_check,no_root_squash,fsid=20000001) root@tsmsrvj:/# mount -v localhost:/data/fex /nfs/localhost/fex mount.nfs: timeout set for Fri Jul 9 09:32:55 2021 mount.nfs: trying text-based options 'vers=4.2,addr=127.0.0.1,clientaddr=127.0.0.1' root@tsmsrvj:/# mount | grep localhost localhost:/data/fex on /nfs/localhost/fex type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1) root@tsmsrvj:/# du -s /nfs/localhost/fex du: WARNING: Circular directory structure. This almost certainly means that you have a corrupted file system. NOTIFY YOUR SYSTEM MANAGER. The following directory is part of the cycle: /nfs/localhost/fex/spool -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<475ccf1.ca37f515.17a8a262a72@tnonline.net> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-09 7:34 ` Ulli Horlacher @ 2021-07-09 16:30 ` Chris Murphy 2021-07-10 6:35 ` Ulli Horlacher 0 siblings, 1 reply; 56+ messages in thread From: Chris Murphy @ 2021-07-09 16:30 UTC (permalink / raw) To: Btrfs BTRFS On Fri, Jul 9, 2021 at 1:34 AM Ulli Horlacher <framstag@rus.uni-stuttgart.de> wrote: > > root@tsmsrvj:/# du -s /nfs/localhost/fex > du: WARNING: Circular directory structure. > This almost certainly means that you have a corrupted file system. > NOTIFY YOUR SYSTEM MANAGER. > The following directory is part of the cycle: > /nfs/localhost/fex/spool What do you get for: btrfs subvolume list -to /nfs/localhost/fex -- Chris Murphy ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-09 16:30 ` Chris Murphy @ 2021-07-10 6:35 ` Ulli Horlacher 2021-07-11 11:41 ` Forza 0 siblings, 1 reply; 56+ messages in thread From: Ulli Horlacher @ 2021-07-10 6:35 UTC (permalink / raw) To: Btrfs BTRFS On Fri 2021-07-09 (10:30), Chris Murphy wrote: > On Fri, Jul 9, 2021 at 1:34 AM Ulli Horlacher > <framstag@rus.uni-stuttgart.de> wrote: > > > > > root@tsmsrvj:/# du -s /nfs/localhost/fex > > du: WARNING: Circular directory structure. > > This almost certainly means that you have a corrupted file system. > > NOTIFY YOUR SYSTEM MANAGER. > > The following directory is part of the cycle: > > /nfs/localhost/fex/spool > > What do you get for: > > btrfs subvolume list -to /nfs/localhost/fex root@tsmsrvj:~# btrfs subvolume list -to /nfs/localhost/fex ERROR: not a btrfs filesystem: /nfs/localhost/fex ERROR: can't access '/nfs/localhost/fex' root@tsmsrvj:~# mount | grep localhost localhost:/data/fex on /nfs/localhost/fex type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1) -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<CAJCQCtR=Xar+0pD9ivhk-kfrWxTxbJpVYu3z8A617GKshf2AsA@mail.gmail.com> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-10 6:35 ` Ulli Horlacher @ 2021-07-11 11:41 ` Forza 2021-07-12 7:17 ` Ulli Horlacher 0 siblings, 1 reply; 56+ messages in thread From: Forza @ 2021-07-11 11:41 UTC (permalink / raw) To: Btrfs BTRFS On 2021-07-10 08:35, Ulli Horlacher wrote: > On Fri 2021-07-09 (10:30), Chris Murphy wrote: >> On Fri, Jul 9, 2021 at 1:34 AM Ulli Horlacher >> <framstag@rus.uni-stuttgart.de> wrote: >> >>> >>> root@tsmsrvj:/# du -s /nfs/localhost/fex >>> du: WARNING: Circular directory structure. >>> This almost certainly means that you have a corrupted file system. >>> NOTIFY YOUR SYSTEM MANAGER. >>> The following directory is part of the cycle: >>> /nfs/localhost/fex/spool >> >> What do you get for: >> >> btrfs subvolume list -to /nfs/localhost/fex > > root@tsmsrvj:~# btrfs subvolume list -to /nfs/localhost/fex > ERROR: not a btrfs filesystem: /nfs/localhost/fex > ERROR: can't access '/nfs/localhost/fex' > > > root@tsmsrvj:~# mount | grep localhost > localhost:/data/fex on /nfs/localhost/fex type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1) > > I think you should have done the btrfs filesystem and not nfs mount: btrfs subvolume list -to /data/fex ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-11 11:41 ` Forza @ 2021-07-12 7:17 ` Ulli Horlacher 0 siblings, 0 replies; 56+ messages in thread From: Ulli Horlacher @ 2021-07-12 7:17 UTC (permalink / raw) To: Btrfs BTRFS On Sun 2021-07-11 (13:41), Forza wrote: > btrfs subvolume list -to /data/fex root@tsmsrvj:/# btrfs subvolume list -to /data/fex ID gen top level path -- --- --------- ---- 270 1471 257 fex/spool -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<2fd105cb-c097-63e8-0c43-049dceeb93c9@tnonline.net> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-09 6:53 ` Ulli Horlacher 2021-07-09 7:23 ` Forza @ 2021-07-09 16:35 ` Chris Murphy 2021-07-10 6:56 ` Ulli Horlacher 1 sibling, 1 reply; 56+ messages in thread From: Chris Murphy @ 2021-07-09 16:35 UTC (permalink / raw) To: Btrfs BTRFS On Fri, Jul 9, 2021 at 12:53 AM Ulli Horlacher <framstag@rus.uni-stuttgart.de> wrote: > > On Fri 2021-07-09 (01:05), Graham Cobb wrote: > > You can eliminate the problems by exporting and mounting single > > subvolumes only > > This is not possible at our site, we use rotating snapshots created by a > cronjob. These two things sound orthogonal to me. You can have a: <FS_TREE>/fex which is mounted via fstab using -o subvol=fex /nfs/localhost/fex And you can separately snapshot fex from the top-level, mounted anywhere you want, but I kinda like putting such things in /run/ because then they're not in the way for more routine/interactive locations like /media or /mnt. But I don't really understand your workflow, or what the fstab or subvolume setup looks like. Are you able to share the cron job script, the fstab, and the full subvolume listing? btrfs subvolume list -ta /nfs/localhost/fex ? -- Chris Murphy ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-09 16:35 ` Chris Murphy @ 2021-07-10 6:56 ` Ulli Horlacher 2021-07-10 22:17 ` Chris Murphy 0 siblings, 1 reply; 56+ messages in thread From: Ulli Horlacher @ 2021-07-10 6:56 UTC (permalink / raw) To: Btrfs BTRFS On Fri 2021-07-09 (10:35), Chris Murphy wrote: > But I don't really understand your workflow, or what the fstab or > subvolume setup looks like. Are you able to share the cron job script, > the fstab, and the full subvolume listing? btrfs subvolume list -ta > /nfs/localhost/fex ? /nfs/localhost/fex is just a test setup on a test server. The production server does not use nfs so far, but we plan to migrate from local disks to nfs. But before we do it, btrfs via nfs MUST work without problems and error messages. /nfs/localhost/fex is not in /etc/fstab, I have mounted it manually, as I wrote in my previous mails. It is just a test. root@tsmsrvj:# grep local /etc/exports /data/fex localhost(rw,async,no_subtree_check,no_root_squash,fsid=20000001) root@tsmsrvj:# mount -v localhost:/data/fex /nfs/localhost/fex mount.nfs: timeout set for Sat Jul 10 08:47:57 2021 mount.nfs: trying text-based options 'vers=4.2,addr=127.0.0.1,clientaddr=127.0.0.1' root@tsmsrvj:# snaprotate -v test 5 /data/fex/spool $ btrfs subvolume snapshot -r /data/fex/spool /data/fex/spool/.snapshot/2021-07-10_0849.test Create a readonly snapshot of '/data/fex/spool' in '/data/fex/spool/.snapshot/2021-07-10_0849.test' root@tsmsrvj:# snaprotate -l /data/fex/spool/.snapshot/2021-03-07_1453.test /data/fex/spool/.snapshot/2021-03-07_1531.test /data/fex/spool/.snapshot/2021-03-07_1532.test /data/fex/spool/.snapshot/2021-03-07_1718.test /data/fex/spool/.snapshot/2021-07-10_0849.test root@tsmsrvj:# btrfs subvolume list /data ID 257 gen 1466 top level 5 path fex ID 270 gen 1471 top level 257 path fex/spool ID 271 gen 21 top level 270 path fex/spool/.snapshot/2021-03-07_1453.test ID 272 gen 23 top level 270 path fex/spool/.snapshot/2021-03-07_1531.test ID 273 gen 25 top level 270 path fex/spool/.snapshot/2021-03-07_1532.test ID 274 gen 27 top level 270 path fex/spool/.snapshot/2021-03-07_1718.test ID 394 gen 1470 top level 270 path fex/spool/.snapshot/2021-07-10_0849.test We cannot move the snapshots to a different directory. Our workflow depends on snaprotate: http://fex.belwue.de/linuxtools/snaprotate.html -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<CAJCQCtQvak-28B7eUf5zRnAeGK27qZaF-1ZZt=OAHk+2KmfsWQ@mail.gmail.com> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-10 6:56 ` Ulli Horlacher @ 2021-07-10 22:17 ` Chris Murphy 2021-07-12 7:25 ` Ulli Horlacher 0 siblings, 1 reply; 56+ messages in thread From: Chris Murphy @ 2021-07-10 22:17 UTC (permalink / raw) To: Btrfs BTRFS On Sat, Jul 10, 2021 at 12:56 AM Ulli Horlacher <framstag@rus.uni-stuttgart.de> wrote: > root@tsmsrvj:# snaprotate -v test 5 /data/fex/spool > $ btrfs subvolume snapshot -r /data/fex/spool /data/fex/spool/.snapshot/2021-07-10_0849.test > Create a readonly snapshot of '/data/fex/spool' in '/data/fex/spool/.snapshot/2021-07-10_0849.test' I think this might be the source of the problem. Nested snapshots are not a good idea, it causes various kinds of confusion. It's not any different if you do an LVM snapshot and nest a bind mount of one file system in another. I have no idea how NFS works but it sounds to me it's getting confused when finding the same file system inodes multiple times and that's just what happens with snapshots. Whether Btrfs or some other snapshotting mechanism. > We cannot move the snapshots to a different directory. Our workflow > depends on snaprotate: > > http://fex.belwue.de/linuxtools/snaprotate.html OK does the problem happen if you have no nested snapshots (no nested subvolumes of any kind) in the NFS export path? If the problem doesn't happen, then either the tool you've chosen needs to be enhanced so it will create snapshots somewhere else, which Btrfs supports, or you need to find another tool that can. -- Chris Murphy ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-10 22:17 ` Chris Murphy @ 2021-07-12 7:25 ` Ulli Horlacher 2021-07-12 13:06 ` Graham Cobb 0 siblings, 1 reply; 56+ messages in thread From: Ulli Horlacher @ 2021-07-12 7:25 UTC (permalink / raw) To: Btrfs BTRFS On Sat 2021-07-10 (16:17), Chris Murphy wrote: > On Sat, Jul 10, 2021 at 12:56 AM Ulli Horlacher > <framstag@rus.uni-stuttgart.de> wrote: > > > root@tsmsrvj:# snaprotate -v test 5 /data/fex/spool > > $ btrfs subvolume snapshot -r /data/fex/spool /data/fex/spool/.snapshot/2021-07-10_0849.test > > Create a readonly snapshot of '/data/fex/spool' in '/data/fex/spool/.snapshot/2021-07-10_0849.test' > > I think this might be the source of the problem. Nested snapshots are > not a good idea, it causes various kinds of confusion. I do not have nested snapshots anywhere. /data/fex/spool is not a snapshot. /data/fex/spool/.snapshot/2021-07-10_0849.test is a simple snapshot of the btrfs subvolume /data/fex/spool > > We cannot move the snapshots to a different directory. Our workflow > > depends on snaprotate: > > > > http://fex.belwue.de/linuxtools/snaprotate.html > > OK does the problem happen if you have no nested snapshots (no nested > subvolumes of any kind) in the NFS export path? > > If the problem doesn't happen, then either the tool you've chosen needs > to be enhanced so it will create snapshots somewhere else, which Btrfs > supports, or you need to find another tool that can. Without snapshots there is no problem, but we need access to the snapshots on the nfs clients for backup/recovery like Netapp offers it. But Netapp is EXPENSIVE :-} If we cannot handle it with btrfs, then we have to switch to ZFS. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<CAJCQCtQn0=8KiB=2garN8k2NRd1PO3HBnrMNvmqssSfKT2-UXQ@mail.gmail.com> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-12 7:25 ` Ulli Horlacher @ 2021-07-12 13:06 ` Graham Cobb 2021-07-12 16:16 ` Ulli Horlacher 0 siblings, 1 reply; 56+ messages in thread From: Graham Cobb @ 2021-07-12 13:06 UTC (permalink / raw) To: Btrfs BTRFS On 12/07/2021 08:25, Ulli Horlacher wrote: > On Sat 2021-07-10 (16:17), Chris Murphy wrote: >> On Sat, Jul 10, 2021 at 12:56 AM Ulli Horlacher >> <framstag@rus.uni-stuttgart.de> wrote: >> >>> root@tsmsrvj:# snaprotate -v test 5 /data/fex/spool >>> $ btrfs subvolume snapshot -r /data/fex/spool /data/fex/spool/.snapshot/2021-07-10_0849.test >>> Create a readonly snapshot of '/data/fex/spool' in '/data/fex/spool/.snapshot/2021-07-10_0849.test' >> >> I think this might be the source of the problem. Nested snapshots are >> not a good idea, it causes various kinds of confusion. > > I do not have nested snapshots anywhere. > /data/fex/spool is not a snapshot. But it is the subvolume which is being snapshotted. What happens if you put the snapshots somewhere that is not part of that subvolume? For example, create /data/fex/snapshots, snapshot /data/fex/spool into a snapshot in /data/fex/snapshots/spool/2021-07-10_0849.test, export /data/fex/snapshots using NFS and mount /data/fex/snapshots on the client? > /data/fex/spool/.snapshot/2021-07-10_0849.test is a simple snapshot of > the btrfs subvolume /data/fex/spool > > >>> We cannot move the snapshots to a different directory. Our workflow >>> depends on snaprotate: >>> >>> http://fex.belwue.de/linuxtools/snaprotate.html Won't snaprotate follow softlinks? ln -s /data/fex/snapshots /data/fex/spool/.snapshot >> >> OK does the problem happen if you have no nested snapshots (no nested >> subvolumes of any kind) in the NFS export path? >> >> If the problem doesn't happen, then either the tool you've chosen needs >> to be enhanced so it will create snapshots somewhere else, which Btrfs >> supports, or you need to find another tool that can. > > Without snapshots there is no problem, but we need access to the snapshots > on the nfs clients for backup/recovery like Netapp offers it. > But Netapp is EXPENSIVE :-} My server snapshots data subvolumes into a different part of the tree (in my case I use btrbk) and exports them to clients and the clients can access all the snapshots over NFS perfectly well. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-12 13:06 ` Graham Cobb @ 2021-07-12 16:16 ` Ulli Horlacher 2021-07-12 22:56 ` g.btrfs 0 siblings, 1 reply; 56+ messages in thread From: Ulli Horlacher @ 2021-07-12 16:16 UTC (permalink / raw) To: Btrfs BTRFS On Mon 2021-07-12 (14:06), Graham Cobb wrote: > >>> root@tsmsrvj:# snaprotate -v test 5 /data/fex/spool > >>> $ btrfs subvolume snapshot -r /data/fex/spool /data/fex/spool/.snapshot/2021-07-10_0849.test > >>> Create a readonly snapshot of '/data/fex/spool' in '/data/fex/spool/.snapshot/2021-07-10_0849.test' > >> > >> I think this might be the source of the problem. Nested snapshots are > >> not a good idea, it causes various kinds of confusion. > > > > I do not have nested snapshots anywhere. > > /data/fex/spool is not a snapshot. > > But it is the subvolume which is being snapshotted. What happens if you > put the snapshots somewhere that is not part of that subvolume? For > example, create /data/fex/snapshots, snapshot /data/fex/spool into a > snapshot in /data/fex/snapshots/spool/2021-07-10_0849.test, export > /data/fex/snapshots using NFS and mount /data/fex/snapshots on the client? Same problem: root@tsmsrvj:/etc# mount | grep data /dev/sdb1 on /data type btrfs (rw,relatime,space_cache,user_subvol_rm_allowed,subvolid=5,subvol=/) root@tsmsrvj:/etc# mkdir /data/snapshots /nfs/localhost/snapshots root@tsmsrvj:/etc# btrfs subvolume snapshot -r /data/fex/spool /data/snapshots/fex_1 Create a readonly snapshot of '/data/fex/spool' in '/data/snapshots/fex_1' root@tsmsrvj:/etc# btrfs subvolume snapshot -r /data/fex/spool /data/snapshots/fex_2 Create a readonly snapshot of '/data/fex/spool' in '/data/snapshots/fex_2' root@tsmsrvj:/etc# btrfs subvolume list /data ID 257 gen 1558 top level 5 path fex ID 270 gen 1557 top level 257 path fex/spool ID 272 gen 23 top level 270 path fex/spool/.snapshot/2021-03-07_1531.test ID 273 gen 25 top level 270 path fex/spool/.snapshot/2021-03-07_1532.test ID 274 gen 27 top level 270 path fex/spool/.snapshot/2021-03-07_1718.test ID 394 gen 1470 top level 270 path fex/spool/.snapshot/2021-07-10_0849.test ID 399 gen 1554 top level 270 path fex/spool/.snapshot/2021-07-12_1747.test ID 400 gen 1556 top level 5 path snapshots/fex_1 ID 401 gen 1557 top level 5 path snapshots/fex_2 root@tsmsrvj:/etc# grep localhost /etc/exports /data/fex localhost(rw,async,no_subtree_check,no_root_squash,crossmnt) /data/snapshots localhost(rw,async,no_subtree_check,no_root_squash,crossmnt) ## ==> no nested subvolumes! different nfs exports root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/fex /nfs/localhost/fex root@tsmsrvj:/etc# mount | grep localhost localhost:/data/fex on /nfs/localhost/fex type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/snapshots /nfs/localhost/snapshots root@tsmsrvj:/etc# mount | grep localhost localhost:/data/fex on /nfs/localhost/fex type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) localhost:/data/fex on /nfs/localhost/snapshots type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) ## why localhost:/data/fex twice?? root@tsmsrvj:/etc# du -Hs /nfs/localhost/snapshots du: WARNING: Circular directory structure. This almost certainly means that you have a corrupted file system. NOTIFY YOUR SYSTEM MANAGER. The following directory is part of the cycle: /nfs/localhost/snapshots/spool 51425792 /nfs/localhost/snapshots > >>> We cannot move the snapshots to a different directory. Our workflow > >>> depends on snaprotate: > >>> > >>> http://fex.belwue.de/linuxtools/snaprotate.html > > Won't snaprotate follow softlinks? ln -s /data/fex/snapshots > /data/fex/spool/.snapshot Yes, it does, the snapshot storage place is just a simple directory, it does not have to be a subvolume. So, a symbolic links is ok, but it does not help, see above. > My server snapshots data subvolumes into a different part of the tree > (in my case I use btrbk) and exports them to clients and the clients can > access all the snapshots over NFS perfectly well. It does not work in my test evironment with Ubuntu 20.04 and btrfs 5.4.1 -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<294e8449-383f-1c90-62be-fb618332862e@cobb.uk.net> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-12 16:16 ` Ulli Horlacher @ 2021-07-12 22:56 ` g.btrfs 2021-07-13 7:37 ` Ulli Horlacher 0 siblings, 1 reply; 56+ messages in thread From: g.btrfs @ 2021-07-12 22:56 UTC (permalink / raw) To: Btrfs BTRFS On 12/07/2021 17:16, Ulli Horlacher wrote: > On Mon 2021-07-12 (14:06), Graham Cobb wrote: > >>>>> root@tsmsrvj:# snaprotate -v test 5 /data/fex/spool >>>>> $ btrfs subvolume snapshot -r /data/fex/spool /data/fex/spool/.snapshot/2021-07-10_0849.test >>>>> Create a readonly snapshot of '/data/fex/spool' in '/data/fex/spool/.snapshot/2021-07-10_0849.test' >>>> >>>> I think this might be the source of the problem. Nested snapshots are >>>> not a good idea, it causes various kinds of confusion. >>> >>> I do not have nested snapshots anywhere. >>> /data/fex/spool is not a snapshot. >> >> But it is the subvolume which is being snapshotted. What happens if you >> put the snapshots somewhere that is not part of that subvolume? For >> example, create /data/fex/snapshots, snapshot /data/fex/spool into a >> snapshot in /data/fex/snapshots/spool/2021-07-10_0849.test, export >> /data/fex/snapshots using NFS and mount /data/fex/snapshots on the client? > > Same problem: > > root@tsmsrvj:/etc# mount | grep data > /dev/sdb1 on /data type btrfs (rw,relatime,space_cache,user_subvol_rm_allowed,subvolid=5,subvol=/) > > root@tsmsrvj:/etc# mkdir /data/snapshots /nfs/localhost/snapshots > > root@tsmsrvj:/etc# btrfs subvolume snapshot -r /data/fex/spool /data/snapshots/fex_1 > Create a readonly snapshot of '/data/fex/spool' in '/data/snapshots/fex_1' > > root@tsmsrvj:/etc# btrfs subvolume snapshot -r /data/fex/spool /data/snapshots/fex_2 > Create a readonly snapshot of '/data/fex/spool' in '/data/snapshots/fex_2' > > root@tsmsrvj:/etc# btrfs subvolume list /data > ID 257 gen 1558 top level 5 path fex > ID 270 gen 1557 top level 257 path fex/spool > ID 272 gen 23 top level 270 path fex/spool/.snapshot/2021-03-07_1531.test > ID 273 gen 25 top level 270 path fex/spool/.snapshot/2021-03-07_1532.test > ID 274 gen 27 top level 270 path fex/spool/.snapshot/2021-03-07_1718.test > ID 394 gen 1470 top level 270 path fex/spool/.snapshot/2021-07-10_0849.test > ID 399 gen 1554 top level 270 path fex/spool/.snapshot/2021-07-12_1747.test > ID 400 gen 1556 top level 5 path snapshots/fex_1 > ID 401 gen 1557 top level 5 path snapshots/fex_2 > > root@tsmsrvj:/etc# grep localhost /etc/exports > /data/fex localhost(rw,async,no_subtree_check,no_root_squash,crossmnt) > /data/snapshots localhost(rw,async,no_subtree_check,no_root_squash,crossmnt) > > ## ==> no nested subvolumes! different nfs exports > > root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/fex /nfs/localhost/fex > root@tsmsrvj:/etc# mount | grep localhost > localhost:/data/fex on /nfs/localhost/fex type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) > > root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/snapshots /nfs/localhost/snapshots > root@tsmsrvj:/etc# mount | grep localhost > localhost:/data/fex on /nfs/localhost/fex type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) > localhost:/data/fex on /nfs/localhost/snapshots type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) > > ## why localhost:/data/fex twice?? > > root@tsmsrvj:/etc# du -Hs /nfs/localhost/snapshots > du: WARNING: Circular directory structure. > This almost certainly means that you have a corrupted file system. > NOTIFY YOUR SYSTEM MANAGER. > The following directory is part of the cycle: > /nfs/localhost/snapshots/spool Sure. But it makes the useful operations work. du, find, ls -R, etc all work properly on /nfs/localhost/fex. When I go looking in the snapshots I am generally looking for which version of a particular file I need to restore. For example, maybe I want to find an old version of /nfs/localhost/fex/spool/some/file. I would then find the best snapshot to use with: ls -l /nfs/localhost/fex_snapshots/spool_*/some/file which might show something like: -rw-r--r-- 1 cobb me 2.8K 2018-04-03 /nfs/localhost/fex_snapshots/spool_20210703/some/file -rw-r--r-- 1 cobb me 7 2021-07-06 /nfs/localhost/fex_snapshots/spool_20210706/some/file -rw-r--r-- 1 cobb me 25 2021-07-12 /nfs/localhost/fex_snapshots/spool_20210712/some/file So I could tell I need to restore the version from spool_20210703 if I need the one with the old data in it, which got lost a few days ago. This is exactly how I use NFS to access my btrbk snapshots stored on the backup server. Of course, if you need to restore a whole subvolume you are better of using btrfs send/receive to bring the snapshot back, instead of using NFS - that preserves the btrfs features like reflinks. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-12 22:56 ` g.btrfs @ 2021-07-13 7:37 ` Ulli Horlacher 2021-07-19 12:06 ` Forza 0 siblings, 1 reply; 56+ messages in thread From: Ulli Horlacher @ 2021-07-13 7:37 UTC (permalink / raw) To: Btrfs BTRFS On Mon 2021-07-12 (23:56), g.btrfs@cobb.uk.net wrote: > > root@tsmsrvj:/etc# du -Hs /nfs/localhost/snapshots > > du: WARNING: Circular directory structure. > > This almost certainly means that you have a corrupted file system. > > NOTIFY YOUR SYSTEM MANAGER. > > The following directory is part of the cycle: > > /nfs/localhost/snapshots/spool > > Sure. But it makes the useful operations work. du, find, ls -R, etc all > work properly on /nfs/localhost/fex. Properly on /nfs/localhost/fex : yes Properly on /nfs/localhost/snapshots : NO And the error messages are annoying! root@tsmsrvj:/etc# exportfs -v /data/fex localhost.localdomain(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash) /data/snapshots localhost.localdomain(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash) root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/fex /nfs/localhost/fex root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/snapshots /nfs/localhost/snapshots root@tsmsrvj:/etc# mount | grep localhost localhost:/data/fex on /nfs/localhost/fex type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) localhost:/data/snapshots on /nfs/localhost/snapshots type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) root@tsmsrvj:/etc# ls -la /data/snapshots /nfs/localhost/snapshots /data/snapshots: total 16 drwxr-xr-x 1 root root 20 Jul 13 09:19 . drwxr-xr-x 1 root root 24 Jul 12 17:42 .. drwxr-xr-x 1 fex fex 261964 Mar 7 14:53 fex_1 drwxr-xr-x 1 fex fex 261964 Mar 7 14:53 fex_2 /nfs/localhost/snapshots: total 4 drwxr-xr-x 1 root root 20 Jul 13 09:19 . drwxr-xr-x 4 root root 4096 Jul 12 17:49 .. drwxr-xr-x 1 fex fex 261964 Mar 7 14:53 fex_1 drwxr-xr-x 1 fex fex 261964 Mar 7 14:53 fex_2 root@tsmsrvj:/etc# du -Hs /nfs/localhost/snapshots du: WARNING: Circular directory structure. This almost certainly means that you have a corrupted file system. NOTIFY YOUR SYSTEM MANAGER. The following directory is part of the cycle: /nfs/localhost/snapshots/fex_1/XXXXXXXXXX@gmail.com du: WARNING: Circular directory structure. This almost certainly means that you have a corrupted file system. NOTIFY YOUR SYSTEM MANAGER. The following directory is part of the cycle: /nfs/localhost/snapshots/fex_2/XXXXXXXXXX@gmail.com 25708064 /nfs/localhost/snapshots root@tsmsrvj:/etc# du -Hs /data/snapshots 25712896 /data/snapshots root@tsmsrvj:/etc# ls -R /nfs/localhost/snapshots | wc -l ls: /nfs/localhost/snapshots/fex_1/XXXXXXXXXX@gmail.com: not listing already-listed directory ls: /nfs/localhost/snapshots/fex_2/XXXXXXXXXX@gmail.com: not listing already-listed directory 128977 root@tsmsrvj:/etc# ls -R /data/snapshots | wc -l 129021 root@tsmsrvj:/etc# ls -aR /nfs/localhost/snapshots | wc -l ls: /nfs/localhost/snapshots/fex_1/XXXXXXXXXX@gmail.com: not listing already-listed directory ls: /nfs/localhost/snapshots/fex_2/XXXXXXXXXX@gmail.com: not listing already-listed directory 281357 root@tsmsrvj:/etc# ls -aR /data/snapshots | wc -l 281427 More debug info: root@tsmsrvj:/data/snapshots# find . >/tmp/local.list root@tsmsrvj:/nfs/localhost/snapshots# find . >/tmp/nfs.list find: File system loop detected; './fex_1/XXXXXXXXXX@gmail.com' is part of the same file system loop as '.'. find: File system loop detected; './fex_2/XXXXXXXXXX@gmail.com' is part of the same file system loop as '.'. root@tsmsrvj:/nfs/localhost/snapshots# diff -u /tmp/local.list /tmp/nfs.list --- /tmp/local.list 2021-07-13 09:25:36.388084331 +0200 +++ /tmp/nfs.list 2021-07-13 09:26:02.120793230 +0200 @@ -1,25 +1,5 @@ . ./fex_1 -./fex_1/XXXXXXXXXX@gmail.com -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/alist -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/filename -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/size -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/autodelete -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/keep -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/ip -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/uurl -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/useragent -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/header -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/dkey -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/speed -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/md5sum -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/download -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/error -./fex_1/XXXXXXXXXX@gmail.com/.log -./fex_1/XXXXXXXXXX@gmail.com/.log/fup -./fex_1/XXXXXXXXXX@gmail.com/.log/fop ./fex_1/XXXXXXXXXX@web.de ./fex_1/XXXXXXXXXX@web.de/@LOCALE ./fex_1/XXXXXXXXXX@web.de/.log @@ -97976,26 +97956,6 @@ ./fex_1/.xkeys ./fex_1/.snapshot ./fex_2 -./fex_2/XXXXXXXXXX@gmail.com -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/alist -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/filename -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/size -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/autodelete -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/keep -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/ip -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/uurl -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/useragent -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/header -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/dkey -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/speed -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/md5sum -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/download -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/error -./fex_2/XXXXXXXXXX@gmail.com/.log -./fex_2/XXXXXXXXXX@gmail.com/.log/fup -./fex_2/XXXXXXXXXX@gmail.com/.log/fop ./fex_2/XXXXXXXXXX@web.de ./fex_2/XXXXXXXXXX@web.de/@LOCALE ./fex_2/XXXXXXXXXX@web.de/.log -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<8506b846-4c4d-6e8f-09ee-e0f2736aac4e@cobb.uk.net> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-13 7:37 ` Ulli Horlacher @ 2021-07-19 12:06 ` Forza 2021-07-19 13:07 ` Forza 2021-07-27 11:27 ` Ulli Horlacher 0 siblings, 2 replies; 56+ messages in thread From: Forza @ 2021-07-19 12:06 UTC (permalink / raw) To: Btrfs BTRFS On 2021-07-13 09:37, Ulli Horlacher wrote: > On Mon 2021-07-12 (23:56), g.btrfs@cobb.uk.net wrote: > >>> root@tsmsrvj:/etc# du -Hs /nfs/localhost/snapshots >>> du: WARNING: Circular directory structure. >>> This almost certainly means that you have a corrupted file system. >>> NOTIFY YOUR SYSTEM MANAGER. >>> The following directory is part of the cycle: >>> /nfs/localhost/snapshots/spool >> >> Sure. But it makes the useful operations work. du, find, ls -R, etc all >> work properly on /nfs/localhost/fex. > > Properly on /nfs/localhost/fex : yes > Properly on /nfs/localhost/snapshots : NO > > And the error messages are annoying! > > root@tsmsrvj:/etc# exportfs -v > /data/fex localhost.localdomain(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash) > /data/snapshots localhost.localdomain(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash) > > root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/fex /nfs/localhost/fex > root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/snapshots /nfs/localhost/snapshots > root@tsmsrvj:/etc# mount | grep localhost > localhost:/data/fex on /nfs/localhost/fex type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) > localhost:/data/snapshots on /nfs/localhost/snapshots type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) > What kind of NFS server is this? Isn't UDP mounts legacy and not normally used by default? Can you switch to an nfs4 server and try again? I also still think you should use fsid export option. > root@tsmsrvj:/etc# ls -la /data/snapshots /nfs/localhost/snapshots > /data/snapshots: > total 16 > drwxr-xr-x 1 root root 20 Jul 13 09:19 . > drwxr-xr-x 1 root root 24 Jul 12 17:42 .. > drwxr-xr-x 1 fex fex 261964 Mar 7 14:53 fex_1 > drwxr-xr-x 1 fex fex 261964 Mar 7 14:53 fex_2 > > /nfs/localhost/snapshots: > total 4 > drwxr-xr-x 1 root root 20 Jul 13 09:19 . > drwxr-xr-x 4 root root 4096 Jul 12 17:49 .. > drwxr-xr-x 1 fex fex 261964 Mar 7 14:53 fex_1 > drwxr-xr-x 1 fex fex 261964 Mar 7 14:53 fex_2 > > root@tsmsrvj:/etc# du -Hs /nfs/localhost/snapshots > du: WARNING: Circular directory structure. > This almost certainly means that you have a corrupted file system. > NOTIFY YOUR SYSTEM MANAGER. > The following directory is part of the cycle: > /nfs/localhost/snapshots/fex_1/XXXXXXXXXX@gmail.com > > du: WARNING: Circular directory structure. > This almost certainly means that you have a corrupted file system. > NOTIFY YOUR SYSTEM MANAGER. > The following directory is part of the cycle: > /nfs/localhost/snapshots/fex_2/XXXXXXXXXX@gmail.com > > 25708064 /nfs/localhost/snapshots > > root@tsmsrvj:/etc# du -Hs /data/snapshots > 25712896 /data/snapshots > > root@tsmsrvj:/etc# ls -R /nfs/localhost/snapshots | wc -l > ls: /nfs/localhost/snapshots/fex_1/XXXXXXXXXX@gmail.com: not listing already-listed directory > ls: /nfs/localhost/snapshots/fex_2/XXXXXXXXXX@gmail.com: not listing already-listed directory > 128977 > > root@tsmsrvj:/etc# ls -R /data/snapshots | wc -l > 129021 > > root@tsmsrvj:/etc# ls -aR /nfs/localhost/snapshots | wc -l > ls: /nfs/localhost/snapshots/fex_1/XXXXXXXXXX@gmail.com: not listing already-listed directory > ls: /nfs/localhost/snapshots/fex_2/XXXXXXXXXX@gmail.com: not listing already-listed directory > 281357 > > root@tsmsrvj:/etc# ls -aR /data/snapshots | wc -l > 281427 > > > > More debug info: > > root@tsmsrvj:/data/snapshots# find . >/tmp/local.list > > root@tsmsrvj:/nfs/localhost/snapshots# find . >/tmp/nfs.list > find: File system loop detected; './fex_1/XXXXXXXXXX@gmail.com' is part of the same file system loop as '.'. > find: File system loop detected; './fex_2/XXXXXXXXXX@gmail.com' is part of the same file system loop as '.'. > > root@tsmsrvj:/nfs/localhost/snapshots# diff -u /tmp/local.list /tmp/nfs.list > > --- /tmp/local.list 2021-07-13 09:25:36.388084331 +0200 > +++ /tmp/nfs.list 2021-07-13 09:26:02.120793230 +0200 > @@ -1,25 +1,5 @@ > . > ./fex_1 > -./fex_1/XXXXXXXXXX@gmail.com > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/alist > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/filename > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/size > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/autodelete > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/keep > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/ip > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/uurl > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/useragent > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/header > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/dkey > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/speed > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/md5sum > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/download > -./fex_1/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/error > -./fex_1/XXXXXXXXXX@gmail.com/.log > -./fex_1/XXXXXXXXXX@gmail.com/.log/fup > -./fex_1/XXXXXXXXXX@gmail.com/.log/fop > ./fex_1/XXXXXXXXXX@web.de > ./fex_1/XXXXXXXXXX@web.de/@LOCALE > ./fex_1/XXXXXXXXXX@web.de/.log > @@ -97976,26 +97956,6 @@ > ./fex_1/.xkeys > ./fex_1/.snapshot > ./fex_2 > -./fex_2/XXXXXXXXXX@gmail.com > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/alist > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/filename > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/size > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/autodelete > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/keep > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/ip > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/uurl > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/useragent > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/header > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/dkey > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/speed > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/md5sum > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/download > -./fex_2/XXXXXXXXXX@gmail.com/XXXXXXXXXX@pi2.uni-stuttgart.de/origin-8.5.1-SR2.zip/error > -./fex_2/XXXXXXXXXX@gmail.com/.log > -./fex_2/XXXXXXXXXX@gmail.com/.log/fup > -./fex_2/XXXXXXXXXX@gmail.com/.log/fop > ./fex_2/XXXXXXXXXX@web.de > ./fex_2/XXXXXXXXXX@web.de/@LOCALE > ./fex_2/XXXXXXXXXX@web.de/.log > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-19 12:06 ` Forza @ 2021-07-19 13:07 ` Forza 2021-07-19 13:35 ` Forza 2021-07-27 11:27 ` Ulli Horlacher 1 sibling, 1 reply; 56+ messages in thread From: Forza @ 2021-07-19 13:07 UTC (permalink / raw) To: Btrfs BTRFS On 2021-07-19 14:06, Forza wrote: > > > On 2021-07-13 09:37, Ulli Horlacher wrote: >> On Mon 2021-07-12 (23:56), g.btrfs@cobb.uk.net wrote: >> >>>> root@tsmsrvj:/etc# du -Hs /nfs/localhost/snapshots >>>> du: WARNING: Circular directory structure. >>>> This almost certainly means that you have a corrupted file system. >>>> NOTIFY YOUR SYSTEM MANAGER. >>>> The following directory is part of the cycle: >>>> /nfs/localhost/snapshots/spool >>> >>> Sure. But it makes the useful operations work. du, find, ls -R, etc all >>> work properly on /nfs/localhost/fex. >> >> Properly on /nfs/localhost/fex : yes >> Properly on /nfs/localhost/snapshots : NO >> >> And the error messages are annoying! >> >> root@tsmsrvj:/etc# exportfs -v >> /data/fex >> localhost.localdomain(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash) >> >> /data/snapshots >> localhost.localdomain(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash) >> >> >> root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/fex /nfs/localhost/fex >> root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/snapshots >> /nfs/localhost/snapshots >> root@tsmsrvj:/etc# mount | grep localhost >> localhost:/data/fex on /nfs/localhost/fex type nfs >> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) >> >> localhost:/data/snapshots on /nfs/localhost/snapshots type nfs >> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) >> >> > > What kind of NFS server is this? Isn't UDP mounts legacy and not > normally used by default? > > Can you switch to an nfs4 server and try again? I also still think you > should use fsid export option. > > > I'm replying to myself here because I booted up a VM with Fedora 34 and tested a similar setup as Mr Horlacher's and can reproduce the errors. Setup: 1) create a subvolume /mnt/rootvol/nfs 2) create some snapshots: btrfs sub snap /mnt/rootvol/nfs /mnt/rootvol/nfs/.snapshots/nfs-1 btrfs sub snap /mnt/rootvol/nfs /mnt/rootvol/nfs/.snapshots/nfs-2 3) export as: /mnt/rootvol/nfs/ *(fsid=1234,no_root_squash) 4) mount -o vers=4 localhost:/mnt/rootvol/nfs /media/nfs-mnt/ 5) "du -sh /media/nfs-mnt" fails with "WARNING: Circular directory structure." 6) "ls -alR /mnt/nfs-mnt" fails with "not listing already-listed directory" In addition I have tried with various export options such as crossmnt, nohide and subtree_check. They do not improve the situation. Also the behaviour is the same with nfs3 as with nfs4. Full outputs are available at https://paste.ee/p/pkHLh ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-19 13:07 ` Forza @ 2021-07-19 13:35 ` Forza 0 siblings, 0 replies; 56+ messages in thread From: Forza @ 2021-07-19 13:35 UTC (permalink / raw) To: Btrfs BTRFS On 2021-07-19 15:07, Forza wrote: > > > On 2021-07-19 14:06, Forza wrote: >> >> >> On 2021-07-13 09:37, Ulli Horlacher wrote: >>> On Mon 2021-07-12 (23:56), g.btrfs@cobb.uk.net wrote: >>> >>>>> root@tsmsrvj:/etc# du -Hs /nfs/localhost/snapshots >>>>> du: WARNING: Circular directory structure. >>>>> This almost certainly means that you have a corrupted file system. >>>>> NOTIFY YOUR SYSTEM MANAGER. >>>>> The following directory is part of the cycle: >>>>> /nfs/localhost/snapshots/spool >>>> >>>> Sure. But it makes the useful operations work. du, find, ls -R, etc all >>>> work properly on /nfs/localhost/fex. >>> >>> Properly on /nfs/localhost/fex : yes >>> Properly on /nfs/localhost/snapshots : NO >>> >>> And the error messages are annoying! >>> >>> root@tsmsrvj:/etc# exportfs -v >>> /data/fex >>> localhost.localdomain(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash) >>> >>> /data/snapshots >>> localhost.localdomain(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash) >>> >>> >>> root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/fex >>> /nfs/localhost/fex >>> root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/snapshots >>> /nfs/localhost/snapshots >>> root@tsmsrvj:/etc# mount | grep localhost >>> localhost:/data/fex on /nfs/localhost/fex type nfs >>> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) >>> >>> localhost:/data/snapshots on /nfs/localhost/snapshots type nfs >>> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) >>> >>> >> >> What kind of NFS server is this? Isn't UDP mounts legacy and not >> normally used by default? >> >> Can you switch to an nfs4 server and try again? I also still think you >> should use fsid export option. >> >> >> > I'm replying to myself here because I booted up a VM with Fedora 34 and > tested a similar setup as Mr Horlacher's and can reproduce the errors. > > Setup: > 1) create a subvolume /mnt/rootvol/nfs > 2) create some snapshots: > btrfs sub snap /mnt/rootvol/nfs /mnt/rootvol/nfs/.snapshots/nfs-1 > btrfs sub snap /mnt/rootvol/nfs /mnt/rootvol/nfs/.snapshots/nfs-2 > > 3) export as: > /mnt/rootvol/nfs/ *(fsid=1234,no_root_squash) > > 4) mount -o vers=4 localhost:/mnt/rootvol/nfs /media/nfs-mnt/ > 5) "du -sh /media/nfs-mnt" fails with > "WARNING: Circular directory structure." > > 6) "ls -alR /mnt/nfs-mnt" fails with > "not listing already-listed directory" > > In addition I have tried with various export options such as crossmnt, > nohide and subtree_check. They do not improve the situation. > > Also the behaviour is the same with nfs3 as with nfs4. > > Full outputs are available at https://paste.ee/p/pkHLh Perhaps the problem is that inode numbers are re-used inside snapshots and that nfsd doesn't understand how to handle this properly? # ls -ila /media/nfs-mnt/ total 0 256 drwxr-xr-x. 1 root root 80 Jul 19 14:17 . 270 drwxr-xr-x. 1 root root 14 Jul 19 14:21 .. 259 -rw-r--r--. 1 root root 0 Jul 19 14:17 bar 261 -rw-r--r--. 1 root root 0 Jul 19 14:17 file1 262 -rw-r--r--. 1 root root 0 Jul 19 14:17 file2 263 -rw-r--r--. 1 root root 0 Jul 19 14:17 file3 258 -rw-r--r--. 1 root root 0 Jul 19 14:17 foo 257 drwxr-xr-x. 1 root root 30 Jul 19 15:02 .snapshots 260 -rw-r--r--. 1 root root 0 Jul 19 14:17 somefiles # ls -ila /media/nfs-mnt/.snapshots/nfs-2/ total 0 256 drwxr-xr-x. 1 root root 80 Jul 19 14:17 . 257 drwxr-xr-x. 1 root root 30 Jul 19 15:02 .. 259 -rw-r--r--. 1 root root 0 Jul 19 14:17 bar 261 -rw-r--r--. 1 root root 0 Jul 19 14:17 file1 262 -rw-r--r--. 1 root root 0 Jul 19 14:17 file2 263 -rw-r--r--. 1 root root 0 Jul 19 14:17 file3 258 -rw-r--r--. 1 root root 0 Jul 19 14:17 foo 257 drwxr-xr-x. 1 root root 10 Jul 19 14:17 .snapshots 260 -rw-r--r--. 1 root root 0 Jul 19 14:17 somefiles Using nfs4 exports and specifying each snapshot as its own fsid does not work either. ### /etc/exports /mnt/rootvol/nfs/ *(fsid=root,no_root_squash,no_subtree_check) /mnt/rootvol/nfs/.snapshots/nfs-1 *(fsid=1000,no_root_squash,no_subtree_check) /mnt/rootvol/nfs/.snapshots/nfs-2 *(fsid=2000,no_root_squash,no_subtree_check) /mnt/rootvol/nfs/.snapshots/nfs-3 *(fsid=3000,no_root_squash,no_subtree_check) # ls -laRi nfs-mnt/ nfs-mnt/: total 0 256 drwxr-xr-x. 1 root root 80 Jul 19 14:17 . 270 drwxr-xr-x. 1 root root 14 Jul 19 14:21 .. 259 -rw-r--r--. 1 root root 0 Jul 19 14:17 bar 261 -rw-r--r--. 1 root root 0 Jul 19 14:17 file1 262 -rw-r--r--. 1 root root 0 Jul 19 14:17 file2 263 -rw-r--r--. 1 root root 0 Jul 19 14:17 file3 258 -rw-r--r--. 1 root root 0 Jul 19 14:17 foo 257 drwxr-xr-x. 1 root root 30 Jul 19 15:02 .snapshots 260 -rw-r--r--. 1 root root 0 Jul 19 14:17 somefiles nfs-mnt/.snapshots: total 0 257 drwxr-xr-x. 1 root root 30 Jul 19 15:02 . 256 drwxr-xr-x. 1 root root 80 Jul 19 14:17 .. 256 drwxr-xr-x. 1 root root 56 Jul 19 15:02 nfs-1 256 drwxr-xr-x. 1 root root 80 Jul 19 14:17 nfs-2 256 drwxr-xr-x. 1 root root 86 Jul 19 15:03 nfs-3 ls: nfs-mnt/.snapshots/nfs-1: not listing already-listed directory ls: nfs-mnt/.snapshots/nfs-2: not listing already-listed directory ls: nfs-mnt/.snapshots/nfs-3: not listing already-listed directory ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-19 12:06 ` Forza 2021-07-19 13:07 ` Forza @ 2021-07-27 11:27 ` Ulli Horlacher 1 sibling, 0 replies; 56+ messages in thread From: Ulli Horlacher @ 2021-07-27 11:27 UTC (permalink / raw) To: Btrfs BTRFS On Mon 2021-07-19 (14:06), Forza wrote: > > And the error messages are annoying! > > > > root@tsmsrvj:/etc# exportfs -v > > /data/fex localhost.localdomain(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash) > > /data/snapshots localhost.localdomain(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash) > > > > root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/fex /nfs/localhost/fex > > root@tsmsrvj:/etc# mount -o vers=3 localhost:/data/snapshots /nfs/localhost/snapshots > > root@tsmsrvj:/etc# mount | grep localhost > > localhost:/data/fex on /nfs/localhost/fex type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) > > localhost:/data/snapshots on /nfs/localhost/snapshots type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=37961,mountproto=udp,local_lock=none,addr=127.0.0.1) > > > > What kind of NFS server is this? Default Ubuntu kernel NFS-server. > Isn't UDP mounts legacy and not normally used by default? See above, I am using tcp! > Can you switch to an nfs4 server and try again? I also still think you > should use fsid export option. No change. The error is still there. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<2b53b9dd-4353-a73e-59b3-c87b6419ebf4@tnonline.net> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-08 22:17 ` cannot use btrfs for nfs server Ulli Horlacher 2021-07-09 0:05 ` Graham Cobb @ 2021-07-09 16:06 ` Lord Vader 2021-07-10 7:03 ` Ulli Horlacher 1 sibling, 1 reply; 56+ messages in thread From: Lord Vader @ 2021-07-09 16:06 UTC (permalink / raw) To: linux-btrfs On Fri, 9 Jul 2021 at 01:18, Ulli Horlacher <framstag@rus.uni-stuttgart.de> wrote: > I have waited some time and some Ubuntu updates, but the bug is still there: > > > When I try to access a btrfs filesystem via nfs, I get the error: > > > root@tsmsrvi:~# mount tsmsrvj:/data/fex /nfs/tsmsrvj/fex > > > root@tsmsrvi:~# time find /nfs/tsmsrvj/fex | wc -l > > > find: File system loop detected; '/nfs/tsmsrvj/fex/spool' is part of the same file system loop as '/nfs/tsmsrvj/fex'. Can you try exporting NFS share with 'crossmnt' option? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: cannot use btrfs for nfs server 2021-07-09 16:06 ` Lord Vader @ 2021-07-10 7:03 ` Ulli Horlacher 0 siblings, 0 replies; 56+ messages in thread From: Ulli Horlacher @ 2021-07-10 7:03 UTC (permalink / raw) To: linux-btrfs On Fri 2021-07-09 (19:06), Lord Vader wrote: > On Fri, 9 Jul 2021 at 01:18, Ulli Horlacher > <framstag@rus.uni-stuttgart.de> wrote: > > > I have waited some time and some Ubuntu updates, but the bug is still there: > > > > When I try to access a btrfs filesystem via nfs, I get the error: > > > > root@tsmsrvi:~# mount tsmsrvj:/data/fex /nfs/tsmsrvj/fex > > > > root@tsmsrvi:~# time find /nfs/tsmsrvj/fex | wc -l > > > > find: File system loop detected; '/nfs/tsmsrvj/fex/spool' is part of the same file system loop as '/nfs/tsmsrvj/fex'. > > Can you try exporting NFS share with 'crossmnt' option? root@tsmsrvj:/etc# exportfs -v /data/fex localhost.localdomain(rw,async,wdelay,crossmnt,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash) root@tsmsrvj:/etc# mount -v localhost:/data/fex /nfs/localhost/fex mount.nfs: timeout set for Sat Jul 10 09:02:31 2021 mount.nfs: trying text-based options 'vers=4.2,addr=127.0.0.1,clientaddr=127.0.0.1' root@tsmsrvj:/etc# du -s /nfs/localhost/fex du: WARNING: Circular directory structure. This almost certainly means that you have a corrupted file system. NOTIFY YOUR SYSTEM MANAGER. The following directory is part of the cycle: /nfs/localhost/fex/spool -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlacher@tik.uni-stuttgart.de Allmandring 30a Tel: ++49-711-68565868 70569 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/ REF:<CAMnT83vyufNCMDQQnyYi-k8dOft3_bc_2L-rgHOBzeWgKqPt2A@mail.gmail.com> ^ permalink raw reply [flat|nested] 56+ messages in thread
[parent not found: <162632387205.13764.6196748476850020429@noble.neil.brown.name>]
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. [not found] ` <162632387205.13764.6196748476850020429@noble.neil.brown.name> @ 2021-07-15 14:09 ` Josef Bacik 2021-07-15 16:45 ` Christoph Hellwig 2021-07-15 23:02 ` NeilBrown 2021-07-15 15:45 ` J. Bruce Fields 1 sibling, 2 replies; 56+ messages in thread From: Josef Bacik @ 2021-07-15 14:09 UTC (permalink / raw) To: NeilBrown, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba Cc: linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On 7/15/21 12:37 AM, NeilBrown wrote: > > Hi all, > the problem this patch address has been discuss on both the NFS list > and the BTRFS list, so I'm sending this to both. I'd be very happy for > people experiencing the problem (NFS export of BTRFS subvols) who are > in a position to rebuild the kernel on their NFS server to test this > and report success (or otherwise). > > While I've tried to write this patch so that it *could* land upstream > (and could definitely land in a distro franken-kernel if needed), I'm > not completely sure it *should* land upstream. It includes some deep > knowledge of BTRFS into NFSD code. This could be removed later once > proper APIs are designed and provided. I can see arguments either way > and wonder what others think. > > BTRFS developers: please examine the various claims I have made about > BTRFS and correct any that are wrong. The observation that > getdents can report the same inode number of unrelated files > (a file and a subvol in my case) is ... interesting. > > NFSD developers: please comment on anything else. > > Others: as I said: testing would be great! :-) > > Subject: [PATCH] NFSD: handle BTRFS subvolumes better. > > A single BTRFS mount can present as multiple "volumes". i.e. multiple > sets of objects with potentially overlapping inode number spaces. > The st_dev presented to user-space via the stat(2) family of calls is > different for each internal volume, as is the f_fsid reported by > statfs(). > > However nfsd doesn't look at st_dev or the fsid (other than for the > export point - typically the mount point), so it doesn't notice the > different filesystems. Importantly, it doesn't report a different fsid > to the NFS client. > > This leads to the NFS client reusing inode numbers, and applications > like "find" and "du" complaining, particularly when they find a > directory with the same st_ino and st_dev as an ancestor. This > typically happens with the root of a sub-volume as the root of every > volume in BTRFS has the same inode number (256). > > To fix this, we need to report a different fsid for each subvolume, but > need to use the same fsid that we currently use for the top-level > volume. Changing this (by rebooting a server to new code), might > confuse the client. I don't think it would be a major problem (stale > filehandles shouldn't happen), but it is best avoided. > > Determining the fsid to use is a bit awkward.... > > There is limited space in the protocol (32 bits for NFSv3, 64 for NFSv4) > so we cannot append the subvolume fsid. The best option seems to be to > hash it in. This patch uses a simple 'xor', but possible a Jenkins hash > would be better. > > For BTRFS (and other) filesystems the current fsid is a hash (xor) of > the uuid provided from userspace by mounted. This is derived from the > statfs fsid. If we use the statfs fsid for subvolumes and xor this in, > we risk erasing useful unique information. So I have chosen not to use > the statfs fsid. > > Ideally we should have an API for the filesystem to report if it uses > multiple subvolumes, and to provide a unique identifier for each. For > now, this patch calls exportfs_encode_fh(). If the returned fsid type > is NOT one of those used by BTRFS, then we assume the st_fsid cannot > change, and use the current behaviour. > > If the type IS one that BTRFS uses, we use intimate knowledge of BTRFS > to extract the root_object_id from the filehandle and record that with > the export information. Then when exporting an fsid, we check if > subvolumes are enabled and if the current dentry has a different > root_object_id to the exported volume. If it does, the root_object_id > is hashed (xor) into the reported fsid. > > When an NFSv4 client sees that the fsid has changed, it will ask for the > MOUNTED_ON_FILEID. With the Linux NFS client, this is visible to > userspace as an automount point, until content within the directory is > accessed and the automount is triggered. Currently the MOUNTED_ON_FILEID > for these subvolume roots is the same as of the root - 256. This will > cause find et.al. to complain until the automount actually gets mounted. > > So this patch reports the MOUNTED_OF_FILEID in such cases to be a magic > number that appears to be appropriate for BTRFS: > BTRFS_FIRST_FREE_OBJECTID - 1 > > Again, we really want an API to get this from the filesystem. Changing > it later has no cost, so we don't need any commitment from the btrfs team > that this is what they will provide if/when we do get such an API. > > This same problem (of an automount point with a duplicate inode number) > also exists for NFSv3. This problem cannot be resolved completely on > the server as NFSv3 doesn't have a well defined "MOUNTED_ON_FILEID" > concept, but we can come close. The inode number returned by READDIR is > likely to be the mounted-on-fileid. With READDIR_PLUS, two fileids are > returned, the one from the readdir, and (optionally) another from > 'stat'. Linux-NFS checks these match and if not, it treats the first as > a mounted-on-fileid. > > Interestingly BTRFS getdents() *DOES* report a different inode number > for subvol roots than is returned by stat(). These aren't actually > unique (!!!!) but in at least one case, they are different from > ancestors, so this is sufficient. > > NFSD currently SUPPRESSES the stat information if the inode number is > different. This is because there is room for a file to be renamed between > the readdir call and the lookup_one_len() prior to getattr, and the > results could be confusing. However for the case of a BTRFS filesystem > with an inode number of 256, the value of reporting the difference seems > to exceed the cost of any confusion caused by a race (if that is even > possible in this case). > So this patch allows the two fileids to be different when 256 is found > on BTRFS. > > With this patch a 'du' or 'find' in an NFS-mounted btrfs filesystem > which has snapshot subvols works correctly for both NFSv4 and NFSv3. > Fortunately the problematic programs tend to trigger READDIR_PLUS and so > benefit from the detection of the MOUNTED_ON_FILEID which is provides. > > Signed-off-by: NeilBrown <neilb@suse.de> I'm going to restate what I think the problem is you're having just so I'm sure we're on the same page. 1. We export a btrfs volume via nfsd that has multiple subvolumes. 2. We run find, and when we stat a file, nfsd doesn't send along our bogus st_dev, it sends it's own thing (I assume?). This confuses du/find because you get the same inode number with different parents. Is this correct? If that's the case then it' be relatively straightforward to add another callback into export_operations to grab this fsid right? Hell we could simply return the objectid of the root since that's unique across the entire file system. We already do our magic FH encoding to make sure we keep all this straight for NFS, another callback to give that info isn't going to kill us. Thanks, Josef ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-15 14:09 ` [PATCH/RFC] NFSD: handle BTRFS subvolumes better Josef Bacik @ 2021-07-15 16:45 ` Christoph Hellwig 2021-07-15 17:11 ` Josef Bacik 2021-07-15 23:02 ` NeilBrown 1 sibling, 1 reply; 56+ messages in thread From: Christoph Hellwig @ 2021-07-15 16:45 UTC (permalink / raw) To: Josef Bacik Cc: NeilBrown, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Thu, Jul 15, 2021 at 10:09:37AM -0400, Josef Bacik wrote: > I'm going to restate what I think the problem is you're having just so I'm > sure we're on the same page. > > 1. We export a btrfs volume via nfsd that has multiple subvolumes. > 2. We run find, and when we stat a file, nfsd doesn't send along our bogus > st_dev, it sends it's own thing (I assume?). This confuses du/find because > you get the same inode number with different parents. > > Is this correct? If that's the case then it' be relatively straightforward > to add another callback into export_operations to grab this fsid right? > Hell we could simply return the objectid of the root since that's unique > across the entire file system. We already do our magic FH encoding to make > sure we keep all this straight for NFS, another callback to give that info > isn't going to kill us. Thanks, Hell no. btrfs is broken plain and simple, and we've been arguing about this for years without progress. btrfs needs to stop claiming different st_dev inside the same mount, otherwise hell is going to break lose left right and center, and this is just one of the many cases where it does. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-15 16:45 ` Christoph Hellwig @ 2021-07-15 17:11 ` Josef Bacik 2021-07-15 17:24 ` Christoph Hellwig 0 siblings, 1 reply; 56+ messages in thread From: Josef Bacik @ 2021-07-15 17:11 UTC (permalink / raw) To: Christoph Hellwig Cc: NeilBrown, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On 7/15/21 12:45 PM, Christoph Hellwig wrote: > On Thu, Jul 15, 2021 at 10:09:37AM -0400, Josef Bacik wrote: >> I'm going to restate what I think the problem is you're having just so I'm >> sure we're on the same page. >> >> 1. We export a btrfs volume via nfsd that has multiple subvolumes. >> 2. We run find, and when we stat a file, nfsd doesn't send along our bogus >> st_dev, it sends it's own thing (I assume?). This confuses du/find because >> you get the same inode number with different parents. >> >> Is this correct? If that's the case then it' be relatively straightforward >> to add another callback into export_operations to grab this fsid right? >> Hell we could simply return the objectid of the root since that's unique >> across the entire file system. We already do our magic FH encoding to make >> sure we keep all this straight for NFS, another callback to give that info >> isn't going to kill us. Thanks, > > Hell no. btrfs is broken plain and simple, and we've been arguing about > this for years without progress. btrfs needs to stop claiming different > st_dev inside the same mount, otherwise hell is going to break lose left > right and center, and this is just one of the many cases where it does. > Because there's no alternative. We need a way to tell userspace they've wandered into a different inode namespace. There's no argument that what we're doing is ugly, but there's never been a clear "do X instead". Just a lot of whinging that btrfs is broken. This makes userspace happy and is simple and straightforward. I'm open to alternatives, but there have been 0 workable alternatives proposed in the last decade of complaining about it. Thanks, Josef ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-15 17:11 ` Josef Bacik @ 2021-07-15 17:24 ` Christoph Hellwig 2021-07-15 18:01 ` Josef Bacik 0 siblings, 1 reply; 56+ messages in thread From: Christoph Hellwig @ 2021-07-15 17:24 UTC (permalink / raw) To: Josef Bacik Cc: Christoph Hellwig, NeilBrown, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Thu, Jul 15, 2021 at 01:11:29PM -0400, Josef Bacik wrote: > Because there's no alternative. We need a way to tell userspace they've > wandered into a different inode namespace. There's no argument that what > we're doing is ugly, but there's never been a clear "do X instead". Just a > lot of whinging that btrfs is broken. This makes userspace happy and is > simple and straightforward. I'm open to alternatives, but there have been 0 > workable alternatives proposed in the last decade of complaining about it. Make sure we cross a vfsmount when crossing the "st_dev" domain so that it is properly reported. Suggested many times and ignored all the time beause it requires a bit of work. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-15 17:24 ` Christoph Hellwig @ 2021-07-15 18:01 ` Josef Bacik 2021-07-15 22:37 ` NeilBrown ` (2 more replies) 0 siblings, 3 replies; 56+ messages in thread From: Josef Bacik @ 2021-07-15 18:01 UTC (permalink / raw) To: Christoph Hellwig Cc: NeilBrown, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On 7/15/21 1:24 PM, Christoph Hellwig wrote: > On Thu, Jul 15, 2021 at 01:11:29PM -0400, Josef Bacik wrote: >> Because there's no alternative. We need a way to tell userspace they've >> wandered into a different inode namespace. There's no argument that what >> we're doing is ugly, but there's never been a clear "do X instead". Just a >> lot of whinging that btrfs is broken. This makes userspace happy and is >> simple and straightforward. I'm open to alternatives, but there have been 0 >> workable alternatives proposed in the last decade of complaining about it. > > Make sure we cross a vfsmount when crossing the "st_dev" domain so > that it is properly reported. Suggested many times and ignored all > the time beause it requires a bit of work. > You keep telling me this but forgetting that I did all this work when you originally suggested it. The problem I ran into was the automount stuff requires that we have a completely different superblock for every vfsmount. This is fine for things like nfs or samba where the automount literally points to a completely different mount, but doesn't work for btrfs where it's on the same file system. If you have 1000 subvolumes and run sync() you're going to write the superblock 1000 times for the same file system. You are going to reclaim inodes on the same file system 1000 times. You are going to reclaim dcache on the same filesytem 1000 times. You are also going to pin 1000 dentries/inodes into memory whenever you wander into these things because the super is going to hold them open. This is not a workable solution. It's not a matter of simply tying into existing infrastructure, we'd have to completely rework how the VFS deals with this stuff in order to be reasonable. And when I brought this up to Al he told me I was insane and we absolutely had to have a different SB for every vfsmount, which means we can't use vfsmount for this, which means we don't have any other options. Thanks, Josef ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-15 18:01 ` Josef Bacik @ 2021-07-15 22:37 ` NeilBrown 2021-07-19 15:40 ` Josef Bacik 2021-07-19 15:49 ` J. Bruce Fields 2021-07-19 9:16 ` Christoph Hellwig 2021-07-20 22:10 ` J. Bruce Fields 2 siblings, 2 replies; 56+ messages in thread From: NeilBrown @ 2021-07-15 22:37 UTC (permalink / raw) To: Josef Bacik Cc: Christoph Hellwig, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Fri, 16 Jul 2021, Josef Bacik wrote: > On 7/15/21 1:24 PM, Christoph Hellwig wrote: > > On Thu, Jul 15, 2021 at 01:11:29PM -0400, Josef Bacik wrote: > >> Because there's no alternative. We need a way to tell userspace they've > >> wandered into a different inode namespace. There's no argument that what > >> we're doing is ugly, but there's never been a clear "do X instead". Just a > >> lot of whinging that btrfs is broken. This makes userspace happy and is > >> simple and straightforward. I'm open to alternatives, but there have been 0 > >> workable alternatives proposed in the last decade of complaining about it. > > > > Make sure we cross a vfsmount when crossing the "st_dev" domain so > > that it is properly reported. Suggested many times and ignored all > > the time beause it requires a bit of work. > > > > You keep telling me this but forgetting that I did all this work when you > originally suggested it. The problem I ran into was the automount stuff > requires that we have a completely different superblock for every vfsmount. > This is fine for things like nfs or samba where the automount literally points > to a completely different mount, but doesn't work for btrfs where it's on the > same file system. If you have 1000 subvolumes and run sync() you're going to > write the superblock 1000 times for the same file system. You are going to > reclaim inodes on the same file system 1000 times. You are going to reclaim > dcache on the same filesytem 1000 times. You are also going to pin 1000 > dentries/inodes into memory whenever you wander into these things because the > super is going to hold them open. > > This is not a workable solution. It's not a matter of simply tying into > existing infrastructure, we'd have to completely rework how the VFS deals with > this stuff in order to be reasonable. And when I brought this up to Al he told > me I was insane and we absolutely had to have a different SB for every vfsmount, > which means we can't use vfsmount for this, which means we don't have any other > options. Thanks, When I was first looking at this, I thought that separate vfsmnts and auto-mounting was the way to go "just like NFS". NFS still shares a lot between the multiple superblock - certainly it shares the same connection to the server. But I dropped the idea when Bruce pointed out that nfsd is not set up to export auto-mounted filesystems. It needs to be able to find a filesystem given a UUID (extracted from a filehandle), and it does this by walking through the mount table to find one that matches. So unless all btrfs subvols were mounted all the time (which I wouldn't propose), it would need major work to fix. NFSv4 describes the fsid as having a "major" and "minor" component. We've never treated these as having an important meaning - just extra bits to encode uniqueness in. Maybe we should have used "major" for the vfsmnt, and kept "minor" for the subvol..... The idea for a single vfsmnt exposing multiple inode-name-spaces does appeal to me. The "st_dev" is just part of the name, and already a fairly blurry part. Thanks to bind mounts, multiple mounts can have the same st_dev. I see no intrinsic reason that a single mount should not have multiple fsids, provided that a coherent picture is provided to userspace which doesn't contain too many surprises. NeilBrown ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-15 22:37 ` NeilBrown @ 2021-07-19 15:40 ` Josef Bacik 2021-07-19 20:00 ` J. Bruce Fields 2021-07-19 15:49 ` J. Bruce Fields 1 sibling, 1 reply; 56+ messages in thread From: Josef Bacik @ 2021-07-19 15:40 UTC (permalink / raw) To: NeilBrown Cc: Christoph Hellwig, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On 7/15/21 6:37 PM, NeilBrown wrote: > On Fri, 16 Jul 2021, Josef Bacik wrote: >> On 7/15/21 1:24 PM, Christoph Hellwig wrote: >>> On Thu, Jul 15, 2021 at 01:11:29PM -0400, Josef Bacik wrote: >>>> Because there's no alternative. We need a way to tell userspace they've >>>> wandered into a different inode namespace. There's no argument that what >>>> we're doing is ugly, but there's never been a clear "do X instead". Just a >>>> lot of whinging that btrfs is broken. This makes userspace happy and is >>>> simple and straightforward. I'm open to alternatives, but there have been 0 >>>> workable alternatives proposed in the last decade of complaining about it. >>> >>> Make sure we cross a vfsmount when crossing the "st_dev" domain so >>> that it is properly reported. Suggested many times and ignored all >>> the time beause it requires a bit of work. >>> >> >> You keep telling me this but forgetting that I did all this work when you >> originally suggested it. The problem I ran into was the automount stuff >> requires that we have a completely different superblock for every vfsmount. >> This is fine for things like nfs or samba where the automount literally points >> to a completely different mount, but doesn't work for btrfs where it's on the >> same file system. If you have 1000 subvolumes and run sync() you're going to >> write the superblock 1000 times for the same file system. You are going to >> reclaim inodes on the same file system 1000 times. You are going to reclaim >> dcache on the same filesytem 1000 times. You are also going to pin 1000 >> dentries/inodes into memory whenever you wander into these things because the >> super is going to hold them open. >> >> This is not a workable solution. It's not a matter of simply tying into >> existing infrastructure, we'd have to completely rework how the VFS deals with >> this stuff in order to be reasonable. And when I brought this up to Al he told >> me I was insane and we absolutely had to have a different SB for every vfsmount, >> which means we can't use vfsmount for this, which means we don't have any other >> options. Thanks, > > When I was first looking at this, I thought that separate vfsmnts > and auto-mounting was the way to go "just like NFS". NFS still shares a > lot between the multiple superblock - certainly it shares the same > connection to the server. > > But I dropped the idea when Bruce pointed out that nfsd is not set up to > export auto-mounted filesystems. It needs to be able to find a > filesystem given a UUID (extracted from a filehandle), and it does this > by walking through the mount table to find one that matches. So unless > all btrfs subvols were mounted all the time (which I wouldn't propose), > it would need major work to fix. > > NFSv4 describes the fsid as having a "major" and "minor" component. > We've never treated these as having an important meaning - just extra > bits to encode uniqueness in. Maybe we should have used "major" for the > vfsmnt, and kept "minor" for the subvol..... > > The idea for a single vfsmnt exposing multiple inode-name-spaces does > appeal to me. The "st_dev" is just part of the name, and already a > fairly blurry part. Thanks to bind mounts, multiple mounts can have the > same st_dev. I see no intrinsic reason that a single mount should not > have multiple fsids, provided that a coherent picture is provided to > userspace which doesn't contain too many surprises. > Ok so setting aside btrfs for the moment, how does NFS deal with exporting a directory that has multiple other file systems under that tree? I assume the same sort of problem doesn't occur, but why is that? Is it because it's a different vfsmount/sb or is there some other magic making this work? Thanks, Josef ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-19 15:40 ` Josef Bacik @ 2021-07-19 20:00 ` J. Bruce Fields 2021-07-19 20:44 ` Josef Bacik 0 siblings, 1 reply; 56+ messages in thread From: J. Bruce Fields @ 2021-07-19 20:00 UTC (permalink / raw) To: Josef Bacik Cc: NeilBrown, Christoph Hellwig, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Mon, Jul 19, 2021 at 11:40:28AM -0400, Josef Bacik wrote: > Ok so setting aside btrfs for the moment, how does NFS deal with > exporting a directory that has multiple other file systems under > that tree? I assume the same sort of problem doesn't occur, but why > is that? Is it because it's a different vfsmount/sb or is there > some other magic making this work? Thanks, There are two main ways an NFS client can look up a file: by name or by filehandle. The former's the normal filesystem directory lookup that we're used to. If the name refers to a mountpoint, the server can cross into the mounted filesystem like anyone else. It's the lookup by filehandle that's interesting. Typically the filehandle includes a UUID and an inode number. The server looks up the UUID with some help from mountd, and that gives a superblock that nfsd can use for the inode lookup. As Neil says, mountd does that basically by searching among mounted filesystems for one with that uuid. So if you wanted to be able to handle a uuid for a filesystem that's not even mounted yet, you'd need some new mechanism to look up such uuids. That's something we don't currently support but that we'd need to support if BTRFS subvolumes were automounted. (And it might have other uses as well.) But I'm not entirely sure if that answers your question.... --b. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-19 20:00 ` J. Bruce Fields @ 2021-07-19 20:44 ` Josef Bacik 2021-07-19 23:53 ` NeilBrown 0 siblings, 1 reply; 56+ messages in thread From: Josef Bacik @ 2021-07-19 20:44 UTC (permalink / raw) To: J. Bruce Fields Cc: NeilBrown, Christoph Hellwig, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On 7/19/21 4:00 PM, J. Bruce Fields wrote: > On Mon, Jul 19, 2021 at 11:40:28AM -0400, Josef Bacik wrote: >> Ok so setting aside btrfs for the moment, how does NFS deal with >> exporting a directory that has multiple other file systems under >> that tree? I assume the same sort of problem doesn't occur, but why >> is that? Is it because it's a different vfsmount/sb or is there >> some other magic making this work? Thanks, > > There are two main ways an NFS client can look up a file: by name or by > filehandle. The former's the normal filesystem directory lookup that > we're used to. If the name refers to a mountpoint, the server can cross > into the mounted filesystem like anyone else. > > It's the lookup by filehandle that's interesting. Typically the > filehandle includes a UUID and an inode number. The server looks up the > UUID with some help from mountd, and that gives a superblock that nfsd > can use for the inode lookup. > > As Neil says, mountd does that basically by searching among mounted > filesystems for one with that uuid. > > So if you wanted to be able to handle a uuid for a filesystem that's not > even mounted yet, you'd need some new mechanism to look up such uuids. > > That's something we don't currently support but that we'd need to > support if BTRFS subvolumes were automounted. (And it might have other > uses as well.) > > But I'm not entirely sure if that answers your question.... > Right, because btrfs handles the filehandles ourselves properly with the export_operations and we encode the subvolume id's into those things to make sure we can always do the proper lookup. I suppose the real problem is that NFS is exposing the inode->i_ino to the client without understanding that it's on a different subvolume. Our trick of simply allocating an anonymous bdev every time you wander into a subvolume to get a unique st_dev doesn't help you guys because you are looking for mounted file systems. I'm not concerned about the FH case, because for that it's already been crafted by btrfs and we know what to do with it, so it's always going to be correct. The actual problem is that we can do getattr(/file1) getattr(/snap/file1) on the client and the NFS server just blind sends i_ino with the same fsid because / and /snap are the same fsid. Which brings us back to what HCH is complaining about. In his view if we had a vfsmount for /snap then you would know that it was a different fs. However that would only actually work if we generated a completely different superblock and thus gave /snap a unique fsid, right? If we did the automount thing, and the NFS server went down and came back up and got a getattr(/snap/file1) from a previously generated FH it would still work right, because it would come into the export_operations with the format that btrfs is expecting and it would be able to do the lookup. This FH lookup would do the automount magic it needs to and then NFS would have the fsid it needs, correct? Thanks, Josef ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-19 20:44 ` Josef Bacik @ 2021-07-19 23:53 ` NeilBrown 0 siblings, 0 replies; 56+ messages in thread From: NeilBrown @ 2021-07-19 23:53 UTC (permalink / raw) To: Josef Bacik Cc: J. Bruce Fields, Christoph Hellwig, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Tue, 20 Jul 2021, Josef Bacik wrote: > On 7/19/21 4:00 PM, J. Bruce Fields wrote: > > On Mon, Jul 19, 2021 at 11:40:28AM -0400, Josef Bacik wrote: > >> Ok so setting aside btrfs for the moment, how does NFS deal with > >> exporting a directory that has multiple other file systems under > >> that tree? I assume the same sort of problem doesn't occur, but why > >> is that? Is it because it's a different vfsmount/sb or is there > >> some other magic making this work? Thanks, > > > > There are two main ways an NFS client can look up a file: by name or by > > filehandle. The former's the normal filesystem directory lookup that > > we're used to. If the name refers to a mountpoint, the server can cross > > into the mounted filesystem like anyone else. > > > > It's the lookup by filehandle that's interesting. Typically the > > filehandle includes a UUID and an inode number. The server looks up the > > UUID with some help from mountd, and that gives a superblock that nfsd > > can use for the inode lookup. > > > > As Neil says, mountd does that basically by searching among mounted > > filesystems for one with that uuid. > > > > So if you wanted to be able to handle a uuid for a filesystem that's not > > even mounted yet, you'd need some new mechanism to look up such uuids. > > > > That's something we don't currently support but that we'd need to > > support if BTRFS subvolumes were automounted. (And it might have other > > uses as well.) > > > > But I'm not entirely sure if that answers your question.... > > > > Right, because btrfs handles the filehandles ourselves properly with the > export_operations and we encode the subvolume id's into those things to make > sure we can always do the proper lookup. > > I suppose the real problem is that NFS is exposing the inode->i_ino to the > client without understanding that it's on a different subvolume. > > Our trick of simply allocating an anonymous bdev every time you wander into a > subvolume to get a unique st_dev doesn't help you guys because you are looking > for mounted file systems. > > I'm not concerned about the FH case, because for that it's already been crafted > by btrfs and we know what to do with it, so it's always going to be correct. > > The actual problem is that we can do > > getattr(/file1) > getattr(/snap/file1) > > on the client and the NFS server just blind sends i_ino with the same fsid > because / and /snap are the same fsid. > > Which brings us back to what HCH is complaining about. In his view if we had a > vfsmount for /snap then you would know that it was a different fs. However that > would only actually work if we generated a completely different superblock and > thus gave /snap a unique fsid, right? No, I don't think it needs to be a different superblock to have a vfsmount. (I don't know if it does to keep HCH happy). If I "mount --bind /snap /snap" then I've created a vfsmnt with the upper and lower directories identical - same inode, same superblock. This is an existence-proof that you don't need a separate super-block. > > If we did the automount thing, and the NFS server went down and came back up and > got a getattr(/snap/file1) from a previously generated FH it would still work > right, because it would come into the export_operations with the format that > btrfs is expecting and it would be able to do the lookup. This FH lookup would > do the automount magic it needs to and then NFS would have the fsid it needs, > correct? Thanks, Not quite. An NFS filehandle (as generated by linux-nfsd) has two components (plus a header). The filesystem-part and the file-part. The filesystem-part is managed by userspace (/usr/sbin/mountd). The code relies on every filesystem appearing in /proc/self/mounts. The bytes chosen are either based on the uuid reported by 'libblkid', or the fsid reported by statfs(), based on a black-list of filesystems for which libblkid is not useful. This list includes btrfs. The file-part is managed in the kernel using export_operations. For any given 'struct path' in the kernel, a filehandle is generated (conceptually) by finding the closest vfsmnt (close to inode, far from root) and asking user-space to map that. Then passing the inode to the filesystem and asking it to map that. So, in your example, if /snap were a mount point, the kernel would ask mountd to determine the filesystem-part of /snap, and the fact that the file-part from btrfs contained the objectid for snap just be redundant information. If /snap couldn't be found in /proc/self/mounts after a server restart, the filehandle would be stale. If btrfs were to use automounts and create the vfsmnts that one might normally expect, then nfsd would need there to be two different sorts of mount points, ideally visible in /proc/mounts (maybe a new flag that appears in the list of mount options? "internal" ??). - there needs to be the current mountpoint which a expected to be present after a reboot, and is likely to introduce a new filesystem, and - there are these "new" mountpoints which are on-demand and expose something that is (in some sense) part of the same filesystem. The key property that NFSd would depend on is that these mount points do NOT introduce a new name-space for file-handles (in the sense of export_operations). To expand on that last point: - If a filehandle is requested for an inode above the "new" mountpoint and another "below" the new mountpoint, they are guaranteed to be different. - If a filehandle that was "below" the new mountpoint is passed to exportfs_decode_fh() together with the vfsmnt that was *above* the mountpoint, then it somehow does "the right thing". Probably that would require changing exportfs_decode_fh() to return a 'struct path' rather than just a 'struct dentry *'. When nfsd detected one of these "internal" mountpoints during a lookup, it would *not* call-out to user-space to create a new export, but it *would* ensure that a new fsid was reported for all inodes in the new vfsmnt. NeilBrown ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-15 22:37 ` NeilBrown 2021-07-19 15:40 ` Josef Bacik @ 2021-07-19 15:49 ` J. Bruce Fields 2021-07-20 0:02 ` NeilBrown 1 sibling, 1 reply; 56+ messages in thread From: J. Bruce Fields @ 2021-07-19 15:49 UTC (permalink / raw) To: NeilBrown Cc: Josef Bacik, Christoph Hellwig, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Fri, Jul 16, 2021 at 08:37:07AM +1000, NeilBrown wrote: > On Fri, 16 Jul 2021, Josef Bacik wrote: > > On 7/15/21 1:24 PM, Christoph Hellwig wrote: > > > On Thu, Jul 15, 2021 at 01:11:29PM -0400, Josef Bacik wrote: > > >> Because there's no alternative. We need a way to tell userspace they've > > >> wandered into a different inode namespace. There's no argument that what > > >> we're doing is ugly, but there's never been a clear "do X instead". Just a > > >> lot of whinging that btrfs is broken. This makes userspace happy and is > > >> simple and straightforward. I'm open to alternatives, but there have been 0 > > >> workable alternatives proposed in the last decade of complaining about it. > > > > > > Make sure we cross a vfsmount when crossing the "st_dev" domain so > > > that it is properly reported. Suggested many times and ignored all > > > the time beause it requires a bit of work. > > > > > > > You keep telling me this but forgetting that I did all this work when you > > originally suggested it. The problem I ran into was the automount stuff > > requires that we have a completely different superblock for every vfsmount. > > This is fine for things like nfs or samba where the automount literally points > > to a completely different mount, but doesn't work for btrfs where it's on the > > same file system. If you have 1000 subvolumes and run sync() you're going to > > write the superblock 1000 times for the same file system. You are going to > > reclaim inodes on the same file system 1000 times. You are going to reclaim > > dcache on the same filesytem 1000 times. You are also going to pin 1000 > > dentries/inodes into memory whenever you wander into these things because the > > super is going to hold them open. > > > > This is not a workable solution. It's not a matter of simply tying into > > existing infrastructure, we'd have to completely rework how the VFS deals with > > this stuff in order to be reasonable. And when I brought this up to Al he told > > me I was insane and we absolutely had to have a different SB for every vfsmount, > > which means we can't use vfsmount for this, which means we don't have any other > > options. Thanks, > > When I was first looking at this, I thought that separate vfsmnts > and auto-mounting was the way to go "just like NFS". NFS still shares a > lot between the multiple superblock - certainly it shares the same > connection to the server. > > But I dropped the idea when Bruce pointed out that nfsd is not set up to > export auto-mounted filesystems. Yes. I wish it was.... But we'd need some way to look a not-currently-mounted filesystem by filehandle: > It needs to be able to find a > filesystem given a UUID (extracted from a filehandle), and it does this > by walking through the mount table to find one that matches. So unless > all btrfs subvols were mounted all the time (which I wouldn't propose), > it would need major work to fix. > > NFSv4 describes the fsid as having a "major" and "minor" component. > We've never treated these as having an important meaning - just extra > bits to encode uniqueness in. Maybe we should have used "major" for the > vfsmnt, and kept "minor" for the subvol..... So nfsd would use the "major" ID to find the parent export, and then btrfs would use the "minor" ID to identify the subvolume? --b. > The idea for a single vfsmnt exposing multiple inode-name-spaces does > appeal to me. The "st_dev" is just part of the name, and already a > fairly blurry part. Thanks to bind mounts, multiple mounts can have the > same st_dev. I see no intrinsic reason that a single mount should not > have multiple fsids, provided that a coherent picture is provided to > userspace which doesn't contain too many surprises. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-19 15:49 ` J. Bruce Fields @ 2021-07-20 0:02 ` NeilBrown 0 siblings, 0 replies; 56+ messages in thread From: NeilBrown @ 2021-07-20 0:02 UTC (permalink / raw) To: J. Bruce Fields Cc: Josef Bacik, Christoph Hellwig, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Tue, 20 Jul 2021, J. Bruce Fields wrote: > On Fri, Jul 16, 2021 at 08:37:07AM +1000, NeilBrown wrote: > > On Fri, 16 Jul 2021, Josef Bacik wrote: > > > On 7/15/21 1:24 PM, Christoph Hellwig wrote: > > > > On Thu, Jul 15, 2021 at 01:11:29PM -0400, Josef Bacik wrote: > > > >> Because there's no alternative. We need a way to tell userspace they've > > > >> wandered into a different inode namespace. There's no argument that what > > > >> we're doing is ugly, but there's never been a clear "do X instead". Just a > > > >> lot of whinging that btrfs is broken. This makes userspace happy and is > > > >> simple and straightforward. I'm open to alternatives, but there have been 0 > > > >> workable alternatives proposed in the last decade of complaining about it. > > > > > > > > Make sure we cross a vfsmount when crossing the "st_dev" domain so > > > > that it is properly reported. Suggested many times and ignored all > > > > the time beause it requires a bit of work. > > > > > > > > > > You keep telling me this but forgetting that I did all this work when you > > > originally suggested it. The problem I ran into was the automount stuff > > > requires that we have a completely different superblock for every vfsmount. > > > This is fine for things like nfs or samba where the automount literally points > > > to a completely different mount, but doesn't work for btrfs where it's on the > > > same file system. If you have 1000 subvolumes and run sync() you're going to > > > write the superblock 1000 times for the same file system. You are going to > > > reclaim inodes on the same file system 1000 times. You are going to reclaim > > > dcache on the same filesytem 1000 times. You are also going to pin 1000 > > > dentries/inodes into memory whenever you wander into these things because the > > > super is going to hold them open. > > > > > > This is not a workable solution. It's not a matter of simply tying into > > > existing infrastructure, we'd have to completely rework how the VFS deals with > > > this stuff in order to be reasonable. And when I brought this up to Al he told > > > me I was insane and we absolutely had to have a different SB for every vfsmount, > > > which means we can't use vfsmount for this, which means we don't have any other > > > options. Thanks, > > > > When I was first looking at this, I thought that separate vfsmnts > > and auto-mounting was the way to go "just like NFS". NFS still shares a > > lot between the multiple superblock - certainly it shares the same > > connection to the server. > > > > But I dropped the idea when Bruce pointed out that nfsd is not set up to > > export auto-mounted filesystems. > > Yes. I wish it was.... But we'd need some way to look a > not-currently-mounted filesystem by filehandle: > > > It needs to be able to find a > > filesystem given a UUID (extracted from a filehandle), and it does this > > by walking through the mount table to find one that matches. So unless > > all btrfs subvols were mounted all the time (which I wouldn't propose), > > it would need major work to fix. > > > > NFSv4 describes the fsid as having a "major" and "minor" component. > > We've never treated these as having an important meaning - just extra > > bits to encode uniqueness in. Maybe we should have used "major" for the > > vfsmnt, and kept "minor" for the subvol..... > > So nfsd would use the "major" ID to find the parent export, and then > btrfs would use the "minor" ID to identify the subvolume? Maybe, though I don't think it would be really useful - just a thought-bubble. As the spec doesn't define any behaviour of these two numbers, there is no point trying to impose any. But (as described in another email) I think we do need to clearly differentiate between "volume" and "subvolume" in the Linux API. We cannot really use "different mount point" to mean "different volume" as bind mounts broke that model long ago. I think that "different st_dev" means "different subvolume" is a core requirement as many applications assume that. So the question is "how to determine if two objects in different subvolumes are still in the same volume". This is something that nfsd needs to know. NeilBrown ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-15 18:01 ` Josef Bacik 2021-07-15 22:37 ` NeilBrown @ 2021-07-19 9:16 ` Christoph Hellwig 2021-07-19 23:54 ` NeilBrown 2021-07-20 22:10 ` J. Bruce Fields 2 siblings, 1 reply; 56+ messages in thread From: Christoph Hellwig @ 2021-07-19 9:16 UTC (permalink / raw) To: Josef Bacik Cc: Christoph Hellwig, NeilBrown, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Thu, Jul 15, 2021 at 02:01:11PM -0400, Josef Bacik wrote: > This is not a workable solution. It's not a matter of simply tying into > existing infrastructure, we'd have to completely rework how the VFS deals > with this stuff in order to be reasonable. And when I brought this up to Al > he told me I was insane and we absolutely had to have a different SB for > every vfsmount, which means we can't use vfsmount for this, which means we > don't have any other options. Thanks, Then fix the problem another way. The problem is known, old and keeps breaking stuff. Don't paper over it, fix it. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-19 9:16 ` Christoph Hellwig @ 2021-07-19 23:54 ` NeilBrown 2021-07-20 6:23 ` Christoph Hellwig 0 siblings, 1 reply; 56+ messages in thread From: NeilBrown @ 2021-07-19 23:54 UTC (permalink / raw) To: Christoph Hellwig Cc: Josef Bacik, Christoph Hellwig, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Mon, 19 Jul 2021, Christoph Hellwig wrote: > On Thu, Jul 15, 2021 at 02:01:11PM -0400, Josef Bacik wrote: > > This is not a workable solution. It's not a matter of simply tying into > > existing infrastructure, we'd have to completely rework how the VFS deals > > with this stuff in order to be reasonable. And when I brought this up to Al > > he told me I was insane and we absolutely had to have a different SB for > > every vfsmount, which means we can't use vfsmount for this, which means we > > don't have any other options. Thanks, > > Then fix the problem another way. The problem is known, old and keeps > breaking stuff. Don't paper over it, fix it. Do you have any pointers to other breakage caused by this particular behaviour of btrfs? It would to have all requirements clearly on the table while designing a solution. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-19 23:54 ` NeilBrown @ 2021-07-20 6:23 ` Christoph Hellwig 2021-07-20 7:17 ` NeilBrown 0 siblings, 1 reply; 56+ messages in thread From: Christoph Hellwig @ 2021-07-20 6:23 UTC (permalink / raw) To: NeilBrown Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Tue, Jul 20, 2021 at 09:54:44AM +1000, NeilBrown wrote: > Do you have any pointers to other breakage caused by this particular > behaviour of btrfs? It would to have all requirements clearly on the > table while designing a solution. A quick google find: https://lore.kernel.org/linux-btrfs/b5e7e64a-741c-baee-bc4d-cd51ca9b3a38@gmail.com/T/ https://savannah.gnu.org/bugs/?50859 https://github.com/coreos/bugs/issues/301 https://bugs.kde.org/show_bug.cgi?id=317127 https://github.com/borgbackup/borg/issues/4009 https://bugs.python.org/issue37339 http://mail.openjdk.java.net/pipermail/nio-dev/2017-June/004292.html and that is just the first 2 or three pages of trivial search results. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-20 6:23 ` Christoph Hellwig @ 2021-07-20 7:17 ` NeilBrown 2021-07-20 8:00 ` Christoph Hellwig 0 siblings, 1 reply; 56+ messages in thread From: NeilBrown @ 2021-07-20 7:17 UTC (permalink / raw) To: Christoph Hellwig Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Tue, 20 Jul 2021, Christoph Hellwig wrote: > On Tue, Jul 20, 2021 at 09:54:44AM +1000, NeilBrown wrote: > > Do you have any pointers to other breakage caused by this particular > > behaviour of btrfs? It would to have all requirements clearly on the > > table while designing a solution. > > A quick google find: > > https://lore.kernel.org/linux-btrfs/b5e7e64a-741c-baee-bc4d-cd51ca9b3a38@gmail.com/T/ > https://savannah.gnu.org/bugs/?50859 > https://github.com/coreos/bugs/issues/301 > https://bugs.kde.org/show_bug.cgi?id=317127 > https://github.com/borgbackup/borg/issues/4009 > https://bugs.python.org/issue37339 > http://mail.openjdk.java.net/pipermail/nio-dev/2017-June/004292.html > > and that is just the first 2 or three pages of trivial search results. > Thanks a lot for these! Very helpful. The details vary, but the core problem seems to be that the device number found in /proc/self/mountinfo is the same for all mounts from a given btrfs filesystem, no matter which subvol happens to be found at or beneath that mountpoint. So it can even be that 'stat' on a mountpoint returns different numbers to what is found for that mountpoint in /proc/self/mountinfo. To address these issues we would need to: 1/ make every btrfs subvol which is not already a mountpoint into an automount point which mounts the subvol (similar to the use of automount in NFS). 2/ either give each subvol a separate 'struct super_block' (which is apparently a bad idea) or change show_mountinfo() to allow an alternate dev_t to be used. e.g. some new s_op which is given mnt->mnt_root and returns a dev_t. If the new s_op is not available, sb->s_dev is used. For nfsd to be able to work with this, those automount points need to have an inode in the parent filesystem with a distinct inode number, and the mount must be marked in some way that nfsd can tell that it is "internal". Possibly a helper function that tests if mnt_parent has the same mnt.mnt_sb would be sufficient, though it might be nice to export this fact to user-space somehow. Also exportfs_decode_fh() needs to be enhanced, probably to return a 'struct path'. Does anything there seem unreasonable to you? Thanks, NeilBrown ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-20 7:17 ` NeilBrown @ 2021-07-20 8:00 ` Christoph Hellwig 2021-07-20 23:11 ` NeilBrown 0 siblings, 1 reply; 56+ messages in thread From: Christoph Hellwig @ 2021-07-20 8:00 UTC (permalink / raw) To: NeilBrown Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Tue, Jul 20, 2021 at 05:17:12PM +1000, NeilBrown wrote: > Does anything there seem unreasonable to you? This is what I've been asking for for years. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-20 8:00 ` Christoph Hellwig @ 2021-07-20 23:11 ` NeilBrown 0 siblings, 0 replies; 56+ messages in thread From: NeilBrown @ 2021-07-20 23:11 UTC (permalink / raw) To: Christoph Hellwig Cc: Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Tue, 20 Jul 2021, Christoph Hellwig wrote: > On Tue, Jul 20, 2021 at 05:17:12PM +1000, NeilBrown wrote: > > Does anything there seem unreasonable to you? > > This is what I've been asking for for years. > > Execellent - we seem to be on the same page. I'll aim to have some prelimiary patches for review within a week. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-15 18:01 ` Josef Bacik 2021-07-15 22:37 ` NeilBrown 2021-07-19 9:16 ` Christoph Hellwig @ 2021-07-20 22:10 ` J. Bruce Fields 2 siblings, 0 replies; 56+ messages in thread From: J. Bruce Fields @ 2021-07-20 22:10 UTC (permalink / raw) To: Josef Bacik Cc: Christoph Hellwig, NeilBrown, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Thu, Jul 15, 2021 at 02:01:11PM -0400, Josef Bacik wrote: > The problem I ran into was the automount stuff requires that we have a > completely different superblock for every vfsmount. This is fine for > things like nfs or samba where the automount literally points to a > completely different mount, but doesn't work for btrfs where it's on > the same file system. If you have 1000 subvolumes and run sync() > you're going to write the superblock 1000 times for the same file > system. Dumb question: why do you have to write the superblock 1000 times, and why is that slower than writing to 1000 different filesystems? > You are > going to reclaim inodes on the same file system 1000 times. You are > going to reclaim dcache on the same filesytem 1000 times. You are > also going to pin 1000 dentries/inodes into memory whenever you > wander into these things because the super is going to hold them > open. That last part at least is the same for the 1000-different-filesystems case, isn't it? --b. > This is not a workable solution. It's not a matter of simply tying > into existing infrastructure, we'd have to completely rework how the > VFS deals with this stuff in order to be reasonable. And when I > brought this up to Al he told me I was insane and we absolutely had > to have a different SB for every vfsmount, which means we can't use > vfsmount for this, which means we don't have any other options. > Thanks, > > Josef ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-15 14:09 ` [PATCH/RFC] NFSD: handle BTRFS subvolumes better Josef Bacik 2021-07-15 16:45 ` Christoph Hellwig @ 2021-07-15 23:02 ` NeilBrown 1 sibling, 0 replies; 56+ messages in thread From: NeilBrown @ 2021-07-15 23:02 UTC (permalink / raw) To: Josef Bacik Cc: J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Fri, 16 Jul 2021, Josef Bacik wrote: > > I'm going to restate what I think the problem is you're having just so I'm sure > we're on the same page. > > 1. We export a btrfs volume via nfsd that has multiple subvolumes. > 2. We run find, and when we stat a file, nfsd doesn't send along our bogus > st_dev, it sends it's own thing (I assume?). This confuses du/find because you > get the same inode number with different parents. > > Is this correct? If that's the case then it' be relatively straightforward to > add another callback into export_operations to grab this fsid right? Hell we > could simply return the objectid of the root since that's unique across the > entire file system. We already do our magic FH encoding to make sure we keep > all this straight for NFS, another callback to give that info isn't going to > kill us. Thanks, Fairly close. As well as the fsid I need a "mounted-on" inode number, so one callback to provide both would do. If zero was reported, that would be equivalent to not providing the callback. - Is "u64" always enough for the subvol-id? - Should we make these details available to user-space with a new STATX flag? - Should it be a new export_operations callback, or new fields in "struct kstat" ?? ... though having asked those question, I begin to wonder if I took a wrong turn. I can already get some fsid information form statfs, though it is only 64bits and for BTRFS is combines the filesystem uuid and the subvol id. For that reason I avoided it. But I'm already caching the fsid for the export-point. If, when I find a different fsid lower down, I xor the result with the export-point fsid, the result would be fairly clean (the xor difference between the two subvol ids) and could be safely mixed into the fsid we currently report. So all I REALLY need from btrfs is a "mounted-on" inode number, matching what readdir() reports. I wouldn't argue AGAINST getting cleaner fsid information. A 128-bit uuid and a 64bit subvol id would be ideal. I'd rather see them as new STATX flags than a new export_operations callback. NeilBrown ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. [not found] ` <162632387205.13764.6196748476850020429@noble.neil.brown.name> 2021-07-15 14:09 ` [PATCH/RFC] NFSD: handle BTRFS subvolumes better Josef Bacik @ 2021-07-15 15:45 ` J. Bruce Fields 2021-07-15 23:08 ` NeilBrown 1 sibling, 1 reply; 56+ messages in thread From: J. Bruce Fields @ 2021-07-15 15:45 UTC (permalink / raw) To: NeilBrown Cc: Chuck Lever, Chris Mason, Josef Bacik, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Thu, Jul 15, 2021 at 02:37:52PM +1000, NeilBrown wrote: > To fix this, we need to report a different fsid for each subvolume, but > need to use the same fsid that we currently use for the top-level > volume. Changing this (by rebooting a server to new code), might > confuse the client. I don't think it would be a major problem (stale > filehandles shouldn't happen), but it is best avoided. ... > Again, we really want an API to get this from the filesystem. Changing > it later has no cost, so we don't need any commitment from the btrfs team > that this is what they will provide if/when we do get such an API. "No cost" makes me a little nervous, are we sure nobody will notice the mountd-on-fileid changing? Fileid and fsid changes I'd worry about more, though I wouldn't rule it out if that'd stand in the way of a bug fix. Thanks for looking into this. --b. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better. 2021-07-15 15:45 ` J. Bruce Fields @ 2021-07-15 23:08 ` NeilBrown 0 siblings, 0 replies; 56+ messages in thread From: NeilBrown @ 2021-07-15 23:08 UTC (permalink / raw) To: J. Bruce Fields Cc: Chuck Lever, Chris Mason, Josef Bacik, David Sterba, linux-nfs, Wang Yugui, Ulli Horlacher, linux-btrfs On Fri, 16 Jul 2021, J. Bruce Fields wrote: > On Thu, Jul 15, 2021 at 02:37:52PM +1000, NeilBrown wrote: > > To fix this, we need to report a different fsid for each subvolume, but > > need to use the same fsid that we currently use for the top-level > > volume. Changing this (by rebooting a server to new code), might > > confuse the client. I don't think it would be a major problem (stale > > filehandles shouldn't happen), but it is best avoided. > ... > > Again, we really want an API to get this from the filesystem. Changing > > it later has no cost, so we don't need any commitment from the btrfs team > > that this is what they will provide if/when we do get such an API. > > "No cost" makes me a little nervous, are we sure nobody will notice the > mountd-on-fileid changing? One cannot be 100% sure, but I cannot see how anything would depend on it being stable. Certainly the kernel doesn't. 'ls -i' doesn't report it - even as "ls -if". "find -inum xx" cannot see it. Obviously readdir() will see it but if any application put much weight on the number, it could already get confused when btrfs returns non-unique numbers as I mentioned. I certainly wouldn't lose sleep over changing it. NeilBrown > > Fileid and fsid changes I'd worry about more, though I wouldn't rule it > out if that'd stand in the way of a bug fix. > > Thanks for looking into this. > > --b. > > ^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2021-07-27 11:35 UTC | newest] Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20210613115313.BC59.409509F4@e16-tech.com> 2021-03-10 7:46 ` nfs subvolume access? Ulli Horlacher 2021-03-10 7:59 ` Hugo Mills 2021-03-10 8:09 ` Ulli Horlacher 2021-03-10 9:35 ` Graham Cobb 2021-03-10 15:55 ` Ulli Horlacher 2021-03-10 17:29 ` Forza 2021-03-10 17:46 ` Ulli Horlacher 2021-03-10 8:17 ` Ulli Horlacher 2021-03-11 7:46 ` Ulli Horlacher 2021-07-08 22:17 ` cannot use btrfs for nfs server Ulli Horlacher 2021-07-09 0:05 ` Graham Cobb 2021-07-09 4:05 ` NeilBrown 2021-07-09 6:53 ` Ulli Horlacher 2021-07-09 7:23 ` Forza 2021-07-09 7:24 ` Hugo Mills 2021-07-09 7:34 ` Ulli Horlacher 2021-07-09 16:30 ` Chris Murphy 2021-07-10 6:35 ` Ulli Horlacher 2021-07-11 11:41 ` Forza 2021-07-12 7:17 ` Ulli Horlacher 2021-07-09 16:35 ` Chris Murphy 2021-07-10 6:56 ` Ulli Horlacher 2021-07-10 22:17 ` Chris Murphy 2021-07-12 7:25 ` Ulli Horlacher 2021-07-12 13:06 ` Graham Cobb 2021-07-12 16:16 ` Ulli Horlacher 2021-07-12 22:56 ` g.btrfs 2021-07-13 7:37 ` Ulli Horlacher 2021-07-19 12:06 ` Forza 2021-07-19 13:07 ` Forza 2021-07-19 13:35 ` Forza 2021-07-27 11:27 ` Ulli Horlacher 2021-07-09 16:06 ` Lord Vader 2021-07-10 7:03 ` Ulli Horlacher [not found] ` <162632387205.13764.6196748476850020429@noble.neil.brown.name> 2021-07-15 14:09 ` [PATCH/RFC] NFSD: handle BTRFS subvolumes better Josef Bacik 2021-07-15 16:45 ` Christoph Hellwig 2021-07-15 17:11 ` Josef Bacik 2021-07-15 17:24 ` Christoph Hellwig 2021-07-15 18:01 ` Josef Bacik 2021-07-15 22:37 ` NeilBrown 2021-07-19 15:40 ` Josef Bacik 2021-07-19 20:00 ` J. Bruce Fields 2021-07-19 20:44 ` Josef Bacik 2021-07-19 23:53 ` NeilBrown 2021-07-19 15:49 ` J. Bruce Fields 2021-07-20 0:02 ` NeilBrown 2021-07-19 9:16 ` Christoph Hellwig 2021-07-19 23:54 ` NeilBrown 2021-07-20 6:23 ` Christoph Hellwig 2021-07-20 7:17 ` NeilBrown 2021-07-20 8:00 ` Christoph Hellwig 2021-07-20 23:11 ` NeilBrown 2021-07-20 22:10 ` J. Bruce Fields 2021-07-15 23:02 ` NeilBrown 2021-07-15 15:45 ` J. Bruce Fields 2021-07-15 23:08 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).