linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* xattr names for unprivileged stacking?
@ 2020-07-28 10:55 Dr. David Alan Gilbert
  2020-07-28 13:08 ` Greg Kurz
  0 siblings, 1 reply; 62+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-28 10:55 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: stefanha, groug, mszeredi, vgoyal, gscrivan, dwalsh, chirantan

Hi,
  Are there any standards for mapping xattr names/classes when
a restricted view of the filesystem needs to think it's root?

e.g. VMs that mount host filesystems, remote filesystems etc and the
client kernel tries to set a trusted. or security. xattr and you want
to store that on an underlying normal filesystem, but your
VM system doesn't want to have CAP_SYS_ADMIN and/or doesn't want to
interfere with the real hosts security.

I can see some existing examples:

  9p in qemu
     maps system.posix_acl_* to user.virtfs.system.posix_acl_*
          stops the guest accessing any user.virtfs.*

   overlayfs
      uses trusted.overlay.* on upper layer and blocks that from 
           clients

   fuse-overlayfs
      uses trusted.overlay.* for compatibiltiy if it has perms,
      otherwise falls back to user.fuseoverlayfs.*

   crosvm's virtiofs
      maps "security.sehash" to "user.virtiofs.security.sehash"
      and blocks the guest from accessing user.virtiofs.*

Does anyone know of any others?

It all seems quite adhoc;  these all fall to bits when you
stack them or when you write a filesystem using one of these
schemes and then mount it with another.

(I'm about to do a similar mapping for virtiofs's C daemon)

Thanks in advance,

Dave 

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-07-28 10:55 xattr names for unprivileged stacking? Dr. David Alan Gilbert
@ 2020-07-28 13:08 ` Greg Kurz
  2020-07-28 13:55   ` Christian Schoenebeck
  0 siblings, 1 reply; 62+ messages in thread
From: Greg Kurz @ 2020-07-28 13:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan, dwalsh,
	chirantan, Christian Schoenebeck

On Tue, 28 Jul 2020 11:55:03 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> Hi,
>   Are there any standards for mapping xattr names/classes when
> a restricted view of the filesystem needs to think it's root?
> 
> e.g. VMs that mount host filesystems, remote filesystems etc and the
> client kernel tries to set a trusted. or security. xattr and you want
> to store that on an underlying normal filesystem, but your
> VM system doesn't want to have CAP_SYS_ADMIN and/or doesn't want to
> interfere with the real hosts security.
> 
> I can see some existing examples:
> 
>   9p in qemu
>      maps system.posix_acl_* to user.virtfs.system.posix_acl_*
>           stops the guest accessing any user.virtfs.*
> 
>    overlayfs
>       uses trusted.overlay.* on upper layer and blocks that from 
>            clients
> 
>    fuse-overlayfs
>       uses trusted.overlay.* for compatibiltiy if it has perms,
>       otherwise falls back to user.fuseoverlayfs.*
> 
>    crosvm's virtiofs
>       maps "security.sehash" to "user.virtiofs.security.sehash"
>       and blocks the guest from accessing user.virtiofs.*
> 
> Does anyone know of any others?
> 

Hi Dave,

Sorry, I'm not aware of any other example.

Cc'ing Christian Schoenebeck, the new 9p maintainer in QEMU in case
he has some information to share in this area.

Cheers,

--
Greg

> It all seems quite adhoc;  these all fall to bits when you
> stack them or when you write a filesystem using one of these
> schemes and then mount it with another.
> 
> (I'm about to do a similar mapping for virtiofs's C daemon)
> 
> Thanks in advance,
> 
> Dave 
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-07-28 13:08 ` Greg Kurz
@ 2020-07-28 13:55   ` Christian Schoenebeck
  2020-08-04 11:28     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 62+ messages in thread
From: Christian Schoenebeck @ 2020-07-28 13:55 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Dr. David Alan Gilbert, linux-fsdevel, stefanha, mszeredi,
	vgoyal, gscrivan, dwalsh, chirantan

On Dienstag, 28. Juli 2020 15:08:59 CEST Greg Kurz wrote:
> On Tue, 28 Jul 2020 11:55:03 +0100
> 
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > Hi,
> > 
> >   Are there any standards for mapping xattr names/classes when
> > 
> > a restricted view of the filesystem needs to think it's root?
> > 
> > e.g. VMs that mount host filesystems, remote filesystems etc and the
> > client kernel tries to set a trusted. or security. xattr and you want
> > to store that on an underlying normal filesystem, but your
> > VM system doesn't want to have CAP_SYS_ADMIN and/or doesn't want to
> > interfere with the real hosts security.
> > 
> > I can see some existing examples:
> >   9p in qemu
> >   
> >      maps system.posix_acl_* to user.virtfs.system.posix_acl_*
> >      
> >           stops the guest accessing any user.virtfs.*

Not that they were remapped, but the 'local' 9pfs fs driver also actively 
interprets:

	user.virtfs.uid
	user.virtfs.gid
	user.virtfs.mode
	user.virtfs.rdev

> >    overlayfs
> >    
> >       uses trusted.overlay.* on upper layer and blocks that from
> >       
> >            clients
> >    
> >    fuse-overlayfs
> >    
> >       uses trusted.overlay.* for compatibiltiy if it has perms,
> >       otherwise falls back to user.fuseoverlayfs.*
> >    
> >    crosvm's virtiofs
> >    
> >       maps "security.sehash" to "user.virtiofs.security.sehash"
> >       and blocks the guest from accessing user.virtiofs.*
> > 
> > Does anyone know of any others?

Well, depends on how large you draw the scope here. For instance Samba has a 
bunch VFS modules which also uses and hence prohibits certain xattrs. For 
instance for supporting (NTFS) alternate data streams (a.k.a. resource forks) 
of Windows clients it uses user.DosStream.*:

https://www.samba.org/samba/docs/current/man-html/vfs_streams_xattr.8.html

as well as "user.DOSATTRIB".

And as macOS heavily relies on resource forks (i.e. macOS doesn't work without 
them), there are a bunch of xattr remappings in the dedicated Apple VFS 
module, like "aapl_*":

https://www.samba.org/samba/docs/current/man-html/vfs_fruit.8.html
https://github.com/samba-team/samba/blob/master/source3/modules/vfs_fruit.c

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-07-28 13:55   ` Christian Schoenebeck
@ 2020-08-04 11:28     ` Dr. David Alan Gilbert
  2020-08-04 13:51       ` Christian Schoenebeck
  0 siblings, 1 reply; 62+ messages in thread
From: Dr. David Alan Gilbert @ 2020-08-04 11:28 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Greg Kurz, linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan,
	dwalsh, chirantan

* Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> On Dienstag, 28. Juli 2020 15:08:59 CEST Greg Kurz wrote:
> > On Tue, 28 Jul 2020 11:55:03 +0100
> > 
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > Hi,
> > > 
> > >   Are there any standards for mapping xattr names/classes when
> > > 
> > > a restricted view of the filesystem needs to think it's root?
> > > 
> > > e.g. VMs that mount host filesystems, remote filesystems etc and the
> > > client kernel tries to set a trusted. or security. xattr and you want
> > > to store that on an underlying normal filesystem, but your
> > > VM system doesn't want to have CAP_SYS_ADMIN and/or doesn't want to
> > > interfere with the real hosts security.
> > > 
> > > I can see some existing examples:
> > >   9p in qemu
> > >   
> > >      maps system.posix_acl_* to user.virtfs.system.posix_acl_*
> > >      
> > >           stops the guest accessing any user.virtfs.*
> 
> Not that they were remapped, but the 'local' 9pfs fs driver also actively 
> interprets:
> 
> 	user.virtfs.uid
> 	user.virtfs.gid
> 	user.virtfs.mode
> 	user.virtfs.rdev
> 
> > >    overlayfs
> > >    
> > >       uses trusted.overlay.* on upper layer and blocks that from
> > >       
> > >            clients
> > >    
> > >    fuse-overlayfs
> > >    
> > >       uses trusted.overlay.* for compatibiltiy if it has perms,
> > >       otherwise falls back to user.fuseoverlayfs.*
> > >    
> > >    crosvm's virtiofs
> > >    
> > >       maps "security.sehash" to "user.virtiofs.security.sehash"
> > >       and blocks the guest from accessing user.virtiofs.*
> > > 
> > > Does anyone know of any others?
> 
> Well, depends on how large you draw the scope here. For instance Samba has a 
> bunch VFS modules which also uses and hence prohibits certain xattrs. For 
> instance for supporting (NTFS) alternate data streams (a.k.a. resource forks) 
> of Windows clients it uses user.DosStream.*:
> 
> https://www.samba.org/samba/docs/current/man-html/vfs_streams_xattr.8.html
> 
> as well as "user.DOSATTRIB".
> 
> And as macOS heavily relies on resource forks (i.e. macOS doesn't work without 
> them), there are a bunch of xattr remappings in the dedicated Apple VFS 
> module, like "aapl_*":
> 
> https://www.samba.org/samba/docs/current/man-html/vfs_fruit.8.html
> https://github.com/samba-team/samba/blob/master/source3/modules/vfs_fruit.c

Thanks;  what I've added to virtiofsd at the moment is a generic
remapping thing that lets me add any prefix and block/drop any xattr.

The other samba-ism I found was mvxattr(1) which lets you rename xattr's
ona  directory tree; which is quite useful.

Dave


> Best regards,
> Christian Schoenebeck
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-04 11:28     ` Dr. David Alan Gilbert
@ 2020-08-04 13:51       ` Christian Schoenebeck
  2020-08-12 11:18         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 62+ messages in thread
From: Christian Schoenebeck @ 2020-08-04 13:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Greg Kurz, linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan,
	dwalsh, chirantan

On Dienstag, 4. August 2020 13:28:01 CEST Dr. David Alan Gilbert wrote:
> > Well, depends on how large you draw the scope here. For instance Samba has
> > a bunch VFS modules which also uses and hence prohibits certain xattrs.
> > For instance for supporting (NTFS) alternate data streams (a.k.a.
> > resource forks) of Windows clients it uses user.DosStream.*:
> > 
> > https://www.samba.org/samba/docs/current/man-html/vfs_streams_xattr.8.html
> > 
> > as well as "user.DOSATTRIB".
> > 
> > And as macOS heavily relies on resource forks (i.e. macOS doesn't work
> > without them), there are a bunch of xattr remappings in the dedicated
> > Apple VFS module, like "aapl_*":
> > 
> > https://www.samba.org/samba/docs/current/man-html/vfs_fruit.8.html
> > https://github.com/samba-team/samba/blob/master/source3/modules/vfs_fruit.
> > c
> 
> Thanks;  what I've added to virtiofsd at the moment is a generic
> remapping thing that lets me add any prefix and block/drop any xattr.

Right, makes absolutely sense to make it configurable. There are too many use 
cases for xattrs, and the precise xattr names are often configurable as well, 
like with the mentioned Samba VFS modules.

> The other samba-ism I found was mvxattr(1) which lets you rename xattr's
> ona  directory tree; which is quite useful.

Haven't seen that before, interesting.

BTW, I have plans for adding support for file forks [1] (a.k.a. alternate 
streams, a.k.a. resource forks) on Linux kernel side, so I will probably come 
up with an RFC in couple weeks to see whether there would be acceptance for 
that at all and if yes in which form.

That would open a similar problematic to virtiofsd on the long term, as file 
forks have a namespace on their own.

[1] https://en.wikipedia.org/wiki/Fork_(file_system)

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-04 13:51       ` Christian Schoenebeck
@ 2020-08-12 11:18         ` Dr. David Alan Gilbert
  2020-08-12 13:34           ` Christian Schoenebeck
  0 siblings, 1 reply; 62+ messages in thread
From: Dr. David Alan Gilbert @ 2020-08-12 11:18 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Greg Kurz, linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan,
	dwalsh, chirantan

* Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> On Dienstag, 4. August 2020 13:28:01 CEST Dr. David Alan Gilbert wrote:
> > > Well, depends on how large you draw the scope here. For instance Samba has
> > > a bunch VFS modules which also uses and hence prohibits certain xattrs.
> > > For instance for supporting (NTFS) alternate data streams (a.k.a.
> > > resource forks) of Windows clients it uses user.DosStream.*:
> > > 
> > > https://www.samba.org/samba/docs/current/man-html/vfs_streams_xattr.8.html
> > > 
> > > as well as "user.DOSATTRIB".
> > > 
> > > And as macOS heavily relies on resource forks (i.e. macOS doesn't work
> > > without them), there are a bunch of xattr remappings in the dedicated
> > > Apple VFS module, like "aapl_*":
> > > 
> > > https://www.samba.org/samba/docs/current/man-html/vfs_fruit.8.html
> > > https://github.com/samba-team/samba/blob/master/source3/modules/vfs_fruit.
> > > c
> > 
> > Thanks;  what I've added to virtiofsd at the moment is a generic
> > remapping thing that lets me add any prefix and block/drop any xattr.
> 
> Right, makes absolutely sense to make it configurable. There are too many use 
> cases for xattrs, and the precise xattr names are often configurable as well, 
> like with the mentioned Samba VFS modules.
> 
> > The other samba-ism I found was mvxattr(1) which lets you rename xattr's
> > ona  directory tree; which is quite useful.
> 
> Haven't seen that before, interesting.
> 
> BTW, I have plans for adding support for file forks [1] (a.k.a. alternate 
> streams, a.k.a. resource forks) on Linux kernel side, so I will probably come 
> up with an RFC in couple weeks to see whether there would be acceptance for 
> that at all and if yes in which form.
> 
> That would open a similar problematic to virtiofsd on the long term, as file 
> forks have a namespace on their own.

Yeh I'm sure that'll need wiring into lots of things in weird ways!
I guess the main difference between an extended attribute and a
file-fork is that you can access the fork using an fd and it feels more
like a file?

Dave


> [1] https://en.wikipedia.org/wiki/Fork_(file_system)
> 
> Best regards,
> Christian Schoenebeck
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-12 11:18         ` Dr. David Alan Gilbert
@ 2020-08-12 13:34           ` Christian Schoenebeck
  2020-08-12 14:33             ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 62+ messages in thread
From: Christian Schoenebeck @ 2020-08-12 13:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Greg Kurz, linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan,
	dwalsh, chirantan

On Mittwoch, 12. August 2020 13:18:19 CEST Dr. David Alan Gilbert wrote:
> * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> > On Dienstag, 4. August 2020 13:28:01 CEST Dr. David Alan Gilbert wrote:
> > > > Well, depends on how large you draw the scope here. For instance Samba
> > > > has
> > > > a bunch VFS modules which also uses and hence prohibits certain
> > > > xattrs.
> > > > For instance for supporting (NTFS) alternate data streams (a.k.a.
> > > > resource forks) of Windows clients it uses user.DosStream.*:
> > > > 
> > > > https://www.samba.org/samba/docs/current/man-html/vfs_streams_xattr.8.
> > > > html
> > > > 
> > > > as well as "user.DOSATTRIB".
> > > > 
> > > > And as macOS heavily relies on resource forks (i.e. macOS doesn't work
> > > > without them), there are a bunch of xattr remappings in the dedicated
> > > > Apple VFS module, like "aapl_*":
> > > > 
> > > > https://www.samba.org/samba/docs/current/man-html/vfs_fruit.8.html
> > > > https://github.com/samba-team/samba/blob/master/source3/modules/vfs_fr
> > > > uit.
> > > > c
> > > 
> > > Thanks;  what I've added to virtiofsd at the moment is a generic
> > > remapping thing that lets me add any prefix and block/drop any xattr.
> > 
> > Right, makes absolutely sense to make it configurable. There are too many
> > use cases for xattrs, and the precise xattr names are often configurable
> > as well, like with the mentioned Samba VFS modules.
> > 
> > > The other samba-ism I found was mvxattr(1) which lets you rename xattr's
> > > ona  directory tree; which is quite useful.
> > 
> > Haven't seen that before, interesting.
> > 
> > BTW, I have plans for adding support for file forks [1] (a.k.a. alternate
> > streams, a.k.a. resource forks) on Linux kernel side, so I will probably
> > come up with an RFC in couple weeks to see whether there would be
> > acceptance for that at all and if yes in which form.
> > 
> > That would open a similar problematic to virtiofsd on the long term, as
> > file forks have a namespace on their own.
> 
> Yeh I'm sure that'll need wiring into lots of things in weird ways!
> I guess the main difference between an extended attribute and a
> file-fork is that you can access the fork using an fd and it feels more
> like a file?

Well, that's a very short reduction of its purpose, but it is a common core 
feature, yes.

xattrs are only suitable for very small data (currently <= 64 kiB on Linux), 
whereas file forks can be as large as any regular file. And yes, forks 
commonly work with fd, so they allow you to do all kinds of I/O operations on 
them. Theoretically though you could even allow to use forks with any other 
function that accepts an fd.

The main issue is that file forks are not in POSIX. So every OS currently has 
its own concept and API, which probably makes a consensus more difficult for 
Linux.

For instance Solaris allows you to set different ownership and permissions on 
forks as well. It does not allow you to create sub-forks though, nor directory 
structures for forks.

On macOS there was (or actually still is) even a quite complex API which 
separated forks into "resource forks" and "data forks", where resource forks 
were typically used as components of an application binary (e.g. menu 
structure, icons, executable binary modules, text and translations). So 
resource forks not only had names, they also had predefined 16-bit type 
identifiers:
https://en.wikipedia.org/wiki/Resource_fork

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-12 13:34           ` Christian Schoenebeck
@ 2020-08-12 14:33             ` Dr. David Alan Gilbert
  2020-08-13  9:01               ` Christian Schoenebeck
  0 siblings, 1 reply; 62+ messages in thread
From: Dr. David Alan Gilbert @ 2020-08-12 14:33 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Greg Kurz, linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan,
	dwalsh, chirantan

* Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> On Mittwoch, 12. August 2020 13:18:19 CEST Dr. David Alan Gilbert wrote:
> > * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> > > On Dienstag, 4. August 2020 13:28:01 CEST Dr. David Alan Gilbert wrote:
> > > > > Well, depends on how large you draw the scope here. For instance Samba
> > > > > has
> > > > > a bunch VFS modules which also uses and hence prohibits certain
> > > > > xattrs.
> > > > > For instance for supporting (NTFS) alternate data streams (a.k.a.
> > > > > resource forks) of Windows clients it uses user.DosStream.*:
> > > > > 
> > > > > https://www.samba.org/samba/docs/current/man-html/vfs_streams_xattr.8.
> > > > > html
> > > > > 
> > > > > as well as "user.DOSATTRIB".
> > > > > 
> > > > > And as macOS heavily relies on resource forks (i.e. macOS doesn't work
> > > > > without them), there are a bunch of xattr remappings in the dedicated
> > > > > Apple VFS module, like "aapl_*":
> > > > > 
> > > > > https://www.samba.org/samba/docs/current/man-html/vfs_fruit.8.html
> > > > > https://github.com/samba-team/samba/blob/master/source3/modules/vfs_fr
> > > > > uit.
> > > > > c
> > > > 
> > > > Thanks;  what I've added to virtiofsd at the moment is a generic
> > > > remapping thing that lets me add any prefix and block/drop any xattr.
> > > 
> > > Right, makes absolutely sense to make it configurable. There are too many
> > > use cases for xattrs, and the precise xattr names are often configurable
> > > as well, like with the mentioned Samba VFS modules.
> > > 
> > > > The other samba-ism I found was mvxattr(1) which lets you rename xattr's
> > > > ona  directory tree; which is quite useful.
> > > 
> > > Haven't seen that before, interesting.
> > > 
> > > BTW, I have plans for adding support for file forks [1] (a.k.a. alternate
> > > streams, a.k.a. resource forks) on Linux kernel side, so I will probably
> > > come up with an RFC in couple weeks to see whether there would be
> > > acceptance for that at all and if yes in which form.
> > > 
> > > That would open a similar problematic to virtiofsd on the long term, as
> > > file forks have a namespace on their own.
> > 
> > Yeh I'm sure that'll need wiring into lots of things in weird ways!
> > I guess the main difference between an extended attribute and a
> > file-fork is that you can access the fork using an fd and it feels more
> > like a file?
> 
> Well, that's a very short reduction of its purpose, but it is a common core 
> feature, yes.
> 
> xattrs are only suitable for very small data (currently <= 64 kiB on Linux), 
> whereas file forks can be as large as any regular file. And yes, forks 
> commonly work with fd, so they allow you to do all kinds of I/O operations on 
> them. Theoretically though you could even allow to use forks with any other 
> function that accepts an fd.
> 
> The main issue is that file forks are not in POSIX. So every OS currently has 
> its own concept and API, which probably makes a consensus more difficult for 
> Linux.
> 
> For instance Solaris allows you to set different ownership and permissions on 
> forks as well. It does not allow you to create sub-forks though, nor directory 
> structures for forks.

Yeh that's quite a change in semantics.

> On macOS there was (or actually still is) even a quite complex API which 
> separated forks into "resource forks" and "data forks", where resource forks 
> were typically used as components of an application binary (e.g. menu 
> structure, icons, executable binary modules, text and translations). So 
> resource forks not only had names, they also had predefined 16-bit type 
> identifiers:
> https://en.wikipedia.org/wiki/Resource_fork

Yeh, lots of different ways.

In a way, if you had a way to drop the 64kiB limit on xattr, then you
could have one type of object, but then add new ways of accessing them
as forks.

Dave

> Best regards,
> Christian Schoenebeck
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-12 14:33             ` Dr. David Alan Gilbert
@ 2020-08-13  9:01               ` Christian Schoenebeck
  2020-08-16 22:56                 ` Dave Chinner
  0 siblings, 1 reply; 62+ messages in thread
From: Christian Schoenebeck @ 2020-08-13  9:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Greg Kurz, linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan,
	dwalsh, chirantan

On Mittwoch, 12. August 2020 16:33:23 CEST Dr. David Alan Gilbert wrote:
> > On macOS there was (or actually still is) even a quite complex API which
> > separated forks into "resource forks" and "data forks", where resource
> > forks were typically used as components of an application binary (e.g.
> > menu structure, icons, executable binary modules, text and translations).
> > So resource forks not only had names, they also had predefined 16-bit
> > type identifiers:
> > https://en.wikipedia.org/wiki/Resource_fork
> 
> Yeh, lots of different ways.
> 
> In a way, if you had a way to drop the 64kiB limit on xattr, then you
> could have one type of object, but then add new ways of accessing them
> as forks.

That's yet another question: should xattrs and forks share the same data- and 
namespace, or rather be orthogonal to each other.

Say forks would (one day) have their own ownership and permissions, then 
restricted environments would want to project forks' permissions onto xattrs, 
which would suggest an orthogonal approach (i.e. forks having their own 
xattrs).

OTOH a shared namespace would allow a mellow transition for heterogenous 
systems and their apps from in-memory-only xattrs towards I/O based forks.

Another option: shared namespace, but allowing forks having subforks. That 
would e.g. allow restricted environments to project permissions onto subforks, 
and the latter in turn being accessible by xattr API at the same time.

Or yet another option: shared data space, but nesting the namespace of one 
side under prefix on the other side (e.g. fork "foo" <=> xattr "fork.foo").

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-13  9:01               ` Christian Schoenebeck
@ 2020-08-16 22:56                 ` Dave Chinner
  2020-08-16 23:09                   ` Matthew Wilcox
  0 siblings, 1 reply; 62+ messages in thread
From: Dave Chinner @ 2020-08-16 22:56 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel, stefanha,
	mszeredi, vgoyal, gscrivan, dwalsh, chirantan

On Thu, Aug 13, 2020 at 11:01:36AM +0200, Christian Schoenebeck wrote:
> On Mittwoch, 12. August 2020 16:33:23 CEST Dr. David Alan Gilbert wrote:
> > > On macOS there was (or actually still is) even a quite complex API which
> > > separated forks into "resource forks" and "data forks", where resource
> > > forks were typically used as components of an application binary (e.g.
> > > menu structure, icons, executable binary modules, text and translations).
> > > So resource forks not only had names, they also had predefined 16-bit
> > > type identifiers:
> > > https://en.wikipedia.org/wiki/Resource_fork
> > 
> > Yeh, lots of different ways.
> > 
> > In a way, if you had a way to drop the 64kiB limit on xattr, then you
> > could have one type of object, but then add new ways of accessing them
> > as forks.
> 
> That's yet another question: should xattrs and forks share the same data- and 
> namespace, or rather be orthogonal to each other.

Completely orthogonal. Alternate data streams are not xattrs, and
xattrs are not ADS....

Indeed, most filesystems will not be able to implement ADS as
xattrs. xattrs are implemented as atomicly journalled metadata on
most filesytems, they cannot be used like a seekable file by
userspace at all. If you want ADS to masquerade as an xattr, then
you have to graft the entire file IO path onto filesytsem xattrs,
and that just ain't gonna work without a -lot- of development in
every filesystem that wants to support ADS.

We've already got a perfectly good presentation layer for user data
files that are accessed by file descriptors (i.e. directories
containing files), so that should be the presentation layer you seek
to extend.

IOWs, trying to use abuse xattrs for ADS support is a non-starter.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-16 22:56                 ` Dave Chinner
@ 2020-08-16 23:09                   ` Matthew Wilcox
  2020-08-17  0:29                     ` Dave Chinner
  0 siblings, 1 reply; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-16 23:09 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christian Schoenebeck, Dr. David Alan Gilbert, Greg Kurz,
	linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan, dwalsh,
	chirantan

On Mon, Aug 17, 2020 at 08:56:20AM +1000, Dave Chinner wrote:
> Indeed, most filesystems will not be able to implement ADS as
> xattrs. xattrs are implemented as atomicly journalled metadata on
> most filesytems, they cannot be used like a seekable file by
> userspace at all. If you want ADS to masquerade as an xattr, then
> you have to graft the entire file IO path onto filesytsem xattrs,
> and that just ain't gonna work without a -lot- of development in
> every filesystem that wants to support ADS.
> 
> We've already got a perfectly good presentation layer for user data
> files that are accessed by file descriptors (i.e. directories
> containing files), so that should be the presentation layer you seek
> to extend.
> 
> IOWs, trying to use abuse xattrs for ADS support is a non-starter.

One thing Dave didn't mention is that a directory can have xattrs,
forks and files (and acls).  So your presentation layer needs to not
confuse one thing for another.

I don't understand why a fork would be permitted to have its own
permissions.  That makes no sense.  Silly Solaris.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-16 23:09                   ` Matthew Wilcox
@ 2020-08-17  0:29                     ` Dave Chinner
  2020-08-17 10:37                       ` file forks vs. xattr (was: xattr names for unprivileged stacking?) Christian Schoenebeck
  2020-08-27 15:22                       ` xattr names for unprivileged stacking? Matthew Wilcox
  0 siblings, 2 replies; 62+ messages in thread
From: Dave Chinner @ 2020-08-17  0:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christian Schoenebeck, Dr. David Alan Gilbert, Greg Kurz,
	linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan, dwalsh,
	chirantan

On Mon, Aug 17, 2020 at 12:09:08AM +0100, Matthew Wilcox wrote:
> On Mon, Aug 17, 2020 at 08:56:20AM +1000, Dave Chinner wrote:
> > Indeed, most filesystems will not be able to implement ADS as
> > xattrs. xattrs are implemented as atomicly journalled metadata on
> > most filesytems, they cannot be used like a seekable file by
> > userspace at all. If you want ADS to masquerade as an xattr, then
> > you have to graft the entire file IO path onto filesytsem xattrs,
> > and that just ain't gonna work without a -lot- of development in
> > every filesystem that wants to support ADS.
> > 
> > We've already got a perfectly good presentation layer for user data
> > files that are accessed by file descriptors (i.e. directories
> > containing files), so that should be the presentation layer you seek
> > to extend.
> > 
> > IOWs, trying to use abuse xattrs for ADS support is a non-starter.
> 
> One thing Dave didn't mention is that a directory can have xattrs,
> forks and files (and acls).  So your presentation layer needs to not
> confuse one thing for another.

I'd stop calling these "forks" already, too. The user wants
"alternate data streams", while a "resource fork" is an internal
filesystem implementation detail used to provide ADS
functionality...

e.g. an XFS inode has a "data fork" which contains the extent tree
that points at user data.  This is a seekable fork. Directories
are also implemented internally in the data fork as directories are
seekable.

OTOH, the XFS inode has an "attr fork" which contains a key-value
btree which only supports record based operations. i.e. and records
can only be atomically updated via transactions. This is not a
seekable data store. xattrs are stored in this data store. The
key-value store supports multiple namespaces (e.g. system vs user)
so things like ACLs and security information can be stored as xattrs
and not be visible as user xattrs.

On the gripping hand, the XFS inode also has a virtual "COW fork"
which is used to track data fork regions that are in the process of
underdoing a copy-on-write operation. This is a shadow extent tree
that tracks the new location of the data until writeback occurs and
then the new location is atomically swapped back into the data
fork. This fork does not get exposed to userspace, nor does it ever
end up on disk - users do not know this fork even exists.

IOWs, historically speaking, a "fork" is something that is used to
implement different storage types and address spaces within an
inode, it's not a feature that is exposed to users and userspace.

To implement ADS, we'd likely consider adding a new physical inode
"ADS fork" which, internally, maps to a separate directory
structure. This provides us with the ADS namespace for each inode
and a mechanism that instantiates a physical inode per ADS. IOWs,
each ADS can be referenced by the VFS natively and independently as
an inode (native "file as a directory" semantics). Hence existing
create/unlink APIs work for managing ADS, readdir() can list all
your ADS, you can keep per ADS xattrs, etc....

IOWs, with a filesystem inode fork implementation like this for ADS,
all we really need is for the VFS to pass a magic command to
->lookup() to tell us to use the ADS namespace attached to the inode
rather than use the primary inode type/state to perform the
operation.

Hence all the ADS support infrastructure is essentially dentry cache
infrastructure allowing a dentry to be both a file and directory,
and providing the pathname resolution that recognises an ADS
redirection. Name that however you want - we've got to do an on-disk
format change to support ADS, so we can tell the VFS we support ADS
or not. And we have no cares about existing names in the filesystem
conflicting with the ADS pathname identifier because it's a mkfs
time decision. Given that special flags are needed for the openat()
call to resolve an ADS (e.g. O_ALT), we know if we should parse the
ADS identifier as an ADS the moment it is seen...

> I don't understand why a fork would be permitted to have its own
> permissions.  That makes no sense.  Silly Solaris.

I can't think of a reason why, either, but the above implementation
for XFS would support it if the presentation layer allows it... :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-17  0:29                     ` Dave Chinner
@ 2020-08-17 10:37                       ` Christian Schoenebeck
  2020-08-23 23:40                         ` Dave Chinner
  2020-08-27 15:22                       ` xattr names for unprivileged stacking? Matthew Wilcox
  1 sibling, 1 reply; 62+ messages in thread
From: Christian Schoenebeck @ 2020-08-17 10:37 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Matthew Wilcox, Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	stefanha, mszeredi, vgoyal, gscrivan, dwalsh, chirantan

On Montag, 17. August 2020 00:56:20 CEST Dave Chinner wrote:
> > That's yet another question: should xattrs and forks share the same data-
> > and namespace, or rather be orthogonal to each other.
> 
> Completely orthogonal. Alternate data streams are not xattrs, and
> xattrs are not ADS....

Agreed. Their key features (atomic small data vs. non-atomic large data) and 
their typical uses cases are probably too different for trying to stitch them 
somehow in an erroneous way into a shared space. Plus it would actually be 
beneficial if forks had their own xattrs.

On Montag, 17. August 2020 02:29:30 CEST Dave Chinner wrote:
> I'd stop calling these "forks" already, too. The user wants
> "alternate data streams", while a "resource fork" is an internal
> filesystem implementation detail used to provide ADS
> functionality...

The common terminology can certainly still be argued. I understand that from 
fs implementation perspective "fork" is probably ambiguous. But from public 
API (i.e. user space side) perspective the term "fork" does make sense, and so 
far I have not seen a better general term for this. Plus the ambiguous aspects 
on fs side are not exposed to the public side.

The term "alternate data stream" suggests that this is just about the raw data 
stream, but that's probably not what this feature will end up being limited 
to. E.g. I think they will have their own permissions on the long term (see 
below). Plus the term ADS is ATM somewhat sticky to the Microsoft universe.

> IOWs, with a filesystem inode fork implementation like this for ADS,
> all we really need is for the VFS to pass a magic command to
> ->lookup() to tell us to use the ADS namespace attached to the inode
> rather than use the primary inode type/state to perform the
> operation.

IMO starting with a minimalistic approach, in a way Solaris developers 
originally introduced forks, would IMO make sense for Linux as well:

- Adding a new option O_FORK to fcntl.h (Solaris uses O_XATTR, not a good
  idea for Linux though for reasons discussed).

- (Mis)using existing APIs for accessing forks (i.e. *at() functions):

	/* open fork 'foo' of file 'sheet.pdf' */

	int fdfile = open("sheet.pdf", O_PATH);
	int fdfork = openat(fdfile, "foo", O_FORK);
	/* continue with regular file I/O on fdfork now ... */

	and

	/* list all forks of file 'sheet.pdf' */

	int fdfile = open("sheet.pdf", O_PATH);
	int fdlist = openat(fdfile, ".", O_RDONLY|O_FORK);
	DIR* dir = fdopendir(fdlist);
	struct dirent* dent;
	while ((dent = readdir(dir)) {
		...
	}

- Permissions and ownership: Same as the file for simplicity as starting 
  point for the first version (see below).

- No subforks as starting point, and hence path separator '/' inside fork 
  names would be prohibited initially to avoid future clashes.

> Hence all the ADS support infrastructure is essentially dentry cache
> infrastructure allowing a dentry to be both a file and directory,
> and providing the pathname resolution that recognises an ADS
> redirection. Name that however you want - we've got to do an on-disk
> format change to support ADS, so we can tell the VFS we support ADS
> or not. And we have no cares about existing names in the filesystem
> conflicting with the ADS pathname identifier because it's a mkfs
> time decision. Given that special flags are needed for the openat()
> call to resolve an ADS (e.g. O_ALT), we know if we should parse the
> ADS identifier as an ADS the moment it is seen...

So you think there should be a built-in full qualified path name resolution to 
forks right from the start? E.g. like on Windows "C:\some\where\sheet.pdf:foo" 
-> fork "foo" of file "sheet.pdf"?

> > I don't understand why a fork would be permitted to have its own
> > permissions.  That makes no sense.  Silly Solaris.
> 
> I can't think of a reason why, either, but the above implementation
> for XFS would support it if the presentation layer allows it... :)

I would definitely not add this right from the start of course, but on the 
long term it actually does make senses for them having their own permissions, 
simply because there are already applications for that:

E.g. on some systems forks are used to tag files for security relevant issues, 
for instance where the file originated from (a trusted vs. untrusted source). 
If it was a untrusted source, the user is made aware about this circumstance 
by the system when attempting to open the file. In this use case the fork 
would probably have more restrictive permissions than the actual file.

OTOH forks are used to extend existing files in non-obtrusive way. Say you 
have some sort of (e.g. huge) master file, and a team works on that file. Then 
the individual people would attach their changes solely as forks to the master 
file with their ownership, probably even with complex ACLs, to prevent certain 
users from touching (or even reading) other ones changes. In this use case the 
master file might be readonly for most people, while the individual forks 
being anywhere between more permissive or more restrictive.

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-17 10:37                       ` file forks vs. xattr (was: xattr names for unprivileged stacking?) Christian Schoenebeck
@ 2020-08-23 23:40                         ` Dave Chinner
  2020-08-24 15:30                           ` Christian Schoenebeck
  0 siblings, 1 reply; 62+ messages in thread
From: Dave Chinner @ 2020-08-23 23:40 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Matthew Wilcox, Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	stefanha, mszeredi, vgoyal, gscrivan, dwalsh, chirantan

On Mon, Aug 17, 2020 at 12:37:17PM +0200, Christian Schoenebeck wrote:
> On Montag, 17. August 2020 00:56:20 CEST Dave Chinner wrote:
> > > That's yet another question: should xattrs and forks share the same data-
> > > and namespace, or rather be orthogonal to each other.
> > 
> > Completely orthogonal. Alternate data streams are not xattrs, and
> > xattrs are not ADS....
> 
> Agreed. Their key features (atomic small data vs. non-atomic large data) and 
> their typical uses cases are probably too different for trying to stitch them 
> somehow in an erroneous way into a shared space. Plus it would actually be 
> beneficial if forks had their own xattrs.
> 
> On Montag, 17. August 2020 02:29:30 CEST Dave Chinner wrote:
> > I'd stop calling these "forks" already, too. The user wants
> > "alternate data streams", while a "resource fork" is an internal
> > filesystem implementation detail used to provide ADS
> > functionality...
> 
> The common terminology can certainly still be argued. I understand that from 
> fs implementation perspective "fork" is probably ambiguous. But from public 
> API (i.e. user space side) perspective the term "fork" does make sense, and so 
> far I have not seen a better general term for this. Plus the ambiguous aspects 
> on fs side are not exposed to the public side.
> 
> The term "alternate data stream" suggests that this is just about the raw data 
> stream, but that's probably not what this feature will end up being limited 
> to. E.g. I think they will have their own permissions on the long term (see 
> below). Plus the term ADS is ATM somewhat sticky to the Microsoft universe.

ADS is the windows term, which is where the majority of people who
use or want to ADS come from. Novell called the "multiple data
streams", and solaris 9 implemented "extended attributes" (ADS)
using inode forks. Apple allows a "data fork" (user data), "resource
forks" (ADS) and now "named forks" which they then used to implement
extended attributes.  Not the solaris ones, the linux style fixed
length key-value xattrs.

Quite frankly, the naming in this area is a complete and utter mess,
and the only clear, unabiguous name for this feature is "alternate
data streams". I don't care that it's something that comes from an
MS background - if your only argument against it is "Microsoft!"
then you're on pretty shakey ground...

> > IOWs, with a filesystem inode fork implementation like this for ADS,
> > all we really need is for the VFS to pass a magic command to
> > ->lookup() to tell us to use the ADS namespace attached to the inode
> > rather than use the primary inode type/state to perform the
> > operation.
> 
> IMO starting with a minimalistic approach, in a way Solaris developers 
> originally introduced forks, would IMO make sense for Linux as well:

<snip>

That's pretty much what the proposed O_ALT did, except it used a
fully qualified path name to define the ADS to open.

> - No subforks as starting point, and hence path separator '/' inside fork 
>   names would be prohibited initially to avoid future clashes.

Can't do that - changing the behaviour of the ADS name handling is
effectively an on-disk filesystem format change. i.e. if we allow it
in future kernels, then we have to mark the filesystem as "/" being
valid so that older kernels and repair utilities won't consider this
as invalid/corrupt and trash the ADS associated with the name.

IOWs, we either support it from the start, or we never support it.

> > Hence all the ADS support infrastructure is essentially dentry cache
> > infrastructure allowing a dentry to be both a file and directory,
> > and providing the pathname resolution that recognises an ADS
> > redirection. Name that however you want - we've got to do an on-disk
> > format change to support ADS, so we can tell the VFS we support ADS
> > or not. And we have no cares about existing names in the filesystem
> > conflicting with the ADS pathname identifier because it's a mkfs
> > time decision. Given that special flags are needed for the openat()
> > call to resolve an ADS (e.g. O_ALT), we know if we should parse the
> > ADS identifier as an ADS the moment it is seen...
> 
> So you think there should be a built-in full qualified path name resolution to 
> forks right from the start? E.g. like on Windows "C:\some\where\sheet.pdf:foo" 
> -> fork "foo" of file "sheet.pdf"?

No. I really don't care how the user interface works. That's for
people who write the syscalls to argue about.

What I was describing is how the internal kernel implementation -
the interaction between the VFS and the filesystem - needs to work.
ADS needs to be supported in some way by the VFS; if ADS are going
to be seekable user data files, then they have to be implemented as
path/dentry/inode tuples that a struct file can point to. IOWs,
internally they need to be seen as first class VFS citizens, and the
VFS needs mechanisms to tell the filesystem to look up the ADS
namespace rather than the inode itself....

> > > I don't understand why a fork would be permitted to have its own
> > > permissions.  That makes no sense.  Silly Solaris.
> > 
> > I can't think of a reason why, either, but the above implementation
> > for XFS would support it if the presentation layer allows it... :)
> 
> I would definitely not add this right from the start of course, but on the 
> long term it actually does make senses for them having their own permissions, 
> simply because there are already applications for that:
> 
> E.g. on some systems forks are used to tag files for security relevant issues, 
> for instance where the file originated from (a trusted vs. untrusted source). 

Key-value data like is what the security xattr namespace is for, not
ADS....

> If it was a untrusted source, the user is made aware about this circumstance 
> by the system when attempting to open the file. In this use case the fork 
> would probably have more restrictive permissions than the actual file.

That requires opening the user data fork to walk the ADS to find
key-value pairs that tell it it must not open the file.  We already
have infrastructure for this sort of thing via LSMs that store their
own private key-value data in the security xattrs namespace that
users can't modify. If you have security permission data that is
larger than can be stored in an xattr, then you've got bigger
problems than a lack of ADS.

OTOH, storing the merkle tree data for fsverity would be a perfect
use for a hidden ADS stream that the user cannot see or modify. The
current fsverity implementation is a nasty hack that stores the
merkle tree data in the same file but hides it beyond EOF so that
only the kernel can access it directly. That only works for a single
non-user data stream, though, so if we wanted more file-offset based
integrity or security data, we've got nowhere to put it.

IOWs, now that I think about it, we should be allowing non-user
per-ADS permissions to be set right from the start because I can
think of several filesystem/kernel internal features that could make
use of such functionality that we would want to remain hidden from
users.

> OTOH forks are used to extend existing files in non-obtrusive way. Say you 
> have some sort of (e.g. huge) master file, and a team works on that file. Then 
> the individual people would attach their changes solely as forks to the master 
> file with their ownership, probably even with complex ACLs, to prevent certain 
> users from touching (or even reading) other ones changes. In this use case the 
> master file might be readonly for most people, while the individual forks 
> being anywhere between more permissive or more restrictive.

You're demonstrating the exact reasons why ADS have traditionally
been considered harmful by Linux developers.  You can do all that
with normal directories and files - you do not need ADS to implement
a fully functional multi-user content management system.

ADS does not make constructs like this simpler or easier for
applications to implement or manage. e.g. If you use traditional
directories and files, you don't need to modify backup applications
and file manipulation tools to correctly copy such constructs....

Keep in mind that you are not going to get universal support for ADS
any time soon as most filesystems will require on-disk format
changes to support them. Further, you are goign to have to wait for
the entire OS ecosystem to grow support for ADS (e.g. cp, tar,
rsync, file, etc) before you can actually use it sanely in
production systems. Even if we implement kernel support right now,
it will be years before it will be widely available and supported at
an OS/distro level...

IOWs, applications that want to do "ADS-like" stuff are going to
have to be written for the lowest common denominator (i.e. no ADS
support at all) for a long time yet.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-23 23:40                         ` Dave Chinner
@ 2020-08-24 15:30                           ` Christian Schoenebeck
  2020-08-24 20:01                             ` Miklos Szeredi
                                               ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Christian Schoenebeck @ 2020-08-24 15:30 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Matthew Wilcox, Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	stefanha, mszeredi, vgoyal, gscrivan, dwalsh, chirantan,
	Miklos Szeredi

On Montag, 24. August 2020 01:40:06 CEST Dave Chinner wrote:
> On Mon, Aug 17, 2020 at 12:37:17PM +0200, Christian Schoenebeck wrote:
> > On Montag, 17. August 2020 00:56:20 CEST Dave Chinner wrote:
> > > IOWs, with a filesystem inode fork implementation like this for ADS,
> > > all we really need is for the VFS to pass a magic command to
> > > ->lookup() to tell us to use the ADS namespace attached to the inode
> > > rather than use the primary inode type/state to perform the
> > > operation.
> > 
> > IMO starting with a minimalistic approach, in a way Solaris developers
> 
> > originally introduced forks, would IMO make sense for Linux as well:
> <snip>
> 
> That's pretty much what the proposed O_ALT did, except it used a
> fully qualified path name to define the ADS to open.

Hu, you're right! There is indeed a somewhat congruent effort & discussion
going on in parallel. Pulling in Miklos into CC for that reason:
https://lore.kernel.org/lkml/CAJfpegtNP8rQSS4Z14Ja4x-TOnejdhDRTsmmDD-Cccy2pkfVVw@mail.gmail.com/

However the motivation of that other thread's PR was rather a procfs-like
system as a unified way to retrieve implementation specific info from an
underlying fs, and the file fork aspect would just be a 'side product'.

Core motivation of that other thread (scroll down a bit):
https://lore.kernel.org/lkml/52483.1597190733@warthog.procyon.org.uk/

> > On Montag, 17. August 2020 02:29:30 CEST Dave Chinner wrote:
> > > I'd stop calling these "forks" already, too. The user wants
> > > "alternate data streams", while a "resource fork" is an internal
> > > filesystem implementation detail used to provide ADS
> > > functionality...
> > 
> > The common terminology can certainly still be argued. I understand that
> > from fs implementation perspective "fork" is probably ambiguous. But from
> > public API (i.e. user space side) perspective the term "fork" does make
> > sense, and so far I have not seen a better general term for this. Plus
> > the ambiguous aspects on fs side are not exposed to the public side.
> > 
> > The term "alternate data stream" suggests that this is just about the raw
> > data stream, but that's probably not what this feature will end up being
> > limited to. E.g. I think they will have their own permissions on the long
> > term (see below). Plus the term ADS is ATM somewhat sticky to the
> > Microsoft universe.
> ADS is the windows term, which is where the majority of people who
> use or want to ADS come from. Novell called the "multiple data
> streams", and solaris 9 implemented "extended attributes" (ADS)
> using inode forks. Apple allows a "data fork" (user data), "resource
> forks" (ADS) and now "named forks" which they then used to implement
> extended attributes.  Not the solaris ones, the linux style fixed
> length key-value xattrs.
> 
> Quite frankly, the naming in this area is a complete and utter mess,

Absolutely!

> and the only clear, unabiguous name for this feature is "alternate
> data streams". I don't care that it's something that comes from an
> MS background - if your only argument against it is "Microsoft!"
> then you're on pretty shakey ground...

It wasn't. My main argument really was, quote: 'The term "alternate data
stream" suggests that this is just about the raw data stream, but that's
probably not what this feature will end up being limited to. E.g. I think they
will have their own permissions on the long term ...'

> > - No subforks as starting point, and hence path separator '/' inside fork
> > 
> >   names would be prohibited initially to avoid future clashes.
> 
> Can't do that - changing the behaviour of the ADS name handling is
> effectively an on-disk filesystem format change. i.e. if we allow it
> in future kernels, then we have to mark the filesystem as "/" being
> valid so that older kernels and repair utilities won't consider this
> as invalid/corrupt and trash the ADS associated with the name.
> 
> IOWs, we either support it from the start, or we never support it.

You have a point there. OTOH I don't think this would be a show stopper. This
feature set will introduce backward incompatibility anyway.

If somebody really would need to run an ancient kernel on a fs that already
contains subforks, then this fs could also be accessed via pass-through fs
inside VM guest & host running a more recent kernel, ... or by accessing it
remotely via fileserver, etc. There are options.

> > > Hence all the ADS support infrastructure is essentially dentry cache
> > > infrastructure allowing a dentry to be both a file and directory,
> > > and providing the pathname resolution that recognises an ADS
> > > redirection. Name that however you want - we've got to do an on-disk
> > > format change to support ADS, so we can tell the VFS we support ADS
> > > or not. And we have no cares about existing names in the filesystem
> > > conflicting with the ADS pathname identifier because it's a mkfs
> > > time decision. Given that special flags are needed for the openat()
> > > call to resolve an ADS (e.g. O_ALT), we know if we should parse the
> > > ADS identifier as an ADS the moment it is seen...
> > 
> > So you think there should be a built-in full qualified path name
> > resolution to forks right from the start? E.g. like on Windows
> > "C:\some\where\sheet.pdf:foo" -> fork "foo" of file "sheet.pdf"?
> 
> No. I really don't care how the user interface works. That's for
> people who write the syscalls to argue about.

Actually I did not have user space in mind either, it was more about the
dentry cache which made me thinking that a built-in path resolution right from
the start would make sense. But OTOH the Linux dentry cache at its heart only
maintains a first-order relationship to calculate the lookup hashes, i.e.:

	dentry_hash = hash(dentry_ptr, child_name);

So it would not really be required to have a full qualified path resolution.

But yet again, in that other thread about that fs meta info API, the argument
was if there was no built-in path resolution right from the start, then user
space apps and libs would start building their own path name resolution on
top of openat(), which might end up in a mess for the ecosystem. They have a
strong argument there.

But as they already pointed out, it would be a problem to actually agree about
a delimiter between the filename and the fork name portion. Miklos suggested a
a double/triple slash, but I agree with other ones that this would render
misbehaviours with all sorts of existing applications:
https://lore.kernel.org/lkml/c013f32e-3931-f832-5857-2537a0b3d634@schaufler-ca.com/

They also came up with some other questions that we have not discussed here:
https://lore.kernel.org/lkml/20200812143957.GQ1236603@ZenIV.linux.org.uk/
https://lore.kernel.org/lkml/20200812213041.GV1236603@ZenIV.linux.org.uk/

> What I was describing is how the internal kernel implementation -
> the interaction between the VFS and the filesystem - needs to work.
> ADS needs to be supported in some way by the VFS; if ADS are going
> to be seekable user data files, then they have to be implemented as
> path/dentry/inode tuples that a struct file can point to. IOWs,
> internally they need to be seen as first class VFS citizens, and the
> VFS needs mechanisms to tell the filesystem to look up the ADS
> namespace rather than the inode itself....

Yes, sure.

> > > > I don't understand why a fork would be permitted to have its own
> > > > permissions.  That makes no sense.  Silly Solaris.
> > > 
> > > I can't think of a reason why, either, but the above implementation
> > > for XFS would support it if the presentation layer allows it... :)
> > 
> > I would definitely not add this right from the start of course, but on the
> > long term it actually does make senses for them having their own
> > permissions, simply because there are already applications for that:
> > 
> > E.g. on some systems forks are used to tag files for security relevant
> > issues, for instance where the file originated from (a trusted vs.
> > untrusted source).
> Key-value data like is what the security xattr namespace is for, not
> ADS....

If it was only about storing a boolean like security.trusted = YES,
then you were right. However that example actually stores info which could
easily exceed the 4k limit of Linux xattrs, e.g. it stores the original URI of
the source.

> IOWs, now that I think about it, we should be allowing non-user
> per-ADS permissions to be set right from the start because I can
> think of several filesystem/kernel internal features that could make
> use of such functionality that we would want to remain hidden from
> users.

Right, actually while reading through that other thread, I realized that my
initial attitude, that is kicking off with a very limited feature set, is
probably contra productive, as they pointed out you'd easily end up handling
such forks as something completely different than regular directories and
files, so you would probably deviate from a unified VFS code base, start
adding new structs, adding exceptions, etc.

> > OTOH forks are used to extend existing files in non-obtrusive way. Say you
> > have some sort of (e.g. huge) master file, and a team works on that file.
> > Then the individual people would attach their changes solely as forks to
> > the master file with their ownership, probably even with complex ACLs, to
> > prevent certain users from touching (or even reading) other ones changes.
> > In this use case the master file might be readonly for most people, while
> > the individual forks being anywhere between more permissive or more
> > restrictive.
> 
> You're demonstrating the exact reasons why ADS have traditionally
> been considered harmful by Linux developers.  You can do all that
> with normal directories and files - you do not need ADS to implement
> a fully functional multi-user content management system.

You're talking from a system-level-dev POV. Just by realizing that this
example could also be mapped into a regular directory structure does not mean
it would be better, nor friendlier from user-POV. From user POV it is one
file, that you would present to the user as directory instead.

---

Ok, maybe I should make this more clear with another example: one major use
case for forks/ADS is extending (e.g. proprietary) binary file formats with
new features. Say company B is developing an editor application that supports
working directly with a binary media file (format) of another company A. And
say that company B's application has some feature that don't exist in app of
company A.

What shall it do? B could try adding their own chunks to the binary file
somewhere, but what happens in practice is that when another user now opens
the file with app A, it would often end up either refusing to open the file at
all, or it would crash, or it would simply drop and lose the info stored
previously by app B once the user saves the file again with app A. With
certain versions of app A it might work, with other versions it doesn't.
That's a nightmare to maintain.

By storing those extended features as named fork, e.g. "com.Bcorp.featureX",
you can easily circumvent that problem. App A still only works on the main
stream. So it can still safely open the file, and it would neither modify nor
drop the file's feature extensions of company B.

> Keep in mind that you are not going to get universal support for ADS
> any time soon as most filesystems will require on-disk format
> changes to support them. Further, you are goign to have to wait for
> the entire OS ecosystem to grow support for ADS (e.g. cp, tar,
> rsync, file, etc) before you can actually use it sanely in
> production systems. Even if we implement kernel support right now,
> it will be years before it will be widely available and supported at
> an OS/distro level...

Sure, that's a chicken egg problem.

Being realistic, I don't expect that forks are something that would be landing
in Linux very soon. I think it is an effort that will take its time, probably
as a Linux-test-fork / PoC for quite a while, up to a point where a common
acceptance is reached.

But file forks already exist on other systems for multiple good reasons. So I
think it makes sense to thrive the effort on Linux as well.

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-24 15:30                           ` Christian Schoenebeck
@ 2020-08-24 20:01                             ` Miklos Szeredi
  2020-08-24 21:26                             ` Frank van der Linden
  2020-08-24 22:29                             ` Theodore Y. Ts'o
  2 siblings, 0 replies; 62+ messages in thread
From: Miklos Szeredi @ 2020-08-24 20:01 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Dave Chinner, Matthew Wilcox, Dr. David Alan Gilbert, Greg Kurz,
	linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal,
	Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Mon, Aug 24, 2020 at 5:30 PM Christian Schoenebeck
<qemu_oss@crudebyte.com> wrote:
>

> Hu, you're right! There is indeed a somewhat congruent effort & discussion
> going on in parallel. Pulling in Miklos into CC for that reason:
> https://lore.kernel.org/lkml/CAJfpegtNP8rQSS4Z14Ja4x-TOnejdhDRTsmmDD-Cccy2pkfVVw@mail.gmail.com/
>
> However the motivation of that other thread's PR was rather a procfs-like
> system as a unified way to retrieve implementation specific info from an
> underlying fs, and the file fork aspect would just be a 'side product'.

The motivation is a consistent interface for accessing file related
data, whatever that be.

> But as they already pointed out, it would be a problem to actually agree about
> a delimiter between the filename and the fork name portion. Miklos suggested a
> a double/triple slash, but I agree with other ones that this would render
> misbehaviours with all sorts of existing applications:
> https://lore.kernel.org/lkml/c013f32e-3931-f832-5857-2537a0b3d634@schaufler-ca.com/

That argument starts like this:

 - Path resolution has allowed multiple slashes in UNIX systems for 50
years, so everyone got used to building paths by concatenating things
ending in slashes and beginning in slashes and putting more slashes in
the middle.

This can't be argued with, we probably have to live with that for 50
more years  (if we are lucky).

The argument continues so:

 - Because everyone got lazy, we can't introduce a new interface with
new rules, because all those lazy programmers won't be bothered to fix
their ways and will use the old practices while trying to use the new
interface, which will break their new apps.

Huh?  Can't they just fix those broken apps, then?

Yeah, yeah, I know it's not as simple as that, as the path can come
from application A, while the new interface is used by application B.
But this would only be a real backward compatibility issue if the new
interface is used without consideration for such cases (i.e. clean up
paths coming from untrusted sources before using it with the new
interface).

So no, I don't buy that argument.   Anyway, starting with just path
resolution starting at the target file makes sense as a first step.

The most important thing, I think, is to not fragment the interface
further.  So O_ALT should allow not just one application (like ADS)
but should have a top level directory for selecting between the
various data sources.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-24 15:30                           ` Christian Schoenebeck
  2020-08-24 20:01                             ` Miklos Szeredi
@ 2020-08-24 21:26                             ` Frank van der Linden
  2020-08-24 22:29                             ` Theodore Y. Ts'o
  2 siblings, 0 replies; 62+ messages in thread
From: Frank van der Linden @ 2020-08-24 21:26 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Dave Chinner, Matthew Wilcox, Dr. David Alan Gilbert, Greg Kurz,
	linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan, dwalsh,
	chirantan, Miklos Szeredi

On Mon, Aug 24, 2020 at 05:30:18PM +0200, Christian Schoenebeck wrote:
> On Montag, 24. August 2020 01:40:06 CEST Dave Chinner wrote:
> > On Mon, Aug 17, 2020 at 12:37:17PM +0200, Christian Schoenebeck wrote:
> > > On Montag, 17. August 2020 00:56:20 CEST Dave Chinner wrote:
> > > > IOWs, with a filesystem inode fork implementation like this for ADS,
> > > > all we really need is for the VFS to pass a magic command to
> > > > ->lookup() to tell us to use the ADS namespace attached to the inode
> > > > rather than use the primary inode type/state to perform the
> > > > operation.
> > >
> > > IMO starting with a minimalistic approach, in a way Solaris developers
> >
> > > originally introduced forks, would IMO make sense for Linux as well:
> > <snip>
> >
> > That's pretty much what the proposed O_ALT did, except it used a
> > fully qualified path name to define the ADS to open.
> 
> Hu, you're right! There is indeed a somewhat congruent effort & discussion
> going on in parallel. Pulling in Miklos into CC for that reason:
> https://lore.kernel.org/lkml/CAJfpegtNP8rQSS4Z14Ja4x-TOnejdhDRTsmmDD-Cccy2pkfVVw@mail.gmail.com/
> 
> However the motivation of that other thread's PR was rather a procfs-like
> system as a unified way to retrieve implementation specific info from an
> underlying fs, and the file fork aspect would just be a 'side product'.
> 
> Core motivation of that other thread (scroll down a bit):
> https://lore.kernel.org/lkml/52483.1597190733@warthog.procyon.org.uk/
> 
> > > On Montag, 17. August 2020 02:29:30 CEST Dave Chinner wrote:
> > > > I'd stop calling these "forks" already, too. The user wants
> > > > "alternate data streams", while a "resource fork" is an internal
> > > > filesystem implementation detail used to provide ADS
> > > > functionality...
> > >
> > > The common terminology can certainly still be argued. I understand that
> > > from fs implementation perspective "fork" is probably ambiguous. But from
> > > public API (i.e. user space side) perspective the term "fork" does make
> > > sense, and so far I have not seen a better general term for this. Plus
> > > the ambiguous aspects on fs side are not exposed to the public side.
> > >
> > > The term "alternate data stream" suggests that this is just about the raw
> > > data stream, but that's probably not what this feature will end up being
> > > limited to. E.g. I think they will have their own permissions on the long
> > > term (see below). Plus the term ADS is ATM somewhat sticky to the
> > > Microsoft universe.
> > ADS is the windows term, which is where the majority of people who
> > use or want to ADS come from. Novell called the "multiple data
> > streams", and solaris 9 implemented "extended attributes" (ADS)
> > using inode forks. Apple allows a "data fork" (user data), "resource
> > forks" (ADS) and now "named forks" which they then used to implement
> > extended attributes.  Not the solaris ones, the linux style fixed
> > length key-value xattrs.
> >
> > Quite frankly, the naming in this area is a complete and utter mess,
> 
> Absolutely!
> 
> > and the only clear, unabiguous name for this feature is "alternate
> > data streams". I don't care that it's something that comes from an
> > MS background - if your only argument against it is "Microsoft!"
> > then you're on pretty shakey ground...
> 
> It wasn't. My main argument really was, quote: 'The term "alternate data
> stream" suggests that this is just about the raw data stream, but that's
> probably not what this feature will end up being limited to. E.g. I think they
> will have their own permissions on the long term ...'
> 
> > > - No subforks as starting point, and hence path separator '/' inside fork
> > >
> > >   names would be prohibited initially to avoid future clashes.
> >
> > Can't do that - changing the behaviour of the ADS name handling is
> > effectively an on-disk filesystem format change. i.e. if we allow it
> > in future kernels, then we have to mark the filesystem as "/" being
> > valid so that older kernels and repair utilities won't consider this
> > as invalid/corrupt and trash the ADS associated with the name.
> >
> > IOWs, we either support it from the start, or we never support it.
> 
> You have a point there. OTOH I don't think this would be a show stopper. This
> feature set will introduce backward incompatibility anyway.
> 
> If somebody really would need to run an ancient kernel on a fs that already
> contains subforks, then this fs could also be accessed via pass-through fs
> inside VM guest & host running a more recent kernel, ... or by accessing it
> remotely via fileserver, etc. There are options.
> 
> > > > Hence all the ADS support infrastructure is essentially dentry cache
> > > > infrastructure allowing a dentry to be both a file and directory,
> > > > and providing the pathname resolution that recognises an ADS
> > > > redirection. Name that however you want - we've got to do an on-disk
> > > > format change to support ADS, so we can tell the VFS we support ADS
> > > > or not. And we have no cares about existing names in the filesystem
> > > > conflicting with the ADS pathname identifier because it's a mkfs
> > > > time decision. Given that special flags are needed for the openat()
> > > > call to resolve an ADS (e.g. O_ALT), we know if we should parse the
> > > > ADS identifier as an ADS the moment it is seen...
> > >
> > > So you think there should be a built-in full qualified path name
> > > resolution to forks right from the start? E.g. like on Windows
> > > "C:\some\where\sheet.pdf:foo" -> fork "foo" of file "sheet.pdf"?
> >
> > No. I really don't care how the user interface works. That's for
> > people who write the syscalls to argue about.
> 
> Actually I did not have user space in mind either, it was more about the
> dentry cache which made me thinking that a built-in path resolution right from
> the start would make sense. But OTOH the Linux dentry cache at its heart only
> maintains a first-order relationship to calculate the lookup hashes, i.e.:
> 
>         dentry_hash = hash(dentry_ptr, child_name);
> 
> So it would not really be required to have a full qualified path resolution.
> 
> But yet again, in that other thread about that fs meta info API, the argument
> was if there was no built-in path resolution right from the start, then user
> space apps and libs would start building their own path name resolution on
> top of openat(), which might end up in a mess for the ecosystem. They have a
> strong argument there.
> 
> But as they already pointed out, it would be a problem to actually agree about
> a delimiter between the filename and the fork name portion. Miklos suggested a
> a double/triple slash, but I agree with other ones that this would render
> misbehaviours with all sorts of existing applications:
> https://lore.kernel.org/lkml/c013f32e-3931-f832-5857-2537a0b3d634@schaufler-ca.com/
> 
> They also came up with some other questions that we have not discussed here:
> https://lore.kernel.org/lkml/20200812143957.GQ1236603@ZenIV.linux.org.uk/
> https://lore.kernel.org/lkml/20200812213041.GV1236603@ZenIV.linux.org.uk/
> 
> > What I was describing is how the internal kernel implementation -
> > the interaction between the VFS and the filesystem - needs to work.
> > ADS needs to be supported in some way by the VFS; if ADS are going
> > to be seekable user data files, then they have to be implemented as
> > path/dentry/inode tuples that a struct file can point to. IOWs,
> > internally they need to be seen as first class VFS citizens, and the
> > VFS needs mechanisms to tell the filesystem to look up the ADS
> > namespace rather than the inode itself....
> 
> Yes, sure.
> 
> > > > > I don't understand why a fork would be permitted to have its own
> > > > > permissions.  That makes no sense.  Silly Solaris.
> > > >
> > > > I can't think of a reason why, either, but the above implementation
> > > > for XFS would support it if the presentation layer allows it... :)
> > >
> > > I would definitely not add this right from the start of course, but on the
> > > long term it actually does make senses for them having their own
> > > permissions, simply because there are already applications for that:
> > >
> > > E.g. on some systems forks are used to tag files for security relevant
> > > issues, for instance where the file originated from (a trusted vs.
> > > untrusted source).
> > Key-value data like is what the security xattr namespace is for, not
> > ADS....
> 
> If it was only about storing a boolean like security.trusted = YES,
> then you were right. However that example actually stores info which could
> easily exceed the 4k limit of Linux xattrs, e.g. it stores the original URI of
> the source.
> 
> > IOWs, now that I think about it, we should be allowing non-user
> > per-ADS permissions to be set right from the start because I can
> > think of several filesystem/kernel internal features that could make
> > use of such functionality that we would want to remain hidden from
> > users.
> 
> Right, actually while reading through that other thread, I realized that my
> initial attitude, that is kicking off with a very limited feature set, is
> probably contra productive, as they pointed out you'd easily end up handling
> such forks as something completely different than regular directories and
> files, so you would probably deviate from a unified VFS code base, start
> adding new structs, adding exceptions, etc.
> 
> > > OTOH forks are used to extend existing files in non-obtrusive way. Say you
> > > have some sort of (e.g. huge) master file, and a team works on that file.
> > > Then the individual people would attach their changes solely as forks to
> > > the master file with their ownership, probably even with complex ACLs, to
> > > prevent certain users from touching (or even reading) other ones changes.
> > > In this use case the master file might be readonly for most people, while
> > > the individual forks being anywhere between more permissive or more
> > > restrictive.
> >
> > You're demonstrating the exact reasons why ADS have traditionally
> > been considered harmful by Linux developers.  You can do all that
> > with normal directories and files - you do not need ADS to implement
> > a fully functional multi-user content management system.
> 
> You're talking from a system-level-dev POV. Just by realizing that this
> example could also be mapped into a regular directory structure does not mean
> it would be better, nor friendlier from user-POV. From user POV it is one
> file, that you would present to the user as directory instead.
> 
> ---
> 
> Ok, maybe I should make this more clear with another example: one major use
> case for forks/ADS is extending (e.g. proprietary) binary file formats with
> new features. Say company B is developing an editor application that supports
> working directly with a binary media file (format) of another company A. And
> say that company B's application has some feature that don't exist in app of
> company A.
> 
> What shall it do? B could try adding their own chunks to the binary file
> somewhere, but what happens in practice is that when another user now opens
> the file with app A, it would often end up either refusing to open the file at
> all, or it would crash, or it would simply drop and lose the info stored
> previously by app B once the user saves the file again with app A. With
> certain versions of app A it might work, with other versions it doesn't.
> That's a nightmare to maintain.
> 
> By storing those extended features as named fork, e.g. "com.Bcorp.featureX",
> you can easily circumvent that problem. App A still only works on the main
> stream. So it can still safely open the file, and it would neither modify nor
> drop the file's feature extensions of company B.
> 
> > Keep in mind that you are not going to get universal support for ADS
> > any time soon as most filesystems will require on-disk format
> > changes to support them. Further, you are goign to have to wait for
> > the entire OS ecosystem to grow support for ADS (e.g. cp, tar,
> > rsync, file, etc) before you can actually use it sanely in
> > production systems. Even if we implement kernel support right now,
> > it will be years before it will be widely available and supported at
> > an OS/distro level...
> 
> Sure, that's a chicken egg problem.
> 
> Being realistic, I don't expect that forks are something that would be landing
> in Linux very soon. I think it is an effort that will take its time, probably
> as a Linux-test-fork / PoC for quite a while, up to a point where a common
> acceptance is reached.
> 
> But file forks already exist on other systems for multiple good reasons. So I
> think it makes sense to thrive the effort on Linux as well.
> 
> Best regards,
> Christian Schoenebeck

Just wanted to echo some of the sentiments in this thread, especially posted
by Christian, so I'm replying to his message.

I agree with him and Linus that the Solaris interface of:

ffd = open("foo", O_RDONLY);
afd = openat(ffd, "attrpath", O_XATTR|O_RDWR);

..is the best starting point. It's simple, it's clean, it doesn't overload
path separators. And hey, if you like doing it with path separators, put
a library function on top of it that uses them :-)

When I implemented for NFS "user." xattrs, I noticed these things:

* Extended attributes have no common caching, so each filesystem implements
  its own, which is a waste.
* There is quite a bit of k(v)alloc-ing and copying going on, and it's hard
  to avoid.
* Given that, the upper size limit is understandable, but still feels kind of
  arbitrary.

So, it would be great to have alternate data streams, and put xattrs on top
of them. Essentially they'd be streams with reserved names that are always
locked for the reader or writer and only allow reads/writes at offset 0,
and always truncate on write.

It would also mean that caching now naturally happens in the page cache,
so no need for each filesystem to have separate caches anymore.

Is it worth it given the code churn involved? Good question. I think, if
done right, it could end up in the code looking at lot cleaner. But it's
a long road to get there, and there are many issues that need to be solved.
So who knows.

Lastly, I think I saw someone say that it was a bit weird to have permissions
per stream/attribute, like Solaris has. I don't know. In a way, the current
"streams" already have different permissions - it's just hardcoded at the
top level ("user", "trusted", etc).

- Frank

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-24 15:30                           ` Christian Schoenebeck
  2020-08-24 20:01                             ` Miklos Szeredi
  2020-08-24 21:26                             ` Frank van der Linden
@ 2020-08-24 22:29                             ` Theodore Y. Ts'o
  2020-08-25 15:12                               ` Christian Schoenebeck
  2 siblings, 1 reply; 62+ messages in thread
From: Theodore Y. Ts'o @ 2020-08-24 22:29 UTC (permalink / raw)
  To: Christian Schoenebeck, Frank van der Linden
  Cc: Dave Chinner, Matthew Wilcox, Dr. David Alan Gilbert, Greg Kurz,
	linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan, dwalsh,
	chirantan, Miklos Szeredi

On Mon, Aug 24, 2020 at 05:30:18PM +0200, Christian Schoenebeck wrote:
> Ok, maybe I should make this more clear with another example: one major use
> case for forks/ADS is extending (e.g. proprietary) binary file formats with
> new features. Say company B is developing an editor application that supports
> working directly with a binary media file (format) of another company A. And
> say that company B's application has some feature that don't exist in app of
> company A.

But that's going to happen today (company B's feature silently getting
dropped) when using data forks/ADS if the file is sent via zip,
http/https, compressed using gzip, xz, bzip2, etc.  I remember that
world when I had to deal with with MacOS files decades ago, and it was
a total mess.

> > Keep in mind that you are not going to get universal support for ADS
> > any time soon as most filesystems will require on-disk format
> > changes to support them. Further, you are goign to have to wait for
> > the entire OS ecosystem to grow support for ADS (e.g. cp, tar,
> > rsync, file, etc) before you can actually use it sanely in
> > production systems. Even if we implement kernel support right now,
> > it will be years before it will be widely available and supported at
> > an OS/distro level...
> 
> Sure, that's a chicken egg problem.
> 
> Being realistic, I don't expect that forks are something that would be landing
> in Linux very soon. I think it is an effort that will take its time, probably
> as a Linux-test-fork / PoC for quite a while, up to a point where a common
> acceptance is reached.

We're talking *decades*.  It's not enough for new protocol specs for
https, rsync, nfs, etc., to be modified, and then implemented.  It's
not enough for file formats for zip, xz, gzip, etc., to be created;
all of this new software has to be deployed throughout the entire
ecosystem.  People don't upgrade server software quickly; look up long
IPv6 has taken to be adopted!

In that amount of time, it's going to be easier to implement a more
modular application container format which allows for new features to
be added into a file --- for example, such as ISO/IEC 26300....

> But file forks already exist on other systems for multiple good reasons. So I
> think it makes sense to thrive the effort on Linux as well.

They aren't actually used all that often with Windows/Windows Office.
That's why you can upload/upload a docx file via https, or check it
into git, etc. without it being broken.  (Trying doing that with an
old-style MacOS file with resource forks; what a nightmare....)

The only place where you really see use of forks/ADS is in places
where interoperability isn't a big deal, such as MacOS executables, or
back when a certain company with monopolistic tendencies was trying to
lock desktop users into their OS....

On Mon, Aug 24, 2020 at 09:26:56PM +0000, Frank van der Linden wrote:
> I agree with him and Linus that the Solaris interface of:
> 
> ffd = open("foo", O_RDONLY);
> afd = openat(ffd, "attrpath", O_XATTR|O_RDWR);
> 
> ..is the best starting point. It's simple, it's clean, it doesn't overload
> path separators. And hey, if you like doing it with path separators, put
> a library function on top of it that uses them :-)

The Solaris interface is pretty clean, but there if we really want to
do this (and from above, I'm not a fan), there is one thing that I
would drop from the Solaris API, and that's the ability to use chdirat
to cd into a directory which is inside a file's ADS.  There were
malware authors who were using this to go to town, since most shells
didn't know about ADS, and so it was a *great* way to hide setuid root
binaries, files that were being prepared for exfiltration from the
corporation's intranet, etc.

So let's learn from Solaris's mistake, and let's not.  Solaris may
have that feature, for Windows compatibility, but I'm not aware of any
enterprise Unix software that has used it, for all of the reasons
discussed above.

					- Ted


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-24 22:29                             ` Theodore Y. Ts'o
@ 2020-08-25 15:12                               ` Christian Schoenebeck
  2020-08-25 15:32                                 ` Miklos Szeredi
  0 siblings, 1 reply; 62+ messages in thread
From: Christian Schoenebeck @ 2020-08-25 15:12 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Frank van der Linden, Dave Chinner, Matthew Wilcox,
	Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel, stefanha,
	mszeredi, vgoyal, gscrivan, dwalsh, chirantan, Miklos Szeredi

On Montag, 24. August 2020 22:01:43 CEST Miklos Szeredi wrote:
> > But as they already pointed out, it would be a problem to actually agree
> > about a delimiter between the filename and the fork name portion. Miklos
> > suggested a a double/triple slash, but I agree with other ones that this
> > would render misbehaviours with all sorts of existing applications:
> > https://lore.kernel.org/lkml/c013f32e-3931-f832-5857-2537a0b3d634@schaufle
> > r-ca.com/
> That argument starts like this:
> 
>  - Path resolution has allowed multiple slashes in UNIX systems for 50
> years, so everyone got used to building paths by concatenating things
> ending in slashes and beginning in slashes and putting more slashes in
> the middle.
> 
> This can't be argued with, we probably have to live with that for 50
> more years  (if we are lucky).
> 
> The argument continues so:
> 
>  - Because everyone got lazy, we can't introduce a new interface with
> new rules, because all those lazy programmers won't be bothered to fix
> their ways and will use the old practices while trying to use the new
> interface, which will break their new apps.
> 
> Huh?  Can't they just fix those broken apps, then?

Picking a delimiter is a huge problem. No matter which delimiter you choose, 
you will always find people screaming out loud.

However the slash character is in fact one that would cause much more trouble 
than other alternatives. The sheer amount of code that blindly concatenates 
pathes without eliminating redundant slashes already renders it an unrealistic 
candidate IMO.

I can give you another argument which might be more convincing to you: say you 
maintain a middleware lib that takes a path as argument somewhere, and that 
lib now gets path="/foo//bar". How could that lib judge whether it should a) 
eliminate the double slash, or rather b) it was really meant to be fork "bar" 
of file "foo" and hence shall pass the string as-is to underlying 
framework(s)? Simply: It can't, as it requires knowledge from either upper or 
lower end that the lib in the middle might not have.

Whatever the delimiter would be, it should at least be an ASCII character (or 
sequence of). If you'd pick a binary or some odd Unicode character, the 
outcome would be that each shell would remap their own ASCII delimiter on top 
of it, and that's actually the oppositive of what a built-in path resolution 
should accomplish, right?

> The most important thing, I think, is to not fragment the interface
> further.  So O_ALT should allow not just one application (like ADS)
> but should have a top level directory for selecting between the
> various data sources.

Well, that's what name spaces are for IMO. So you would probably reserve some 
prefixes for system purposes, like it is already done for Linux xattrs. Or do 
you see any advantage for adding a dedicated directory layer in between 
instead?

On Dienstag, 25. August 2020 00:29:24 CEST Theodore Y. Ts'o wrote:
> On Mon, Aug 24, 2020 at 05:30:18PM +0200, Christian Schoenebeck wrote:
> > Being realistic, I don't expect that forks are something that would be
> > landing in Linux very soon. I think it is an effort that will take its
> > time, probably as a Linux-test-fork / PoC for quite a while, up to a
> > point where a common acceptance is reached.
> 
> We're talking *decades*.  It's not enough for new protocol specs for
> https, rsync, nfs, etc., to be modified, and then implemented.  It's
> not enough for file formats for zip, xz, gzip, etc., to be created;
> all of this new software has to be deployed throughout the entire
> ecosystem.  People don't upgrade server software quickly; look up long
> IPv6 has taken to be adopted!

I am not endorsing forks for every app and user on the planet. Like other 
people who come up with new kernel features, I am just suggesting it because I 
am actually using them in an heterogenous network already, and the Linux nodes 
are the missing brick not supporting them yet.

My personal plan (if viable acceptance):

- At least basic support for forks inside the Linux kernel (e.g. *at() 
  functions).

- Adding support to ZFS, which provides most of it under the hood already.

- Modifying some Samba VFS modules to support those forks natively.

- Minor additions to QEMU.

That's already sufficient as starting point for my existing use cases. Other 
user space apps involved (e.g. on macOS) are already supporting forks.

> In that amount of time, it's going to be easier to implement a more
> modular application container format which allows for new features to
> be added into a file --- for example, such as ISO/IEC 26300....

That would provide an alternative for only a small portion of use cases of 
forks. E.g. even in the example I mentioned about adding features to a foreign 
file format, the point of using forks in that example was that you are *not* 
in charge of file format aspects. Companies of proprietary file formats are 
often not very keen to open their format to other companies at all.

And even if you would limit the problem to the open source world, it is 
unlikely that every OSS app would adopt and switch to ODF format for several 
reasons.

Just look how many media container formats are out there. Despite their 
individual codecs and precise payload being used; theoretically they could all 
have switched to exactly one container format, right? Just didn't happen.

> > Ok, maybe I should make this more clear with another example: one major
> > use
> > case for forks/ADS is extending (e.g. proprietary) binary file formats
> > with
> > new features. Say company B is developing an editor application that
> > supports working directly with a binary media file (format) of another
> > company A. And say that company B's application has some feature that
> > don't exist in app of company A.
> 
> But that's going to happen today (company B's feature silently getting
> dropped) when using data forks/ADS if the file is sent via zip,
> http/https, compressed using gzip, xz, bzip2, etc.  I remember that
> world when I had to deal with with MacOS files decades ago, and it was
> a total mess.

That's because you were transferring files between a system that supported 
forks vs. a system that did not support them. And that's exactly the problem I 
want to fix, because right now I have to remap forks on Linux to separate 
files for not losing data, or use archives, or other hacks.

Really, it is not about convincing people to use or not use forks. They are 
already there. It is about integrating Linux into such an existing 
infrastructure appropriately.

> They aren't actually used all that often with Windows/Windows Office.
> That's why you can upload/upload a docx file via https, or check it
> into git, etc. without it being broken.  (Trying doing that with an
> old-style MacOS file with resource forks; what a nightmare....)

You zip them, upload, download, unzip.

And right, you don't need forks for every file that you share with other 
people. But I don't see why that would put Linux support for forks into 
question.

On Montag, 24. August 2020 23:26:56 CEST Frank van der Linden wrote:
> When I implemented for NFS "user." xattrs, I noticed these things:
> 
> * Extended attributes have no common caching, so each filesystem implements
>   its own, which is a waste.
> * There is quite a bit of k(v)alloc-ing and copying going on, and it's hard
>   to avoid.
> * Given that, the upper size limit is understandable, but still feels kind
> of arbitrary.
> 
> So, it would be great to have alternate data streams, and put xattrs on top
> of them. Essentially they'd be streams with reserved names that are always
> locked for the reader or writer and only allow reads/writes at offset 0,
> and always truncate on write.

That would unify the existing Linux xattrs and forks, and probably simplify 
the code base, right. But I can imagine that some people would not be happy 
with this, as you basically force them to opt-in into that 'fork'-enabled code 
set when they are actually just about using old xattrs.

Personally I would leave them orthogonal, i.e. a fork could have xattrs on its 
own.

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-25 15:12                               ` Christian Schoenebeck
@ 2020-08-25 15:32                                 ` Miklos Szeredi
  2020-08-27 12:02                                   ` Christian Schoenebeck
  0 siblings, 1 reply; 62+ messages in thread
From: Miklos Szeredi @ 2020-08-25 15:32 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Theodore Y. Ts'o, Frank van der Linden, Dave Chinner,
	Matthew Wilcox, Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Tue, Aug 25, 2020 at 5:12 PM Christian Schoenebeck
<qemu_oss@crudebyte.com> wrote:

> I can give you another argument which might be more convincing to you: say you
> maintain a middleware lib that takes a path as argument somewhere, and that
> lib now gets path="/foo//bar". How could that lib judge whether it should a)
> eliminate the double slash, or rather b) it was really meant to be fork "bar"
> of file "foo" and hence shall pass the string as-is to underlying
> framework(s)? Simply: It can't, as it requires knowledge from either upper or
> lower end that the lib in the middle might not have.

Nobody needs to care, only the level that actually wants to handle the
alternative namespace.  And then that level absolutely *must* call
into a level that it knows does handle the alternative namespace.

Yeah, it's not going to suddenly start to  work by putting "foo//bar"
into an open file dialogue or whatever.   That's not the point, adding
that  new interface is to enable *new* functionality not to change
existing functionality.  That's the point that people don't seem to
get.

> > The most important thing, I think, is to not fragment the interface
> > further.  So O_ALT should allow not just one application (like ADS)
> > but should have a top level directory for selecting between the
> > various data sources.
>
> Well, that's what name spaces are for IMO. So you would probably reserve some
> prefixes for system purposes, like it is already done for Linux xattrs. Or do
> you see any advantage for adding a dedicated directory layer in between
> instead?

You mean some reserved prefixes for ADS?  Bleh.

No, xattr is not the model we should be following.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-25 15:32                                 ` Miklos Szeredi
@ 2020-08-27 12:02                                   ` Christian Schoenebeck
  2020-08-27 12:25                                     ` Matthew Wilcox
  0 siblings, 1 reply; 62+ messages in thread
From: Christian Schoenebeck @ 2020-08-27 12:02 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Theodore Y. Ts'o, Frank van der Linden, Dave Chinner,
	Matthew Wilcox, Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Dienstag, 25. August 2020 17:32:15 CEST Miklos Szeredi wrote:
> On Tue, Aug 25, 2020 at 5:12 PM Christian Schoenebeck
> 
> <qemu_oss@crudebyte.com> wrote:
> > I can give you another argument which might be more convincing to you: say
> > you maintain a middleware lib that takes a path as argument somewhere,
> > and that lib now gets path="/foo//bar". How could that lib judge whether
> > it should a) eliminate the double slash, or rather b) it was really meant
> > to be fork "bar" of file "foo" and hence shall pass the string as-is to
> > underlying
> > framework(s)? Simply: It can't, as it requires knowledge from either upper
> > or lower end that the lib in the middle might not have.
> 
> Nobody needs to care, only the level that actually wants to handle the
> alternative namespace.  And then that level absolutely *must* call
> into a level that it knows does handle the alternative namespace.
> 
> Yeah, it's not going to suddenly start to  work by putting "foo//bar"
> into an open file dialogue or whatever.   That's not the point, adding
> that  new interface is to enable *new* functionality not to change
> existing functionality.  That's the point that people don't seem to
> get.

I think you are underestimating the negative impact an n-times-slash delimiter 
would introduce. Middleware functionalities rely on unumbiguous path name 
resolution for being able to transform pathes without asking another level how 
it shall a) parse and b) interpret individual components of a path.

It would not be as simple as saying, they are now broken, let's fix them. 
Because path transformations happen so often on all levels on any system, that 
if you'd introduce a dependency for that (i.e. a simple path transformation 
would need to ask e.g. a storage backend for help), then it would slow down 
overall performance tremendously, especially as such requests are typically 
non-deterministic.

E.g. it is very common for a middleware function to transform a path into a 
list, and "/a/b//c/d" would now be ambiguous:

    "/a/b//c/d" -> [ "a", "b", "c", "d" ]

    or

    "/a/b//c/d" -> [ "a", "b", [ "c", "d" ] ]

You can't simply pass either one option to the next level, because it would 
break behaviour:

    foreach (dir_entry in [ "a", "b", "c", "d" ]) {
        dirAction(dir_entry)
    }

is different than:

    foreach (dir_entry in [ "a", "b" ]) {
        dirAction(dir_entry)
        foreach (fork_entry in [ "c", "d" ]) {
            forkAction(fork_entry)
        }
    }

Hence that simple path transformation would need to ask for help to resolve 
the ambiguity, which might take anything between few microseconds up to 
several seconds, then multiply that duration with the common amount of 
individual path transformations involved in just a very simple task.

---

What I could imagine as delimiter instead; slash-caret:

    /var/foo.pdf/^/forkname

I also like Microsoft's colon pick, as it would make shell interactions more 
slick:

	/var/foo.pdf:forkname

However I am aware that the colon would probably be too drastic, as colons are 
often used to separate individual pathes in a list for instance.

> > > The most important thing, I think, is to not fragment the interface
> > > further.  So O_ALT should allow not just one application (like ADS)
> > > but should have a top level directory for selecting between the
> > > various data sources.
> > 
> > Well, that's what name spaces are for IMO. So you would probably reserve
> > some prefixes for system purposes, like it is already done for Linux
> > xattrs. Or do you see any advantage for adding a dedicated directory
> > layer in between instead?
> 
> You mean some reserved prefixes for ADS?  Bleh.
> 
> No, xattr is not the model we should be following.

Maybe. I am actually unresolved about that. As that fs meta info PR recently 
showed, there might be other future use cases for this interface that one 
probably cannot foresee today; and a dedicated toplevel directory to choose 
between them would also make the kernel internal bindings more clean. So you 
might have for instance:

	/var/foo.pdf/^/alt/forkname   # for common ADS (incl. macOS data forks)

	/var/foo.pdf/^/res/forkname   # for mapping macOS resource forks

	/var/foo.pdf/^/meta/forkname  # for accessing fs implementation info

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-27 12:02                                   ` Christian Schoenebeck
@ 2020-08-27 12:25                                     ` Matthew Wilcox
  2020-08-27 13:48                                       ` Christian Schoenebeck
  0 siblings, 1 reply; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-27 12:25 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Miklos Szeredi, Theodore Y. Ts'o, Frank van der Linden,
	Dave Chinner, Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Thu, Aug 27, 2020 at 02:02:42PM +0200, Christian Schoenebeck wrote:
> What I could imagine as delimiter instead; slash-caret:
> 
>     /var/foo.pdf/^/forkname

Any ascii character is going to be used in some actual customer workload.
I suggest we use a unicode character instead.

/var/foo.pdf/💩/badidea


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-27 12:25                                     ` Matthew Wilcox
@ 2020-08-27 13:48                                       ` Christian Schoenebeck
  2020-08-27 14:01                                         ` Matthew Wilcox
  0 siblings, 1 reply; 62+ messages in thread
From: Christian Schoenebeck @ 2020-08-27 13:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Miklos Szeredi, Theodore Y. Ts'o, Frank van der Linden,
	Dave Chinner, Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Donnerstag, 27. August 2020 14:25:55 CEST Matthew Wilcox wrote:
> On Thu, Aug 27, 2020 at 02:02:42PM +0200, Christian Schoenebeck wrote:
> > What I could imagine as delimiter instead; slash-caret:
> >     /var/foo.pdf/^/forkname
> 
> Any ascii character is going to be used in some actual customer workload.

Not exactly. "/foo/^/bar" is already a valid path today. So every Linux system 
(incl. all libs/apps) must be capable to deal with that path already, so it 
would not introduce a tokenization problem.

The caret character is not reserved by any filesystem either:
https://en.wikipedia.org/wiki/Filename

The only change a caret delimiter would bring, is a very minor change in 
semantic: apps would no longer be allowed to create dirs/files named exactly 
"^". But I find that a very small restriction compared to the negative impact 
of other delimiter options, i.e.:

	touch /some/where/^          # error if forks enabled, OK otherwise
	touch /some/where/^whatever  # always OK

So if you have apps that need to access dirs/files called *exactly* "^", that 
would be easy to fix. And if you don't want to, you just keep kernel's support 
for forks disabled and preserve old semantic of "^".

> I suggest we use a unicode character instead.
> 
> /var/foo.pdf/💩/badidea

Like I mentioned before, if you'd pick a unicode character (or binary), then 
each shell will map their own ASCII-sequence on top of that. Because shell 
users want ASCII. Which would defeat the primary purpose: a unified path 
resolution.

Then even if you'd pick unicode, that would raise new questions and problems; 
e.g. utf-8, utf-16, utf-32? Character normalization required? How do you 
ensure each layer will use the same encoding?

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-27 13:48                                       ` Christian Schoenebeck
@ 2020-08-27 14:01                                         ` Matthew Wilcox
  2020-08-27 14:23                                           ` Christian Schoenebeck
  0 siblings, 1 reply; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-27 14:01 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Miklos Szeredi, Theodore Y. Ts'o, Frank van der Linden,
	Dave Chinner, Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Thu, Aug 27, 2020 at 03:48:57PM +0200, Christian Schoenebeck wrote:
> On Donnerstag, 27. August 2020 14:25:55 CEST Matthew Wilcox wrote:
> > On Thu, Aug 27, 2020 at 02:02:42PM +0200, Christian Schoenebeck wrote:
> > > What I could imagine as delimiter instead; slash-caret:
> > >     /var/foo.pdf/^/forkname
> > 
> > Any ascii character is going to be used in some actual customer workload.
> 
> Not exactly. "/foo/^/bar" is already a valid path today. So every Linux system 
> (incl. all libs/apps) must be capable to deal with that path already, so it 
> would not introduce a tokenization problem.

That's exactly the point.  I can guarantee you that some customer is
already using a file named exactly '^'.

> > I suggest we use a unicode character instead.
> > 
> > /var/foo.pdf/💩/badidea
> 
> Like I mentioned before, if you'd pick a unicode character (or binary), then 
> each shell will map their own ASCII-sequence on top of that. Because shell 
> users want ASCII. Which would defeat the primary purpose: a unified path 
> resolution.

You misunderstood.  This was my way of telling you that your idea is shit.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-27 14:01                                         ` Matthew Wilcox
@ 2020-08-27 14:23                                           ` Christian Schoenebeck
  2020-08-27 14:25                                             ` Matthew Wilcox
  2020-08-27 14:44                                             ` Al Viro
  0 siblings, 2 replies; 62+ messages in thread
From: Christian Schoenebeck @ 2020-08-27 14:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Miklos Szeredi, Theodore Y. Ts'o, Frank van der Linden,
	Dave Chinner, Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Donnerstag, 27. August 2020 16:01:07 CEST Matthew Wilcox wrote:
> On Thu, Aug 27, 2020 at 03:48:57PM +0200, Christian Schoenebeck wrote:
> > On Donnerstag, 27. August 2020 14:25:55 CEST Matthew Wilcox wrote:
> > > On Thu, Aug 27, 2020 at 02:02:42PM +0200, Christian Schoenebeck wrote:
> > > > What I could imagine as delimiter instead; slash-caret:
> > > >     /var/foo.pdf/^/forkname
> > > 
> > > Any ascii character is going to be used in some actual customer
> > > workload.
> > 
> > Not exactly. "/foo/^/bar" is already a valid path today. So every Linux
> > system (incl. all libs/apps) must be capable to deal with that path
> > already, so it would not introduce a tokenization problem.
> 
> That's exactly the point.  I can guarantee you that some customer is
> already using a file named exactly '^'.

You are contradicting yourself. Ditching the idea because a file "^" might 
exist, implies ditching your idea of "💩" as it might already exist as well.

> > > I suggest we use a unicode character instead.
> > > 
> > > /var/foo.pdf/💩/badidea
> > 
> > Like I mentioned before, if you'd pick a unicode character (or binary),
> > then each shell will map their own ASCII-sequence on top of that. Because
> > shell users want ASCII. Which would defeat the primary purpose: a unified
> > path resolution.
> 
> You misunderstood.  This was my way of telling you that your idea is shit.

Be invited for making better suggestions. But one thing please: don't start 
getting offending.

No matter which delimiter you'd choose, something will break. It is just about 
how much will it break und how likely it'll be in practice, not if.

If you are concerned about not breaking anything: keep forks disabled.

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-27 14:23                                           ` Christian Schoenebeck
@ 2020-08-27 14:25                                             ` Matthew Wilcox
  2020-08-27 14:44                                             ` Al Viro
  1 sibling, 0 replies; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-27 14:25 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Miklos Szeredi, Theodore Y. Ts'o, Frank van der Linden,
	Dave Chinner, Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Thu, Aug 27, 2020 at 04:23:24PM +0200, Christian Schoenebeck wrote:
> On Donnerstag, 27. August 2020 16:01:07 CEST Matthew Wilcox wrote:
> > On Thu, Aug 27, 2020 at 03:48:57PM +0200, Christian Schoenebeck wrote:
> > > On Donnerstag, 27. August 2020 14:25:55 CEST Matthew Wilcox wrote:
> > > > On Thu, Aug 27, 2020 at 02:02:42PM +0200, Christian Schoenebeck wrote:
> > > > > What I could imagine as delimiter instead; slash-caret:
> > > > >     /var/foo.pdf/^/forkname
> > > > 
> > > > Any ascii character is going to be used in some actual customer
> > > > workload.
> > > 
> > > Not exactly. "/foo/^/bar" is already a valid path today. So every Linux
> > > system (incl. all libs/apps) must be capable to deal with that path
> > > already, so it would not introduce a tokenization problem.
> > 
> > That's exactly the point.  I can guarantee you that some customer is
> > already using a file named exactly '^'.
> 
> You are contradicting yourself. Ditching the idea because a file "^" might 
> exist, implies ditching your idea of "💩" as it might already exist as well.

That's because THIS IS A SHIT IDEA.

> > You misunderstood.  This was my way of telling you that your idea is shit.
> 
> Be invited for making better suggestions. But one thing please: don't start 
> getting offending.

Oh, fuck off.

> No matter which delimiter you'd choose, something will break. It is just about 
> how much will it break und how likely it'll be in practice, not if.
> 
> If you are concerned about not breaking anything: keep forks disabled.

My way of keeping forks disabled is to tell you to fuck off.  You
can keep fucking off until you get there.  Then fuck off some more.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-27 14:23                                           ` Christian Schoenebeck
  2020-08-27 14:25                                             ` Matthew Wilcox
@ 2020-08-27 14:44                                             ` Al Viro
  2020-08-27 16:29                                               ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 62+ messages in thread
From: Al Viro @ 2020-08-27 14:44 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Matthew Wilcox, Miklos Szeredi, Theodore Y. Ts'o,
	Frank van der Linden, Dave Chinner, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi,
	Vivek Goyal, Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Thu, Aug 27, 2020 at 04:23:24PM +0200, Christian Schoenebeck wrote:

> Be invited for making better suggestions. But one thing please: don't start 
> getting offending.
> 
> No matter which delimiter you'd choose, something will break. It is just about 
> how much will it break und how likely it'll be in practice, not if.

... which means NAK.  We don't break userland without very good reasons and
support for anyone's pet feature is not one of those.  It's as simple as
that.

> If you are concerned about not breaking anything: keep forks disabled.

s/disabled/out of tree/

One general note: the arguments along the lines of "don't enable that,
then" are either ignorant or actively dishonest; it really doesn't work
that way, as we'd learnt quite a few times by now.  There's no such
thing as "optional feature" - *any* feature, no matter how useless,
might end up a dependency (no matter how needless) of something that
would force distros to enable it.  We'd been down that road too many
times to keep pretending that it doesn't happen.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-17  0:29                     ` Dave Chinner
  2020-08-17 10:37                       ` file forks vs. xattr (was: xattr names for unprivileged stacking?) Christian Schoenebeck
@ 2020-08-27 15:22                       ` Matthew Wilcox
  2020-08-27 22:24                         ` Dave Chinner
  1 sibling, 1 reply; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-27 15:22 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christian Schoenebeck, Dr. David Alan Gilbert, Greg Kurz,
	linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan, dwalsh,
	chirantan

On Mon, Aug 17, 2020 at 10:29:30AM +1000, Dave Chinner wrote:
> To implement ADS, we'd likely consider adding a new physical inode
> "ADS fork" which, internally, maps to a separate directory
> structure. This provides us with the ADS namespace for each inode
> and a mechanism that instantiates a physical inode per ADS. IOWs,
> each ADS can be referenced by the VFS natively and independently as
> an inode (native "file as a directory" semantics). Hence existing
> create/unlink APIs work for managing ADS, readdir() can list all
> your ADS, you can keep per ADS xattrs, etc....
> 
> IOWs, with a filesystem inode fork implementation like this for ADS,
> all we really need is for the VFS to pass a magic command to
> ->lookup() to tell us to use the ADS namespace attached to the inode
> rather than use the primary inode type/state to perform the
> operation.
> 
> Hence all the ADS support infrastructure is essentially dentry cache
> infrastructure allowing a dentry to be both a file and directory,
> and providing the pathname resolution that recognises an ADS
> redirection. Name that however you want - we've got to do an on-disk
> format change to support ADS, so we can tell the VFS we support ADS
> or not. And we have no cares about existing names in the filesystem
> conflicting with the ADS pathname identifier because it's a mkfs
> time decision. Given that special flags are needed for the openat()
> call to resolve an ADS (e.g. O_ALT), we know if we should parse the
> ADS identifier as an ADS the moment it is seen...

I think this is equivalent to saying "Linux will never support ADS".
Al has some choice words on having the dentry cache support objects which
are both files and directories.  You run into some "fun" locking issues.
And there's lots of things you just don't want to permit, like mounting
a new filesystem on top of some ADS, or chrooting a process into an ADS,
or renaming an ADS into a different file.

I think what would satisfy people is allowing actual "alternate data
streams" to exist in files.  You always start out by opening a file,
then the presentation layer is a syscall that lets you enumerate the
data streams available for this file, and another syscall that returns
an fd for one of those streams.

As long as nobody gets the bright idea to be able to link that fd into
the directory structure somewhere, this should avoid any problems with
unwanted things being done to an ADS.  Chunks of your implementation
described above should be fine for this.

I thought through some of this a while back, and came up with this list:

> Work as expected:
> mmap(), read(), write(), close(), splice(), sendfile(), fallocate(),
> ftruncate(), dup(), dup2(), dup3(), utimensat(), futimens(), select(),
> poll(), lseek(), fcntl(): F_DUPFD, F_GETFD, F_GETFL, F_SETFL, F_SETLK,
> F_SETLKW, F_GETLK, F_GETOWN, F_SETOWN, F_GETSIG, F_SETSIG, F_SETLEASE,
> F_GETLEASE)
>
> Return error if fd refers to the non-default stream:
> linkat(), symlinkat(), mknodat(), mkdirat()
>
> Remove a stream from a file:
> unlinkat()
>
> Open an existing stream in a file or create a new stream in a file:
> openat()
>
> fstat()
> st_ino may be different for different names.  st_dev may be different.
> st_mode will match the object for files, even if it is changed after
> creation.  For directories, it will match except that execute permission
> will be removed and S_IFMT will be S_ISREG (do we want to define a
> new S_ISSTRM?).  st_nlink will be 1.  st_uid and st_gid will match.
> It will have its own st_atime/st_mtime/st_ctime.  Accessing a stream
> will not update its parent's atime/mtime/ctime.
>
> renameat()
> If olddirfd + oldpath refers to a stream then newdirfd + newpath must
> refer to a stream within the same parent object.  If that stream exists,
> it is removed.  If olddirfd + oldpath does not refer to a stream,
> then newdirfd + newpath must not refer to a stream.
>
> The two file specifications must resolve to the same parent object.
> It is possible to use renameat() to rename a stream within an object,
> but not to move a stream from one object to another.  If newpath refers
> to an existing named stream, it is removed.

I don't seem to have come up with an actual syscall for enumerating the
stream names.  I kind of think a fresh syscall might be the right way to
go.

For the benefit of shell scripts, I think an argument to 'cat' to open
an ADS and an lsads command should be enough.

Oh, and I would think we might want i_blocks of the 'host' inode to
reflect the blocks allocated to all the data streams attached to the
inode.  That should address at least parts of the data exfiltration
concern.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-27 14:44                                             ` Al Viro
@ 2020-08-27 16:29                                               ` Dr. David Alan Gilbert
  2020-08-27 16:35                                                 ` Matthew Wilcox
  2020-08-28  9:11                                                 ` Christian Schoenebeck
  0 siblings, 2 replies; 62+ messages in thread
From: Dr. David Alan Gilbert @ 2020-08-27 16:29 UTC (permalink / raw)
  To: Al Viro
  Cc: Christian Schoenebeck, Matthew Wilcox, Miklos Szeredi,
	Theodore Y. Ts'o, Frank van der Linden, Dave Chinner,
	Greg Kurz, linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi,
	Vivek Goyal, Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

* Al Viro (viro@zeniv.linux.org.uk) wrote:
> On Thu, Aug 27, 2020 at 04:23:24PM +0200, Christian Schoenebeck wrote:
> 
> > Be invited for making better suggestions. But one thing please: don't start 
> > getting offending.
> > 
> > No matter which delimiter you'd choose, something will break. It is just about 
> > how much will it break und how likely it'll be in practice, not if.
> 
> ... which means NAK.  We don't break userland without very good reasons and
> support for anyone's pet feature is not one of those.  It's as simple as
> that.

I'm curious how much people expect to use these forks from existing
programs - do people expect to be able to do something and edit a fork
using their favorite editor or cat/grep/etc them?

I say that because if they do, then having a special syscall to open
the fork wont fly; and while I agree that any form of suffix is a lost
cause, I wonder what else is possible (although if it wasn't for the
internal difficulties, I do have a soft spot for things that look like
both files and directories showing the forks; but I realise I'm weird
there).

Dave

> > If you are concerned about not breaking anything: keep forks disabled.
> 
> s/disabled/out of tree/
> 
> One general note: the arguments along the lines of "don't enable that,
> then" are either ignorant or actively dishonest; it really doesn't work
> that way, as we'd learnt quite a few times by now.  There's no such
> thing as "optional feature" - *any* feature, no matter how useless,
> might end up a dependency (no matter how needless) of something that
> would force distros to enable it.  We'd been down that road too many
> times to keep pretending that it doesn't happen.
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-27 16:29                                               ` Dr. David Alan Gilbert
@ 2020-08-27 16:35                                                 ` Matthew Wilcox
  2020-08-28  9:11                                                 ` Christian Schoenebeck
  1 sibling, 0 replies; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-27 16:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Al Viro, Miklos Szeredi, Theodore Y. Ts'o,
	Frank van der Linden, Dave Chinner, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Thu, Aug 27, 2020 at 05:29:35PM +0100, Dr. David Alan Gilbert wrote:
> * Al Viro (viro@zeniv.linux.org.uk) wrote:
> > On Thu, Aug 27, 2020 at 04:23:24PM +0200, Christian Schoenebeck wrote:
> > 
> > > Be invited for making better suggestions. But one thing please: don't start 
> > > getting offending.
> > > 
> > > No matter which delimiter you'd choose, something will break. It is just about 
> > > how much will it break und how likely it'll be in practice, not if.
> > 
> > ... which means NAK.  We don't break userland without very good reasons and
> > support for anyone's pet feature is not one of those.  It's as simple as
> > that.
> 
> I'm curious how much people expect to use these forks from existing
> programs - do people expect to be able to do something and edit a fork
> using their favorite editor or cat/grep/etc them?
> 
> I say that because if they do, then having a special syscall to open
> the fork wont fly; and while I agree that any form of suffix is a lost
> cause, I wonder what else is possible (although if it wasn't for the
> internal difficulties, I do have a soft spot for things that look like
> both files and directories showing the forks; but I realise I'm weird
> there).

I also have fond memories of !SquashFS but the problem is that some
people want named streams on _directories_, which means that these
directories need to be both directories-of-files and directories-of-streams.
That's harder to disambiguate.

I think providing two new tools (or variants on existing tools) --
streamcat and streamls should be enough to enable operating on named
streams from the command line.  If other tools want to provide the ability
to operate on named streams directly, that would be up to that tool.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-27 15:22                       ` xattr names for unprivileged stacking? Matthew Wilcox
@ 2020-08-27 22:24                         ` Dave Chinner
  2020-08-29 16:07                           ` Matthew Wilcox
  0 siblings, 1 reply; 62+ messages in thread
From: Dave Chinner @ 2020-08-27 22:24 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christian Schoenebeck, Dr. David Alan Gilbert, Greg Kurz,
	linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan, dwalsh,
	chirantan

On Thu, Aug 27, 2020 at 04:22:07PM +0100, Matthew Wilcox wrote:
> On Mon, Aug 17, 2020 at 10:29:30AM +1000, Dave Chinner wrote:
> > To implement ADS, we'd likely consider adding a new physical inode
> > "ADS fork" which, internally, maps to a separate directory
> > structure. This provides us with the ADS namespace for each inode
> > and a mechanism that instantiates a physical inode per ADS. IOWs,
> > each ADS can be referenced by the VFS natively and independently as
> > an inode (native "file as a directory" semantics). Hence existing
> > create/unlink APIs work for managing ADS, readdir() can list all
> > your ADS, you can keep per ADS xattrs, etc....
> > 
> > IOWs, with a filesystem inode fork implementation like this for ADS,
> > all we really need is for the VFS to pass a magic command to
> > ->lookup() to tell us to use the ADS namespace attached to the inode
> > rather than use the primary inode type/state to perform the
> > operation.
> > 
> > Hence all the ADS support infrastructure is essentially dentry cache
> > infrastructure allowing a dentry to be both a file and directory,
> > and providing the pathname resolution that recognises an ADS
> > redirection. Name that however you want - we've got to do an on-disk
> > format change to support ADS, so we can tell the VFS we support ADS
> > or not. And we have no cares about existing names in the filesystem
> > conflicting with the ADS pathname identifier because it's a mkfs
> > time decision. Given that special flags are needed for the openat()
> > call to resolve an ADS (e.g. O_ALT), we know if we should parse the
> > ADS identifier as an ADS the moment it is seen...
> 
> I think this is equivalent to saying "Linux will never support ADS".
> Al has some choice words on having the dentry cache support objects which
> are both files and directories.  You run into some "fun" locking issues.
> And there's lots of things you just don't want to permit, like mounting
> a new filesystem on top of some ADS, or chrooting a process into an ADS,
> or renaming an ADS into a different file.

I know all this. My point is that the behaviour required by ADS
objects is that of a seekable data file. That requires a struct file
that points at a struct inode, page cache mapping, etc to all work
as they currently do. It also means that how ADS are managed and
presented to userspace is entirely a VFS construct. Indeed,
everything you mention above is functionality controlled/implemented
by the VFS via the dentry cache...

> I think what would satisfy people is allowing actual "alternate data
> streams" to exist in files.  You always start out by opening a file,
> then the presentation layer is a syscall that lets you enumerate the
> data streams available for this file, and another syscall that returns
> an fd for one of those streams.

You could do this with a getdents_at() syscall that has an AT_ALT
flag or something like that. i.e. iterate the streams on the inode
(whether it be a regular file or a directory!) and report them as
dirents to userspace. Userspace can then openat2(fd, name, O_ALT)
and there is your user API.

The VFS can deal with openat2(fd, stream_name, O_ALT) however it
wants - it doesn't need the dentry cache pathwalk here - just vector
straight to the filesystem's ->lookup mechanism on the inode
attached to the "dirfd" passed in. 

AFAICT, the dentry cache only needs to be involved if we want to
-cache- the ADS namespace. I don't think we need to cache the ADS
namespace as long as the inode is cached by the filesystem - just
let the fs and let it do an inode cache lookup and instantiation for
ADS inodes (eg as XFS already does for internal inode accesses
during bulkstat, quotacheck, etc). We don't cache the xattr
namespaces in the VFS - the filesystem is responsible for doing that
if required - so I don't think this would be a problem for ADS
access...

The fact that ADS inodes would not be in the dentry cache and hence
not visible to pathwalks at all then means that all of the issues
such as mounting over them, chroot, etc don't exist in the first
place...

> As long as nobody gets the bright idea to be able to link that fd into
> the directory structure somewhere, this should avoid any problems with
> unwanted things being done to an ADS.  Chunks of your implementation
> described above should be fine for this.

I can see the need for rename and linkat linking O_TMPFILE fd's into
ADS names, though. e.g. to be able to do safe overwrites of ADS
data.

From a fs management POV, we'll also want to be able to do things
like defrag ADS inodes, which means we'll need to be able to do
atomic inode operations (e.g. swap extents) between O_TMPFILE inodes
and ADS inodes, etc. So in addition to the VFS interfaces, there's a
bunch of filesystem admin stuff that will need to be made ADS aware,
and it's likely there will be fs specific ioctls that need to be
modifed/added to manipulate ADS inodes directly...

> I thought through some of this a while back, and came up with this list:
> 
> > Work as expected:
> > mmap(), read(), write(), close(), splice(), sendfile(), fallocate(),
> > ftruncate(), dup(), dup2(), dup3(), utimensat(), futimens(), select(),
> > poll(), lseek(), fcntl(): F_DUPFD, F_GETFD, F_GETFL, F_SETFL, F_SETLK,
> > F_SETLKW, F_GETLK, F_GETOWN, F_SETOWN, F_GETSIG, F_SETSIG, F_SETLEASE,
> > F_GETLEASE)
> >
> > Return error if fd refers to the non-default stream:
> > linkat(), symlinkat(), mknodat(), mkdirat()
> >
> > Remove a stream from a file:
> > unlinkat()
> >
> > Open an existing stream in a file or create a new stream in a file:
> > openat()
> >
> > fstat()
> > st_ino may be different for different names.  st_dev may be different.
> > st_mode will match the object for files, even if it is changed after
> > creation.  For directories, it will match except that execute permission
> > will be removed and S_IFMT will be S_ISREG (do we want to define a
> > new S_ISSTRM?).  st_nlink will be 1.  st_uid and st_gid will match.
> > It will have its own st_atime/st_mtime/st_ctime.  Accessing a stream
> > will not update its parent's atime/mtime/ctime.
> >
> > renameat()
> > If olddirfd + oldpath refers to a stream then newdirfd + newpath must
> > refer to a stream within the same parent object.  If that stream exists,
> > it is removed.  If olddirfd + oldpath does not refer to a stream,
> > then newdirfd + newpath must not refer to a stream.
> >
> > The two file specifications must resolve to the same parent object.
> > It is possible to use renameat() to rename a stream within an object,
> > but not to move a stream from one object to another.  If newpath refers
> > to an existing named stream, it is removed.
> 
> I don't seem to have come up with an actual syscall for enumerating the
> stream names.  I kind of think a fresh syscall might be the right way to
> go.

Why reinvent the wheel? getdentsat() seems like the right interface
to use here because it matches up with all the other *at(AT_ALT)
style interfaces we'd be using to operate on ADS... :P

> For the benefit of shell scripts, I think an argument to 'cat' to open
> an ADS and an lsads command should be enough.
> 
> Oh, and I would think we might want i_blocks of the 'host' inode to
> reflect the blocks allocated to all the data streams attached to the
> inode.  That should address at least parts of the data exfiltration
> concern.

I think that's a problem, because metadata blocks that are invisible
to userspace are also accounted to the inode block count, so a user
cannot know if the difference between the data file size and the
block count stat() reports is block mapping metadata, xattrs,
speculative delayed allocation reservations, etc. It's just not a
useful signal because it's already so overloaded with invisible
stuff....

It also means that every block map modification to an ADS inode also
has to lock and modify the host inode. That's going to mean adding a
heap of complexity to the filesystem transaction models because now
there are two independent inodes that have to be locked we doing a
single inode operations instead of largely being a simple drop in...

IOWs, if ADS visibility is required (which I don't think anyone will
argue against) I'd suggest that statx() has a flag added to indicate
ADS exist on the inode. Then it's easy to discover through a
standard interface....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-27 16:29                                               ` Dr. David Alan Gilbert
  2020-08-27 16:35                                                 ` Matthew Wilcox
@ 2020-08-28  9:11                                                 ` Christian Schoenebeck
  2020-08-28 14:46                                                   ` Theodore Y. Ts'o
  1 sibling, 1 reply; 62+ messages in thread
From: Christian Schoenebeck @ 2020-08-28  9:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Miklos Szeredi
  Cc: Al Viro, Theodore Y. Ts'o, Frank van der Linden,
	Dave Chinner, Greg Kurz, linux-fsdevel, Stefan Hajnoczi,
	Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano, Daniel J Walsh,
	Chirantan Ekbote

On Donnerstag, 27. August 2020 16:44:52 CEST Al Viro wrote:
> > No matter which delimiter you'd choose, something will break. It is just
> > about how much will it break und how likely it'll be in practice, not if.
> ... which means NAK.  We don't break userland without very good reasons and
> support for anyone's pet feature is not one of those.  It's as simple as
> that.
> 
> > If you are concerned about not breaking anything: keep forks disabled.
> 
> s/disabled/out of tree/
> 
> One general note: the arguments along the lines of "don't enable that,
> then" are either ignorant or actively dishonest; it really doesn't work
> that way, as we'd learnt quite a few times by now.  There's no such
> thing as "optional feature" - *any* feature, no matter how useless,
> might end up a dependency (no matter how needless) of something that
> would force distros to enable it.  We'd been down that road too many
> times to keep pretending that it doesn't happen.

Well, it could be an option per mounted fs, but I know -> NAK.

On Donnerstag, 27. August 2020 18:29:35 CEST Dr. David Alan Gilbert wrote:
> * Al Viro (viro@zeniv.linux.org.uk) wrote:
> > On Thu, Aug 27, 2020 at 04:23:24PM +0200, Christian Schoenebeck wrote:
> > > Be invited for making better suggestions. But one thing please: don't
> > > start
> > > getting offending.
> > > 
> > > No matter which delimiter you'd choose, something will break. It is just
> > > about how much will it break und how likely it'll be in practice, not
> > > if.> 
> > ... which means NAK.  We don't break userland without very good reasons
> > and
> > support for anyone's pet feature is not one of those.  It's as simple as
> > that.
> 
> I'm curious how much people expect to use these forks from existing
> programs - do people expect to be able to do something and edit a fork
> using their favorite editor or cat/grep/etc them?

Built-in path resolution would be nice, but it won't be a show stopper for 
such common utils if not. For instance on Solaris there is:

runat <filename> <cmd> ...

which works something like fchdir(); execv(); you loose some flexibility, but 
in practice still OK.

> I say that because if they do, then having a special syscall to open
> the fork wont fly; and while I agree that any form of suffix is a lost
> cause, I wonder what else is possible (although if it wasn't for the
> internal difficulties, I do have a soft spot for things that look like
> both files and directories showing the forks; but I realise I'm weird
> there).

It seems to be both a file & dir feature on all systems that have that 
concept. So people would expect it for dirs on Linux as well.

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
  2020-08-28  9:11                                                 ` Christian Schoenebeck
@ 2020-08-28 14:46                                                   ` Theodore Y. Ts'o
  0 siblings, 0 replies; 62+ messages in thread
From: Theodore Y. Ts'o @ 2020-08-28 14:46 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Dr. David Alan Gilbert, Miklos Szeredi, Al Viro,
	Frank van der Linden, Dave Chinner, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Fri, Aug 28, 2020 at 11:11:15AM +0200, Christian Schoenebeck wrote:
> 
> Built-in path resolution would be nice, but it won't be a show stopper for 
> such common utils if not. For instance on Solaris there is:
> 
> runat <filename> <cmd> ...
> 
> which works something like fchdir(); execv(); you loose some flexibility, but 
> in practice still OK.

And we know from the Solaris experience that it was used *much* more
by malware authors (since most Unix security scanners didn't know
about forks) than any legitmate users.

Which is another way of saying, it's a bad idea --- unless you are a
malware author.

      	 	     		       - Ted

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-27 22:24                         ` Dave Chinner
@ 2020-08-29 16:07                           ` Matthew Wilcox
  2020-08-29 16:13                             ` Al Viro
  0 siblings, 1 reply; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-29 16:07 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christian Schoenebeck, Dr. David Alan Gilbert, Greg Kurz,
	linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan, dwalsh,
	chirantan

On Fri, Aug 28, 2020 at 08:24:57AM +1000, Dave Chinner wrote:
> On Thu, Aug 27, 2020 at 04:22:07PM +0100, Matthew Wilcox wrote:
> > On Mon, Aug 17, 2020 at 10:29:30AM +1000, Dave Chinner wrote:
> > > To implement ADS, we'd likely consider adding a new physical inode
> > > "ADS fork" which, internally, maps to a separate directory
> > > structure. This provides us with the ADS namespace for each inode
> > > and a mechanism that instantiates a physical inode per ADS. IOWs,
> > > each ADS can be referenced by the VFS natively and independently as
> > > an inode (native "file as a directory" semantics). Hence existing
> > > create/unlink APIs work for managing ADS, readdir() can list all
> > > your ADS, you can keep per ADS xattrs, etc....
> > > 
> > > IOWs, with a filesystem inode fork implementation like this for ADS,
> > > all we really need is for the VFS to pass a magic command to
> > > ->lookup() to tell us to use the ADS namespace attached to the inode
> > > rather than use the primary inode type/state to perform the
> > > operation.
> > > 
> > > Hence all the ADS support infrastructure is essentially dentry cache
> > > infrastructure allowing a dentry to be both a file and directory,
> > > and providing the pathname resolution that recognises an ADS
> > > redirection. Name that however you want - we've got to do an on-disk
> > > format change to support ADS, so we can tell the VFS we support ADS
> > > or not. And we have no cares about existing names in the filesystem
> > > conflicting with the ADS pathname identifier because it's a mkfs
> > > time decision. Given that special flags are needed for the openat()
> > > call to resolve an ADS (e.g. O_ALT), we know if we should parse the
> > > ADS identifier as an ADS the moment it is seen...
> > 
> > I think this is equivalent to saying "Linux will never support ADS".
> > Al has some choice words on having the dentry cache support objects which
> > are both files and directories.  You run into some "fun" locking issues.
> > And there's lots of things you just don't want to permit, like mounting
> > a new filesystem on top of some ADS, or chrooting a process into an ADS,
> > or renaming an ADS into a different file.
> 
> I know all this. My point is that the behaviour required by ADS
> objects is that of a seekable data file. That requires a struct file
> that points at a struct inode, page cache mapping, etc to all work
> as they currently do. It also means that how ADS are managed and
> presented to userspace is entirely a VFS construct. Indeed,
> everything you mention above is functionality controlled/implemented
> by the VFS via the dentry cache...

I agree with you that supporting named streams within a file requires
an independent inode for each stream.  I disagree with you that this is
dentry cache infrastructure.  I do not believe in giving each stream
its own dentry.  Either they share the default stream's dentry, or they
have no dentry (mild preference for no dentry).

> > I think what would satisfy people is allowing actual "alternate data
> > streams" to exist in files.  You always start out by opening a file,
> > then the presentation layer is a syscall that lets you enumerate the
> > data streams available for this file, and another syscall that returns
> > an fd for one of those streams.
> 
> You could do this with a getdents_at() syscall that has an AT_ALT
> flag or something like that. i.e. iterate the streams on the inode
> (whether it be a regular file or a directory!) and report them as
> dirents to userspace. Userspace can then openat2(fd, name, O_ALT)
> and there is your user API.

Maybe.  getdents is a little overkill; these things don't have inode
numbers (at least not ones which are meaningful to userspace), or
d_type.  I might be tempted by just read() on an fd like v7 unix.

> The VFS can deal with openat2(fd, stream_name, O_ALT) however it
> wants - it doesn't need the dentry cache pathwalk here - just vector
> straight to the filesystem's ->lookup mechanism on the inode
> attached to the "dirfd" passed in. 
>
> AFAICT, the dentry cache only needs to be involved if we want to
> -cache- the ADS namespace. I don't think we need to cache the ADS
> namespace as long as the inode is cached by the filesystem - just
> let the fs and let it do an inode cache lookup and instantiation for
> ADS inodes (eg as XFS already does for internal inode accesses
> during bulkstat, quotacheck, etc). We don't cache the xattr
> namespaces in the VFS - the filesystem is responsible for doing that
> if required - so I don't think this would be a problem for ADS
> access...
> 
> The fact that ADS inodes would not be in the dentry cache and hence
> not visible to pathwalks at all then means that all of the issues
> such as mounting over them, chroot, etc don't exist in the first
> place...

Wait, you've now switched from "this is dentry cache infrastructure"
to "it should not be in the dentry cache".  So I don't understand what
you're arguing for.

> > As long as nobody gets the bright idea to be able to link that fd into
> > the directory structure somewhere, this should avoid any problems with
> > unwanted things being done to an ADS.  Chunks of your implementation
> > described above should be fine for this.
> 
> I can see the need for rename and linkat linking O_TMPFILE fd's into
> ADS names, though. e.g. to be able to do safe overwrites of ADS
> data.

I don't have a problem with being able to create unnamed streams and
then atomically linking them into their containing file.

> From a fs management POV, we'll also want to be able to do things
> like defrag ADS inodes, which means we'll need to be able to do
> atomic inode operations (e.g. swap extents) between O_TMPFILE inodes
> and ADS inodes, etc. So in addition to the VFS interfaces, there's a
> bunch of filesystem admin stuff that will need to be made ADS aware,
> and it's likely there will be fs specific ioctls that need to be
> modifed/added to manipulate ADS inodes directly...

Yes, probably.

> > For the benefit of shell scripts, I think an argument to 'cat' to open
> > an ADS and an lsads command should be enough.
> > 
> > Oh, and I would think we might want i_blocks of the 'host' inode to
> > reflect the blocks allocated to all the data streams attached to the
> > inode.  That should address at least parts of the data exfiltration
> > concern.
> 
> I think that's a problem, because metadata blocks that are invisible
> to userspace are also accounted to the inode block count, so a user
> cannot know if the difference between the data file size and the
> block count stat() reports is block mapping metadata, xattrs,
> speculative delayed allocation reservations, etc. It's just not a
> useful signal because it's already so overloaded with invisible
> stuff....

My concern is that 'du' should not have to be made stream-aware to
continue to be accurate.  Yes, all these other things also contribute
to the space being used by a file, so it's not a very reliable signal,
but if you see a vast discrepancy (several gigabytes being used by a
file which is notionally a few hundred bytes), it's suspicious.

> It also means that every block map modification to an ADS inode also
> has to lock and modify the host inode. That's going to mean adding a
> heap of complexity to the filesystem transaction models because now
> there are two independent inodes that have to be locked we doing a
> single inode operations instead of largely being a simple drop in...

It doesn't have to be reflected in the on-disk inode.  As long as the
calling stat() returns the number of blocks allocated to all streams
contained in the file, you can implement that any way you want.

> IOWs, if ADS visibility is required (which I don't think anyone will
> argue against) I'd suggest that statx() has a flag added to indicate
> ADS exist on the inode. Then it's easy to discover through a
> standard interface....

The amount of space used has to be visible to unmodified utilities.
We could have an implementation where unmodified utilities walk all
the sub-streams at stat() time while statx() with the appropriate flag
reports disaggregated data (and is more efficient).


I think we have a group of people contributing to this thread who want
the plain "named streams" functionality that you and I are currently
discussing.  And then another group who want something more complex
where the "alternate" contents of the file could be a directory tree
with files and subdirectories and permissions ... essentially mounting
the contents of a ZIP file on top of itself.  And I think that's a level
of complexity we have to step away from.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-29 16:07                           ` Matthew Wilcox
@ 2020-08-29 16:13                             ` Al Viro
  2020-08-29 17:51                               ` Miklos Szeredi
  2020-08-29 19:17                               ` Matthew Wilcox
  0 siblings, 2 replies; 62+ messages in thread
From: Al Viro @ 2020-08-29 16:13 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dave Chinner, Christian Schoenebeck, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan,
	dwalsh, chirantan

On Sat, Aug 29, 2020 at 05:07:17PM +0100, Matthew Wilcox wrote:

> I agree with you that supporting named streams within a file requires
> an independent inode for each stream.  I disagree with you that this is
> dentry cache infrastructure.  I do not believe in giving each stream
> its own dentry.  Either they share the default stream's dentry, or they
> have no dentry (mild preference for no dentry).

*blink*

Just how would they have different inodes while sharing a dentry?

> > The fact that ADS inodes would not be in the dentry cache and hence
> > not visible to pathwalks at all then means that all of the issues
> > such as mounting over them, chroot, etc don't exist in the first
> > place...
> 
> Wait, you've now switched from "this is dentry cache infrastructure"
> to "it should not be in the dentry cache".  So I don't understand what
> you're arguing for.

Bloody wonderful, that.  So now we have struct file instances with no dentry
associated with them?  Which would have to be taken into account all over
the place...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-29 16:13                             ` Al Viro
@ 2020-08-29 17:51                               ` Miklos Szeredi
  2020-08-29 18:04                                 ` Al Viro
  2020-08-29 19:17                               ` Matthew Wilcox
  1 sibling, 1 reply; 62+ messages in thread
From: Miklos Szeredi @ 2020-08-29 17:51 UTC (permalink / raw)
  To: Al Viro
  Cc: Matthew Wilcox, Dave Chinner, Christian Schoenebeck,
	Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Sat, Aug 29, 2020 at 6:14 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Aug 29, 2020 at 05:07:17PM +0100, Matthew Wilcox wrote:
>

> > > The fact that ADS inodes would not be in the dentry cache and hence
> > > not visible to pathwalks at all then means that all of the issues
> > > such as mounting over them, chroot, etc don't exist in the first
> > > place...
> >
> > Wait, you've now switched from "this is dentry cache infrastructure"
> > to "it should not be in the dentry cache".  So I don't understand what
> > you're arguing for.
>
> Bloody wonderful, that.  So now we have struct file instances with no dentry
> associated with them?  Which would have to be taken into account all over
> the place...

It could have a temporary dentry allocated for the lifetime of the
file and dropped on last dput.  I.e. there's a dentry, but no cache.
Yeah, yeah, d_path() issues, however that one will have to be special
cased anyway.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-29 17:51                               ` Miklos Szeredi
@ 2020-08-29 18:04                                 ` Al Viro
  2020-08-29 18:22                                   ` Christian Schoenebeck
  2020-08-29 19:13                                   ` Miklos Szeredi
  0 siblings, 2 replies; 62+ messages in thread
From: Al Viro @ 2020-08-29 18:04 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Matthew Wilcox, Dave Chinner, Christian Schoenebeck,
	Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Sat, Aug 29, 2020 at 07:51:47PM +0200, Miklos Szeredi wrote:
> On Sat, Aug 29, 2020 at 6:14 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Sat, Aug 29, 2020 at 05:07:17PM +0100, Matthew Wilcox wrote:
> >
> 
> > > > The fact that ADS inodes would not be in the dentry cache and hence
> > > > not visible to pathwalks at all then means that all of the issues
> > > > such as mounting over them, chroot, etc don't exist in the first
> > > > place...
> > >
> > > Wait, you've now switched from "this is dentry cache infrastructure"
> > > to "it should not be in the dentry cache".  So I don't understand what
> > > you're arguing for.
> >
> > Bloody wonderful, that.  So now we have struct file instances with no dentry
> > associated with them?  Which would have to be taken into account all over
> > the place...
> 
> It could have a temporary dentry allocated for the lifetime of the
> file and dropped on last dput.  I.e. there's a dentry, but no cache.
> Yeah, yeah, d_path() issues, however that one will have to be special
> cased anyway.

d_path() is the least of the problems, actually.  Directory tree structure on
those, OTOH, is a serious problem.  If you want to have getdents(2) on that
shite, you want an opened descriptor that looks like a directory.  And _that_
opens a large can of worms.  Because now you have fchdir(2) to cope with,
lookups going through /proc/self/fd/<n>/..., etc., etc.

Al, fully expecting "we'll special-case our way out of everything - how hard
could that be?" in response...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-29 18:04                                 ` Al Viro
@ 2020-08-29 18:22                                   ` Christian Schoenebeck
  2020-08-29 19:13                                   ` Miklos Szeredi
  1 sibling, 0 replies; 62+ messages in thread
From: Christian Schoenebeck @ 2020-08-29 18:22 UTC (permalink / raw)
  To: Al Viro, Dave Chinner
  Cc: Miklos Szeredi, Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Samstag, 29. August 2020 20:04:48 CEST Al Viro wrote:
> On Sat, Aug 29, 2020 at 07:51:47PM +0200, Miklos Szeredi wrote:
> > On Sat, Aug 29, 2020 at 6:14 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > On Sat, Aug 29, 2020 at 05:07:17PM +0100, Matthew Wilcox wrote:
> > > > > The fact that ADS inodes would not be in the dentry cache and hence
> > > > > not visible to pathwalks at all then means that all of the issues
> > > > > such as mounting over them, chroot, etc don't exist in the first
> > > > > place...
> > > > 
> > > > Wait, you've now switched from "this is dentry cache infrastructure"
> > > > to "it should not be in the dentry cache".  So I don't understand what
> > > > you're arguing for.
> > > 
> > > Bloody wonderful, that.  So now we have struct file instances with no
> > > dentry associated with them?  Which would have to be taken into account
> > > all over the place...
> > 
> > It could have a temporary dentry allocated for the lifetime of the
> > file and dropped on last dput.  I.e. there's a dentry, but no cache.
> > Yeah, yeah, d_path() issues, however that one will have to be special
> > cased anyway.
> 
> d_path() is the least of the problems, actually.  Directory tree structure
> on those, OTOH, is a serious problem.  If you want to have getdents(2) on
> that shite, you want an opened descriptor that looks like a directory.  And
> _that_ opens a large can of worms.  Because now you have fchdir(2) to cope
> with, lookups going through /proc/self/fd/<n>/..., etc., etc.
> 
> Al, fully expecting "we'll special-case our way out of everything - how hard
> could that be?" in response...

Independent of what and how all this is presented to user space, I think all 
this will only ever land if it does not deviate too much from the existing 
unified VFS model.

The most relevant change that I see is that (probably similar to Miklos) that 
a user visible file(/dir) kernel internally links a dedicated directory which 
contains the streams, but as far as the kernel is concerned, that's a 
directory, streams are files, they are still inodes, and they are still part 
of the dentry cache, etc.

Starting to handle ADS streams as some completely separate new thing in the 
model will most certainly just end up with much more code and problems than 
adding filters here and there for making certain things inaccessible from user 
space (e.g. prohibiting chdir() into that special directory, prevent mounting 
things onto ADS files, ot whatever other presentation measures might be 
desired for security reasons).

And no: stat(mainfile) must still return the block count of the main stream 
only, not any aggregated data, otherwise it will break user space. Thinks like 
'du' must explicitly be made ADS aware instead.

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-29 18:04                                 ` Al Viro
  2020-08-29 18:22                                   ` Christian Schoenebeck
@ 2020-08-29 19:13                                   ` Miklos Szeredi
  2020-08-29 19:25                                     ` Al Viro
  1 sibling, 1 reply; 62+ messages in thread
From: Miklos Szeredi @ 2020-08-29 19:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Matthew Wilcox, Dave Chinner, Christian Schoenebeck,
	Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Sat, Aug 29, 2020 at 8:04 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Aug 29, 2020 at 07:51:47PM +0200, Miklos Szeredi wrote:
> > On Sat, Aug 29, 2020 at 6:14 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > On Sat, Aug 29, 2020 at 05:07:17PM +0100, Matthew Wilcox wrote:
> > >
> >
> > > > > The fact that ADS inodes would not be in the dentry cache and hence
> > > > > not visible to pathwalks at all then means that all of the issues
> > > > > such as mounting over them, chroot, etc don't exist in the first
> > > > > place...
> > > >
> > > > Wait, you've now switched from "this is dentry cache infrastructure"
> > > > to "it should not be in the dentry cache".  So I don't understand what
> > > > you're arguing for.
> > >
> > > Bloody wonderful, that.  So now we have struct file instances with no dentry
> > > associated with them?  Which would have to be taken into account all over
> > > the place...
> >
> > It could have a temporary dentry allocated for the lifetime of the
> > file and dropped on last dput.  I.e. there's a dentry, but no cache.
> > Yeah, yeah, d_path() issues, however that one will have to be special
> > cased anyway.
>
> d_path() is the least of the problems, actually.  Directory tree structure on
> those, OTOH, is a serious problem.  If you want to have getdents(2) on that
> shite, you want an opened descriptor that looks like a directory.  And _that_
> opens a large can of worms.  Because now you have fchdir(2) to cope with,
> lookups going through /proc/self/fd/<n>/..., etc., etc.

Seriously, nobody wants fchdir().  And getdents() does not imply fchdir().

As for whether we'd need foobarat() on such a beast or let
/proc/self/fd/<n> be dereferenced, I think no.  So comes the argument:
 but then we'll break all those libraries and whatnot relying on these
constructs.  Well, sorry, so would we if we didn't introduce this in
the first place.  That's not really breaking anything, it's just
setting expectations.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-29 16:13                             ` Al Viro
  2020-08-29 17:51                               ` Miklos Szeredi
@ 2020-08-29 19:17                               ` Matthew Wilcox
  2020-08-29 19:40                                 ` Al Viro
  1 sibling, 1 reply; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-29 19:17 UTC (permalink / raw)
  To: Al Viro
  Cc: Dave Chinner, Christian Schoenebeck, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan,
	dwalsh, chirantan

On Sat, Aug 29, 2020 at 05:13:58PM +0100, Al Viro wrote:
> On Sat, Aug 29, 2020 at 05:07:17PM +0100, Matthew Wilcox wrote:
> 
> > I agree with you that supporting named streams within a file requires
> > an independent inode for each stream.  I disagree with you that this is
> > dentry cache infrastructure.  I do not believe in giving each stream
> > its own dentry.  Either they share the default stream's dentry, or they
> > have no dentry (mild preference for no dentry).
> 
> *blink*
> 
> Just how would they have different inodes while sharing a dentry?
> 
> > > The fact that ADS inodes would not be in the dentry cache and hence
> > > not visible to pathwalks at all then means that all of the issues
> > > such as mounting over them, chroot, etc don't exist in the first
> > > place...
> > 
> > Wait, you've now switched from "this is dentry cache infrastructure"
> > to "it should not be in the dentry cache".  So I don't understand what
> > you're arguing for.
> 
> Bloody wonderful, that.  So now we have struct file instances with no dentry
> associated with them?  Which would have to be taken into account all over
> the place...

I probably have the wrong nomenclature for what I'm proposing.

So here's a concrete API.  What questions need to be answered?

fd = open("real", O_RDWR);

// fetch stream names
sfd = open_stream(fd, NULL);
read(sfd, names, length);
close(sfd);

// open the first one
sfd = open_stream(fd, names);
read(sfd, buffer, buflen);
close(sfd);

// create a new anonymous stream
sfd = open_stream(fd, "");
write(sfd, buffer, buflen);
// name it
linkat(sfd, NULL, fd, "newstream", AT_EMPTY_PATH);
close(sfd);

 - Stream names are NUL terminated and may contain any other character.
   If you want to put a '/' in a stream name, that's fine, but there's
   no hierarchy.  Ditto "//../././../../..//./."  It's just a really
   oddly named stream.
 - linkat() will fail if 'fd' does not match where 'sfd' was created.
 - open_stream() always creates a new stream when a zero-length string is
   specified.
 - open_stream() returns ENOENT if there is no stream by that name (ie the
   only way to create a stream is to specify no name, and then name
   it later).
 - sfd inherits the appropriate O_ flags from fd (O_RDWR, O_CLOEXEC, ...)
 - open_stream(sfd) is ENOTTY.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-29 19:13                                   ` Miklos Szeredi
@ 2020-08-29 19:25                                     ` Al Viro
  2020-08-30 19:05                                       ` Miklos Szeredi
  0 siblings, 1 reply; 62+ messages in thread
From: Al Viro @ 2020-08-29 19:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Matthew Wilcox, Dave Chinner, Christian Schoenebeck,
	Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Sat, Aug 29, 2020 at 09:13:24PM +0200, Miklos Szeredi wrote:

> > d_path() is the least of the problems, actually.  Directory tree structure on
> > those, OTOH, is a serious problem.  If you want to have getdents(2) on that
> > shite, you want an opened descriptor that looks like a directory.  And _that_
> > opens a large can of worms.  Because now you have fchdir(2) to cope with,
> > lookups going through /proc/self/fd/<n>/..., etc., etc.
> 
> Seriously, nobody wants fchdir().  And getdents() does not imply fchdir().

Yes, it does.  If it's a directory, fchdir(2) gets to deal with it.
If it's not, no getdents(2).  Unless you special-case the damn thing in
said fchdir(2).

> As for whether we'd need foobarat() on such a beast or let
> /proc/self/fd/<n> be dereferenced, I think no.  So comes the argument:
>  but then we'll break all those libraries and whatnot relying on these
> constructs.  Well, sorry, so would we if we didn't introduce this in
> the first place.  That's not really breaking anything, it's just
> setting expectations.

Translation: we'll special-case that in procfs, etc., etc. and handwave
the problems away.  Lovely...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-29 19:17                               ` Matthew Wilcox
@ 2020-08-29 19:40                                 ` Al Viro
  2020-08-29 20:12                                   ` Matthew Wilcox
  0 siblings, 1 reply; 62+ messages in thread
From: Al Viro @ 2020-08-29 19:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dave Chinner, Christian Schoenebeck, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan,
	dwalsh, chirantan

On Sat, Aug 29, 2020 at 08:17:51PM +0100, Matthew Wilcox wrote:

> I probably have the wrong nomenclature for what I'm proposing.
> 
> So here's a concrete API.  What questions need to be answered?
> 
> fd = open("real", O_RDWR);
> 
> // fetch stream names
> sfd = open_stream(fd, NULL);
> read(sfd, names, length);

	1) what does fstat() on sfd return?
	2) what does keeping it open do to underlying file?
	3) what happens to it if that underlying file is unlinked?
	4) what does it do to underlying filesystem?  Can it be unmounted?

> close(sfd);

> 
> // open the first one
> sfd = open_stream(fd, names);
> read(sfd, buffer, buflen);
> close(sfd);
> 
> // create a new anonymous stream
> sfd = open_stream(fd, "");
> write(sfd, buffer, buflen);
> // name it
> linkat(sfd, NULL, fd, "newstream", AT_EMPTY_PATH);

Oh, lovely - so linkat() *CAN* get that for dirfd and must somehow tell
it from the normal case.  With the semantics entirely unrelated to the normal
one.  And on top of everything else, we have
	5) what are the permissions involved?  When are they determined, BTW?

> close(sfd);
> 
>  - Stream names are NUL terminated and may contain any other character.
>    If you want to put a '/' in a stream name, that's fine, but there's
>    no hierarchy.  Ditto "//../././../../..//./."  It's just a really
>    oddly named stream.

Er...  Whatever for?

>  - linkat() will fail if 'fd' does not match where 'sfd' was created.

	6) "match" in the above being what, exactly?

Incidentally, how do you remove those?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-29 19:40                                 ` Al Viro
@ 2020-08-29 20:12                                   ` Matthew Wilcox
  2020-08-31 14:23                                     ` Theodore Y. Ts'o
  0 siblings, 1 reply; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-29 20:12 UTC (permalink / raw)
  To: Al Viro
  Cc: Dave Chinner, Christian Schoenebeck, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, stefanha, mszeredi, vgoyal, gscrivan,
	dwalsh, chirantan

On Sat, Aug 29, 2020 at 08:40:42PM +0100, Al Viro wrote:
> On Sat, Aug 29, 2020 at 08:17:51PM +0100, Matthew Wilcox wrote:
> 
> > I probably have the wrong nomenclature for what I'm proposing.
> > 
> > So here's a concrete API.  What questions need to be answered?
> > 
> > fd = open("real", O_RDWR);
> > 
> > // fetch stream names
> > sfd = open_stream(fd, NULL);
> > read(sfd, names, length);
> 
> 	1) what does fstat() on sfd return?

My strawman answers:

 - st_dev, st_ino, st_uid, st_gid, st_rdev, st_blksize are those of the
   containing file
 - st_mode: S_IFREG | parent & 0777
 - st_nlink: 1
 - st_size, st_blocks st_atime, st_mtime, st_ctime: as appropriate

> 	2) what does keeping it open do to underlying file?

I don't have a solid answer here.  Maybe it keeps a reference count on
the underlying inode?  Obviously we need to prevent the superblock from
disappearing from under it.  Maybe it needs to keep a refcount on the
struct file it was spawned from.  I haven't thought this through yet.

> 	3) what happens to it if that underlying file is unlinked?

Unlinking a file necessarily unlinks all the streams.  So the file
remains in existance until all fds on it are closed, including all
the streams.

> 	4) what does it do to underlying filesystem?  Can it be unmounted?

I think I covered that in the earlier answers.

> > // create a new anonymous stream
> > sfd = open_stream(fd, "");
> > write(sfd, buffer, buflen);
> > // name it
> > linkat(sfd, NULL, fd, "newstream", AT_EMPTY_PATH);
> 
> Oh, lovely - so linkat() *CAN* get that for dirfd and must somehow tell
> it from the normal case.  With the semantics entirely unrelated to the normal
> one.

I'm open to just using a different syscall.  link_stream(sfd, "newstream");
And, as you point out below, we need unlink_stream(fd, "stream");

> And on top of everything else, we have
> 	5) what are the permissions involved?  When are they determined, BTW?

If you can open a file, you can open its streams.  So an O_PATH file
descriptor can't be used to open streams.

> > close(sfd);
> > 
> >  - Stream names are NUL terminated and may contain any other character.
> >    If you want to put a '/' in a stream name, that's fine, but there's
> >    no hierarchy.  Ditto "//../././../../..//./."  It's just a really
> >    oddly named stream.
> 
> Er...  Whatever for?

Interoperability.  If some other system creates a stream with a '/' in
it, I don't want the filesystem to have to convert.  Although, at least
Windows doesn't permit '/' in stream names [1] [2].  Of course, individual
filesystems could reject characters in names that they don't like.

[1] https://docs.microsoft.com/en-us/windows/win32/fileio/file-streams
[2] https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file

> >  - linkat() will fail if 'fd' does not match where 'sfd' was created.
> 
> 	6) "match" in the above being what, exactly?

Referring to a different inode than the one it was created in.  Although
if we just go with the link_stream() proposal above, then this point is
moot.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-29 19:25                                     ` Al Viro
@ 2020-08-30 19:05                                       ` Miklos Szeredi
  2020-08-30 19:10                                         ` Matthew Wilcox
  0 siblings, 1 reply; 62+ messages in thread
From: Miklos Szeredi @ 2020-08-30 19:05 UTC (permalink / raw)
  To: Al Viro
  Cc: Matthew Wilcox, Dave Chinner, Christian Schoenebeck,
	Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Sat, Aug 29, 2020 at 9:25 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Aug 29, 2020 at 09:13:24PM +0200, Miklos Szeredi wrote:
>
> > > d_path() is the least of the problems, actually.  Directory tree structure on
> > > those, OTOH, is a serious problem.  If you want to have getdents(2) on that
> > > shite, you want an opened descriptor that looks like a directory.  And _that_
> > > opens a large can of worms.  Because now you have fchdir(2) to cope with,
> > > lookups going through /proc/self/fd/<n>/..., etc., etc.
> >
> > Seriously, nobody wants fchdir().  And getdents() does not imply fchdir().
>
> Yes, it does.  If it's a directory, fchdir(2) gets to deal with it.
> If it's not, no getdents(2).  Unless you special-case the damn thing in
> said fchdir(2).

Huh?  f_op->iterate() needed for getdents(2) and i_op->lookup() needed
for fchdir(2).

Yes, open(..., O_ALT) would be special.  Let's call it open_alt(2) to
avoid confusion with normal open on a normal filesystem.   No special
casing anywhere at all.   It's a completely new interface that returns
a file which either has ->read/write() or ->iterate() and which points
to an inode with empty i_ops.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-30 19:05                                       ` Miklos Szeredi
@ 2020-08-30 19:10                                         ` Matthew Wilcox
  2020-08-31  7:34                                           ` Miklos Szeredi
  0 siblings, 1 reply; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-30 19:10 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, Dave Chinner, Christian Schoenebeck,
	Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Sun, Aug 30, 2020 at 09:05:40PM +0200, Miklos Szeredi wrote:
> Yes, open(..., O_ALT) would be special.  Let's call it open_alt(2) to
> avoid confusion with normal open on a normal filesystem.   No special
> casing anywhere at all.   It's a completely new interface that returns
> a file which either has ->read/write() or ->iterate() and which points
> to an inode with empty i_ops.

I think fiemap() should be allowed on a stream.  After all, these extents
do exist.  But I'm opposed to allowing getdents(); it'll only encourage
people to think they can have non-files as streams.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-30 19:10                                         ` Matthew Wilcox
@ 2020-08-31  7:34                                           ` Miklos Szeredi
  2020-08-31 11:37                                             ` Matthew Wilcox
  0 siblings, 1 reply; 62+ messages in thread
From: Miklos Szeredi @ 2020-08-31  7:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Al Viro, Dave Chinner, Christian Schoenebeck,
	Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Sun, Aug 30, 2020 at 9:10 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sun, Aug 30, 2020 at 09:05:40PM +0200, Miklos Szeredi wrote:
> > Yes, open(..., O_ALT) would be special.  Let's call it open_alt(2) to
> > avoid confusion with normal open on a normal filesystem.   No special
> > casing anywhere at all.   It's a completely new interface that returns
> > a file which either has ->read/write() or ->iterate() and which points
> > to an inode with empty i_ops.
>
> I think fiemap() should be allowed on a stream.  After all, these extents
> do exist.  But I'm opposed to allowing getdents(); it'll only encourage
> people to think they can have non-files as streams.

Call it whatever you want.  I think getdents (without lseek!!!)  is a
fine interface for enumeration.

Also let me stress again, that this ALT thing is not just about
streams, but a generic interface for getting OOB/meta/whatever data
for a given inode/path.  Hence it must have a depth of at least 2, but
limiting it to 2 would again be shortsighted.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-31  7:34                                           ` Miklos Szeredi
@ 2020-08-31 11:37                                             ` Matthew Wilcox
  2020-08-31 11:51                                               ` Miklos Szeredi
  0 siblings, 1 reply; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-31 11:37 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, Dave Chinner, Dr. David Alan Gilbert, Greg Kurz,
	linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal,
	Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Mon, Aug 31, 2020 at 09:34:20AM +0200, Miklos Szeredi wrote:
> On Sun, Aug 30, 2020 at 9:10 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Sun, Aug 30, 2020 at 09:05:40PM +0200, Miklos Szeredi wrote:
> > > Yes, open(..., O_ALT) would be special.  Let's call it open_alt(2) to
> > > avoid confusion with normal open on a normal filesystem.   No special
> > > casing anywhere at all.   It's a completely new interface that returns
> > > a file which either has ->read/write() or ->iterate() and which points
> > > to an inode with empty i_ops.
> >
> > I think fiemap() should be allowed on a stream.  After all, these extents
> > do exist.  But I'm opposed to allowing getdents(); it'll only encourage
> > people to think they can have non-files as streams.
> 
> Call it whatever you want.  I think getdents (without lseek!!!)  is a
> fine interface for enumeration.
> 
> Also let me stress again, that this ALT thing is not just about
> streams, but a generic interface for getting OOB/meta/whatever data
> for a given inode/path.  Hence it must have a depth of at least 2, but
> limiting it to 2 would again be shortsighted.

As I said to Dave, you and I have a strong difference of opinion here.
I think that what you are proposing is madness.  You're making it too
flexible which comes with too much opportunity for abuse.  I just want
to see alternate data streams for the same filename in order to support
existing use cases.  You seem to be able to want to create an entire
new world inside a file, and that's just too confusing.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-31 11:37                                             ` Matthew Wilcox
@ 2020-08-31 11:51                                               ` Miklos Szeredi
  2020-08-31 13:23                                                 ` Matthew Wilcox
  0 siblings, 1 reply; 62+ messages in thread
From: Miklos Szeredi @ 2020-08-31 11:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Al Viro, Dave Chinner, Dr. David Alan Gilbert, Greg Kurz,
	linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal,
	Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Mon, Aug 31, 2020 at 1:37 PM Matthew Wilcox <willy@infradead.org> wrote:

> As I said to Dave, you and I have a strong difference of opinion here.
> I think that what you are proposing is madness.  You're making it too
> flexible which comes with too much opportunity for abuse.

Such as?

>  I just want
> to see alternate data streams for the same filename in order to support
> existing use cases.  You seem to be able to want to create an entire
> new world inside a file, and that's just too confusing.

To whom?  I'm sure users of ancient systems with a flat directory
found directory trees very confusing.  Yet it turned out that the
hierarchical system beat the heck out of the flat one.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-31 11:51                                               ` Miklos Szeredi
@ 2020-08-31 13:23                                                 ` Matthew Wilcox
  2020-08-31 14:21                                                   ` Miklos Szeredi
                                                                     ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-31 13:23 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, Dave Chinner, Dr. David Alan Gilbert, Greg Kurz,
	linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal,
	Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Mon, Aug 31, 2020 at 01:51:20PM +0200, Miklos Szeredi wrote:
> On Mon, Aug 31, 2020 at 1:37 PM Matthew Wilcox <willy@infradead.org> wrote:
> 
> > As I said to Dave, you and I have a strong difference of opinion here.
> > I think that what you are proposing is madness.  You're making it too
> > flexible which comes with too much opportunity for abuse.
> 
> Such as?

One proposal I saw earlier in this thread was to do something like
$ runalt /path/to/file ls
which would open_alt() /path/to/file, fchdir to it and run ls inside it.
That's just crazy.

> >  I just want
> > to see alternate data streams for the same filename in order to support
> > existing use cases.  You seem to be able to want to create an entire
> > new world inside a file, and that's just too confusing.
> 
> To whom?  I'm sure users of ancient systems with a flat directory
> found directory trees very confusing.  Yet it turned out that the
> hierarchical system beat the heck out of the flat one.

Which doesn't mean that multiple semi-hidden hierarchies are going to
be better than one visible hierarchy.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-31 13:23                                                 ` Matthew Wilcox
@ 2020-08-31 14:21                                                   ` Miklos Szeredi
  2020-08-31 14:25                                                   ` Theodore Y. Ts'o
  2020-08-31 18:02                                                   ` Andreas Dilger
  2 siblings, 0 replies; 62+ messages in thread
From: Miklos Szeredi @ 2020-08-31 14:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Al Viro, Dave Chinner, Dr. David Alan Gilbert, Greg Kurz,
	linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal,
	Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Mon, Aug 31, 2020 at 3:23 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Aug 31, 2020 at 01:51:20PM +0200, Miklos Szeredi wrote:
> > On Mon, Aug 31, 2020 at 1:37 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > > As I said to Dave, you and I have a strong difference of opinion here.
> > > I think that what you are proposing is madness.  You're making it too
> > > flexible which comes with too much opportunity for abuse.
> >
> > Such as?
>
> One proposal I saw earlier in this thread was to do something like
> $ runalt /path/to/file ls
> which would open_alt() /path/to/file, fchdir to it and run ls inside it.
> That's just crazy.

Indeed, I have said numerous times that fchdir() on those objects must
not happen.  But there's no law (that I know of) that says all
hierarchies of files must allow fchdir().

> > >  I just want
> > > to see alternate data streams for the same filename in order to support
> > > existing use cases.  You seem to be able to want to create an entire
> > > new world inside a file, and that's just too confusing.
> >
> > To whom?  I'm sure users of ancient systems with a flat directory
> > found directory trees very confusing.  Yet it turned out that the
> > hierarchical system beat the heck out of the flat one.
>
> Which doesn't mean that multiple semi-hidden hierarchies are going to
> be better than one visible hierarchy.

Look how metadata interfaces for inodes are already fragmented:

 - stat (zillions of versions due to field sizes)
 - statx (hopefully good for some time)
 - getxattr
 - FS_IOC_GETFLAGS
 - FS_IOC_FSGETXATTR (nothing to do with the "other" xattr)
 - FS_IOC_FIEMAP
 - all the filesystem specific stuff (encryption, compression, whatever)

And now you are proposing to add yet another interface specific to ADS.

What about a generic interface instead for most future use cases as
well as possibly duplicating some of the existing ones?  This would
simplify userspace tooling and allow for a single generic internal
interface as well.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-29 20:12                                   ` Matthew Wilcox
@ 2020-08-31 14:23                                     ` Theodore Y. Ts'o
  2020-08-31 14:40                                       ` Matthew Wilcox
  2020-08-31 16:11                                       ` Christian Schoenebeck
  0 siblings, 2 replies; 62+ messages in thread
From: Theodore Y. Ts'o @ 2020-08-31 14:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Al Viro, Dave Chinner, Christian Schoenebeck,
	Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel, stefanha,
	mszeredi, vgoyal, gscrivan, dwalsh, chirantan

On Sat, Aug 29, 2020 at 09:12:45PM +0100, Matthew Wilcox wrote:
> > 	3) what happens to it if that underlying file is unlinked?
> 
> Unlinking a file necessarily unlinks all the streams.  So the file
> remains in existance until all fds on it are closed, including all
> the streams.

That's a bad idea, because if the fds are closed silently, then they
can be reused; and then if the userspace library tries to write to
what it *thinks* is an ADS file, not knowing that the application has
unlinked and closed the ADS file, user file data would be lost.

What we would want instead (if we want to pursue the madness of ADS,
which I don't), is something like the effects of a BSD-style revoke(2)
system call, which causes all attempts to operate on said file
descriptor to return an error and/or EOF after the fd has been
revoked.

				- Ted

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-31 13:23                                                 ` Matthew Wilcox
  2020-08-31 14:21                                                   ` Miklos Szeredi
@ 2020-08-31 14:25                                                   ` Theodore Y. Ts'o
  2020-08-31 14:45                                                     ` Matthew Wilcox
  2020-09-01  3:34                                                     ` Dave Chinner
  2020-08-31 18:02                                                   ` Andreas Dilger
  2 siblings, 2 replies; 62+ messages in thread
From: Theodore Y. Ts'o @ 2020-08-31 14:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Miklos Szeredi, Al Viro, Dave Chinner, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi,
	Vivek Goyal, Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Mon, Aug 31, 2020 at 02:23:39PM +0100, Matthew Wilcox wrote:
> On Mon, Aug 31, 2020 at 01:51:20PM +0200, Miklos Szeredi wrote:
> > On Mon, Aug 31, 2020 at 1:37 PM Matthew Wilcox <willy@infradead.org> wrote:
> > 
> > > As I said to Dave, you and I have a strong difference of opinion here.
> > > I think that what you are proposing is madness.  You're making it too
> > > flexible which comes with too much opportunity for abuse.
> > 
> > Such as?
> 
> One proposal I saw earlier in this thread was to do something like
> $ runalt /path/to/file ls
> which would open_alt() /path/to/file, fchdir to it and run ls inside it.
> That's just crazy.

As I've said before, malware authors would love that features.  Most
system administrators won't.

Oh, one other question about ADS; if a file system supports reflink,
what is supposed to happen when you reflink a file?  You have to
consider all of the ADS's to be reflinked as well?  In some ways, this
is good, because the overhead and complexity will probably cause most
file system maintainers to throw up their had, say this is madness,
and refuse to implement it.  :-)

						- Ted

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-31 14:23                                     ` Theodore Y. Ts'o
@ 2020-08-31 14:40                                       ` Matthew Wilcox
  2020-08-31 16:11                                       ` Christian Schoenebeck
  1 sibling, 0 replies; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-31 14:40 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Al Viro, Dave Chinner, Christian Schoenebeck,
	Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel, stefanha,
	mszeredi, vgoyal, gscrivan, dwalsh, chirantan

On Mon, Aug 31, 2020 at 10:23:12AM -0400, Theodore Y. Ts'o wrote:
> On Sat, Aug 29, 2020 at 09:12:45PM +0100, Matthew Wilcox wrote:
> > > 	3) what happens to it if that underlying file is unlinked?
> > 
> > Unlinking a file necessarily unlinks all the streams.  So the file
> > remains in existance until all fds on it are closed, including all
> > the streams.
> 
> That's a bad idea, because if the fds are closed silently, then they

What?  I think you completely misread me.  I never said anything about
closing file descriptors.  I'm proprosing standard unix semantics;
having a file descriptor open keeps a file in existance, even after
it's unlinked.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-31 14:25                                                   ` Theodore Y. Ts'o
@ 2020-08-31 14:45                                                     ` Matthew Wilcox
  2020-08-31 14:49                                                       ` Miklos Szeredi
  2020-09-01  3:34                                                     ` Dave Chinner
  1 sibling, 1 reply; 62+ messages in thread
From: Matthew Wilcox @ 2020-08-31 14:45 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Miklos Szeredi, Al Viro, Dave Chinner, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi,
	Vivek Goyal, Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Mon, Aug 31, 2020 at 10:25:32AM -0400, Theodore Y. Ts'o wrote:
> Oh, one other question about ADS; if a file system supports reflink,
> what is supposed to happen when you reflink a file?  You have to
> consider all of the ADS's to be reflinked as well?  In some ways, this
> is good, because the overhead and complexity will probably cause most
> file system maintainers to throw up their had, say this is madness,
> and refuse to implement it.  :-)

Why is it so much harder than reflinking all the xattrs on a file?

If one thinks that Miklos' crazypants infinite hierarchy is the way to
go, then yes, this is absurdly complex.  If these are closer in spirit
to being seekable xattrs, then it looks a lot more managable.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-31 14:45                                                     ` Matthew Wilcox
@ 2020-08-31 14:49                                                       ` Miklos Szeredi
  0 siblings, 0 replies; 62+ messages in thread
From: Miklos Szeredi @ 2020-08-31 14:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Theodore Y. Ts'o, Al Viro, Dave Chinner,
	Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	Stefan Hajnoczi, Miklos Szeredi, Vivek Goyal, Giuseppe Scrivano,
	Daniel J Walsh, Chirantan Ekbote

On Mon, Aug 31, 2020 at 4:45 PM Matthew Wilcox <willy@infradead.org> wrote:

> If one thinks that Miklos' crazypants infinite hierarchy is the way to

Oh, I care about ADS *implementation* not a wee bit.  You can
implement that as a flat structure, or not implement it at all.

What I care about is the *interface*.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-31 14:23                                     ` Theodore Y. Ts'o
  2020-08-31 14:40                                       ` Matthew Wilcox
@ 2020-08-31 16:11                                       ` Christian Schoenebeck
  1 sibling, 0 replies; 62+ messages in thread
From: Christian Schoenebeck @ 2020-08-31 16:11 UTC (permalink / raw)
  To: Theodore Y. Ts'o, Al Viro
  Cc: Dave Chinner, Dr. David Alan Gilbert, Greg Kurz, linux-fsdevel,
	stefanha, mszeredi, vgoyal, gscrivan, dwalsh, chirantan

On Sonntag, 30. August 2020 21:05:40 CEST Miklos Szeredi wrote:
> On Sat, Aug 29, 2020 at 9:25 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Sat, Aug 29, 2020 at 09:13:24PM +0200, Miklos Szeredi wrote:
> > > > d_path() is the least of the problems, actually.  Directory tree
> > > > structure on those, OTOH, is a serious problem.  If you want to have
> > > > getdents(2) on that shite, you want an opened descriptor that looks
> > > > like a directory.  And _that_ opens a large can of worms.  Because
> > > > now you have fchdir(2) to cope with, lookups going through
> > > > /proc/self/fd/<n>/..., etc., etc.
> > > 
> > > Seriously, nobody wants fchdir().  And getdents() does not imply
> > > fchdir().
> > 
> > Yes, it does.  If it's a directory, fchdir(2) gets to deal with it.
> > If it's not, no getdents(2).  Unless you special-case the damn thing in
> > said fchdir(2).
> 
> Huh?  f_op->iterate() needed for getdents(2) and i_op->lookup() needed
> for fchdir(2).
> 
> Yes, open(..., O_ALT) would be special.  Let's call it open_alt(2) to
> avoid confusion with normal open on a normal filesystem.   No special
> casing anywhere at all.   It's a completely new interface that returns
> a file which either has ->read/write() or ->iterate() and which points
> to an inode with empty i_ops.

Wouldn't that be overkill to introduce a new syscall just for that?
My {disclaimer: quick & naive} approach would be sticking a new flag 
S_ALT_WHATEVER onto i_flags maybe? And hard code denial in 
inode_permission(MAY_EXEC) if that S_ALT_WHATEVER flag is present? Then you 
can getdents() but not fchdir() into it, if I am not missing something.

On Montag, 31. August 2020 09:34:20 CEST Miklos Szeredi wrote:
> On Sun, Aug 30, 2020 at 9:10 PM Matthew Wilcox <willy@infradead.org> wrote:
> > On Sun, Aug 30, 2020 at 09:05:40PM +0200, Miklos Szeredi wrote:
> > > Yes, open(..., O_ALT) would be special.  Let's call it open_alt(2) to
> > > avoid confusion with normal open on a normal filesystem.   No special
> > > casing anywhere at all.   It's a completely new interface that returns
> > > a file which either has ->read/write() or ->iterate() and which points
> > > to an inode with empty i_ops.
> > 
> > I think fiemap() should be allowed on a stream.  After all, these extents
> > do exist.  But I'm opposed to allowing getdents(); it'll only encourage
> > people to think they can have non-files as streams.
> 
> Call it whatever you want.  I think getdents (without lseek!!!)  is a
> fine interface for enumeration.
> 
> Also let me stress again, that this ALT thing is not just about
> streams, but a generic interface for getting OOB/meta/whatever data
> for a given inode/path.  Hence it must have a depth of at least 2, but
> limiting it to 2 would again be shortsighted.

Al, feeling about these two issues?

On Montag, 31. August 2020 16:23:12 CEST Theodore Y. Ts'o wrote:
> On Sat, Aug 29, 2020 at 09:12:45PM +0100, Matthew Wilcox wrote:
> > > 	3) what happens to it if that underlying file is unlinked?
> > 
> > Unlinking a file necessarily unlinks all the streams.  So the file
> > remains in existance until all fds on it are closed, including all
> > the streams.
> 
> That's a bad idea, because if the fds are closed silently, then they
> can be reused; and then if the userspace library tries to write to
> what it *thinks* is an ADS file, not knowing that the application has
> unlinked and closed the ADS file, user file data would be lost.

Why would that be bad with ADS while it is Ok with regular files right now?

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-31 13:23                                                 ` Matthew Wilcox
  2020-08-31 14:21                                                   ` Miklos Szeredi
  2020-08-31 14:25                                                   ` Theodore Y. Ts'o
@ 2020-08-31 18:02                                                   ` Andreas Dilger
  2020-09-01  3:48                                                     ` Dave Chinner
  2 siblings, 1 reply; 62+ messages in thread
From: Andreas Dilger @ 2020-08-31 18:02 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Miklos Szeredi, Al Viro, Dave Chinner, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi,
	Vivek Goyal, Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

[-- Attachment #1: Type: text/plain, Size: 2287 bytes --]

On Aug 31, 2020, at 7:23 AM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Mon, Aug 31, 2020 at 01:51:20PM +0200, Miklos Szeredi wrote:
>> On Mon, Aug 31, 2020 at 1:37 PM Matthew Wilcox <willy@infradead.org> wrote:
>> 
>>> As I said to Dave, you and I have a strong difference of opinion here.
>>> I think that what you are proposing is madness.  You're making it too
>>> flexible which comes with too much opportunity for abuse.
>> 
>> Such as?
> 
> One proposal I saw earlier in this thread was to do something like
> $ runalt /path/to/file ls
> which would open_alt() /path/to/file, fchdir to it and run ls inside it.
> That's just crazy.
> 
>>> I just want
>>> to see alternate data streams for the same filename in order to support
>>> existing use cases.  You seem to be able to want to create an entire
>>> new world inside a file, and that's just too confusing.
>> 
>> To whom?  I'm sure users of ancient systems with a flat directory
>> found directory trees very confusing.  Yet it turned out that the
>> hierarchical system beat the heck out of the flat one.
> 
> Which doesn't mean that multiple semi-hidden hierarchies are going to
> be better than one visible hierarchy.

I can see the use of ADS for "additional information" about a single file
(e.g. verity Merkle tree with checksums of the file data) that are too big
to put into an xattr and/or need random updates.  However, I don't see the
benefits of attaching a whole arbitrary set of files to a single filename.

If people want a whole hierarchy of directories contained within a single
file, why not use a container (e.g. ext4 filesystem image) to hold all of
that?  That allows an arbitrary group of files/directories/permissions to
be applied to a tree of files, but the container can be copied or removed
atomically as needed?

Using a filesystem image as the container is (IMHO) preferable to using a
tarball or similar, because it can be randomly updated after creation, and
already has all of the semantics needed.

The main thing that is needed is some mechanism that users can access that
decides whether access to the image is as a file, or if processed should
automount the image and descend into the contained namespace.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-31 14:25                                                   ` Theodore Y. Ts'o
  2020-08-31 14:45                                                     ` Matthew Wilcox
@ 2020-09-01  3:34                                                     ` Dave Chinner
  2020-09-01 14:52                                                       ` Theodore Y. Ts'o
  1 sibling, 1 reply; 62+ messages in thread
From: Dave Chinner @ 2020-09-01  3:34 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Matthew Wilcox, Miklos Szeredi, Al Viro, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi,
	Vivek Goyal, Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Mon, Aug 31, 2020 at 10:25:32AM -0400, Theodore Y. Ts'o wrote:
> On Mon, Aug 31, 2020 at 02:23:39PM +0100, Matthew Wilcox wrote:
> > On Mon, Aug 31, 2020 at 01:51:20PM +0200, Miklos Szeredi wrote:
> > > On Mon, Aug 31, 2020 at 1:37 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > 
> > > > As I said to Dave, you and I have a strong difference of opinion here.
> > > > I think that what you are proposing is madness.  You're making it too
> > > > flexible which comes with too much opportunity for abuse.
> > > 
> > > Such as?
> > 
> > One proposal I saw earlier in this thread was to do something like
> > $ runalt /path/to/file ls
> > which would open_alt() /path/to/file, fchdir to it and run ls inside it.
> > That's just crazy.
> 
> As I've said before, malware authors would love that features.  Most
> system administrators won't.
> 
> Oh, one other question about ADS; if a file system supports reflink,
> what is supposed to happen when you reflink a file?  You have to
> consider all of the ADS's to be reflinked as well? 

Absolutely.

But, unlike your implication that this is -really complex and hard
to do-, it's actually relatively trivial to do with the XFS
implementation I mentioned as each ADS stream is a fully fledged
inode that can point to shared data extents. If you can do data
manipulation on a regular inode, you'll be able to do it on an ADS,
and that includes copying ADS streams via reflink.

Indeed, this actually makes the 'cp' utility able to support ADS
wihtout modification. i.e 'cp --reflink=always' will "copy" ADS data
automatically, without even needing to be aware they exist....

Such behaviour is almost certainly no harder to implement in XFS as
an atomic, recoverable operation than the "unlink removes all the
ADSs attached to the inode" requirement.....

> In some ways, this
> is good, because the overhead and complexity will probably cause most
> file system maintainers to throw up their had, say this is madness,
> and refuse to implement it.  :-)

I disagree: requiring reflink to actually "copy" ADS transparently
actually makes things easier for userspace support of ADS.  Thanks
for suggesting the idea, Ted. :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-08-31 18:02                                                   ` Andreas Dilger
@ 2020-09-01  3:48                                                     ` Dave Chinner
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Chinner @ 2020-09-01  3:48 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Matthew Wilcox, Miklos Szeredi, Al Viro, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi,
	Vivek Goyal, Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Mon, Aug 31, 2020 at 12:02:56PM -0600, Andreas Dilger wrote:
> On Aug 31, 2020, at 7:23 AM, Matthew Wilcox <willy@infradead.org> wrote:
> > 
> > On Mon, Aug 31, 2020 at 01:51:20PM +0200, Miklos Szeredi wrote:
> >> On Mon, Aug 31, 2020 at 1:37 PM Matthew Wilcox <willy@infradead.org> wrote:
> >> 
> >>> As I said to Dave, you and I have a strong difference of opinion here.
> >>> I think that what you are proposing is madness.  You're making it too
> >>> flexible which comes with too much opportunity for abuse.
> >> 
> >> Such as?
> > 
> > One proposal I saw earlier in this thread was to do something like
> > $ runalt /path/to/file ls
> > which would open_alt() /path/to/file, fchdir to it and run ls inside it.
> > That's just crazy.
> > 
> >>> I just want
> >>> to see alternate data streams for the same filename in order to support
> >>> existing use cases.  You seem to be able to want to create an entire
> >>> new world inside a file, and that's just too confusing.
> >> 
> >> To whom?  I'm sure users of ancient systems with a flat directory
> >> found directory trees very confusing.  Yet it turned out that the
> >> hierarchical system beat the heck out of the flat one.
> > 
> > Which doesn't mean that multiple semi-hidden hierarchies are going to
> > be better than one visible hierarchy.
> 
> I can see the use of ADS for "additional information" about a single file
> (e.g. verity Merkle tree with checksums of the file data) that are too big
> to put into an xattr and/or need random updates.  However, I don't see the
> benefits of attaching a whole arbitrary set of files to a single filename.
> 
> If people want a whole hierarchy of directories contained within a single
> file, why not use a container (e.g. ext4 filesystem image) to hold all of
> that?  That allows an arbitrary group of files/directories/permissions to
> be applied to a tree of files, but the container can be copied or removed
> atomically as needed?
> 
> Using a filesystem image as the container is (IMHO) preferable to using a
> tarball or similar, because it can be randomly updated after creation, and
> already has all of the semantics needed.

Yup, that's pretty much the premise behind the XFS subvolume stuff I
was exploring a while back. The file user data fork contains a
filesystem image, and the filesystem can mount them where-ever it
wants and manipulates the internal state as if it's just another
filesytem. It's essentially the equivalent of virtual LBA address
space mapping layer above the block layer.

And if your user data fork is capable of reflink and COW, then you
have atomically snapshottable virtually mapped filesystem containers
a.k.a. subvolumes.....

> The main thing that is needed is some mechanism that users can access that
> decides whether access to the image is as a file, or if processed should
> automount the image and descend into the contained namespace.

XFS used to have a IF_UUID inode type that was intended on Irix to
be a filesystem referral indicator. Kinda like a symlink, but it
just contained a UUID rather than a path. Traversing a IF_UUID inode
in the path would result in calling out to userspace to find the
filesystem with that UUID and automounting it in place, then it
would restart the path resolution and walk directly into the
filesystem that got mounted...

CHeers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-09-01  3:34                                                     ` Dave Chinner
@ 2020-09-01 14:52                                                       ` Theodore Y. Ts'o
  2020-09-01 15:14                                                         ` Theodore Y. Ts'o
  0 siblings, 1 reply; 62+ messages in thread
From: Theodore Y. Ts'o @ 2020-09-01 14:52 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Matthew Wilcox, Miklos Szeredi, Al Viro, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi,
	Vivek Goyal, Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Tue, Sep 01, 2020 at 01:34:05PM +1000, Dave Chinner wrote:
> 
> But, unlike your implication that this is -really complex and hard
> to do-, it's actually relatively trivial to do with the XFS
> implementation I mentioned as each ADS stream is a fully fledged
> inode that can point to shared data extents. If you can do data
> manipulation on a regular inode, you'll be able to do it on an ADS,
> and that includes copying ADS streams via reflink.

Is the reflink system call on a file with ADS's atomic, or not?  What
if there are a million files is ADS hierarchy which is 100
subdirectories deep in some places, comprising several TB's worth of
data?  Is that all going to fit in a single XFS transaction?  What if
you crash in the middle of it?  Is a partially reflinked copy of an
ADS file OK?  Or a reflinked ADS file missing some portion of the
alternate data streams?

	      	   	   		- Ted

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-09-01 14:52                                                       ` Theodore Y. Ts'o
@ 2020-09-01 15:14                                                         ` Theodore Y. Ts'o
  2020-09-02  5:19                                                           ` Dave Chinner
  0 siblings, 1 reply; 62+ messages in thread
From: Theodore Y. Ts'o @ 2020-09-01 15:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Matthew Wilcox, Miklos Szeredi, Al Viro, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi,
	Vivek Goyal, Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Tue, Sep 01, 2020 at 10:52:05AM -0400, Theodore Y. Ts'o wrote:
> On Tue, Sep 01, 2020 at 01:34:05PM +1000, Dave Chinner wrote:
> > 
> > But, unlike your implication that this is -really complex and hard
> > to do-, it's actually relatively trivial to do with the XFS
> > implementation I mentioned as each ADS stream is a fully fledged
> > inode that can point to shared data extents. If you can do data
> > manipulation on a regular inode, you'll be able to do it on an ADS,
> > and that includes copying ADS streams via reflink.
> 
> Is the reflink system call on a file with ADS's atomic, or not?  What
> if there are a million files is ADS hierarchy which is 100
> subdirectories deep in some places, comprising several TB's worth of
> data?  Is that all going to fit in a single XFS transaction?  What if
> you crash in the middle of it?  Is a partially reflinked copy of an
> ADS file OK?  Or a reflinked ADS file missing some portion of the
> alternate data streams?

Oh, and if the answer is that the ADS inodes should be reflinked
individually in userspace, wonderful!  An ADS inode could then just be
a directory, like it was in the NeXT operating system, and copying an
ADS file could *also* be done in userspace, as a cp -r.   :-)

That's fine too, and keeps the file system completely out of it.  :-)

              	    	     	       	    - Ted

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: xattr names for unprivileged stacking?
  2020-09-01 15:14                                                         ` Theodore Y. Ts'o
@ 2020-09-02  5:19                                                           ` Dave Chinner
  0 siblings, 0 replies; 62+ messages in thread
From: Dave Chinner @ 2020-09-02  5:19 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Matthew Wilcox, Miklos Szeredi, Al Viro, Dr. David Alan Gilbert,
	Greg Kurz, linux-fsdevel, Stefan Hajnoczi, Miklos Szeredi,
	Vivek Goyal, Giuseppe Scrivano, Daniel J Walsh, Chirantan Ekbote

On Tue, Sep 01, 2020 at 11:14:53AM -0400, Theodore Y. Ts'o wrote:
> On Tue, Sep 01, 2020 at 10:52:05AM -0400, Theodore Y. Ts'o wrote:
> > On Tue, Sep 01, 2020 at 01:34:05PM +1000, Dave Chinner wrote:
> > > 
> > > But, unlike your implication that this is -really complex and hard
> > > to do-, it's actually relatively trivial to do with the XFS
> > > implementation I mentioned as each ADS stream is a fully fledged
> > > inode that can point to shared data extents. If you can do data
> > > manipulation on a regular inode, you'll be able to do it on an ADS,
> > > and that includes copying ADS streams via reflink.
> > 
> > Is the reflink system call on a file with ADS's atomic, or not?

We can implement as wholly atomic if we want to, yes.

> > What
> > if there are a million files is ADS hierarchy which is 100
> > subdirectories deep in some places, comprising several TB's worth of
> > data?  Is that all going to fit in a single XFS transaction?

No, but we solved that "unbound transaction size" problem years ago
for reflink and reverse mapping updates. We have constructs like
intents and rolling atomic transactions to allow largely unbound
modifications to run without being limited by the maximum size
of a single transaction or even the size of the journal...

> > What if
> > you crash in the middle of it?

Intents track modification progress and allow recovery to restart
from exactly where the journal says the operation had got to....

> > Is a partially reflinked copy of an
> > ADS file OK?

Yes.

> > Or a reflinked ADS file missing some portion of the
> > alternate data streams?

I don't know what this is refering to.

> Oh, and if the answer is that the ADS inodes should be reflinked
> individually in userspace, wonderful!

You can do that too, if you want - FI_CLONE/clone_file_range work on
open file descriptors, not file names, so if you can represent an
ADS as a file descriptor, you can clone it, dedupe it, etc, just
like you can with an other user data file.

> An ADS inode could then just be
> a directory, like it was in the NeXT operating system, and copying an
> ADS file could *also* be done in userspace, as a cp -r.   :-)

And then you lose all the atomicity and recoverability of reflink
copies. So, not they are not equivalent, and demonstrate where the
limits of representing ADS as "directories in a file" reduce the
ability to atomically manipulate ADS as group of objects attached to
a file...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2020-09-02  5:19 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-28 10:55 xattr names for unprivileged stacking? Dr. David Alan Gilbert
2020-07-28 13:08 ` Greg Kurz
2020-07-28 13:55   ` Christian Schoenebeck
2020-08-04 11:28     ` Dr. David Alan Gilbert
2020-08-04 13:51       ` Christian Schoenebeck
2020-08-12 11:18         ` Dr. David Alan Gilbert
2020-08-12 13:34           ` Christian Schoenebeck
2020-08-12 14:33             ` Dr. David Alan Gilbert
2020-08-13  9:01               ` Christian Schoenebeck
2020-08-16 22:56                 ` Dave Chinner
2020-08-16 23:09                   ` Matthew Wilcox
2020-08-17  0:29                     ` Dave Chinner
2020-08-17 10:37                       ` file forks vs. xattr (was: xattr names for unprivileged stacking?) Christian Schoenebeck
2020-08-23 23:40                         ` Dave Chinner
2020-08-24 15:30                           ` Christian Schoenebeck
2020-08-24 20:01                             ` Miklos Szeredi
2020-08-24 21:26                             ` Frank van der Linden
2020-08-24 22:29                             ` Theodore Y. Ts'o
2020-08-25 15:12                               ` Christian Schoenebeck
2020-08-25 15:32                                 ` Miklos Szeredi
2020-08-27 12:02                                   ` Christian Schoenebeck
2020-08-27 12:25                                     ` Matthew Wilcox
2020-08-27 13:48                                       ` Christian Schoenebeck
2020-08-27 14:01                                         ` Matthew Wilcox
2020-08-27 14:23                                           ` Christian Schoenebeck
2020-08-27 14:25                                             ` Matthew Wilcox
2020-08-27 14:44                                             ` Al Viro
2020-08-27 16:29                                               ` Dr. David Alan Gilbert
2020-08-27 16:35                                                 ` Matthew Wilcox
2020-08-28  9:11                                                 ` Christian Schoenebeck
2020-08-28 14:46                                                   ` Theodore Y. Ts'o
2020-08-27 15:22                       ` xattr names for unprivileged stacking? Matthew Wilcox
2020-08-27 22:24                         ` Dave Chinner
2020-08-29 16:07                           ` Matthew Wilcox
2020-08-29 16:13                             ` Al Viro
2020-08-29 17:51                               ` Miklos Szeredi
2020-08-29 18:04                                 ` Al Viro
2020-08-29 18:22                                   ` Christian Schoenebeck
2020-08-29 19:13                                   ` Miklos Szeredi
2020-08-29 19:25                                     ` Al Viro
2020-08-30 19:05                                       ` Miklos Szeredi
2020-08-30 19:10                                         ` Matthew Wilcox
2020-08-31  7:34                                           ` Miklos Szeredi
2020-08-31 11:37                                             ` Matthew Wilcox
2020-08-31 11:51                                               ` Miklos Szeredi
2020-08-31 13:23                                                 ` Matthew Wilcox
2020-08-31 14:21                                                   ` Miklos Szeredi
2020-08-31 14:25                                                   ` Theodore Y. Ts'o
2020-08-31 14:45                                                     ` Matthew Wilcox
2020-08-31 14:49                                                       ` Miklos Szeredi
2020-09-01  3:34                                                     ` Dave Chinner
2020-09-01 14:52                                                       ` Theodore Y. Ts'o
2020-09-01 15:14                                                         ` Theodore Y. Ts'o
2020-09-02  5:19                                                           ` Dave Chinner
2020-08-31 18:02                                                   ` Andreas Dilger
2020-09-01  3:48                                                     ` Dave Chinner
2020-08-29 19:17                               ` Matthew Wilcox
2020-08-29 19:40                                 ` Al Viro
2020-08-29 20:12                                   ` Matthew Wilcox
2020-08-31 14:23                                     ` Theodore Y. Ts'o
2020-08-31 14:40                                       ` Matthew Wilcox
2020-08-31 16:11                                       ` Christian Schoenebeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).