All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Christian Schoenebeck <qemu_oss@crudebyte.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Greg Kurz <groug@kaod.org>,
	linux-fsdevel@vger.kernel.org, stefanha@redhat.com,
	mszeredi@redhat.com, vgoyal@redhat.com, gscrivan@redhat.com,
	dwalsh@redhat.com, chirantan@chromium.org
Subject: Re: file forks vs. xattr (was: xattr names for unprivileged stacking?)
Date: Mon, 24 Aug 2020 09:40:06 +1000	[thread overview]
Message-ID: <20200823234006.GD7728@dread.disaster.area> (raw)
In-Reply-To: <2859814.QYyEAd97eH@silver>

On Mon, Aug 17, 2020 at 12:37:17PM +0200, Christian Schoenebeck wrote:
> On Montag, 17. August 2020 00:56:20 CEST Dave Chinner wrote:
> > > That's yet another question: should xattrs and forks share the same data-
> > > and namespace, or rather be orthogonal to each other.
> > 
> > Completely orthogonal. Alternate data streams are not xattrs, and
> > xattrs are not ADS....
> 
> Agreed. Their key features (atomic small data vs. non-atomic large data) and 
> their typical uses cases are probably too different for trying to stitch them 
> somehow in an erroneous way into a shared space. Plus it would actually be 
> beneficial if forks had their own xattrs.
> 
> On Montag, 17. August 2020 02:29:30 CEST Dave Chinner wrote:
> > I'd stop calling these "forks" already, too. The user wants
> > "alternate data streams", while a "resource fork" is an internal
> > filesystem implementation detail used to provide ADS
> > functionality...
> 
> The common terminology can certainly still be argued. I understand that from 
> fs implementation perspective "fork" is probably ambiguous. But from public 
> API (i.e. user space side) perspective the term "fork" does make sense, and so 
> far I have not seen a better general term for this. Plus the ambiguous aspects 
> on fs side are not exposed to the public side.
> 
> The term "alternate data stream" suggests that this is just about the raw data 
> stream, but that's probably not what this feature will end up being limited 
> to. E.g. I think they will have their own permissions on the long term (see 
> below). Plus the term ADS is ATM somewhat sticky to the Microsoft universe.

ADS is the windows term, which is where the majority of people who
use or want to ADS come from. Novell called the "multiple data
streams", and solaris 9 implemented "extended attributes" (ADS)
using inode forks. Apple allows a "data fork" (user data), "resource
forks" (ADS) and now "named forks" which they then used to implement
extended attributes.  Not the solaris ones, the linux style fixed
length key-value xattrs.

Quite frankly, the naming in this area is a complete and utter mess,
and the only clear, unabiguous name for this feature is "alternate
data streams". I don't care that it's something that comes from an
MS background - if your only argument against it is "Microsoft!"
then you're on pretty shakey ground...

> > IOWs, with a filesystem inode fork implementation like this for ADS,
> > all we really need is for the VFS to pass a magic command to
> > ->lookup() to tell us to use the ADS namespace attached to the inode
> > rather than use the primary inode type/state to perform the
> > operation.
> 
> IMO starting with a minimalistic approach, in a way Solaris developers 
> originally introduced forks, would IMO make sense for Linux as well:

<snip>

That's pretty much what the proposed O_ALT did, except it used a
fully qualified path name to define the ADS to open.

> - No subforks as starting point, and hence path separator '/' inside fork 
>   names would be prohibited initially to avoid future clashes.

Can't do that - changing the behaviour of the ADS name handling is
effectively an on-disk filesystem format change. i.e. if we allow it
in future kernels, then we have to mark the filesystem as "/" being
valid so that older kernels and repair utilities won't consider this
as invalid/corrupt and trash the ADS associated with the name.

IOWs, we either support it from the start, or we never support it.

> > Hence all the ADS support infrastructure is essentially dentry cache
> > infrastructure allowing a dentry to be both a file and directory,
> > and providing the pathname resolution that recognises an ADS
> > redirection. Name that however you want - we've got to do an on-disk
> > format change to support ADS, so we can tell the VFS we support ADS
> > or not. And we have no cares about existing names in the filesystem
> > conflicting with the ADS pathname identifier because it's a mkfs
> > time decision. Given that special flags are needed for the openat()
> > call to resolve an ADS (e.g. O_ALT), we know if we should parse the
> > ADS identifier as an ADS the moment it is seen...
> 
> So you think there should be a built-in full qualified path name resolution to 
> forks right from the start? E.g. like on Windows "C:\some\where\sheet.pdf:foo" 
> -> fork "foo" of file "sheet.pdf"?

No. I really don't care how the user interface works. That's for
people who write the syscalls to argue about.

What I was describing is how the internal kernel implementation -
the interaction between the VFS and the filesystem - needs to work.
ADS needs to be supported in some way by the VFS; if ADS are going
to be seekable user data files, then they have to be implemented as
path/dentry/inode tuples that a struct file can point to. IOWs,
internally they need to be seen as first class VFS citizens, and the
VFS needs mechanisms to tell the filesystem to look up the ADS
namespace rather than the inode itself....

> > > I don't understand why a fork would be permitted to have its own
> > > permissions.  That makes no sense.  Silly Solaris.
> > 
> > I can't think of a reason why, either, but the above implementation
> > for XFS would support it if the presentation layer allows it... :)
> 
> I would definitely not add this right from the start of course, but on the 
> long term it actually does make senses for them having their own permissions, 
> simply because there are already applications for that:
> 
> E.g. on some systems forks are used to tag files for security relevant issues, 
> for instance where the file originated from (a trusted vs. untrusted source). 

Key-value data like is what the security xattr namespace is for, not
ADS....

> If it was a untrusted source, the user is made aware about this circumstance 
> by the system when attempting to open the file. In this use case the fork 
> would probably have more restrictive permissions than the actual file.

That requires opening the user data fork to walk the ADS to find
key-value pairs that tell it it must not open the file.  We already
have infrastructure for this sort of thing via LSMs that store their
own private key-value data in the security xattrs namespace that
users can't modify. If you have security permission data that is
larger than can be stored in an xattr, then you've got bigger
problems than a lack of ADS.

OTOH, storing the merkle tree data for fsverity would be a perfect
use for a hidden ADS stream that the user cannot see or modify. The
current fsverity implementation is a nasty hack that stores the
merkle tree data in the same file but hides it beyond EOF so that
only the kernel can access it directly. That only works for a single
non-user data stream, though, so if we wanted more file-offset based
integrity or security data, we've got nowhere to put it.

IOWs, now that I think about it, we should be allowing non-user
per-ADS permissions to be set right from the start because I can
think of several filesystem/kernel internal features that could make
use of such functionality that we would want to remain hidden from
users.

> OTOH forks are used to extend existing files in non-obtrusive way. Say you 
> have some sort of (e.g. huge) master file, and a team works on that file. Then 
> the individual people would attach their changes solely as forks to the master 
> file with their ownership, probably even with complex ACLs, to prevent certain 
> users from touching (or even reading) other ones changes. In this use case the 
> master file might be readonly for most people, while the individual forks 
> being anywhere between more permissive or more restrictive.

You're demonstrating the exact reasons why ADS have traditionally
been considered harmful by Linux developers.  You can do all that
with normal directories and files - you do not need ADS to implement
a fully functional multi-user content management system.

ADS does not make constructs like this simpler or easier for
applications to implement or manage. e.g. If you use traditional
directories and files, you don't need to modify backup applications
and file manipulation tools to correctly copy such constructs....

Keep in mind that you are not going to get universal support for ADS
any time soon as most filesystems will require on-disk format
changes to support them. Further, you are goign to have to wait for
the entire OS ecosystem to grow support for ADS (e.g. cp, tar,
rsync, file, etc) before you can actually use it sanely in
production systems. Even if we implement kernel support right now,
it will be years before it will be widely available and supported at
an OS/distro level...

IOWs, applications that want to do "ADS-like" stuff are going to
have to be written for the lowest common denominator (i.e. no ADS
support at all) for a long time yet.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2020-08-23 23:40 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28 10:55 xattr names for unprivileged stacking? Dr. David Alan Gilbert
2020-07-28 13:08 ` Greg Kurz
2020-07-28 13:55   ` Christian Schoenebeck
2020-08-04 11:28     ` Dr. David Alan Gilbert
2020-08-04 13:51       ` Christian Schoenebeck
2020-08-12 11:18         ` Dr. David Alan Gilbert
2020-08-12 13:34           ` Christian Schoenebeck
2020-08-12 14:33             ` Dr. David Alan Gilbert
2020-08-13  9:01               ` Christian Schoenebeck
2020-08-16 22:56                 ` Dave Chinner
2020-08-16 23:09                   ` Matthew Wilcox
2020-08-17  0:29                     ` Dave Chinner
2020-08-17 10:37                       ` file forks vs. xattr (was: xattr names for unprivileged stacking?) Christian Schoenebeck
2020-08-23 23:40                         ` Dave Chinner [this message]
2020-08-24 15:30                           ` Christian Schoenebeck
2020-08-24 20:01                             ` Miklos Szeredi
2020-08-24 21:26                             ` Frank van der Linden
2020-08-24 22:29                             ` Theodore Y. Ts'o
2020-08-25 15:12                               ` Christian Schoenebeck
2020-08-25 15:32                                 ` Miklos Szeredi
2020-08-27 12:02                                   ` Christian Schoenebeck
2020-08-27 12:25                                     ` Matthew Wilcox
2020-08-27 13:48                                       ` Christian Schoenebeck
2020-08-27 14:01                                         ` Matthew Wilcox
2020-08-27 14:23                                           ` Christian Schoenebeck
2020-08-27 14:25                                             ` Matthew Wilcox
2020-08-27 14:44                                             ` Al Viro
2020-08-27 16:29                                               ` Dr. David Alan Gilbert
2020-08-27 16:35                                                 ` Matthew Wilcox
2020-08-28  9:11                                                 ` Christian Schoenebeck
2020-08-28 14:46                                                   ` Theodore Y. Ts'o
2020-08-27 15:22                       ` xattr names for unprivileged stacking? Matthew Wilcox
2020-08-27 22:24                         ` Dave Chinner
2020-08-29 16:07                           ` Matthew Wilcox
2020-08-29 16:13                             ` Al Viro
2020-08-29 17:51                               ` Miklos Szeredi
2020-08-29 18:04                                 ` Al Viro
2020-08-29 18:22                                   ` Christian Schoenebeck
2020-08-29 19:13                                   ` Miklos Szeredi
2020-08-29 19:25                                     ` Al Viro
2020-08-30 19:05                                       ` Miklos Szeredi
2020-08-30 19:10                                         ` Matthew Wilcox
2020-08-31  7:34                                           ` Miklos Szeredi
2020-08-31 11:37                                             ` Matthew Wilcox
2020-08-31 11:51                                               ` Miklos Szeredi
2020-08-31 13:23                                                 ` Matthew Wilcox
2020-08-31 14:21                                                   ` Miklos Szeredi
2020-08-31 14:25                                                   ` Theodore Y. Ts'o
2020-08-31 14:45                                                     ` Matthew Wilcox
2020-08-31 14:49                                                       ` Miklos Szeredi
2020-09-01  3:34                                                     ` Dave Chinner
2020-09-01 14:52                                                       ` Theodore Y. Ts'o
2020-09-01 15:14                                                         ` Theodore Y. Ts'o
2020-09-02  5:19                                                           ` Dave Chinner
2020-08-31 18:02                                                   ` Andreas Dilger
2020-09-01  3:48                                                     ` Dave Chinner
2020-08-29 19:17                               ` Matthew Wilcox
2020-08-29 19:40                                 ` Al Viro
2020-08-29 20:12                                   ` Matthew Wilcox
2020-08-31 14:23                                     ` Theodore Y. Ts'o
2020-08-31 14:40                                       ` Matthew Wilcox
2020-08-31 16:11                                       ` Christian Schoenebeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200823234006.GD7728@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=chirantan@chromium.org \
    --cc=dgilbert@redhat.com \
    --cc=dwalsh@redhat.com \
    --cc=groug@kaod.org \
    --cc=gscrivan@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mszeredi@redhat.com \
    --cc=qemu_oss@crudebyte.com \
    --cc=stefanha@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.