linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Proposal: A new fs-verity interface
@ 2019-01-10  5:15 Theodore Y. Ts'o
  2019-01-10  5:15 ` Theodore Y. Ts'o
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Theodore Y. Ts'o @ 2019-01-10  5:15 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Dave Chinner, Darrick J. Wong,
	Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4,
	linux-f2fs-devel

The following approach is based in Darrick's suggestion:

int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg);

struct fsverity_arg {
       int fsv_donor_fd;
       u64 fsv_offset;
       u64 fsv_size;
};

fsv_offset and fsz_size must be a multiple of the file system block
size.  If the ioctl comples successfully, as a side effect the
donor_fd will have a hole punch operation on the specified range.  In
other words, the equivalent of operation of fallocate(fsv_donor_fd,
FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by
fd will be protected using fsverity.

It will be legal for fsv_donor_fd == fd, so this interface is a
superset of the original FS_IOC_ENABLE_VERITY ioctl.

This will hopefully make Christoph and Dave happy because the
interface does not presuppose how ext4 and f2fs will implement
fsverity behind the scenes.  However, it does not forbid it, and the
net cost is that ext4 and f2fs will have to implement code which
transplants the blocks from the donor_fd to fd in the case where
donor_fd != fd --- and in the case where blocks are encrypted using
fscrypt, we will have to decrypt the blocks from donor_fd and possibly
re-encrypt then in fd's per-file key, which means we'll have to add
extra complexity to implement the decrypt and re-encrypt passing
through the page cache.

But if this helps resolve Christoph and Dave's objections, it
shouldn't be _too_ much extra complexity.  Before we go ahead an
implement it, though, I'd appreciate a confirmation that this will
indeed actually resolve their complaints.

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Proposal: A new fs-verity interface
  2019-01-10  5:15 Proposal: A new fs-verity interface Theodore Y. Ts'o
@ 2019-01-10  5:15 ` Theodore Y. Ts'o
  2019-01-10 18:18 ` Darrick J. Wong
  2019-01-14 23:41 ` Dave Chinner
  2 siblings, 0 replies; 13+ messages in thread
From: Theodore Y. Ts'o @ 2019-01-10  5:15 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig, Dave Chinner, Darrick J. Wong,
	Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4,
	linux-f2fs-devel

The following approach is based in Darrick's suggestion:

int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg);

struct fsverity_arg {
       int fsv_donor_fd;
       u64 fsv_offset;
       u64 fsv_size;
};

fsv_offset and fsz_size must be a multiple of the file system block
size.  If the ioctl comples successfully, as a side effect the
donor_fd will have a hole punch operation on the specified range.  In
other words, the equivalent of operation of fallocate(fsv_donor_fd,
FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by
fd will be protected using fsverity.

It will be legal for fsv_donor_fd == fd, so this interface is a
superset of the original FS_IOC_ENABLE_VERITY ioctl.

This will hopefully make Christoph and Dave happy because the
interface does not presuppose how ext4 and f2fs will implement
fsverity behind the scenes.  However, it does not forbid it, and the
net cost is that ext4 and f2fs will have to implement code which
transplants the blocks from the donor_fd to fd in the case where
donor_fd != fd --- and in the case where blocks are encrypted using
fscrypt, we will have to decrypt the blocks from donor_fd and possibly
re-encrypt then in fd's per-file key, which means we'll have to add
extra complexity to implement the decrypt and re-encrypt passing
through the page cache.

But if this helps resolve Christoph and Dave's objections, it
shouldn't be _too_ much extra complexity.  Before we go ahead an
implement it, though, I'd appreciate a confirmation that this will
indeed actually resolve their complaints.

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Proposal: A new fs-verity interface
  2019-01-10  5:15 Proposal: A new fs-verity interface Theodore Y. Ts'o
  2019-01-10  5:15 ` Theodore Y. Ts'o
@ 2019-01-10 18:18 ` Darrick J. Wong
  2019-01-10 18:18   ` Darrick J. Wong
  2019-01-14 23:41 ` Dave Chinner
  2 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2019-01-10 18:18 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Eric Biggers, Dave Chinner, linux-f2fs-devel, Christoph Hellwig,
	linux-fscrypt, linux-fsdevel, linux-ext4, Linus Torvalds

On Thu, Jan 10, 2019 at 12:15:00AM -0500, Theodore Y. Ts'o wrote:
> The following approach is based in Darrick's suggestion:
> 
> int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg);
> 
> struct fsverity_arg {
>        int fsv_donor_fd;

Explicitly sized fields and padding here, please.  ISTR there are a few
arches that don't have alignment requirements which will make this
messy.

>        u64 fsv_offset;
>        u64 fsv_size;

You might want to allocate some reserved space for flags in case you
ever decide you need it, but otherwise it seems fine to me...

--D

> };
> 
> fsv_offset and fsz_size must be a multiple of the file system block
> size.  If the ioctl comples successfully, as a side effect the
> donor_fd will have a hole punch operation on the specified range.  In
> other words, the equivalent of operation of fallocate(fsv_donor_fd,
> FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by
> fd will be protected using fsverity.
> 
> It will be legal for fsv_donor_fd == fd, so this interface is a
> superset of the original FS_IOC_ENABLE_VERITY ioctl.
> 
> This will hopefully make Christoph and Dave happy because the
> interface does not presuppose how ext4 and f2fs will implement
> fsverity behind the scenes.  However, it does not forbid it, and the
> net cost is that ext4 and f2fs will have to implement code which
> transplants the blocks from the donor_fd to fd in the case where
> donor_fd != fd --- and in the case where blocks are encrypted using
> fscrypt, we will have to decrypt the blocks from donor_fd and possibly
> re-encrypt then in fd's per-file key, which means we'll have to add
> extra complexity to implement the decrypt and re-encrypt passing
> through the page cache.
> 
> But if this helps resolve Christoph and Dave's objections, it
> shouldn't be _too_ much extra complexity.  Before we go ahead an
> implement it, though, I'd appreciate a confirmation that this will
> indeed actually resolve their complaints.
> 
> Thanks,
> 
> 					- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Proposal: A new fs-verity interface
  2019-01-10 18:18 ` Darrick J. Wong
@ 2019-01-10 18:18   ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-01-10 18:18 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Christoph Hellwig, Dave Chinner, Eric Biggers,
	linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel

On Thu, Jan 10, 2019 at 12:15:00AM -0500, Theodore Y. Ts'o wrote:
> The following approach is based in Darrick's suggestion:
> 
> int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg);
> 
> struct fsverity_arg {
>        int fsv_donor_fd;

Explicitly sized fields and padding here, please.  ISTR there are a few
arches that don't have alignment requirements which will make this
messy.

>        u64 fsv_offset;
>        u64 fsv_size;

You might want to allocate some reserved space for flags in case you
ever decide you need it, but otherwise it seems fine to me...

--D

> };
> 
> fsv_offset and fsz_size must be a multiple of the file system block
> size.  If the ioctl comples successfully, as a side effect the
> donor_fd will have a hole punch operation on the specified range.  In
> other words, the equivalent of operation of fallocate(fsv_donor_fd,
> FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by
> fd will be protected using fsverity.
> 
> It will be legal for fsv_donor_fd == fd, so this interface is a
> superset of the original FS_IOC_ENABLE_VERITY ioctl.
> 
> This will hopefully make Christoph and Dave happy because the
> interface does not presuppose how ext4 and f2fs will implement
> fsverity behind the scenes.  However, it does not forbid it, and the
> net cost is that ext4 and f2fs will have to implement code which
> transplants the blocks from the donor_fd to fd in the case where
> donor_fd != fd --- and in the case where blocks are encrypted using
> fscrypt, we will have to decrypt the blocks from donor_fd and possibly
> re-encrypt then in fd's per-file key, which means we'll have to add
> extra complexity to implement the decrypt and re-encrypt passing
> through the page cache.
> 
> But if this helps resolve Christoph and Dave's objections, it
> shouldn't be _too_ much extra complexity.  Before we go ahead an
> implement it, though, I'd appreciate a confirmation that this will
> indeed actually resolve their complaints.
> 
> Thanks,
> 
> 					- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Proposal: A new fs-verity interface
  2019-01-10  5:15 Proposal: A new fs-verity interface Theodore Y. Ts'o
  2019-01-10  5:15 ` Theodore Y. Ts'o
  2019-01-10 18:18 ` Darrick J. Wong
@ 2019-01-14 23:41 ` Dave Chinner
  2019-01-14 23:41   ` Dave Chinner
  2019-01-23  5:10   ` Theodore Y. Ts'o
  2 siblings, 2 replies; 13+ messages in thread
From: Dave Chinner @ 2019-01-14 23:41 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Eric Biggers, Darrick J. Wong, linux-f2fs-devel,
	Christoph Hellwig, linux-fscrypt, linux-fsdevel, linux-ext4,
	Linus Torvalds

On Thu, Jan 10, 2019 at 12:15:00AM -0500, Theodore Y. Ts'o wrote:
> The following approach is based in Darrick's suggestion:

I do not recall what that was, so this:

> int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg);
> 
> struct fsverity_arg {
>        int fsv_donor_fd;
>        u64 fsv_offset;
>        u64 fsv_size;
> };
> 
> fsv_offset and fsz_size must be a multiple of the file system block
> size.  If the ioctl comples successfully, as a side effect the
> donor_fd will have a hole punch operation on the specified range.  In
> other words, the equivalent of operation of fallocate(fsv_donor_fd,
> FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by
> fd will be protected using fsverity.

makes no sense to me. What's in {offset, size} and why do you need
to call this on that specific range? If it is the equivalent of a
hole punch, then why wouldn't you just use FALLOC_FL_PUNCH_HOLE?

Can you please write the man page for the interface so that the
description of what it does and how it should be used is crystal
clear and doesn't assume the reader knows "what darrick proposed"...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Proposal: A new fs-verity interface
  2019-01-14 23:41 ` Dave Chinner
@ 2019-01-14 23:41   ` Dave Chinner
  2019-01-23  5:10   ` Theodore Y. Ts'o
  1 sibling, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2019-01-14 23:41 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Christoph Hellwig, Darrick J. Wong, Eric Biggers,
	linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel

On Thu, Jan 10, 2019 at 12:15:00AM -0500, Theodore Y. Ts'o wrote:
> The following approach is based in Darrick's suggestion:

I do not recall what that was, so this:

> int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg);
> 
> struct fsverity_arg {
>        int fsv_donor_fd;
>        u64 fsv_offset;
>        u64 fsv_size;
> };
> 
> fsv_offset and fsz_size must be a multiple of the file system block
> size.  If the ioctl comples successfully, as a side effect the
> donor_fd will have a hole punch operation on the specified range.  In
> other words, the equivalent of operation of fallocate(fsv_donor_fd,
> FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by
> fd will be protected using fsverity.

makes no sense to me. What's in {offset, size} and why do you need
to call this on that specific range? If it is the equivalent of a
hole punch, then why wouldn't you just use FALLOC_FL_PUNCH_HOLE?

Can you please write the man page for the interface so that the
description of what it does and how it should be used is crystal
clear and doesn't assume the reader knows "what darrick proposed"...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Proposal: A new fs-verity interface
  2019-01-14 23:41 ` Dave Chinner
  2019-01-14 23:41   ` Dave Chinner
@ 2019-01-23  5:10   ` Theodore Y. Ts'o
  2019-01-24 21:25     ` Dave Chinner
  1 sibling, 1 reply; 13+ messages in thread
From: Theodore Y. Ts'o @ 2019-01-23  5:10 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Christoph Hellwig, Darrick J. Wong, Eric Biggers,
	linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel

Apologies for the delay in getting responding; I've been on vacation
last week.

On Tue, Jan 15, 2019 at 10:41:01AM +1100, Dave Chinner wrote:
> > int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg);
> > 
> > struct fsverity_arg {
> >        int fsv_donor_fd;
> >        u64 fsv_offset;
> >        u64 fsv_size;
> > };
> > 
> > fsv_offset and fsz_size must be a multiple of the file system block
> > size.  If the ioctl comples successfully, as a side effect the
> > donor_fd will have a hole punch operation on the specified range.  In
> > other words, the equivalent of operation of fallocate(fsv_donor_fd,
> > FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by
> > fd will be protected using fsverity.
> 
> makes no sense to me. What's in {offset, size} and why do you need
> to call this on that specific range? If it is the equivalent of a
> hole punch, then why wouldn't you just use FALLOC_FL_PUNCH_HOLE?

This is how we transfer the metadata (e.g., the Merkle tree plus a
header block) from userspace to the file system.  In the original
interface, the verity metadata was appended end of the data file, and
then an ioctl would transfer the metadata to the file system.  For
ext4 and f2fs, we actually store the fsverity metadata after the data
blocks and simply adjust i_size visible to userspace so the metadata
blocks are "invisible".

You objected to this interface, because it was too tightly tied to the
implementation used in ext4 and f2fs.  So what Darrick suggested is
that it could be conveyed by passing the metadata via a second file
descriptor.  The offset and size specifies where the metadata can be
fetched from the second file.

As in the original interface, the metadata blocks are "consumed" by
the ioctl, so they disappear from the second file, as if
FALLOC_FL_PUNCH_HOLE.  It's not that they are deallocated; the blocks
in question are simply getting transferred to inode which is being
protected using fs-verity.

> Can you please write the man page for the interface so that the
> description of what it does and how it should be used is crystal
> clear and doesn't assume the reader knows "what darrick proposed"...

I can do that, but I'd like to make sure we have general agreement
approach first.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Proposal: A new fs-verity interface
  2019-01-23  5:10   ` Theodore Y. Ts'o
@ 2019-01-24 21:25     ` Dave Chinner
  2019-01-24 21:40       ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2019-01-24 21:25 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Christoph Hellwig, Darrick J. Wong, Eric Biggers,
	linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel

On Wed, Jan 23, 2019 at 12:10:17AM -0500, Theodore Y. Ts'o wrote:
> Apologies for the delay in getting responding; I've been on vacation
> last week.
> 
> On Tue, Jan 15, 2019 at 10:41:01AM +1100, Dave Chinner wrote:
> > > int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg);
> > > 
> > > struct fsverity_arg {
> > >        int fsv_donor_fd;
> > >        u64 fsv_offset;
> > >        u64 fsv_size;
> > > };
> > > 
> > > fsv_offset and fsz_size must be a multiple of the file system block
> > > size.  If the ioctl comples successfully, as a side effect the
> > > donor_fd will have a hole punch operation on the specified range.  In
> > > other words, the equivalent of operation of fallocate(fsv_donor_fd,
> > > FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by
> > > fd will be protected using fsverity.
> > 
> > makes no sense to me. What's in {offset, size} and why do you need
> > to call this on that specific range? If it is the equivalent of a
> > hole punch, then why wouldn't you just use FALLOC_FL_PUNCH_HOLE?
> 
> This is how we transfer the metadata (e.g., the Merkle tree plus a
> header block) from userspace to the file system. 


> In the original
> interface, the verity metadata was appended end of the data file, and
> then an ioctl would transfer the metadata to the file system.

So how is this different? You still put the data into a file,
then tell the kernel what file it belongs to, and then the
filesystem does "magic" and *consumes* that part of the file?

That's what I'm asking here - what are the semantics of this
ioctl(). I'mnot asking for justification or a unnecessary history
lesson. I'm asking you to describe the behaviour and interface
precisely.

> For
> ext4 and f2fs, we actually store the fsverity metadata after the data
> blocks and simply adjust i_size visible to userspace so the metadata
> blocks are "invisible".

Yes, I know how it's been implemented in ext4 and f2fs. I don't
care about that - I care about what how this new interface works.

> You objected to this interface, because it was too tightly tied to the
> implementation used in ext4 and f2fs.  So what Darrick suggested is
> that it could be conveyed by passing the metadata via a second file
> descriptor.  The offset and size specifies where the metadata can be
> fetched from the second file.

So it's still "file data" that the filesystem has to magically
transform into hidden metadata in some manner. What writes the
merkel tree into a file, and why can't it just write it direct to
the filesystem rather than having to use an intermediate file?

i.e. the proposed API still seems to be bleeding "merkel tree is
held in file data" implementation artifacts through it. That
you can still append the merkle tree to the target file and
then use the hack "donor_fd = fd" to tell it that is where the
merkel tree data is stored seems like a convenient API hack for the
existing implementation, rather than a well thought through, generic
API.

> As in the original interface, the metadata blocks are "consumed" by
> the ioctl, so they disappear from the second file, as if
> FALLOC_FL_PUNCH_HOLE.  It's not that they are deallocated; the blocks
> in question are simply getting transferred to inode which is being
> protected using fs-verity.

But it's not a "hole punch" in your proposal and current
implementation because a) the merkle tree blocks are istill
accounted as user data on the inode the file and hence visible to
userspace (e.g. via stat()) and b) hole punch does not change the
file size. i.e. FALLOC_FL_PUNCH_HOLE is not an equivalent operation
for the behaviour you are talking about, and the blocks aren't
actually "consumed" because they are still accounted to the same
file.

IOWs, "du -sh" now reports a significant mismatch between file sizes
and space consumed by the files. We know from experience that this
will confuse users and they'll think it's a bug:

http://xfs.org/index.php/XFS_FAQ#Q:_Why_do_files_on_XFS_use_more_data_blocks_than_expected.3F

So even though Darrick's proposal is /better/ than just appending
data and flipping the inode size to hide it, this API definition
still allows implementation details specific to "storing merkle tree
as user data beyond EOF" implementations right through it.


> 
> > Can you please write the man page for the interface so that the
> > description of what it does and how it should be used is crystal
> > clear and doesn't assume the reader knows "what darrick proposed"...
> 
> I can do that, but I'd like to make sure we have general agreement
> approach first.

This is why we suck so badly at designing new interfaces. We have to
have "agreement" before anyone even tries to flesh out the full
interface and document all aspects of it. But then we get
"agreement" and the next thing that happens is an implementation
lands in the tree and then we find later on it's full of bugs
because "agreement" doesn't mean ithe API was "fully documented,
understood, reviewed and tested".

We need to change the way we design user APIs - if you don't have a
fully documented proposal (i.e. a man page) that describes the
parameters, the behaviour of the API, the implementation
constraints and requirements, the errors that can be returned, etc,
then it isn't an API proposal we can actually review properly.

So, please, if this is the way you want to go, please document the
API properly.

However, I still have issues with the assumption that the user
application generating the merkel tree must first write the fsverity
information to a user data file before the merkel tree is passed to
the filesystem for permanent storage? Why doesn't it write the
merkel tree directly to the kernel so the filesystem can store it
where it wants to store it on the fly?

Linus wanted the fsverity information to be injected at the VFS so
it's the same for everyone, and that implies that the fsverity
information needs to be treated as a data stream rather than a file
in some filesystem. People have suggested an ADS (alternate data
stream) for this, but that's crazy talk. All we need is a file/fd
flag to say we are "writing fsverity information" so the kernel does
the right thing. e.g.

	vfd = openat(AT_FDCWD, filename, O_APPEND|O_WRONLY);
	flags = fcntl(vfd, F_GETFL, NULL);
	fcntl(vfd, F_SETFL, flags | O_FSVERITY);
	while (....) {
		write(vfd, buf, len);
	}

IOWs, we can communicate to the kernel that this data stream is
special "fs verity" metadata, hence allowing the data that to be
written to where-ever the filesystem stores the merkel tree.
Permissions will be required to set this flag.

(Discussion point: maybe should be a FD_ flag that forces FD_CLOEXEC
so that child processes can't write fsverity information)

From the tagged data stream,  we can implement filesystem indepedent
storage for the merkel tree data - it doesn't have be be written by
the filesystem at all. e.g.  the VFS could cut the incoming buffer
up into fsverity headers and 64kB data chunks and store them in
separate named xattrs. The filesystem would be none-the-wiser that
there is even fsverity data on the file - it would just work on all
filesystems. You don't even need flags in the inode to tell the fs
that it is protected by fsverity - the presence of the fsverity
xattrs is sufficient to determine that.

(Discussion point: use separate iovecs for fsverity headers and
opaque data chunks w/ pwritev2(RWF_FSVERITY|RWF_APPEND) as a defined
user/kernel fsverity format instead of just write calls which the
kernel then interprets)

This is largely impossible with the "store the merkel tree data in a
file, then use a special ioctl to swizzle it into place" API because
the filesystem has to know the on-disk format of the fsverity
information to be able to break it up sanely into xattrs or some
other internal storage type. The filesystem also needs on disk
format changes to support non-xattr format fsverity storage, while a
"VFS layer translates to xattrs" approach can be done without
needing any changes to any filesystem. It also makes the fsverity
functionality available to network filesystems as well...

IOWs, it seems to me that the API should be based around writing
fsverity data into the kernel as a defined data stream, not an API
that does "something" with opaque data in a file from
"somewhere" and assumes the filesystem will handle it all....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Proposal: A new fs-verity interface
  2019-01-24 21:25     ` Dave Chinner
@ 2019-01-24 21:40       ` Linus Torvalds
  2019-01-24 23:22         ` Theodore Y. Ts'o
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2019-01-24 21:40 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Theodore Y. Ts'o, Christoph Hellwig, Darrick J. Wong,
	Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4,
	linux-f2fs-devel

On Fri, Jan 25, 2019 at 10:25 AM Dave Chinner <david@fromorbit.com> wrote:
>
> So it's still "file data" that the filesystem has to magically
> transform into hidden metadata in some manner. What writes the
> merkel tree into a file, and why can't it just write it direct to
> the filesystem rather than having to use an intermediate file?

This does frustrate me too.

I _assume_ (but it's exactly that - just an assumption) this whole
design decision comes from basically having a transport layer that is
entirely unaware of the merle data, so the data  is brought in some
entirely traditional way that can only transfer regular file contents
(ie tar/zip/ar kind of thing, but presumably actually just in the form
of an android APK). And then the new interface is just a way to
"convert" that into the actual final security model.

One thing that is also unclear to me is whether that "secure" model
needs to be stable on disk (ie is this considered an actual write that
*modifies* the underlying filesystem, and the merkle tree data ends up
being associated long-term and over reboots), or whether it would be
acceptable to just have it be a temporary "view" of the file where the
filesystem itself can be read-only, and all that happens is that now
the merkle tree is associated with that file as long as the filesystem
is mounted (or until it is disassociated).

Maybe this was answered in some of the earlier email threads that (at
least for me) were then somewhat overshadowed by the merge window work
and the holidays. So it's possible that I repeat myself. But I do have
to say that I think I'd *still* prefer this to be something more like
an xattr, and that maybe we'd be better off actually improving out
"write to xattr" interface or something.

I understand that you don't want to load the whole merkle tree into
memory, and that is the reason that you want to point to some "stable
on disk" area, but the hole punching does seem to be a particularly
nasty part of it. It would be much better to have the merkle data in
some place where it doesn't then need to be hidden again, no?

             Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Proposal: A new fs-verity interface
  2019-01-24 21:40       ` Linus Torvalds
@ 2019-01-24 23:22         ` Theodore Y. Ts'o
  2019-01-25  0:32           ` Matthew Wilcox
  2019-01-25  0:35           ` Linus Torvalds
  0 siblings, 2 replies; 13+ messages in thread
From: Theodore Y. Ts'o @ 2019-01-24 23:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Christoph Hellwig, Darrick J. Wong, Eric Biggers,
	linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel

On Fri, Jan 25, 2019 at 10:40:31AM +1300, Linus Torvalds wrote:
> 
> I _assume_ (but it's exactly that - just an assumption) this whole
> design decision comes from basically having a transport layer that is
> entirely unaware of the merle data, so the data  is brought in some
> entirely traditional way that can only transfer regular file contents
> (ie tar/zip/ar kind of thing, but presumably actually just in the form
> of an android APK). And then the new interface is just a way to
> "convert" that into the actual final security model.

How the transport layer is going to send the merkle data is really
unrelated (e.g., it's not necessarily going to be at the end of the
file data).

> One thing that is also unclear to me is whether that "secure" model
> needs to be stable on disk (ie is this considered an actual write that
> *modifies* the underlying filesystem, and the merkle tree data ends up
> being associated long-term and over reboots), or whether it would be
> acceptable to just have it be a temporary "view" of the file where the
> filesystem itself can be read-only, and all that happens is that now
> the merkle tree is associated with that file as long as the filesystem
> is mounted (or until it is disassociated).

It's the first.  We need to keep the Merkle tree and associated
metadata information (which might include a PKCS 7 digital signature)
permanently associated with the file.  So it has to be stored in the
file; it's associated metadata.

> Maybe this was answered in some of the earlier email threads that (at
> least for me) were then somewhat overshadowed by the merge window work
> and the holidays. So it's possible that I repeat myself. But I do have
> to say that I think I'd *still* prefer this to be something more like
> an xattr, and that maybe we'd be better off actually improving out
> "write to xattr" interface or something.

The main issue is that for a 129 MB file, the Merkle data is going to
be a Megabyte.  So using a set/get interface, ala our current xattr
interface, seems awkward.  Also, currently for most file systems,
xattrs are limited in size to around 4k to 32k, and most xattrs
relatively small (e.g., SELinux labels, ACL's).  So even if we used
the xattr interface, for many file systems, for something that might
be 1 megabyte (for a 129 MB file to be protected by fs-verity), it
would almost certainly be stored in a different location than other
xattrs.  So similarly, changing our attr interfaces for big blobs,
when the vast majority of xattrs are small ones, doesn't seem to be a
great use of time.

The other thing I'll point out is that file system developers
generally have frowned on using setting xattrs having magic side
effects, since that would mean making the xattr set/get interface
acting more lke an ioctl.  When we make an file to become fs-verity
protected, it does have a side-effect of making the file immutable.
That's not a huge side-effect, but that's another reason where it
feels like the xattr interface seems like the wrong effort.

> I understand that you don't want to load the whole merkle tree into
> memory, and that is the reason that you want to point to some "stable
> on disk" area, but the hole punching does seem to be a particularly
> nasty part of it. It would be much better to have the merkle data in
> some place where it doesn't then need to be hidden again, no?

It's not really a "hole punch", but we are moving the data around.
That's because Dave Chinner and Christoph demanded it.  The original
approach was to put it at the end of the file, and then hide it.  If
the question is "why hide the metadata", it's because it's metadata.
We certainly don't want to make it be visible as part of the file
stream.

We could store the metadata somewhere else --- for example, we could
store it in another inode.  But inodes have overhead, and that would
mean using two inodes for every fs-verity protected files --- and we
don't need all of the other metadata (mtime, ctime, etc.) for the
Merkle tree.  So that's how we got to where we were.  I think the
approach of storing it using the same extent tree where we map logical
block numbers to physical block numbers make a lot of sense for ext4
and f2fs.

It seems that some file system (which may never even implement
fs-verity) their developers hate that particular approach.  So that's
where the suggestion of using a separate file descriptor to convey the
Merkle tree data to the file system came from.  It wasn't my first
choice.

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Proposal: A new fs-verity interface
  2019-01-24 23:22         ` Theodore Y. Ts'o
@ 2019-01-25  0:32           ` Matthew Wilcox
  2019-01-25  0:35           ` Linus Torvalds
  1 sibling, 0 replies; 13+ messages in thread
From: Matthew Wilcox @ 2019-01-25  0:32 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Dave Chinner, Christoph Hellwig, Darrick J. Wong,
	Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4,
	linux-f2fs-devel

On Thu, Jan 24, 2019 at 06:22:37PM -0500, Theodore Y. Ts'o wrote:
> The main issue is that for a 129 MB file, the Merkle data is going to
> be a Megabyte.

127MB ... I pointed out this error the last time the documentation
was posted.

> We could store the metadata somewhere else --- for example, we could
> store it in another inode.  But inodes have overhead, and that would
> mean using two inodes for every fs-verity protected files --- and we
> don't need all of the other metadata (mtime, ctime, etc.) for the
> Merkle tree.  So that's how we got to where we were.  I think the
> approach of storing it using the same extent tree where we map logical
> block numbers to physical block numbers make a lot of sense for ext4
> and f2fs.
> 
> It seems that some file system (which may never even implement
> fs-verity) their developers hate that particular approach.  So that's
> where the suggestion of using a separate file descriptor to convey the
> Merkle tree data to the file system came from.  It wasn't my first
> choice.

I'll reiterate an API I suggested on December 21st:

: verity_fd = ioctl(fd, FS_IOC_VERITY_FD);
: write(verity_fd, &merkle_tree);
: close(verity_fd);
: 
: At final close of that verity_fd, the filesystem behaves in the same way
: that it does on receipt of this FS_IOC_ENABLE_VERITY ioctl today.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Proposal: A new fs-verity interface
  2019-01-24 23:22         ` Theodore Y. Ts'o
  2019-01-25  0:32           ` Matthew Wilcox
@ 2019-01-25  0:35           ` Linus Torvalds
  2019-01-29 15:48             ` Theodore Y. Ts'o
  1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2019-01-25  0:35 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Dave Chinner, Christoph Hellwig, Darrick J. Wong, Eric Biggers,
	linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel

On Fri, Jan 25, 2019 at 12:22 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
>
> > I understand that you don't want to load the whole merkle tree into
> > memory, and that is the reason that you want to point to some "stable
> > on disk" area, but the hole punching does seem to be a particularly
> > nasty part of it. It would be much better to have the merkle data in
> > some place where it doesn't then need to be hidden again, no?
>
> It's not really a "hole punch", but we are moving the data around.
> That's because Dave Chinner and Christoph demanded it.  The original
> approach was to put it at the end of the file, and then hide it.  If
> the question is "why hide the metadata", it's because it's metadata.
> We certainly don't want to make it be visible as part of the file
> stream.

But that's the whole hiding thing. Why do you feel you need to do
that? Why not just leave it alone, and leave it visible, and say "hey,
the merkle data for file X comes from here".

Punching a hole, or moving it somewhere else, or hiding or, or
whatever you want to do, all mean "you have to modify the filesystem".
They also mean "you can't back up the original sanely any more and
move it somewhere else".

Wouldn't it be much nicer to simple have a *separate* stream. If it's
not an xattr, then maybe just a separate file entirely. A file that
just contains the merkle data, and has no alignment things because
there's no "it's at the end of the file that it describes" issues,
it's just a standalone thing that is still visible so that you can
copy the original data and it's still valid (even if the merkle tree
then isn't _enabled_ in the copy, but now you could re-enable it in a
new location).

> We could store the metadata somewhere else --- for example, we could
> store it in another inode.  But inodes have overhead, and that would
> mean using two inodes for every fs-verity protected files --- and we
> don't need all of the other metadata (mtime, ctime, etc.) for the
> Merkle tree.

Well, if one of the issues is that the merkle data ends up being
megabytes, I don't see this as being  a huge issue. Particularly since
the alignemnt restrictions in the unified thing means that you still
aren't exactly "packing" the merkle data efficiently to begin with..

In fact, if you want to have merkle data for small files (where the
merkle data itself is just a  few words), then having it in a separate
file and as part of the inode inline data doesn't seem like it's
likely any worse (and might be *better*) than having it at some block
boundary due to alignment...

Hmm?

            Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Proposal: A new fs-verity interface
  2019-01-25  0:35           ` Linus Torvalds
@ 2019-01-29 15:48             ` Theodore Y. Ts'o
  0 siblings, 0 replies; 13+ messages in thread
From: Theodore Y. Ts'o @ 2019-01-29 15:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Christoph Hellwig, Darrick J. Wong, Eric Biggers,
	linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel

On Fri, Jan 25, 2019 at 01:35:05PM +1300, Linus Torvalds wrote:
> But that's the whole hiding thing. Why do you feel you need to do
> that? Why not just leave it alone, and leave it visible, and say "hey,
> the merkle data for file X comes from here".

There are a number of downsides:

*) It's ugly that files that have to live somewhere (e.g., a dot file,
   some other directory, etc.) in the directory hierarchy, when theyt
   are fundamentally part of the file that is being protected --- that
   is, it is file metadata.

*) We don't want to allow the files to be deted, since it breaks the
   protection; that either has to make the original file useful, since
   the security policy is we can't trust the file --- which might be a
   privileged APK (think setuid binary), or we have to make the file
   immutable and it from being deleted.

*) When we delete the original file, userspace now has to manually
   clean up the Merkle data for the file.

So keeping it hidden is just cleaner.

You're right that making the Merkle data explicit available in some
form (either via an xattr or a separate file) would make it easier to
copy the file, but that's not something that is needed in practice.
So it's an advantage, but it wasn't one that we had considered
important.  For example for most executables on a desktop, they are
installed via a package manager, and they are deleted when the package
is updated.  Or in the case of an Android APK, copying it is not
something that is done once it is downloaded to the device.

> In fact, if you want to have merkle data for small files (where the
> merkle data itself is just a  few words), then having it in a separate
> file and as part of the inode inline data doesn't seem like it's
> likely any worse (and might be *better*) than having it at some block
> boundary due to alignment...
> 
> Hmm?

The default inode size is 256; and in that case "small files" is less
than 12k.  With an ext4 inode size of 1024 bytes "small files" would
be 108k --- and this is ignoring the size of the fsverity header.
With the header these numbers would be even smaller --- and given that
the most common use of this will be for APK and executables, using the
inline data (or inline xattrs) is really not practical.

       	    	       	       	  - Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-01-29 15:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-10  5:15 Proposal: A new fs-verity interface Theodore Y. Ts'o
2019-01-10  5:15 ` Theodore Y. Ts'o
2019-01-10 18:18 ` Darrick J. Wong
2019-01-10 18:18   ` Darrick J. Wong
2019-01-14 23:41 ` Dave Chinner
2019-01-14 23:41   ` Dave Chinner
2019-01-23  5:10   ` Theodore Y. Ts'o
2019-01-24 21:25     ` Dave Chinner
2019-01-24 21:40       ` Linus Torvalds
2019-01-24 23:22         ` Theodore Y. Ts'o
2019-01-25  0:32           ` Matthew Wilcox
2019-01-25  0:35           ` Linus Torvalds
2019-01-29 15:48             ` Theodore Y. Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).