* Proposal: A new fs-verity interface @ 2019-01-10 5:15 Theodore Y. Ts'o 2019-01-10 5:15 ` Theodore Y. Ts'o ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Theodore Y. Ts'o @ 2019-01-10 5:15 UTC (permalink / raw) To: Linus Torvalds, Christoph Hellwig, Dave Chinner, Darrick J. Wong, Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel The following approach is based in Darrick's suggestion: int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg); struct fsverity_arg { int fsv_donor_fd; u64 fsv_offset; u64 fsv_size; }; fsv_offset and fsz_size must be a multiple of the file system block size. If the ioctl comples successfully, as a side effect the donor_fd will have a hole punch operation on the specified range. In other words, the equivalent of operation of fallocate(fsv_donor_fd, FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by fd will be protected using fsverity. It will be legal for fsv_donor_fd == fd, so this interface is a superset of the original FS_IOC_ENABLE_VERITY ioctl. This will hopefully make Christoph and Dave happy because the interface does not presuppose how ext4 and f2fs will implement fsverity behind the scenes. However, it does not forbid it, and the net cost is that ext4 and f2fs will have to implement code which transplants the blocks from the donor_fd to fd in the case where donor_fd != fd --- and in the case where blocks are encrypted using fscrypt, we will have to decrypt the blocks from donor_fd and possibly re-encrypt then in fd's per-file key, which means we'll have to add extra complexity to implement the decrypt and re-encrypt passing through the page cache. But if this helps resolve Christoph and Dave's objections, it shouldn't be _too_ much extra complexity. Before we go ahead an implement it, though, I'd appreciate a confirmation that this will indeed actually resolve their complaints. Thanks, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Proposal: A new fs-verity interface 2019-01-10 5:15 Proposal: A new fs-verity interface Theodore Y. Ts'o @ 2019-01-10 5:15 ` Theodore Y. Ts'o 2019-01-10 18:18 ` Darrick J. Wong 2019-01-14 23:41 ` Dave Chinner 2 siblings, 0 replies; 13+ messages in thread From: Theodore Y. Ts'o @ 2019-01-10 5:15 UTC (permalink / raw) To: Linus Torvalds, Christoph Hellwig, Dave Chinner, Darrick J. Wong, Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel The following approach is based in Darrick's suggestion: int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg); struct fsverity_arg { int fsv_donor_fd; u64 fsv_offset; u64 fsv_size; }; fsv_offset and fsz_size must be a multiple of the file system block size. If the ioctl comples successfully, as a side effect the donor_fd will have a hole punch operation on the specified range. In other words, the equivalent of operation of fallocate(fsv_donor_fd, FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by fd will be protected using fsverity. It will be legal for fsv_donor_fd == fd, so this interface is a superset of the original FS_IOC_ENABLE_VERITY ioctl. This will hopefully make Christoph and Dave happy because the interface does not presuppose how ext4 and f2fs will implement fsverity behind the scenes. However, it does not forbid it, and the net cost is that ext4 and f2fs will have to implement code which transplants the blocks from the donor_fd to fd in the case where donor_fd != fd --- and in the case where blocks are encrypted using fscrypt, we will have to decrypt the blocks from donor_fd and possibly re-encrypt then in fd's per-file key, which means we'll have to add extra complexity to implement the decrypt and re-encrypt passing through the page cache. But if this helps resolve Christoph and Dave's objections, it shouldn't be _too_ much extra complexity. Before we go ahead an implement it, though, I'd appreciate a confirmation that this will indeed actually resolve their complaints. Thanks, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Proposal: A new fs-verity interface 2019-01-10 5:15 Proposal: A new fs-verity interface Theodore Y. Ts'o 2019-01-10 5:15 ` Theodore Y. Ts'o @ 2019-01-10 18:18 ` Darrick J. Wong 2019-01-10 18:18 ` Darrick J. Wong 2019-01-14 23:41 ` Dave Chinner 2 siblings, 1 reply; 13+ messages in thread From: Darrick J. Wong @ 2019-01-10 18:18 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Eric Biggers, Dave Chinner, linux-f2fs-devel, Christoph Hellwig, linux-fscrypt, linux-fsdevel, linux-ext4, Linus Torvalds On Thu, Jan 10, 2019 at 12:15:00AM -0500, Theodore Y. Ts'o wrote: > The following approach is based in Darrick's suggestion: > > int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg); > > struct fsverity_arg { > int fsv_donor_fd; Explicitly sized fields and padding here, please. ISTR there are a few arches that don't have alignment requirements which will make this messy. > u64 fsv_offset; > u64 fsv_size; You might want to allocate some reserved space for flags in case you ever decide you need it, but otherwise it seems fine to me... --D > }; > > fsv_offset and fsz_size must be a multiple of the file system block > size. If the ioctl comples successfully, as a side effect the > donor_fd will have a hole punch operation on the specified range. In > other words, the equivalent of operation of fallocate(fsv_donor_fd, > FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by > fd will be protected using fsverity. > > It will be legal for fsv_donor_fd == fd, so this interface is a > superset of the original FS_IOC_ENABLE_VERITY ioctl. > > This will hopefully make Christoph and Dave happy because the > interface does not presuppose how ext4 and f2fs will implement > fsverity behind the scenes. However, it does not forbid it, and the > net cost is that ext4 and f2fs will have to implement code which > transplants the blocks from the donor_fd to fd in the case where > donor_fd != fd --- and in the case where blocks are encrypted using > fscrypt, we will have to decrypt the blocks from donor_fd and possibly > re-encrypt then in fd's per-file key, which means we'll have to add > extra complexity to implement the decrypt and re-encrypt passing > through the page cache. > > But if this helps resolve Christoph and Dave's objections, it > shouldn't be _too_ much extra complexity. Before we go ahead an > implement it, though, I'd appreciate a confirmation that this will > indeed actually resolve their complaints. > > Thanks, > > - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Proposal: A new fs-verity interface 2019-01-10 18:18 ` Darrick J. Wong @ 2019-01-10 18:18 ` Darrick J. Wong 0 siblings, 0 replies; 13+ messages in thread From: Darrick J. Wong @ 2019-01-10 18:18 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Linus Torvalds, Christoph Hellwig, Dave Chinner, Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel On Thu, Jan 10, 2019 at 12:15:00AM -0500, Theodore Y. Ts'o wrote: > The following approach is based in Darrick's suggestion: > > int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg); > > struct fsverity_arg { > int fsv_donor_fd; Explicitly sized fields and padding here, please. ISTR there are a few arches that don't have alignment requirements which will make this messy. > u64 fsv_offset; > u64 fsv_size; You might want to allocate some reserved space for flags in case you ever decide you need it, but otherwise it seems fine to me... --D > }; > > fsv_offset and fsz_size must be a multiple of the file system block > size. If the ioctl comples successfully, as a side effect the > donor_fd will have a hole punch operation on the specified range. In > other words, the equivalent of operation of fallocate(fsv_donor_fd, > FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by > fd will be protected using fsverity. > > It will be legal for fsv_donor_fd == fd, so this interface is a > superset of the original FS_IOC_ENABLE_VERITY ioctl. > > This will hopefully make Christoph and Dave happy because the > interface does not presuppose how ext4 and f2fs will implement > fsverity behind the scenes. However, it does not forbid it, and the > net cost is that ext4 and f2fs will have to implement code which > transplants the blocks from the donor_fd to fd in the case where > donor_fd != fd --- and in the case where blocks are encrypted using > fscrypt, we will have to decrypt the blocks from donor_fd and possibly > re-encrypt then in fd's per-file key, which means we'll have to add > extra complexity to implement the decrypt and re-encrypt passing > through the page cache. > > But if this helps resolve Christoph and Dave's objections, it > shouldn't be _too_ much extra complexity. Before we go ahead an > implement it, though, I'd appreciate a confirmation that this will > indeed actually resolve their complaints. > > Thanks, > > - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Proposal: A new fs-verity interface 2019-01-10 5:15 Proposal: A new fs-verity interface Theodore Y. Ts'o 2019-01-10 5:15 ` Theodore Y. Ts'o 2019-01-10 18:18 ` Darrick J. Wong @ 2019-01-14 23:41 ` Dave Chinner 2019-01-14 23:41 ` Dave Chinner 2019-01-23 5:10 ` Theodore Y. Ts'o 2 siblings, 2 replies; 13+ messages in thread From: Dave Chinner @ 2019-01-14 23:41 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Eric Biggers, Darrick J. Wong, linux-f2fs-devel, Christoph Hellwig, linux-fscrypt, linux-fsdevel, linux-ext4, Linus Torvalds On Thu, Jan 10, 2019 at 12:15:00AM -0500, Theodore Y. Ts'o wrote: > The following approach is based in Darrick's suggestion: I do not recall what that was, so this: > int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg); > > struct fsverity_arg { > int fsv_donor_fd; > u64 fsv_offset; > u64 fsv_size; > }; > > fsv_offset and fsz_size must be a multiple of the file system block > size. If the ioctl comples successfully, as a side effect the > donor_fd will have a hole punch operation on the specified range. In > other words, the equivalent of operation of fallocate(fsv_donor_fd, > FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by > fd will be protected using fsverity. makes no sense to me. What's in {offset, size} and why do you need to call this on that specific range? If it is the equivalent of a hole punch, then why wouldn't you just use FALLOC_FL_PUNCH_HOLE? Can you please write the man page for the interface so that the description of what it does and how it should be used is crystal clear and doesn't assume the reader knows "what darrick proposed"... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Proposal: A new fs-verity interface 2019-01-14 23:41 ` Dave Chinner @ 2019-01-14 23:41 ` Dave Chinner 2019-01-23 5:10 ` Theodore Y. Ts'o 1 sibling, 0 replies; 13+ messages in thread From: Dave Chinner @ 2019-01-14 23:41 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Linus Torvalds, Christoph Hellwig, Darrick J. Wong, Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel On Thu, Jan 10, 2019 at 12:15:00AM -0500, Theodore Y. Ts'o wrote: > The following approach is based in Darrick's suggestion: I do not recall what that was, so this: > int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg); > > struct fsverity_arg { > int fsv_donor_fd; > u64 fsv_offset; > u64 fsv_size; > }; > > fsv_offset and fsz_size must be a multiple of the file system block > size. If the ioctl comples successfully, as a side effect the > donor_fd will have a hole punch operation on the specified range. In > other words, the equivalent of operation of fallocate(fsv_donor_fd, > FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by > fd will be protected using fsverity. makes no sense to me. What's in {offset, size} and why do you need to call this on that specific range? If it is the equivalent of a hole punch, then why wouldn't you just use FALLOC_FL_PUNCH_HOLE? Can you please write the man page for the interface so that the description of what it does and how it should be used is crystal clear and doesn't assume the reader knows "what darrick proposed"... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Proposal: A new fs-verity interface 2019-01-14 23:41 ` Dave Chinner 2019-01-14 23:41 ` Dave Chinner @ 2019-01-23 5:10 ` Theodore Y. Ts'o 2019-01-24 21:25 ` Dave Chinner 1 sibling, 1 reply; 13+ messages in thread From: Theodore Y. Ts'o @ 2019-01-23 5:10 UTC (permalink / raw) To: Dave Chinner Cc: Linus Torvalds, Christoph Hellwig, Darrick J. Wong, Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel Apologies for the delay in getting responding; I've been on vacation last week. On Tue, Jan 15, 2019 at 10:41:01AM +1100, Dave Chinner wrote: > > int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg); > > > > struct fsverity_arg { > > int fsv_donor_fd; > > u64 fsv_offset; > > u64 fsv_size; > > }; > > > > fsv_offset and fsz_size must be a multiple of the file system block > > size. If the ioctl comples successfully, as a side effect the > > donor_fd will have a hole punch operation on the specified range. In > > other words, the equivalent of operation of fallocate(fsv_donor_fd, > > FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by > > fd will be protected using fsverity. > > makes no sense to me. What's in {offset, size} and why do you need > to call this on that specific range? If it is the equivalent of a > hole punch, then why wouldn't you just use FALLOC_FL_PUNCH_HOLE? This is how we transfer the metadata (e.g., the Merkle tree plus a header block) from userspace to the file system. In the original interface, the verity metadata was appended end of the data file, and then an ioctl would transfer the metadata to the file system. For ext4 and f2fs, we actually store the fsverity metadata after the data blocks and simply adjust i_size visible to userspace so the metadata blocks are "invisible". You objected to this interface, because it was too tightly tied to the implementation used in ext4 and f2fs. So what Darrick suggested is that it could be conveyed by passing the metadata via a second file descriptor. The offset and size specifies where the metadata can be fetched from the second file. As in the original interface, the metadata blocks are "consumed" by the ioctl, so they disappear from the second file, as if FALLOC_FL_PUNCH_HOLE. It's not that they are deallocated; the blocks in question are simply getting transferred to inode which is being protected using fs-verity. > Can you please write the man page for the interface so that the > description of what it does and how it should be used is crystal > clear and doesn't assume the reader knows "what darrick proposed"... I can do that, but I'd like to make sure we have general agreement approach first. Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Proposal: A new fs-verity interface 2019-01-23 5:10 ` Theodore Y. Ts'o @ 2019-01-24 21:25 ` Dave Chinner 2019-01-24 21:40 ` Linus Torvalds 0 siblings, 1 reply; 13+ messages in thread From: Dave Chinner @ 2019-01-24 21:25 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Linus Torvalds, Christoph Hellwig, Darrick J. Wong, Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel On Wed, Jan 23, 2019 at 12:10:17AM -0500, Theodore Y. Ts'o wrote: > Apologies for the delay in getting responding; I've been on vacation > last week. > > On Tue, Jan 15, 2019 at 10:41:01AM +1100, Dave Chinner wrote: > > > int ioctl(fd, FS_IOC_ENABLE_VERITY, struct fsverity_arg *arg); > > > > > > struct fsverity_arg { > > > int fsv_donor_fd; > > > u64 fsv_offset; > > > u64 fsv_size; > > > }; > > > > > > fsv_offset and fsz_size must be a multiple of the file system block > > > size. If the ioctl comples successfully, as a side effect the > > > donor_fd will have a hole punch operation on the specified range. In > > > other words, the equivalent of operation of fallocate(fsv_donor_fd, > > > FALLOC_FL_PUNCH_HOLE, fsv_offset, fsv_size), and the file specified by > > > fd will be protected using fsverity. > > > > makes no sense to me. What's in {offset, size} and why do you need > > to call this on that specific range? If it is the equivalent of a > > hole punch, then why wouldn't you just use FALLOC_FL_PUNCH_HOLE? > > This is how we transfer the metadata (e.g., the Merkle tree plus a > header block) from userspace to the file system. > In the original > interface, the verity metadata was appended end of the data file, and > then an ioctl would transfer the metadata to the file system. So how is this different? You still put the data into a file, then tell the kernel what file it belongs to, and then the filesystem does "magic" and *consumes* that part of the file? That's what I'm asking here - what are the semantics of this ioctl(). I'mnot asking for justification or a unnecessary history lesson. I'm asking you to describe the behaviour and interface precisely. > For > ext4 and f2fs, we actually store the fsverity metadata after the data > blocks and simply adjust i_size visible to userspace so the metadata > blocks are "invisible". Yes, I know how it's been implemented in ext4 and f2fs. I don't care about that - I care about what how this new interface works. > You objected to this interface, because it was too tightly tied to the > implementation used in ext4 and f2fs. So what Darrick suggested is > that it could be conveyed by passing the metadata via a second file > descriptor. The offset and size specifies where the metadata can be > fetched from the second file. So it's still "file data" that the filesystem has to magically transform into hidden metadata in some manner. What writes the merkel tree into a file, and why can't it just write it direct to the filesystem rather than having to use an intermediate file? i.e. the proposed API still seems to be bleeding "merkel tree is held in file data" implementation artifacts through it. That you can still append the merkle tree to the target file and then use the hack "donor_fd = fd" to tell it that is where the merkel tree data is stored seems like a convenient API hack for the existing implementation, rather than a well thought through, generic API. > As in the original interface, the metadata blocks are "consumed" by > the ioctl, so they disappear from the second file, as if > FALLOC_FL_PUNCH_HOLE. It's not that they are deallocated; the blocks > in question are simply getting transferred to inode which is being > protected using fs-verity. But it's not a "hole punch" in your proposal and current implementation because a) the merkle tree blocks are istill accounted as user data on the inode the file and hence visible to userspace (e.g. via stat()) and b) hole punch does not change the file size. i.e. FALLOC_FL_PUNCH_HOLE is not an equivalent operation for the behaviour you are talking about, and the blocks aren't actually "consumed" because they are still accounted to the same file. IOWs, "du -sh" now reports a significant mismatch between file sizes and space consumed by the files. We know from experience that this will confuse users and they'll think it's a bug: http://xfs.org/index.php/XFS_FAQ#Q:_Why_do_files_on_XFS_use_more_data_blocks_than_expected.3F So even though Darrick's proposal is /better/ than just appending data and flipping the inode size to hide it, this API definition still allows implementation details specific to "storing merkle tree as user data beyond EOF" implementations right through it. > > > Can you please write the man page for the interface so that the > > description of what it does and how it should be used is crystal > > clear and doesn't assume the reader knows "what darrick proposed"... > > I can do that, but I'd like to make sure we have general agreement > approach first. This is why we suck so badly at designing new interfaces. We have to have "agreement" before anyone even tries to flesh out the full interface and document all aspects of it. But then we get "agreement" and the next thing that happens is an implementation lands in the tree and then we find later on it's full of bugs because "agreement" doesn't mean ithe API was "fully documented, understood, reviewed and tested". We need to change the way we design user APIs - if you don't have a fully documented proposal (i.e. a man page) that describes the parameters, the behaviour of the API, the implementation constraints and requirements, the errors that can be returned, etc, then it isn't an API proposal we can actually review properly. So, please, if this is the way you want to go, please document the API properly. However, I still have issues with the assumption that the user application generating the merkel tree must first write the fsverity information to a user data file before the merkel tree is passed to the filesystem for permanent storage? Why doesn't it write the merkel tree directly to the kernel so the filesystem can store it where it wants to store it on the fly? Linus wanted the fsverity information to be injected at the VFS so it's the same for everyone, and that implies that the fsverity information needs to be treated as a data stream rather than a file in some filesystem. People have suggested an ADS (alternate data stream) for this, but that's crazy talk. All we need is a file/fd flag to say we are "writing fsverity information" so the kernel does the right thing. e.g. vfd = openat(AT_FDCWD, filename, O_APPEND|O_WRONLY); flags = fcntl(vfd, F_GETFL, NULL); fcntl(vfd, F_SETFL, flags | O_FSVERITY); while (....) { write(vfd, buf, len); } IOWs, we can communicate to the kernel that this data stream is special "fs verity" metadata, hence allowing the data that to be written to where-ever the filesystem stores the merkel tree. Permissions will be required to set this flag. (Discussion point: maybe should be a FD_ flag that forces FD_CLOEXEC so that child processes can't write fsverity information) From the tagged data stream, we can implement filesystem indepedent storage for the merkel tree data - it doesn't have be be written by the filesystem at all. e.g. the VFS could cut the incoming buffer up into fsverity headers and 64kB data chunks and store them in separate named xattrs. The filesystem would be none-the-wiser that there is even fsverity data on the file - it would just work on all filesystems. You don't even need flags in the inode to tell the fs that it is protected by fsverity - the presence of the fsverity xattrs is sufficient to determine that. (Discussion point: use separate iovecs for fsverity headers and opaque data chunks w/ pwritev2(RWF_FSVERITY|RWF_APPEND) as a defined user/kernel fsverity format instead of just write calls which the kernel then interprets) This is largely impossible with the "store the merkel tree data in a file, then use a special ioctl to swizzle it into place" API because the filesystem has to know the on-disk format of the fsverity information to be able to break it up sanely into xattrs or some other internal storage type. The filesystem also needs on disk format changes to support non-xattr format fsverity storage, while a "VFS layer translates to xattrs" approach can be done without needing any changes to any filesystem. It also makes the fsverity functionality available to network filesystems as well... IOWs, it seems to me that the API should be based around writing fsverity data into the kernel as a defined data stream, not an API that does "something" with opaque data in a file from "somewhere" and assumes the filesystem will handle it all.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Proposal: A new fs-verity interface 2019-01-24 21:25 ` Dave Chinner @ 2019-01-24 21:40 ` Linus Torvalds 2019-01-24 23:22 ` Theodore Y. Ts'o 0 siblings, 1 reply; 13+ messages in thread From: Linus Torvalds @ 2019-01-24 21:40 UTC (permalink / raw) To: Dave Chinner Cc: Theodore Y. Ts'o, Christoph Hellwig, Darrick J. Wong, Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel On Fri, Jan 25, 2019 at 10:25 AM Dave Chinner <david@fromorbit.com> wrote: > > So it's still "file data" that the filesystem has to magically > transform into hidden metadata in some manner. What writes the > merkel tree into a file, and why can't it just write it direct to > the filesystem rather than having to use an intermediate file? This does frustrate me too. I _assume_ (but it's exactly that - just an assumption) this whole design decision comes from basically having a transport layer that is entirely unaware of the merle data, so the data is brought in some entirely traditional way that can only transfer regular file contents (ie tar/zip/ar kind of thing, but presumably actually just in the form of an android APK). And then the new interface is just a way to "convert" that into the actual final security model. One thing that is also unclear to me is whether that "secure" model needs to be stable on disk (ie is this considered an actual write that *modifies* the underlying filesystem, and the merkle tree data ends up being associated long-term and over reboots), or whether it would be acceptable to just have it be a temporary "view" of the file where the filesystem itself can be read-only, and all that happens is that now the merkle tree is associated with that file as long as the filesystem is mounted (or until it is disassociated). Maybe this was answered in some of the earlier email threads that (at least for me) were then somewhat overshadowed by the merge window work and the holidays. So it's possible that I repeat myself. But I do have to say that I think I'd *still* prefer this to be something more like an xattr, and that maybe we'd be better off actually improving out "write to xattr" interface or something. I understand that you don't want to load the whole merkle tree into memory, and that is the reason that you want to point to some "stable on disk" area, but the hole punching does seem to be a particularly nasty part of it. It would be much better to have the merkle data in some place where it doesn't then need to be hidden again, no? Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Proposal: A new fs-verity interface 2019-01-24 21:40 ` Linus Torvalds @ 2019-01-24 23:22 ` Theodore Y. Ts'o 2019-01-25 0:32 ` Matthew Wilcox 2019-01-25 0:35 ` Linus Torvalds 0 siblings, 2 replies; 13+ messages in thread From: Theodore Y. Ts'o @ 2019-01-24 23:22 UTC (permalink / raw) To: Linus Torvalds Cc: Dave Chinner, Christoph Hellwig, Darrick J. Wong, Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel On Fri, Jan 25, 2019 at 10:40:31AM +1300, Linus Torvalds wrote: > > I _assume_ (but it's exactly that - just an assumption) this whole > design decision comes from basically having a transport layer that is > entirely unaware of the merle data, so the data is brought in some > entirely traditional way that can only transfer regular file contents > (ie tar/zip/ar kind of thing, but presumably actually just in the form > of an android APK). And then the new interface is just a way to > "convert" that into the actual final security model. How the transport layer is going to send the merkle data is really unrelated (e.g., it's not necessarily going to be at the end of the file data). > One thing that is also unclear to me is whether that "secure" model > needs to be stable on disk (ie is this considered an actual write that > *modifies* the underlying filesystem, and the merkle tree data ends up > being associated long-term and over reboots), or whether it would be > acceptable to just have it be a temporary "view" of the file where the > filesystem itself can be read-only, and all that happens is that now > the merkle tree is associated with that file as long as the filesystem > is mounted (or until it is disassociated). It's the first. We need to keep the Merkle tree and associated metadata information (which might include a PKCS 7 digital signature) permanently associated with the file. So it has to be stored in the file; it's associated metadata. > Maybe this was answered in some of the earlier email threads that (at > least for me) were then somewhat overshadowed by the merge window work > and the holidays. So it's possible that I repeat myself. But I do have > to say that I think I'd *still* prefer this to be something more like > an xattr, and that maybe we'd be better off actually improving out > "write to xattr" interface or something. The main issue is that for a 129 MB file, the Merkle data is going to be a Megabyte. So using a set/get interface, ala our current xattr interface, seems awkward. Also, currently for most file systems, xattrs are limited in size to around 4k to 32k, and most xattrs relatively small (e.g., SELinux labels, ACL's). So even if we used the xattr interface, for many file systems, for something that might be 1 megabyte (for a 129 MB file to be protected by fs-verity), it would almost certainly be stored in a different location than other xattrs. So similarly, changing our attr interfaces for big blobs, when the vast majority of xattrs are small ones, doesn't seem to be a great use of time. The other thing I'll point out is that file system developers generally have frowned on using setting xattrs having magic side effects, since that would mean making the xattr set/get interface acting more lke an ioctl. When we make an file to become fs-verity protected, it does have a side-effect of making the file immutable. That's not a huge side-effect, but that's another reason where it feels like the xattr interface seems like the wrong effort. > I understand that you don't want to load the whole merkle tree into > memory, and that is the reason that you want to point to some "stable > on disk" area, but the hole punching does seem to be a particularly > nasty part of it. It would be much better to have the merkle data in > some place where it doesn't then need to be hidden again, no? It's not really a "hole punch", but we are moving the data around. That's because Dave Chinner and Christoph demanded it. The original approach was to put it at the end of the file, and then hide it. If the question is "why hide the metadata", it's because it's metadata. We certainly don't want to make it be visible as part of the file stream. We could store the metadata somewhere else --- for example, we could store it in another inode. But inodes have overhead, and that would mean using two inodes for every fs-verity protected files --- and we don't need all of the other metadata (mtime, ctime, etc.) for the Merkle tree. So that's how we got to where we were. I think the approach of storing it using the same extent tree where we map logical block numbers to physical block numbers make a lot of sense for ext4 and f2fs. It seems that some file system (which may never even implement fs-verity) their developers hate that particular approach. So that's where the suggestion of using a separate file descriptor to convey the Merkle tree data to the file system came from. It wasn't my first choice. - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Proposal: A new fs-verity interface 2019-01-24 23:22 ` Theodore Y. Ts'o @ 2019-01-25 0:32 ` Matthew Wilcox 2019-01-25 0:35 ` Linus Torvalds 1 sibling, 0 replies; 13+ messages in thread From: Matthew Wilcox @ 2019-01-25 0:32 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Linus Torvalds, Dave Chinner, Christoph Hellwig, Darrick J. Wong, Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel On Thu, Jan 24, 2019 at 06:22:37PM -0500, Theodore Y. Ts'o wrote: > The main issue is that for a 129 MB file, the Merkle data is going to > be a Megabyte. 127MB ... I pointed out this error the last time the documentation was posted. > We could store the metadata somewhere else --- for example, we could > store it in another inode. But inodes have overhead, and that would > mean using two inodes for every fs-verity protected files --- and we > don't need all of the other metadata (mtime, ctime, etc.) for the > Merkle tree. So that's how we got to where we were. I think the > approach of storing it using the same extent tree where we map logical > block numbers to physical block numbers make a lot of sense for ext4 > and f2fs. > > It seems that some file system (which may never even implement > fs-verity) their developers hate that particular approach. So that's > where the suggestion of using a separate file descriptor to convey the > Merkle tree data to the file system came from. It wasn't my first > choice. I'll reiterate an API I suggested on December 21st: : verity_fd = ioctl(fd, FS_IOC_VERITY_FD); : write(verity_fd, &merkle_tree); : close(verity_fd); : : At final close of that verity_fd, the filesystem behaves in the same way : that it does on receipt of this FS_IOC_ENABLE_VERITY ioctl today. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Proposal: A new fs-verity interface 2019-01-24 23:22 ` Theodore Y. Ts'o 2019-01-25 0:32 ` Matthew Wilcox @ 2019-01-25 0:35 ` Linus Torvalds 2019-01-29 15:48 ` Theodore Y. Ts'o 1 sibling, 1 reply; 13+ messages in thread From: Linus Torvalds @ 2019-01-25 0:35 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Dave Chinner, Christoph Hellwig, Darrick J. Wong, Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel On Fri, Jan 25, 2019 at 12:22 PM Theodore Y. Ts'o <tytso@mit.edu> wrote: > > > > I understand that you don't want to load the whole merkle tree into > > memory, and that is the reason that you want to point to some "stable > > on disk" area, but the hole punching does seem to be a particularly > > nasty part of it. It would be much better to have the merkle data in > > some place where it doesn't then need to be hidden again, no? > > It's not really a "hole punch", but we are moving the data around. > That's because Dave Chinner and Christoph demanded it. The original > approach was to put it at the end of the file, and then hide it. If > the question is "why hide the metadata", it's because it's metadata. > We certainly don't want to make it be visible as part of the file > stream. But that's the whole hiding thing. Why do you feel you need to do that? Why not just leave it alone, and leave it visible, and say "hey, the merkle data for file X comes from here". Punching a hole, or moving it somewhere else, or hiding or, or whatever you want to do, all mean "you have to modify the filesystem". They also mean "you can't back up the original sanely any more and move it somewhere else". Wouldn't it be much nicer to simple have a *separate* stream. If it's not an xattr, then maybe just a separate file entirely. A file that just contains the merkle data, and has no alignment things because there's no "it's at the end of the file that it describes" issues, it's just a standalone thing that is still visible so that you can copy the original data and it's still valid (even if the merkle tree then isn't _enabled_ in the copy, but now you could re-enable it in a new location). > We could store the metadata somewhere else --- for example, we could > store it in another inode. But inodes have overhead, and that would > mean using two inodes for every fs-verity protected files --- and we > don't need all of the other metadata (mtime, ctime, etc.) for the > Merkle tree. Well, if one of the issues is that the merkle data ends up being megabytes, I don't see this as being a huge issue. Particularly since the alignemnt restrictions in the unified thing means that you still aren't exactly "packing" the merkle data efficiently to begin with.. In fact, if you want to have merkle data for small files (where the merkle data itself is just a few words), then having it in a separate file and as part of the inode inline data doesn't seem like it's likely any worse (and might be *better*) than having it at some block boundary due to alignment... Hmm? Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Proposal: A new fs-verity interface 2019-01-25 0:35 ` Linus Torvalds @ 2019-01-29 15:48 ` Theodore Y. Ts'o 0 siblings, 0 replies; 13+ messages in thread From: Theodore Y. Ts'o @ 2019-01-29 15:48 UTC (permalink / raw) To: Linus Torvalds Cc: Dave Chinner, Christoph Hellwig, Darrick J. Wong, Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4, linux-f2fs-devel On Fri, Jan 25, 2019 at 01:35:05PM +1300, Linus Torvalds wrote: > But that's the whole hiding thing. Why do you feel you need to do > that? Why not just leave it alone, and leave it visible, and say "hey, > the merkle data for file X comes from here". There are a number of downsides: *) It's ugly that files that have to live somewhere (e.g., a dot file, some other directory, etc.) in the directory hierarchy, when theyt are fundamentally part of the file that is being protected --- that is, it is file metadata. *) We don't want to allow the files to be deted, since it breaks the protection; that either has to make the original file useful, since the security policy is we can't trust the file --- which might be a privileged APK (think setuid binary), or we have to make the file immutable and it from being deleted. *) When we delete the original file, userspace now has to manually clean up the Merkle data for the file. So keeping it hidden is just cleaner. You're right that making the Merkle data explicit available in some form (either via an xattr or a separate file) would make it easier to copy the file, but that's not something that is needed in practice. So it's an advantage, but it wasn't one that we had considered important. For example for most executables on a desktop, they are installed via a package manager, and they are deleted when the package is updated. Or in the case of an Android APK, copying it is not something that is done once it is downloaded to the device. > In fact, if you want to have merkle data for small files (where the > merkle data itself is just a few words), then having it in a separate > file and as part of the inode inline data doesn't seem like it's > likely any worse (and might be *better*) than having it at some block > boundary due to alignment... > > Hmm? The default inode size is 256; and in that case "small files" is less than 12k. With an ext4 inode size of 1024 bytes "small files" would be 108k --- and this is ignoring the size of the fsverity header. With the header these numbers would be even smaller --- and given that the most common use of this will be for APK and executables, using the inline data (or inline xattrs) is really not practical. - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-01-29 15:50 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-10 5:15 Proposal: A new fs-verity interface Theodore Y. Ts'o 2019-01-10 5:15 ` Theodore Y. Ts'o 2019-01-10 18:18 ` Darrick J. Wong 2019-01-10 18:18 ` Darrick J. Wong 2019-01-14 23:41 ` Dave Chinner 2019-01-14 23:41 ` Dave Chinner 2019-01-23 5:10 ` Theodore Y. Ts'o 2019-01-24 21:25 ` Dave Chinner 2019-01-24 21:40 ` Linus Torvalds 2019-01-24 23:22 ` Theodore Y. Ts'o 2019-01-25 0:32 ` Matthew Wilcox 2019-01-25 0:35 ` Linus Torvalds 2019-01-29 15:48 ` Theodore Y. Ts'o
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).