* backup/restore of fscrypt files @ 2020-11-26 7:12 Andreas Dilger 2020-11-30 18:39 ` Eric Biggers 0 siblings, 1 reply; 4+ messages in thread From: Andreas Dilger @ 2020-11-26 7:12 UTC (permalink / raw) To: Eric Biggers, Theodore Ts'o Cc: linux-fscrypt, Ext4 Developers List, linux-f2fs-devel, linux-fsdevel, Sebastien Buisson [-- Attachment #1: Type: text/plain, Size: 2070 bytes --] Currently it is not possible to do backup/restore of fscrypted files without the encryption key for a number of reasons. However, it is desirable to be able to backup/restore filesystems with encrypted files for numerous reasons. My understanding is that there are two significant obstacles for this to work: - the file size reported to userspace for an encrypted file is the "real" file size, but there is data stored beyond "i_size" that is required for decrypting the file - the per-inode 16-byte nonce that would need to be backed up and restored for later decryption to be possible I'm wondering if it makes sense for stat() to report the size rounded up to the end of the encryption block for encrypted files, and then report the "real" size and nonce in virtual xattrs (e.g. "trusted.fscrypt_size" and "trusted.fscrypt_nonce") so that encrypted files can be backed up and restored using normal utilities like tar and rsync if the xattrs are also copied. A (small) added benefit of rounding the size of encrypted files up to the end of the encryption block is that it makes fingerprinting of files by their size a bit harder. Changing the size returned by stat() is not (IMHO) problematic, since it is not currently possible to directly read encrypted files without the key anyway. The use of "trusted" xattrs would limit the backup/restore of encrypted files to privileged users. We could use "user" xattrs to allow backup by non-root users, but that would re-expose the real file size to userspace (not worse than today), and would corrupt the file if the size or nonce xattrs were modified by the user. It isn't clear whether there is a huge benefit of users to be able to backup/restore their own files while encrypted. For single-user systems, the user will have root access anyway, while administrators of multi-user systems need privileged access for shared filesystems backup/restore anyway. I'm probably missing some issues here, but hopefully this isn't an intractable problem. Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 873 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: backup/restore of fscrypt files 2020-11-26 7:12 backup/restore of fscrypt files Andreas Dilger @ 2020-11-30 18:39 ` Eric Biggers 2020-11-30 19:42 ` Eric Biggers 2020-11-30 20:09 ` Theodore Y. Ts'o 0 siblings, 2 replies; 4+ messages in thread From: Eric Biggers @ 2020-11-30 18:39 UTC (permalink / raw) To: Andreas Dilger Cc: Theodore Ts'o, linux-fscrypt, Ext4 Developers List, linux-f2fs-devel, linux-fsdevel, Sebastien Buisson Hi Andreas, On Thu, Nov 26, 2020 at 12:12:26AM -0700, Andreas Dilger wrote: > Currently it is not possible to do backup/restore of fscrypted files without the > encryption key for a number of reasons. However, it is desirable to be able to > backup/restore filesystems with encrypted files for numerous reasons. > > My understanding is that there are two significant obstacles for this to work: > - the file size reported to userspace for an encrypted file is the "real" file size, > but there is data stored beyond "i_size" that is required for decrypting the file > - the per-inode 16-byte nonce that would need to be backed up and restored for > later decryption to be possible > > I'm wondering if it makes sense for stat() to report the size rounded up to the end > of the encryption block for encrypted files, and then report the "real" size and > nonce in virtual xattrs (e.g. "trusted.fscrypt_size" and "trusted.fscrypt_nonce") > so that encrypted files can be backed up and restored using normal utilities like > tar and rsync if the xattrs are also copied. > > A (small) added benefit of rounding the size of encrypted files up to the end of the > encryption block is that it makes fingerprinting of files by their size a bit harder. > Changing the size returned by stat() is not (IMHO) problematic, since it is not > currently possible to directly read encrypted files without the key anyway. > > The use of "trusted" xattrs would limit the backup/restore of encrypted files to > privileged users. We could use "user" xattrs to allow backup by non-root users, but > that would re-expose the real file size to userspace (not worse than today), and > would corrupt the file if the size or nonce xattrs were modified by the user. > > It isn't clear whether there is a huge benefit of users to be able to backup/restore > their own files while encrypted. For single-user systems, the user will have root > access anyway, while administrators of multi-user systems need privileged access for > shared filesystems backup/restore anyway. > > I'm probably missing some issues here, but hopefully this isn't an intractable problem. > There would be a lot more to it than what you describe. First, filenames are encrypted too. As a result, there would have to be new ioctls to allow backing up and restoring encrypted filenames. The existing no-key names (the names the kernel shows when you list an encrypted dir) don't work for this, as due to the NAME_MAX limit, they don't necessarily encode the whole ciphertext. There would have to be new APIs which operate on raw ciphertexts (which may contain the '/' or '\0' bytes) of up to NAME_MAX bytes. Similarly for symlinks; there would have to be new ioctls to read and create them, as the existing readlink() and symlink() system calls won't necessarily work. Granted, handling symlinks correctly is less critical than filenames, as we *could* just encode the whole symlink target in base64 and say that if you create a symlink target over 3072 bytes you're out of luck. That would be problematic, but less so than limiting encrypted filenames to ~180 bytes... So for that and various other reasons such as the ordering of different operations (when restoring a directory, will it be marked as encrypted before or after the files are created in it, etc.), I think allowing 'rsync' or 'tar' to work transparently isn't going to be possible. Instead, a new tool that knows how to use ioctls to back up and restore encrypted files would be needed. Then there is the issue of ordering and how different operations would interact with each other. This proposal would require the ability to open() a regular file that doesn't have its encryption key available, and read and write from it. open() gives you a file descriptor on which lots of other things could be called too, so we'd need to make sure to explicitly prevent a lot of things which we didn't have to worry about before, like fallocate() and various ioctl()s. Then, what happens if someone adds an encryption key -- when does the file's page cache get invalidated, and how does it get synchronized with any ongoing I/O, or memory maps that may exist, and so on. (Allowing only direct I/O on files that don't have encryption key unavailable may help...) Or what happens if an encrypted directory is "under construction", and someone tries to access it with the key, but its fscrypt_nonce hasn't been restored yet. And how are such directories represented on-disk -- what does the encryption xattr actually contain. Requiring the encryption policy and nonce to be set *before* anything is created in the directory would make things simpler, I think... Also similarly for setting the real file size -- requiring that it be set before anything can be written to the file may help. As for changing the i_size reported to userspace on encrypted files without the key to include the whole final encrypted block, I don't think that would be an issue by itself. Note that it doesn't really "make fingerprinting of files by their size a bit harder", as i_size would still be unencrypted on-disk. - Eric ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: backup/restore of fscrypt files 2020-11-30 18:39 ` Eric Biggers @ 2020-11-30 19:42 ` Eric Biggers 2020-11-30 20:09 ` Theodore Y. Ts'o 1 sibling, 0 replies; 4+ messages in thread From: Eric Biggers @ 2020-11-30 19:42 UTC (permalink / raw) To: Andreas Dilger Cc: Theodore Ts'o, linux-fscrypt, Ext4 Developers List, linux-f2fs-devel, linux-fsdevel, Sebastien Buisson On Mon, Nov 30, 2020 at 10:39:10AM -0800, Eric Biggers wrote: > (Allowing only direct I/O on files that don't have encryption key unavailable > may help...) It may sense to only provide the ciphertext when reads are done using RWF_ENCODED (https://lkml.kernel.org/linux-fsdevel/cover.1605723568.git.osandov@fb.com), rather than making normal reads return ciphertext when the key is unavailable. Ciphertext reads would always be uncached, which would avoid two conflicting uses of the same address_space. - Eric ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: backup/restore of fscrypt files 2020-11-30 18:39 ` Eric Biggers 2020-11-30 19:42 ` Eric Biggers @ 2020-11-30 20:09 ` Theodore Y. Ts'o 1 sibling, 0 replies; 4+ messages in thread From: Theodore Y. Ts'o @ 2020-11-30 20:09 UTC (permalink / raw) To: Eric Biggers Cc: Andreas Dilger, linux-fscrypt, Ext4 Developers List, linux-f2fs-devel, linux-fsdevel, Sebastien Buisson On Mon, Nov 30, 2020 at 10:39:08AM -0800, Eric Biggers wrote: > Then there is the issue of ordering and how different operations would interact > with each other. This proposal would require the ability to open() a regular > file that doesn't have its encryption key available, and read and write from it. > open() gives you a file descriptor on which lots of other things could be called > too, so we'd need to make sure to explicitly prevent a lot of things which we > didn't have to worry about before, like fallocate() and various ioctl()s. Then, > what happens if someone adds an encryption key -- when does the file's page > cache get invalidated, and how does it get synchronized with any ongoing I/O, or > memory maps that may exist, and so on. (Allowing only direct I/O on files that > don't have encryption key unavailable may help...) I had put together a draft patch series which used a combination of ioctls to set and get the necessary encryption metadata (including the filename), and then allowed root to allow Direct I/O to fetch the data blocks. But it wa a mess, especially if you were backing up a directory hierarchy, in terms of what would need to be done on userspace side during the restore operation --- especially if one of the requirements is that the *restore* operation had to work if you didn't have the encryption key at restore time. (Think of an Android tablet that had muliple users, and the person doing the backup and restore might not have all of the encryption keys available to her.) Fortunately, the business requirement for this disappeared, and the patch series (which was super messy, and not tested because it would have required writing some complex code on the restore side --- the issue is with the fact that mkdir generates a new encryption key for new directories, so we would need to have a way to reset the key for a directory after it was freshly created, but before any filenames were added --- like I said, it was a real mess), and so I was happy to let that patch series die a natural death. These days, we now have support for Direct I/O when the encrpytion is done by hardware between the OS and the storage device, and the addition of inline crypto and the v2 encryption keys would have made the patch series invalid (and far more complex, if someone wanted to reconstitute it). So it *could* be done, but it's a huge amount of work, and without the business justification to dedicate the software engineering time to implement both the kernel side patches, and the userspace backup and restore (which would be different for a traditional Linux desktop and what might be used by say, an Android userspace application), I suspect it's pretty unlikely to happen. Of course, if some volunteer wants to try do all of the work, I suspect Eric and I could provide some design help --- but it really isn't going to be trivial to design and implement. Cheers, - Ted ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-11-30 20:10 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-11-26 7:12 backup/restore of fscrypt files Andreas Dilger 2020-11-30 18:39 ` Eric Biggers 2020-11-30 19:42 ` Eric Biggers 2020-11-30 20:09 ` Theodore Y. Ts'o
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).