linux-fscrypt.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* backup/restore of fscrypt files
@ 2020-11-26  7:12 Andreas Dilger
  2020-11-30 18:39 ` Eric Biggers
  0 siblings, 1 reply; 4+ messages in thread
From: Andreas Dilger @ 2020-11-26  7:12 UTC (permalink / raw)
  To: Eric Biggers, Theodore Ts'o
  Cc: linux-fscrypt, Ext4 Developers List, linux-f2fs-devel,
	linux-fsdevel, Sebastien Buisson

[-- Attachment #1: Type: text/plain, Size: 2070 bytes --]

Currently it is not possible to do backup/restore of fscrypted files without the
encryption key for a number of reasons.  However, it is desirable to be able to
backup/restore filesystems with encrypted files for numerous reasons.

My understanding is that there are two significant obstacles for this to work:
- the file size reported to userspace for an encrypted file is the "real" file size,
  but there is data stored beyond "i_size" that is required for decrypting the file
- the per-inode 16-byte nonce that would need to be backed up and restored for
  later decryption to be possible

I'm wondering if it makes sense for stat() to report the size rounded up to the end
of the encryption block for encrypted files, and then report the "real" size and
nonce in virtual xattrs (e.g. "trusted.fscrypt_size" and "trusted.fscrypt_nonce")
so that encrypted files can be backed up and restored using normal utilities like
tar and rsync if the xattrs are also copied.

A (small) added benefit of rounding the size of encrypted files up to the end of the
encryption block is that it makes fingerprinting of files by their size a bit harder.
Changing the size returned by stat() is not (IMHO) problematic, since it is not
currently possible to directly read encrypted files without the key anyway.


The use of "trusted" xattrs would limit the backup/restore of encrypted files to
privileged users.  We could use "user" xattrs to allow backup by non-root users, but
that would re-expose the real file size to userspace (not worse than today), and
would corrupt the file if the size or nonce xattrs were modified by the user.

It isn't clear whether there is a huge benefit of users to be able to backup/restore
their own files while encrypted.  For single-user systems, the user will have root
access anyway, while administrators of multi-user systems need privileged access for
shared filesystems backup/restore anyway.

I'm probably missing some issues here, but hopefully this isn't an intractable problem.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: backup/restore of fscrypt files
  2020-11-26  7:12 backup/restore of fscrypt files Andreas Dilger
@ 2020-11-30 18:39 ` Eric Biggers
  2020-11-30 19:42   ` Eric Biggers
  2020-11-30 20:09   ` Theodore Y. Ts'o
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Biggers @ 2020-11-30 18:39 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Theodore Ts'o, linux-fscrypt, Ext4 Developers List,
	linux-f2fs-devel, linux-fsdevel, Sebastien Buisson

Hi Andreas,

On Thu, Nov 26, 2020 at 12:12:26AM -0700, Andreas Dilger wrote:
> Currently it is not possible to do backup/restore of fscrypted files without the
> encryption key for a number of reasons.  However, it is desirable to be able to
> backup/restore filesystems with encrypted files for numerous reasons.
> 
> My understanding is that there are two significant obstacles for this to work:
> - the file size reported to userspace for an encrypted file is the "real" file size,
>   but there is data stored beyond "i_size" that is required for decrypting the file
> - the per-inode 16-byte nonce that would need to be backed up and restored for
>   later decryption to be possible
> 
> I'm wondering if it makes sense for stat() to report the size rounded up to the end
> of the encryption block for encrypted files, and then report the "real" size and
> nonce in virtual xattrs (e.g. "trusted.fscrypt_size" and "trusted.fscrypt_nonce")
> so that encrypted files can be backed up and restored using normal utilities like
> tar and rsync if the xattrs are also copied.
> 
> A (small) added benefit of rounding the size of encrypted files up to the end of the
> encryption block is that it makes fingerprinting of files by their size a bit harder.
> Changing the size returned by stat() is not (IMHO) problematic, since it is not
> currently possible to directly read encrypted files without the key anyway.
> 
> The use of "trusted" xattrs would limit the backup/restore of encrypted files to
> privileged users.  We could use "user" xattrs to allow backup by non-root users, but
> that would re-expose the real file size to userspace (not worse than today), and
> would corrupt the file if the size or nonce xattrs were modified by the user.
> 
> It isn't clear whether there is a huge benefit of users to be able to backup/restore
> their own files while encrypted.  For single-user systems, the user will have root
> access anyway, while administrators of multi-user systems need privileged access for
> shared filesystems backup/restore anyway.
> 
> I'm probably missing some issues here, but hopefully this isn't an intractable problem.
> 

There would be a lot more to it than what you describe.

First, filenames are encrypted too.  As a result, there would have to be new
ioctls to allow backing up and restoring encrypted filenames.  The existing
no-key names (the names the kernel shows when you list an encrypted dir) don't
work for this, as due to the NAME_MAX limit, they don't necessarily encode the
whole ciphertext.  There would have to be new APIs which operate on raw
ciphertexts (which may contain the '/' or '\0' bytes) of up to NAME_MAX bytes.

Similarly for symlinks; there would have to be new ioctls to read and create
them, as the existing readlink() and symlink() system calls won't necessarily
work.  Granted, handling symlinks correctly is less critical than filenames, as
we *could* just encode the whole symlink target in base64 and say that if you
create a symlink target over 3072 bytes you're out of luck.  That would be
problematic, but less so than limiting encrypted filenames to ~180 bytes...

So for that and various other reasons such as the ordering of different
operations (when restoring a directory, will it be marked as encrypted before or
after the files are created in it, etc.), I think allowing 'rsync' or 'tar' to
work transparently isn't going to be possible.  Instead, a new tool that knows
how to use ioctls to back up and restore encrypted files would be needed.

Then there is the issue of ordering and how different operations would interact
with each other.  This proposal would require the ability to open() a regular
file that doesn't have its encryption key available, and read and write from it.
open() gives you a file descriptor on which lots of other things could be called
too, so we'd need to make sure to explicitly prevent a lot of things which we
didn't have to worry about before, like fallocate() and various ioctl()s.  Then,
what happens if someone adds an encryption key -- when does the file's page
cache get invalidated, and how does it get synchronized with any ongoing I/O, or
memory maps that may exist, and so on.  (Allowing only direct I/O on files that
don't have encryption key unavailable may help...)

Or what happens if an encrypted directory is "under construction", and someone
tries to access it with the key, but its fscrypt_nonce hasn't been restored yet.
And how are such directories represented on-disk -- what does the encryption
xattr actually contain.  Requiring the encryption policy and nonce to be set
*before* anything is created in the directory would make things simpler, I
think...  Also similarly for setting the real file size -- requiring that it be
set before anything can be written to the file may help.

As for changing the i_size reported to userspace on encrypted files without the
key to include the whole final encrypted block, I don't think that would be an
issue by itself.  Note that it doesn't really "make fingerprinting of files by
their size a bit harder", as i_size would still be unencrypted on-disk.

- Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: backup/restore of fscrypt files
  2020-11-30 18:39 ` Eric Biggers
@ 2020-11-30 19:42   ` Eric Biggers
  2020-11-30 20:09   ` Theodore Y. Ts'o
  1 sibling, 0 replies; 4+ messages in thread
From: Eric Biggers @ 2020-11-30 19:42 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Theodore Ts'o, linux-fscrypt, Ext4 Developers List,
	linux-f2fs-devel, linux-fsdevel, Sebastien Buisson

On Mon, Nov 30, 2020 at 10:39:10AM -0800, Eric Biggers wrote:
> (Allowing only direct I/O on files that don't have encryption key unavailable
> may help...)

It may sense to only provide the ciphertext when reads are done using
RWF_ENCODED
(https://lkml.kernel.org/linux-fsdevel/cover.1605723568.git.osandov@fb.com),
rather than making normal reads return ciphertext when the key is unavailable.

Ciphertext reads would always be uncached, which would avoid two conflicting
uses of the same address_space.

- Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: backup/restore of fscrypt files
  2020-11-30 18:39 ` Eric Biggers
  2020-11-30 19:42   ` Eric Biggers
@ 2020-11-30 20:09   ` Theodore Y. Ts'o
  1 sibling, 0 replies; 4+ messages in thread
From: Theodore Y. Ts'o @ 2020-11-30 20:09 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Andreas Dilger, linux-fscrypt, Ext4 Developers List,
	linux-f2fs-devel, linux-fsdevel, Sebastien Buisson

On Mon, Nov 30, 2020 at 10:39:08AM -0800, Eric Biggers wrote:
> Then there is the issue of ordering and how different operations would interact
> with each other.  This proposal would require the ability to open() a regular
> file that doesn't have its encryption key available, and read and write from it.
> open() gives you a file descriptor on which lots of other things could be called
> too, so we'd need to make sure to explicitly prevent a lot of things which we
> didn't have to worry about before, like fallocate() and various ioctl()s.  Then,
> what happens if someone adds an encryption key -- when does the file's page
> cache get invalidated, and how does it get synchronized with any ongoing I/O, or
> memory maps that may exist, and so on.  (Allowing only direct I/O on files that
> don't have encryption key unavailable may help...)

I had put together a draft patch series which used a combination of
ioctls to set and get the necessary encryption metadata (including the
filename), and then allowed root to allow Direct I/O to fetch the data
blocks.

But it wa a mess, especially if you were backing up a directory
hierarchy, in terms of what would need to be done on userspace side
during the restore operation --- especially if one of the requirements
is that the *restore* operation had to work if you didn't have the
encryption key at restore time.  (Think of an Android tablet that had
muliple users, and the person doing the backup and restore might not
have all of the encryption keys available to her.)

Fortunately, the business requirement for this disappeared, and the
patch series (which was super messy, and not tested because it would
have required writing some complex code on the restore side --- the
issue is with the fact that mkdir generates a new encryption key for
new directories, so we would need to have a way to reset the key for a
directory after it was freshly created, but before any filenames were
added --- like I said, it was a real mess), and so I was happy to let
that patch series die a natural death.

These days, we now have support for Direct I/O when the encrpytion is
done by hardware between the OS and the storage device, and the
addition of inline crypto and the v2 encryption keys would have made
the patch series invalid (and far more complex, if someone wanted to
reconstitute it).

So it *could* be done, but it's a huge amount of work, and without the
business justification to dedicate the software engineering time to
implement both the kernel side patches, and the userspace backup and
restore (which would be different for a traditional Linux desktop and
what might be used by say, an Android userspace application), I
suspect it's pretty unlikely to happen.

Of course, if some volunteer wants to try do all of the work, I
suspect Eric and I could provide some design help --- but it really
isn't going to be trivial to design and implement.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-30 20:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-26  7:12 backup/restore of fscrypt files Andreas Dilger
2020-11-30 18:39 ` Eric Biggers
2020-11-30 19:42   ` Eric Biggers
2020-11-30 20:09   ` Theodore Y. Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).