Backup/restore of fscrypt files and directories

* Backup/restore of fscrypt files and directories
@ 2023-02-08 12:09 Sebastien Buisson
  2023-02-08 19:28 ` Eric Biggers
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastien Buisson @ 2023-02-08 12:09 UTC (permalink / raw)
  To: linux-fscrypt; +Cc: linux-fsdevel, linux-ext4

I am planning to implement backup and restore for fscrypt files and 
directories and propose the following design, and would welcome feedback 
on this approach.

There is a need to preserve encrypted file data in case of storage 
failure and to allow safely moving the data between filesystems and 
systems without decrypting it, just like we would do for normal files. 
While backup and restore at the device level is sometimes an option, we 
need to also be able to carry out back/restore at the ext4 file system 
level, for instance to allow changing formatting options.

The core principle we want to retain is that we must not make any clear 
text copy of encrypted files. This means backup/restore must be carried 
out without the encryption key.

The first challenge we have to address is to get access to raw encrypted 
files without the encryption key. By design, fscrypt does not allow such 
kind of access, and the ext4 file system would not let read or write 
files flagged as encrypted if the encryption key is not provided. This 
restriction is not for security reasons, but to avoid applications 
accidentally accessing the ciphertext. A mechanism must be provided for 
access to both raw encrypted content, and raw encrypted names.

The second challenge is to deal with the encrypted file's size, when it 
is accessed with the encryption key vs. when accessed without the 
encryption key. For the backup operation to retrieve full encrypted 
content, the encrypted file size should be reported as a multiple of the 
encryption chunk size when the encryption key is not present. And the 
clear text file size (size as seen with the encryption key) must be 
backed up as well in order to properly restore encrypted files later on. 
This information cannot be inferred by any other means.

The third challenge is to get access to the encryption context of files 
and directories. By design, fscrypt does not expose this information, 
internally stored as an extended attribute but with no associated 
handler. However, making a backup of the encryption context is crucial 
because it preserves the information needed to later decrypt the file 
content. And it is also a non-trivial operation to restore the 
encryption context. Indeed, fscrypt imposes that an encryption context 
can only be set on a new file or an existing but empty directory.

In order to address this need for backup/restore of encrypted files, we 
propose to make use of a special extended attribute named 
security.encdata, containing:
- encoding method used for binary data. Assume name can be up to 255 chars.
- clear text file data length in bytes (set to 0 for dirs).
- encryption context. 40 bytes for v2 encryption context.
- encrypted name. 256 bytes max.

To improve portability if we need to change the on-disk format in the 
future, and to make the archived data useful over a longer timeframe, 
the content of the security.encdata xattr is expressed as ASCII text 
with a "key: value" YAML format. As encryption context and encrypted 
file name are binary, they need to be encoded.
So the content of the security.encdata xattr would be something like:

   { encoding: base64url, size: 3012, enc_ctx: YWJjZGVmZ2hpamtsbW
   5vcHFyc3R1dnd4eXphYmNkZWZnaGlqa2xtbg, enc_name: ZmlsZXdpdGh2ZX
   J5bG9uZ25hbWVmaWxld2l0aHZlcnlsb25nbmFtZWZpbGV3aXRodmVyeWxvbmdu
   YW1lZmlsZXdpdGg }

Because base64 encoding has a 33% overhead, this gives us a maximum 
xattr size of approximately 800 characters.
This extended attribute would not be shown when listing xattrs, only 
exposed when fetched explicitly, and unmodified tools would not be able 
to access the encrypted files in any case. It would not be stored on 
disk, only computed when fetched.

File and file system backups often use the tar utility either directly 
or under the covers. We propose to modify the tar utility to make it 
"encryption aware", but the same relatively small changes could be done 
with other common backup utilities like cpio as needed. When detecting 
ext4 encrypted files, tar would need to explicitly fetch the 
security.encdata extended attribute, and store it along with the backup 
file. Fetching this extended attribute would internally trigger in ext4 
a mechanism responsible for gathering the required information. Because 
we must not make any clear text copy of encrypted files, the encryption 
key must not be present. Tar would also need to use a special flag that 
would allow reading raw data without the encryption key. Such a flag 
could be named O_FILE_ENC, and would need to be coupled with O_DIRECT so 
that the page cache does not see this raw data. O_FILE_ENC could take 
the value of (O_NOCTTY | O_NDELAY) as they are unlikely to be used in 
practice and are not harmful if used incorrectly. The name of the 
backed-up file would be the encoded+digested form returned by fscrypt.

The tar utility would be used to extract a previously created tarball 
containing encrypted files. When restoring the security.encdata extended 
attribute, instead of storing the xattr as-is on disk, this would 
internally trigger in ext4 a mechanism responsible for extracting the 
required information, and storing them accordingly. Tar would also need 
to specify the O_FILE_ENC | O_DIRECT flags to write raw data without the 
encryption key.

To create a valid encrypted file with proper encryption context and 
encrypted name, we can implement a mechanism where the file is first 
created with O_TMPFILE in the encrypted directory to avoid triggering 
the encryption context check before setting the security.encdata xattr, 
and then atomically linking it to the namespace with the correct 
encrypted name.

 From a security standpoint, doing backup and restore of encrypted files 
must not compromise their security. This is the reason why we want to 
carry out these operations without the encryption key. It avoids making 
a clear text copy of encrypted files.
The security.encdata extended attribute contains the encryption context 
of the file or directory. This has a 16-byte nonce (per-file random 
value) that is used along with the master key to derive the per-file key 
thanks to a KDF function. But the master key is not stored in ext4, so 
it is not backed up as part of the scenario described above, which makes 
the backup of the raw encrypted files safe.
The process of restoring encrypted files must not change the encryption 
context associated with the files. In particular, setting an encryption 
context on a file must be possible only once, when the file is restored. 
And the newly introduced capability of restoring encrypted files must 
not give the ability to set an arbitrary encryption context on files.

 From the backup tool point of view, the only changes needed would be to 
add "O_FILE_ENC" when the open fails with ENOKEY, and then explicitly 
backup the "security.encdata" xattr with the file.  On restore, if the 
"security.encdata" xattr is present, then the file should be created in 
the directory with O_TMPFILE before restoring the xattrs and file data, 
and then using link() to link the file to the directory with the 
encrypted filename.

 From the filesystem point of view, it needs to generate the encdata 
xattr on getxattr(), and interpret it correctly on setxattr().  The VFS 
needs to allow open() and link() on encrypted files with O_FILE_ENC.

If this proposal is OK I can provide a series of patches to implement this.

^ permalink raw reply	[flat|nested] 5+ messages in thread