* Backup/restore of fscrypt files and directories @ 2023-02-08 12:09 Sebastien Buisson 2023-02-08 19:28 ` Eric Biggers 0 siblings, 1 reply; 5+ messages in thread From: Sebastien Buisson @ 2023-02-08 12:09 UTC (permalink / raw) To: linux-fscrypt; +Cc: linux-fsdevel, linux-ext4 I am planning to implement backup and restore for fscrypt files and directories and propose the following design, and would welcome feedback on this approach. There is a need to preserve encrypted file data in case of storage failure and to allow safely moving the data between filesystems and systems without decrypting it, just like we would do for normal files. While backup and restore at the device level is sometimes an option, we need to also be able to carry out back/restore at the ext4 file system level, for instance to allow changing formatting options. The core principle we want to retain is that we must not make any clear text copy of encrypted files. This means backup/restore must be carried out without the encryption key. The first challenge we have to address is to get access to raw encrypted files without the encryption key. By design, fscrypt does not allow such kind of access, and the ext4 file system would not let read or write files flagged as encrypted if the encryption key is not provided. This restriction is not for security reasons, but to avoid applications accidentally accessing the ciphertext. A mechanism must be provided for access to both raw encrypted content, and raw encrypted names. The second challenge is to deal with the encrypted file's size, when it is accessed with the encryption key vs. when accessed without the encryption key. For the backup operation to retrieve full encrypted content, the encrypted file size should be reported as a multiple of the encryption chunk size when the encryption key is not present. And the clear text file size (size as seen with the encryption key) must be backed up as well in order to properly restore encrypted files later on. This information cannot be inferred by any other means. The third challenge is to get access to the encryption context of files and directories. By design, fscrypt does not expose this information, internally stored as an extended attribute but with no associated handler. However, making a backup of the encryption context is crucial because it preserves the information needed to later decrypt the file content. And it is also a non-trivial operation to restore the encryption context. Indeed, fscrypt imposes that an encryption context can only be set on a new file or an existing but empty directory. In order to address this need for backup/restore of encrypted files, we propose to make use of a special extended attribute named security.encdata, containing: - encoding method used for binary data. Assume name can be up to 255 chars. - clear text file data length in bytes (set to 0 for dirs). - encryption context. 40 bytes for v2 encryption context. - encrypted name. 256 bytes max. To improve portability if we need to change the on-disk format in the future, and to make the archived data useful over a longer timeframe, the content of the security.encdata xattr is expressed as ASCII text with a "key: value" YAML format. As encryption context and encrypted file name are binary, they need to be encoded. So the content of the security.encdata xattr would be something like: { encoding: base64url, size: 3012, enc_ctx: YWJjZGVmZ2hpamtsbW 5vcHFyc3R1dnd4eXphYmNkZWZnaGlqa2xtbg, enc_name: ZmlsZXdpdGh2ZX J5bG9uZ25hbWVmaWxld2l0aHZlcnlsb25nbmFtZWZpbGV3aXRodmVyeWxvbmdu YW1lZmlsZXdpdGg } Because base64 encoding has a 33% overhead, this gives us a maximum xattr size of approximately 800 characters. This extended attribute would not be shown when listing xattrs, only exposed when fetched explicitly, and unmodified tools would not be able to access the encrypted files in any case. It would not be stored on disk, only computed when fetched. File and file system backups often use the tar utility either directly or under the covers. We propose to modify the tar utility to make it "encryption aware", but the same relatively small changes could be done with other common backup utilities like cpio as needed. When detecting ext4 encrypted files, tar would need to explicitly fetch the security.encdata extended attribute, and store it along with the backup file. Fetching this extended attribute would internally trigger in ext4 a mechanism responsible for gathering the required information. Because we must not make any clear text copy of encrypted files, the encryption key must not be present. Tar would also need to use a special flag that would allow reading raw data without the encryption key. Such a flag could be named O_FILE_ENC, and would need to be coupled with O_DIRECT so that the page cache does not see this raw data. O_FILE_ENC could take the value of (O_NOCTTY | O_NDELAY) as they are unlikely to be used in practice and are not harmful if used incorrectly. The name of the backed-up file would be the encoded+digested form returned by fscrypt. The tar utility would be used to extract a previously created tarball containing encrypted files. When restoring the security.encdata extended attribute, instead of storing the xattr as-is on disk, this would internally trigger in ext4 a mechanism responsible for extracting the required information, and storing them accordingly. Tar would also need to specify the O_FILE_ENC | O_DIRECT flags to write raw data without the encryption key. To create a valid encrypted file with proper encryption context and encrypted name, we can implement a mechanism where the file is first created with O_TMPFILE in the encrypted directory to avoid triggering the encryption context check before setting the security.encdata xattr, and then atomically linking it to the namespace with the correct encrypted name. From a security standpoint, doing backup and restore of encrypted files must not compromise their security. This is the reason why we want to carry out these operations without the encryption key. It avoids making a clear text copy of encrypted files. The security.encdata extended attribute contains the encryption context of the file or directory. This has a 16-byte nonce (per-file random value) that is used along with the master key to derive the per-file key thanks to a KDF function. But the master key is not stored in ext4, so it is not backed up as part of the scenario described above, which makes the backup of the raw encrypted files safe. The process of restoring encrypted files must not change the encryption context associated with the files. In particular, setting an encryption context on a file must be possible only once, when the file is restored. And the newly introduced capability of restoring encrypted files must not give the ability to set an arbitrary encryption context on files. From the backup tool point of view, the only changes needed would be to add "O_FILE_ENC" when the open fails with ENOKEY, and then explicitly backup the "security.encdata" xattr with the file. On restore, if the "security.encdata" xattr is present, then the file should be created in the directory with O_TMPFILE before restoring the xattrs and file data, and then using link() to link the file to the directory with the encrypted filename. From the filesystem point of view, it needs to generate the encdata xattr on getxattr(), and interpret it correctly on setxattr(). The VFS needs to allow open() and link() on encrypted files with O_FILE_ENC. If this proposal is OK I can provide a series of patches to implement this. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Backup/restore of fscrypt files and directories 2023-02-08 12:09 Backup/restore of fscrypt files and directories Sebastien Buisson @ 2023-02-08 19:28 ` Eric Biggers 2023-02-10 13:44 ` Sebastien Buisson 0 siblings, 1 reply; 5+ messages in thread From: Eric Biggers @ 2023-02-08 19:28 UTC (permalink / raw) To: Sebastien Buisson; +Cc: linux-fscrypt, linux-fsdevel, linux-ext4 Hi Sebastien, On Wed, Feb 08, 2023 at 01:09:50PM +0100, Sebastien Buisson wrote: > I am planning to implement backup and restore for fscrypt files and > directories and propose the following design, and would welcome feedback on > this approach. Thanks for looking into this. Before getting too far into the details of your proposal, are you aware of the previous threads about this? Specifically: "backup/restore of fscrypt files" (https://lore.kernel.org/linux-fscrypt/D1AD7D55-94D6-4C19-96B4-BAD0FD33CF49@dilger.ca/T/#u) And the discussion that happened as part of "[PATCH RERESEND v9 0/9] fs: interface for directly reading/writing compressed data" (https://lore.kernel.org/linux-fsdevel/CAHk-=wh74eFxL0f_HSLUEsD1OQfFNH9ccYVgCXNoV1098VCV6Q@mail.gmail.com and its responses). Both times before, it was brought up that the hardest part is backing up and restoring the filenames, including symlinks. I don't think your proposal really addresses that. Your proposal has a single filename in the security.encdata xattr. But actually, a file can have many names. Also, a file can have an encrypted name without being encrypted itself; that's the case for device node, socket, and FIFO files. Also, symlinks have their target encrypted. I think that your proposal, in general, needs more detail about how *restores* will work, since that's going to be much harder than backups. It's not hard to get the filesystem to give you more information; it's much harder to make changes to a filesystem while keeping everything self-consistent! A description of the use cases of this feature would also be helpful. Historically, people have said they needed this feature when they really didn't. > The third challenge is to get access to the encryption context of files and > directories. By design, fscrypt does not expose this information, internally > stored as an extended attribute but with no associated handler. Actually, FS_IOC_GET_ENCRYPTION_POLICY_EX and FS_IOC_GET_ENCRYPTION_NONCE together give you all the information stored in the encryption context. > In order to address this need for backup/restore of encrypted files, we > propose to make use of a special extended attribute named security.encdata, > containing: > - encoding method used for binary data. Assume name can be up to 255 chars. > - clear text file data length in bytes (set to 0 for dirs). st_size already gives the plaintext file length, even while the encryption key is not present. > - encryption context. 40 bytes for v2 encryption context. > - encrypted name. 256 bytes max. > > To improve portability if we need to change the on-disk format in the > future, and to make the archived data useful over a longer timeframe, the > content of the security.encdata xattr is expressed as ASCII text with a > "key: value" YAML format. As encryption context and encrypted file name are > binary, they need to be encoded. > So the content of the security.encdata xattr would be something like: > > { encoding: base64url, size: 3012, enc_ctx: YWJjZGVmZ2hpamtsbW > 5vcHFyc3R1dnd4eXphYmNkZWZnaGlqa2xtbg, enc_name: ZmlsZXdpdGh2ZX > J5bG9uZ25hbWVmaWxld2l0aHZlcnlsb25nbmFtZWZpbGV3aXRodmVyeWxvbmdu > YW1lZmlsZXdpdGg } > > Because base64 encoding has a 33% overhead, this gives us a maximum xattr > size of approximately 800 characters. > This extended attribute would not be shown when listing xattrs, only exposed > when fetched explicitly, and unmodified tools would not be able to access > the encrypted files in any case. It would not be stored on disk, only > computed when fetched. An xattr containing multiple key-value pairs is quite strange, since xattrs themselves are key-value pairs. This could just be multiple xattrs. Did you choose this design because you intend for this to be treated as an opaque blob that userspace must not interpret at all? > File and file system backups often use the tar utility either directly or > under the covers. We propose to modify the tar utility to make it > "encryption aware", but the same relatively small changes could be done with > other common backup utilities like cpio as needed. When detecting ext4 > encrypted files, tar would need to explicitly fetch the security.encdata > extended attribute, and store it along with the backup file. Fetching this > extended attribute would internally trigger in ext4 a mechanism responsible > for gathering the required information. Because we must not make any clear > text copy of encrypted files, the encryption key must not be present. Why can't the encryption key be present during backup? Surely some people are going to want to back up encrypted files consistently in ciphertext form, regardless of whether the key happens to be present or not at the particular time the backup is being done? Consider e.g. a bunch of user home directories which are regularly being locked and unlocked, and the system administrator is taking backups of everything. > Tar > would also need to use a special flag that would allow reading raw data > without the encryption key. Such a flag could be named O_FILE_ENC, and would > need to be coupled with O_DIRECT so that the page cache does not see this > raw data. O_FILE_ENC could take the value of (O_NOCTTY | O_NDELAY) as they > are unlikely to be used in practice and are not harmful if used incorrectly. Maybe call this O_CIPHERTEXT? Also note that a new RWF_* flag to preadv2, instead of a new O_* flag to open(), has been suggested before. > The name of the backed-up file would be the encoded+digested form returned > by fscrypt. Does this have a meaning, since the actual name would be stored separately? > The tar utility would be used to extract a previously created tarball > containing encrypted files. When restoring the security.encdata extended > attribute, instead of storing the xattr as-is on disk, this would internally > trigger in ext4 a mechanism responsible for extracting the required > information, and storing them accordingly. Tar would also need to specify > the O_FILE_ENC | O_DIRECT flags to write raw data without the encryption > key. > > To create a valid encrypted file with proper encryption context and > encrypted name, we can implement a mechanism where the file is first created > with O_TMPFILE in the encrypted directory to avoid triggering the encryption > context check before setting the security.encdata xattr, and then atomically > linking it to the namespace with the correct encrypted name. How exactly does the link to the correct name happen? What if there's more than one name? What about restoring non-regular files? > The security.encdata extended attribute contains the encryption context of > the file or directory. This has a 16-byte nonce (per-file random value) that > is used along with the master key to derive the per-file key thanks to a KDF > function. But the master key is not stored in ext4, so it is not backed up > as part of the scenario described above, which makes the backup of the raw > encrypted files safe. Side note: the backup/restore support will need to be disabled on files that use FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64 or FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32, since those files are tied to the filesystem they are on. - Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Backup/restore of fscrypt files and directories 2023-02-08 19:28 ` Eric Biggers @ 2023-02-10 13:44 ` Sebastien Buisson 2023-02-10 20:42 ` Theodore Ts'o 0 siblings, 1 reply; 5+ messages in thread From: Sebastien Buisson @ 2023-02-10 13:44 UTC (permalink / raw) To: Eric Biggers; +Cc: linux-fscrypt, linux-fsdevel, linux-ext4 Hi Eric, Thanks for your feedback. Le 08/02/2023 à 20:28, Eric Biggers a écrit : > Hi Sebastien, > > On Wed, Feb 08, 2023 at 01:09:50PM +0100, Sebastien Buisson wrote: >> I am planning to implement backup and restore for fscrypt files and >> directories and propose the following design, and would welcome feedback on >> this approach. > Thanks for looking into this. Before getting too far into the details of your > proposal, are you aware of the previous threads about this? Specifically: > > "backup/restore of fscrypt files" > (https://lore.kernel.org/linux-fscrypt/D1AD7D55-94D6-4C19-96B4-BAD0FD33CF49@dilger.ca/T/#u) > > And the discussion that happened as part of > "[PATCH RERESEND v9 0/9] fs: interface for directly reading/writing compressed data" > (https://lore.kernel.org/linux-fsdevel/CAHk-=wh74eFxL0f_HSLUEsD1OQfFNH9ccYVgCXNoV1098VCV6Q@mail.gmail.com > and its responses). I knew about the first one, but had not stumbled accross the discussion that happened in the compression related thread, thanks. > Both times before, it was brought up that the hardest part is backing up and > restoring the filenames, including symlinks. I don't think your proposal really > addresses that. Your proposal has a single filename in the security.encdata > xattr. But actually, a file can have many names. Also, a file can have an > encrypted name without being encrypted itself; that's the case for device node, > socket, and FIFO files. Also, symlinks have their target encrypted. That is correct. The value of the enc_name field is the ciphertext name of the current dentry. Like with regular files, my impression was that tar (or the backup utility) would handle the hard links properly. According to you, what would make a difference between regular files and encrypted files regarding restore or hard links? As for symlinks, you are right I need to dig further. I think at least the security.encdata xattr would need an additional field to hold the ciphertext symlink target. > I think that your proposal, in general, needs more detail about how *restores* > will work, since that's going to be much harder than backups. It's not hard to > get the filesystem to give you more information; it's much harder to make > changes to a filesystem while keeping everything self-consistent! > > A description of the use cases of this feature would also be helpful. > Historically, people have said they needed this feature when they really didn't. There is really a need for backup/restore at the file system level. For instance, in case of storage failure, we would want to restore files to a newly formatted device, in a finner granularity that cannot be achieved with a backup/restore at the device level, or because that would allow changing formatting options. Also, it particularly makes sense to have per-directory backups, as the block devices are getting larger and larger. The ability to backup and restore encrypted files is interesting in annother use case: moving files between file systems and systems without the need to decrypt then re-encrypt. >> The third challenge is to get access to the encryption context of files and >> directories. By design, fscrypt does not expose this information, internally >> stored as an extended attribute but with no associated handler. > Actually, FS_IOC_GET_ENCRYPTION_POLICY_EX and FS_IOC_GET_ENCRYPTION_NONCE > together give you all the information stored in the encryption context. > >> In order to address this need for backup/restore of encrypted files, we >> propose to make use of a special extended attribute named security.encdata, >> containing: >> - encoding method used for binary data. Assume name can be up to 255 chars. >> - clear text file data length in bytes (set to 0 for dirs). > st_size already gives the plaintext file length, even while the encryption key > is not present. Exactly, and that would prevent normal utilities from reading raw encrypted content up to the end of the encryption block (if access without the key was granted). >> - encryption context. 40 bytes for v2 encryption context. >> - encrypted name. 256 bytes max. >> >> To improve portability if we need to change the on-disk format in the >> future, and to make the archived data useful over a longer timeframe, the >> content of the security.encdata xattr is expressed as ASCII text with a >> "key: value" YAML format. As encryption context and encrypted file name are >> binary, they need to be encoded. >> So the content of the security.encdata xattr would be something like: >> >> { encoding: base64url, size: 3012, enc_ctx: YWJjZGVmZ2hpamtsbW >> 5vcHFyc3R1dnd4eXphYmNkZWZnaGlqa2xtbg, enc_name: ZmlsZXdpdGh2ZX >> J5bG9uZ25hbWVmaWxld2l0aHZlcnlsb25nbmFtZWZpbGV3aXRodmVyeWxvbmdu >> YW1lZmlsZXdpdGg } >> >> Because base64 encoding has a 33% overhead, this gives us a maximum xattr >> size of approximately 800 characters. >> This extended attribute would not be shown when listing xattrs, only exposed >> when fetched explicitly, and unmodified tools would not be able to access >> the encrypted files in any case. It would not be stored on disk, only >> computed when fetched. > An xattr containing multiple key-value pairs is quite strange, since xattrs > themselves are key-value pairs. This could just be multiple xattrs. > > Did you choose this design because you intend for this to be treated as an > opaque blob that userspace must not interpret at all? This format is chosen to be readable and potentially modified if implementation of backup/restore of encrypted files evolves in the future. As you mention, some of the information returned in the security.encdata xattr can be retrieved by other means. But the idea to have a single xattr that holds all the information is to ease implementation in the backup/restore tools. For them, the backup operation would just consist in fetching the security.encdata xattr if dealing with an encrypted file. So from that standpoint, the content of the xattr is not supposed to be interpreted by the backup/restore tools. However, having a readable multi key-value pair format increases portability and makes it possible for other tools to convert to a newer format if the need arises in the future. >> File and file system backups often use the tar utility either directly or >> under the covers. We propose to modify the tar utility to make it >> "encryption aware", but the same relatively small changes could be done with >> other common backup utilities like cpio as needed. When detecting ext4 >> encrypted files, tar would need to explicitly fetch the security.encdata >> extended attribute, and store it along with the backup file. Fetching this >> extended attribute would internally trigger in ext4 a mechanism responsible >> for gathering the required information. Because we must not make any clear >> text copy of encrypted files, the encryption key must not be present. > Why can't the encryption key be present during backup? Surely some people are > going to want to back up encrypted files consistently in ciphertext form, > regardless of whether the key happens to be present or not at the particular > time the backup is being done? Consider e.g. a bunch of user home directories > which are regularly being locked and unlocked, and the system administrator is > taking backups of everything. That is a very good question. Of course we do not want to make clear text copies of encrypted files, but you are right that we should also support making a ciphertext backup while the key is present. I guess this is achievable thanks to a specific flag to open() or preadv2() as mentioned below. >> Tar >> would also need to use a special flag that would allow reading raw data >> without the encryption key. Such a flag could be named O_FILE_ENC, and would >> need to be coupled with O_DIRECT so that the page cache does not see this >> raw data. O_FILE_ENC could take the value of (O_NOCTTY | O_NDELAY) as they >> are unlikely to be used in practice and are not harmful if used incorrectly. > Maybe call this O_CIPHERTEXT? Also note that a new RWF_* flag to preadv2, > instead of a new O_* flag to open(), has been suggested before. > >> The name of the backed-up file would be the encoded+digested form returned >> by fscrypt. > Does this have a meaning, since the actual name would be stored separately? But the backed-up file needs to have a name right? Given that the encoded+digested form returned by fscrypt is unique for the directory, I thought it would be fine to use. Can you think of another name to give to backed-up files? >> The tar utility would be used to extract a previously created tarball >> containing encrypted files. When restoring the security.encdata extended >> attribute, instead of storing the xattr as-is on disk, this would internally >> trigger in ext4 a mechanism responsible for extracting the required >> information, and storing them accordingly. Tar would also need to specify >> the O_FILE_ENC | O_DIRECT flags to write raw data without the encryption >> key. >> >> To create a valid encrypted file with proper encryption context and >> encrypted name, we can implement a mechanism where the file is first created >> with O_TMPFILE in the encrypted directory to avoid triggering the encryption >> context check before setting the security.encdata xattr, and then atomically >> linking it to the namespace with the correct encrypted name. > How exactly does the link to the correct name happen? What if there's more than > one name? What about restoring non-regular files? So the restore tool first creates the file with O_TMPFILE in the encrypted directory, and writes its ciphertext content (with a special flag mentioned above). Then the tool sets the security.encdata xattr. Internally fscrypt uses the value of the enc_ctx field to set the .c xattr on the file, and the size field to set the plaintext file length. The value of the enc_name field is stored temporarily by fscrypt in a dedicated xattr such as "ciphertextname". Then the tool calls linkat() on the file. Internally, seeing the special flag and the presence of the "ciphertextname" xattr, fscrypt uses this value as the new name. The purpose of this is to impose the provided encryption context and encrypted name, instead of having new ones generated at file creation. In the case of hard links, I do not know how tar for instance handles this for normal files. Do you have any ideas? Cheers, Sebastien. >> The security.encdata extended attribute contains the encryption context of >> the file or directory. This has a 16-byte nonce (per-file random value) that >> is used along with the master key to derive the per-file key thanks to a KDF >> function. But the master key is not stored in ext4, so it is not backed up >> as part of the scenario described above, which makes the backup of the raw >> encrypted files safe. > Side note: the backup/restore support will need to be disabled on files that use > FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64 or FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32, since > those files are tied to the filesystem they are on. > > - Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Backup/restore of fscrypt files and directories 2023-02-10 13:44 ` Sebastien Buisson @ 2023-02-10 20:42 ` Theodore Ts'o 2023-02-16 0:54 ` Andreas Dilger 0 siblings, 1 reply; 5+ messages in thread From: Theodore Ts'o @ 2023-02-10 20:42 UTC (permalink / raw) To: Sebastien Buisson; +Cc: Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4 On Fri, Feb 10, 2023 at 02:44:22PM +0100, Sebastien Buisson wrote: > As for symlinks, you are right I need to dig further. I think at least the > security.encdata xattr would need an additional field to hold the ciphertext > symlink target. So I'd caution you against the concept of using the security.encdata xattr. In propose, it's being used in two different ways. The first way is as a system call / ioctl like way, and that's something which is very much frowned upon, at least by many in the Kernel community. The red flag here is when you say that the xattr isn't actually stored on disk, but rather is created on the fly when the xattr is fetched. If you need to fetch information from the kernel that's not stored as part of the on-disk format, then use an ioctl or a system call. Don't try to turn the xattr interface into a system call / ioctl extension like thing. The other way you're using the encdata is that you're presuming that this is how you'd store the information in the tar format. And how we fetch information from the kernel, and how it is stored as an exchange format, should be decoupled as much as possible. In the case of a tar archive, the symlink target is normally stored in the data block of the tar archive. In the case where the symlink is encrypted, why should that change? We aren't storing the encrypted data in a different location, such as the encdata xattr; why should that be different in the case of the symlink target? Now, how you *fetch* the encrypted symlink target might be different, such as how we fetch the contents of an unencrypted data file (via the read system call) and how we fetch an unencrypted symlink target (via the readlink system call) are different. > > A description of the use cases of this feature would also be helpful. > > Historically, people have said they needed this feature when they really didn't. > > There is really a need for backup/restore at the file system level. For > instance, in case of storage failure, we would want to restore files to a > newly formatted device, in a finner granularity that cannot be achieved with > a backup/restore at the device level, or because that would allow changing > formatting options. Also, it particularly makes sense to have per-directory > backups, as the block devices are getting larger and larger. > > The ability to backup and restore encrypted files is interesting in annother > use case: moving files between file systems and systems without the need to > decrypt then re-encrypt. The use case of being able to restore files without needing to decrypt and re-encrypt is quite different from the use case where you want to be able to backup the files without needing encryption keys present, but the encryption keys *are* needed at restore time is quite different --- and the latter is quite a bit easier. For example, some of encryption modes which use the inode number as part of the IV, could be handled if keys are needed at restore time; but it would be quite a bit harder, if not impossible, if you want to be able restore the ecrypted files without doing a decrypt/re-encrypt pass. Can you give more details about why you are interested in implementing this? Does your company have a more specific business justification for wanting to invest in this work? If so, can you say more about it? The reason why I ask is because very often fscrypt gets used in integrated solutions, where the encryption/decryption engine is done in-line between the general purpose CPU and the storage device. In some cases, the users' encryption keys might be stored in a something like ARM TrustZone or in some other specialized trusted key manager where even the kernel running in the general purpose hardware won't have access to *any* of the keys. It's for that reason that we have some of these alternate modes where the inode number is used as part of the IV, as opposed to the more traditional scheme where the user's key is used to derive a file-specific subkey. One of the original use cases for fscrypt was for Android and ChromeOS devices. And for those devices the state tends to be synchronized across multiple devices, including web browsers. So the state ends up getting saved, unencrypted, in an application specific format, so you can recover very quickly with no data loss, even if the device gets lost or destroyed[1]. [1] https://www.youtube.com/watch?v=lm-Vnx58UYo It was for this reason that ultimately, we decided that there really wasn't a need to back up the data in an encrypted form, since for the use case that our company was interested in addressing, well over 90% of the state was of necesity already being backed up in an unencrypted format. So it was easier to just backup remaining bits of state, and if we need decrypt, then re-encrypt in a key which is derived from the user's login password before it is sent up to the cloud server. You may be trying to solve the problem in the most general way possible, but sometimes that's not the best solution, especially once time to market and cost/complexity of implementation is taken into account. As Linus Torvalds stated earlier today, when talking about splice(2) vs sendfile(2): "... this is also very much an example of how "generic" may be something that is revered in computer science, but is often a *horrible* thing in reality.... Special cases are often much simpler and easier, and sometimes the special cases are all you actually want." [2] [2] https://lore.kernel.org/all/CAHk-=wip9xx367bfCV8xaF9Oaw4DZ6edF9Ojv10XoxJ-iUBwhA@mail.gmail.com/ > In the case of hard links, I do not know how tar for instance handles this > for normal files. Do you have any ideas? "Tar stores hardlinks in the tarball by storing the first file (of a group of hardlinked files); the subsequent hard links to it are indicated by a special record. When untarring, encountering this record causes tar to create a hard link in the destination filesystem." [3] [3] https://forums.whirlpool.net.au/archive/2787890 Why are you assuming that tar is the best format to use for storing encrypted files? It's going to require special extensions to the tar format, which means it won't necessarily be interoperable across different tar implementations. (For example, the hard link support is specific to GNU tar.) Does your requirements (and this is why a more detailed explanation of your use case would be helpful) require supporting hard links? If it doesn't and you don't mind storing N copies of the file in the tar archive file, and not restoring the hard links when the tar file is unpacked, then life is much simpler. Which is why it's important to be very clear about use cases and requirements before trying to design a solution. Cheers, - Ted ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Backup/restore of fscrypt files and directories 2023-02-10 20:42 ` Theodore Ts'o @ 2023-02-16 0:54 ` Andreas Dilger 0 siblings, 0 replies; 5+ messages in thread From: Andreas Dilger @ 2023-02-16 0:54 UTC (permalink / raw) To: Theodore Ts'o Cc: Sebastien Buisson, Eric Biggers, linux-fscrypt, linux-fsdevel, linux-ext4 [-- Attachment #1: Type: text/plain, Size: 10197 bytes --] On Feb 10, 2023, at 1:42 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > On Fri, Feb 10, 2023 at 02:44:22PM +0100, Sebastien Buisson wrote: >> As for symlinks, you are right I need to dig further. I think at least the >> security.encdata xattr would need an additional field to hold the ciphertext >> symlink target. > > So I'd caution you against the concept of using the security.encdata > xattr. In propose, it's being used in two different ways. The first > way is as a system call / ioctl like way, and that's something which > is very much frowned upon, at least by many in the Kernel community. > The red flag here is when you say that the xattr isn't actually stored > on disk, but rather is created on the fly when the xattr is fetched. > If you need to fetch information from the kernel that's not stored as > part of the on-disk format, then use an ioctl or a system call. Don't > try to turn the xattr interface into a system call / ioctl extension > like thing. I don't think the actual xattr format is critical to the process, it is a blob saved into the archive from the filesystem and restored later. That seems like the textbook definition of an extended attribute. > The other way you're using the encdata is that you're presuming that > this is how you'd store the information in the tar format. And how we > fetch information from the kernel, and how it is stored as an exchange > format, should be decoupled as much as possible. I think using an xattr to do backup/restore of the internal encryption state makes sense, both as the interface for getting these attributes from the kernel, and for storing them to the archive. It seems prudent to make the userspace interface as "generic" as possible, and handling of fscrypt files should be in the filesystem that cares about it. Avoiding the need for the encryption keys makes life *much* simpler for the sysadmin (no need to contact users for backup/restore), and is also *much* more secure from the data POV (no need to store keys in a central site or anywhere, and data is never in plaintext even in memory on the backup machine), as well as a lot faster (no need to both decrypt/encrypt the data for both backup and restore). In some regards the proposed text-based format is already somewhat decoupled from how it is stored internally, and allows abstraction from the internal details. That seems more for the benefit of ext4 to be able to (potentially) process older backups in case more/different information needs to be stored in the xattr. Storing the real file size in the xattr, but exposing the "encrypted chunk size" via st_size would allow userspace to read/write the full encrypted data size without any modification, and then setting the xattr restores the real file size and fscrypt context to the inode. tar, rsync, and commercial backup programs are already able to backup and restore xattrs today, so no changes there. Making as small changes as possible to userspace to handle fscrypt files (e.g. use open(O_CIPHERTEXT) flag if ENOKEY is returned, and then save/restore of an extra xattr) is IMHO a lot easier sell than adding in multiple fs-specific ioctl calls, and then still having to develop some *other* way to save/restore binary fscrypt context in the archive. > In the case of a tar archive, the symlink target is normally stored in > the data block of the tar archive. In the case where the symlink is > encrypted, why should that change? We aren't storing the encrypted > data in a different location, such as the encdata xattr; why should > that be different in the case of the symlink target? Sure, it seems reasonable to save the symlink target as "file data" rather than as part of the security.encdata xattr if that is "more normal" for how tar handles this. I haven't looked into the tar code/format to see how symlinks or hardlinks are handled. > Now, how you *fetch* the encrypted symlink target might be different, > such as how we fetch the contents of an unencrypted data file (via the > read system call) and how we fetch an unencrypted symlink target (via > the readlink system call) are different. > >>> A description of the use cases of this feature would also be helpful. >>> Historically, people have said they needed this feature when they really didn't. >> >> There is really a need for backup/restore at the file system level. For >> instance, in case of storage failure, we would want to restore files to a >> newly formatted device, in a finner granularity that cannot be achieved with >> a backup/restore at the device level, or because that would allow changing >> formatting options. Also, it particularly makes sense to have per-directory >> backups, as the block devices are getting larger and larger. >> >> The ability to backup and restore encrypted files is interesting in annother >> use case: moving files between file systems and systems without the need to >> decrypt then re-encrypt. > > The use case of being able to restore files without needing to decrypt > and re-encrypt is quite different from the use case where you want to > be able to backup the files without needing encryption keys present, > but the encryption keys *are* needed at restore time is quite different > --- and the latter is quite a bit easier. It might be easier on the kernel side, but I can't imagine how requiring the user keys at restore time would simplify the life of a sysadmin trying to recover from failed storage in the middle of the night and having to contact each user in turn to enter their crypto keys as the backup is being extracted... > For example, some of encryption modes which use the inode number as > part of the IV, could be handled if keys are needed at restore time; > but it would be quite a bit harder, if not impossible, if you want to > be able restore the ecrypted files without doing a decrypt/re-encrypt > pass. [snip] > Special cases are often much simpler and easier, and sometimes > the special cases are all you actually want. The mention of strange encryption modes and general-purpose archive formats argues for *more* complexity and special cases, but that contradicts the Linus quote... I don't think this would need to handle *all* different encryption types before it is useful to have. The "inode number is part of IV" seems fragile/non-portable for a few reasons (e.g. it also breaks resize2fs, e4defrag, and possibly other tools), so I would say "don't do that if you want to be able to backup your data". This seems to target the most common use case where the underlying backup/restore filesystem are both suitably enhanced ext4 (or maybe other fscrypt filesystems with equivalent changes). IMHO that solves the critical issue of doing automated backup/restore without the key(s). In theory, the same O_CIPHERTEXT flag could be used by other filesystems to do backup/restore *to the same target fstype* since the xattr(s) are not processed in userspace other than to restore them later. Even the *presence* of the security.encdata xattr (or multiple xattrs) could be conditional upon the O_CIPHERTEXT flag at open time. It would be up to the underlying filesystem to interpret the xattr contents as needed. > Can you give more details about why you are interested in implementing > this? Does your company have a more specific business justification > for wanting to invest in this work? If so, can you say more about it? I would think an being able to automate backup/restore of a multi-user filesystem seems reasonable? fscrypt is well suited to multi-user data encryption (each directory can have a different master key managed by the end user), but having a master key is both a single point of failure (i.e. it could compromise all of the users' data, assuming the security policy allowed this at all), and having to enter the master key (or dozens or hundreds of separate user keys) for backup and restore (or saving it persistently for automation) is problematic to say the least. > The reason why I ask is because very often fscrypt gets used in > integrated solutions, where the encryption/decryption engine is done > in-line between the general purpose CPU and the storage device. [snip stuff related to Android/ChromeOS fscrypt special case usage] This relates more to "normal" ext4 filesystems using fscrypt with no embedded/hardware/ecosystem. I've been thinking about using fscrypt for my home file server for a while, but not being able to make a backup of that data seems like a show stopper. This is good for both personal filesystems that might have an encrypted subdirectory, or for file servers that use ext4 for multi-user storage. >> In the case of hard links, I do not know how tar for instance handles this >> for normal files. Do you have any ideas? > > "Tar stores hardlinks in the tarball by storing the first file (of > a group of hardlinked files); the subsequent hard links to it are > indicated by a special record. When untarring, encountering this > record causes tar to create a hard link in the destination > filesystem." [3] > > [3] https://forums.whirlpool.net.au/archive/2787890 > > Why are you assuming that tar is the best format to use for storing > encrypted files? It's going to require special extensions to the tar > format, which means it won't necessarily be interoperable across > different tar implementations. (For example, the hard link support is > specific to GNU tar.) I would think that GNU tar is probably the most common backup tool for Linux and it already supports all the modern filesystem features (xattrs, symlinks, hardlinks, sparse files, etc.) so it seems like a reasonable place to start. Once that is working then adding support to other tools can be done on an as-needed basis (ideally with minimal changes to those tools). Like you quoted Linus earlier, the solution doesn't need to solve *every* problem, just enough to have a working single-filesystem backup and restore. Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 873 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-02-16 0:54 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-02-08 12:09 Backup/restore of fscrypt files and directories Sebastien Buisson 2023-02-08 19:28 ` Eric Biggers 2023-02-10 13:44 ` Sebastien Buisson 2023-02-10 20:42 ` Theodore Ts'o 2023-02-16 0:54 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).